Temporal Decoupling and Quantum Keepers

How to Read This Lesson

For TLM, resist the temptation to picture pins. Picture a C++ function call carrying a transaction object, then add timing only where the architectural question needs it.

Temporal decoupling is a performance technique for loosely timed virtual platforms. Instead of synchronizing with the SystemC kernel after every small operation, an initiator accumulates local time and synchronizes periodically.

SystemC/TLM literature often describes this as a way to improve speed interoperability for virtual platforms, as mandated by the LRM's Global Quantum concepts.

Source and LRM Trail

For TLM, use the IEEE 1666 TLM clauses in Docs/LRMs/SystemC_LRM_1666-2023.pdf as the portable contract. Then inspect .codex-src/systemc/src/tlm_core/tlm_2: tlm_generic_payload, tlm_fw_transport_if, tlm_bw_transport_if, tlm_initiator_socket, tlm_target_socket, tlm_dmi, and tlm_quantumkeeper.

The Cost Being Avoided

Every kernel synchronization (e.g. wait(delay)) has overhead because it forces a context switch out of the executing thread back to the scheduler. If an instruction set simulator synchronizes after every instruction, the platform can become unnecessarily slow.

With temporal decoupling, the initiator runs ahead locally, accumulating a time offset. The initiator still respects a global quantum (a maximum time slice). It is not allowed to run arbitrarily far ahead of the rest of the simulated system.

Complete Quantum Keeper Example

A temporally decoupled initiator sends transactions with an annotated delay that includes its local time offset. Targets add their own delay to that reference. The initiator updates its local quantum keeper after the call.

Here is a fully compilable example of a TLM initiator utilizing a tlm_quantumkeeper:

#include <systemc>
#include <tlm>
#include <tlm_utils/simple_initiator_socket.h>
#include <tlm_utils/simple_target_socket.h>
#include <tlm_utils/tlm_quantumkeeper.h>
 
using namespace sc_core;
 
SC_MODULE(FastTarget) {
  tlm_utils::simple_target_socket<FastTarget> socket{"socket"};
  SC_CTOR(FastTarget) {
    socket.register_b_transport(this, &FastTarget::b_transport);
  }
  void b_transport(tlm::tlm_generic_payload& trans, sc_time& delay) {
    // Target adds its processing time to the delay annotation
    delay += sc_time(10, SC_NS); 
    trans.set_response_status(tlm::TLM_OK_RESPONSE);
  }
};
 
SC_MODULE(DecoupledInitiator) {
  tlm_utils::simple_initiator_socket<DecoupledInitiator> socket{"socket"};
  tlm_utils::tlm_quantumkeeper qk;
 
  SC_CTOR(DecoupledInitiator) {
    // Set global quantum (e.g., 1000 ns)
    tlm::tlm_global_quantum::instance().set(sc_time(1000, SC_NS));
    // Reset local quantum keeper
    qk.reset();
    
    SC_THREAD(run);
  }
 
  void run() {
    for (int i = 0; i < 5; ++i) {
      tlm::tlm_generic_payload trans;
      trans.set_command(tlm::TLM_READ_COMMAND);
      trans.set_address(0x0);
      
      // Pass local time offset instead of SC_ZERO_TIME
      sc_time local_delay = qk.get_local_time();
      socket->b_transport(trans, local_delay);
      
      // Update quantum keeper with the new delay computed by target
      qk.set(local_delay);
 
      std::cout << "Transaction " << i 
                << " finished at kernel time " << sc_time_stamp()
                << " + local offset " << qk.get_local_time() << "\n";
 
      // If we exceeded the quantum, sync with the kernel!
      if (qk.need_sync()) {
        std::cout << "--> Syncing with kernel!\n";
        qk.sync();
      }
    }
  }
};
 
int sc_main(int argc, char* argv[]) {
  DecoupledInitiator init("init");
  FastTarget tgt("tgt");
  init.socket.bind(tgt.socket);
  
  sc_start();
  return 0;
}

This keeps transaction timing meaningful without forcing a kernel context switch for every operation.

Choosing the Quantum

A larger quantum usually improves speed but reduces timing precision. A smaller quantum improves synchronization precision but increases overhead.

Choose based on the model's purpose:

firmware bring-up: larger quantum may be fine
interrupt-latency exploration: smaller quantum
bus-performance exploration: validate against a more detailed model
cycle-level questions: temporal decoupling may be the wrong abstraction

Interrupts and Synchronization

Interrupts are the classic trap. If a CPU model runs far ahead locally, it may observe interrupts late unless the platform uses synchronization points carefully. Temporal decoupling is a controlled lie for speed. It is useful when the lie does not invalidate the engineering question.

Under the Hood: `tlm_quantumkeeper` and Local Time

To minimize context switching, TLM initiators use a tlm_quantumkeeper. Instead of calling wait(delay) (which halts the coroutine and returns control to the sc_simcontext), the initiator accumulates the delay in a local variable m_local_time. The quantum keeper checks if m_local_time has exceeded the globally configured m_global_quantum (src/tlm_core/tlm_2/tlm_quantum/tlm_global_quantum.cpp). Only when the local time exceeds this threshold does the keeper call wait(m_local_time), synchronizing the thread with the main SystemC scheduler. This temporal decoupling can speed up instruction set simulators (ISS) by 10x-100x.