Chapter 13: Modeling Best Practices

Modeling Best Practices: Abstraction

How to decide what to model, what to approximate, what to document, and how to keep SystemC examples production-like.

How to Read This Lesson

This best-practice lesson is written for code reviews. Use it to decide what should be portable standard behavior, what is an implementation detail, and what needs a project rule.

Modeling Best Practices: Abstraction

The most important SystemC skill is not knowing every API. It is choosing the right abstraction level. The IEEE 1666 LRM provides mechanisms ranging from bit-accurate, delta-cycle-level RTL modeling all the way up to Loosely Timed (LT) Transaction Level Modeling (TLM-2.0).

A good architect uses the simplest, fastest abstraction that still answers the system's design questions. We must look at the Accellera SystemC kernel's source code to understand why certain abstractions perform drastically better than others.

Source and LRM Trail

Best-practice lessons should be traceable. Use Docs/LRMs/SystemC_LRM_1666-2023.pdf, the domain LRMs for AMS/CCI/UVM when relevant, .codex-src/systemc, .codex-src/cci, .codex-src/uvm-systemc, and .codex-src/systemc-common-practices. Mark what is portable, what is source insight, and what is project policy.

The Spectrum of Abstraction

  1. Cycle-Accurate (RTL Level): Uses sc_signal, sc_logic, clocks, and delta cycles.

    • Kernel Reality: Every clock edge triggers an sc_event. This places all sensitive SC_METHODs into sc_simcontext::m_runnable. When sc_signal::write() is called, the kernel adds the signal to sc_simcontext::m_update_list. After execution, sc_simcontext::crunch() iterates the update list, triggering m_value_changed_events, which schedules more processes, causing delta cycles. This heavy reliance on kernel data structures makes it slow.
    • Pros: Matches actual hardware perfectly.
    • Cons: Extremely slow simulation (KHz range).
    • When to use: Hardware validation, HLS (High-Level Synthesis) generation.
  2. Approximately Timed (AT) TLM: Uses TLM-2.0 Non-Blocking Transport (nb_transport), modeling multiple protocol phases (Request, End-Request, Response) with annotated delays.

    • Kernel Reality: It avoids sc_signal and delta cycle updates, passing payloads via function calls. However, phases are synchronized using tlm_peq_with_cb_and_phase (Payload Event Queue), which internally relies on sc_event::notify(sc_time). This still pushes sc_event_timed objects into sc_simcontext::m_timed_events, requiring kernel wake-ups, though much fewer than RTL.
    • Pros: Highly accurate bus contention and performance profiling.
    • Cons: Harder to write, moderate simulation speed.
    • When to use: Interconnect performance analysis, cache coherency studies.
  3. Loosely Timed (LT) TLM (Virtual Platforms): Uses TLM-2.0 Blocking Transport (b_transport), temporal decoupling, and quantum keepers.

    • Kernel Reality: A b_transport is just a virtual C++ function call. A target module consumes time by adding to a local sc_time delay parameter rather than calling wait(). The thread is temporally decoupled, meaning it runs ahead of sc_simcontext::m_curr_time without yielding to the scheduler. No context switching (QuickThreads/pthreads) occurs, and sc_simcontext::crunch() is completely bypassed until the local time exceeds the TLM quantum.
    • Pros: Blistering fast simulation speed (100s of MHz). Capable of booting Linux in seconds.
    • Cons: Timing is approximate; cannot find race conditions on the bus.
    • When to use: Firmware development, early software bring-up, OS porting.

Avoid Accidental RTL

If you are building a Virtual Platform (LT), avoid mixing RTL-style modeling inside it. If you model every clock edge of a UART transmitter using sc_signal<bool> and wait(), your entire platform's speed will be bottlenecked by that one UART.

Instead, model the UART at a high level: when software writes a byte, wait a bulk block of time (wait(byte_delay)), and then raise an interrupt.

Complete Example: Abstracting a Timer

Here is a complete sc_main demonstrates the difference between an RTL-style timer (bad for VPs) and an abstract TLM-style timer (good for VPs).

#include <systemc>
#include <iostream>
 
// ---------------------------------------------------------
// BAD for Virtual Platforms: Cycle-Accurate Timer (RTL Style)
// ---------------------------------------------------------
SC_MODULE(RtlTimer) {
    sc_core::sc_in<bool> clk{"clk"};
    sc_core::sc_out<bool> irq{"irq"};
    
    int counter = 0;
    int limit = 1000;
 
    SC_CTOR(RtlTimer) {
        SC_METHOD(tick);
        sensitive << clk.pos();
    }
 
    void tick() {
        // Wakes up on EVERY SINGLE CLOCK CYCLE! Very slow simulation.
        // Causes sc_simcontext to evaluate this method continuously.
        counter++;
        if (counter >= limit) {
            irq.write(true);
            counter = 0;
        } else {
            irq.write(false);
        }
    }
};
 
// ---------------------------------------------------------
// GOOD for Virtual Platforms: Abstract Timer (TLM Style)
// ---------------------------------------------------------
SC_MODULE(AbstractTimer) {
    sc_core::sc_event irq_event;
    sc_core::sc_time clock_period;
    int limit = 1000;
 
    SC_CTOR(AbstractTimer) : clock_period(10, sc_core::SC_NS) {
        SC_THREAD(timer_process);
    }
 
    void timer_process() {
        while (true) {
            // Wakes up ONLY when the interrupt is actually due!
            // Skips 1000 clock cycles instantly. High simulation speed.
            sc_core::sc_time wait_time = clock_period * limit;
            
            // Internally creates an sc_event_timed and pushes to m_timed_events
            // Yields the QuickThread coroutine context back to the scheduler.
            wait(wait_time);
            
            std::cout << "@ " << sc_core::sc_time_stamp() 
                      << " [AbstractTimer] Interrupt Fired!\n";
            
            irq_event.notify(sc_core::SC_ZERO_TIME);
        }
    }
};
 
int sc_main(int argc, char* argv[]) {
    // We only simulate the AbstractTimer for this demonstration.
    AbstractTimer t_abs("abstract_timer");
 
    std::cout << "Starting Virtual Platform simulation...\n";
    sc_core::sc_start(25, sc_core::SC_US);
    
    return 0;
}

Explanation of the Execution

Starting Virtual Platform simulation...
@ 10 us [AbstractTimer] Interrupt Fired!
@ 20 us [AbstractTimer] Interrupt Fired!

The RtlTimer would require the SystemC kernel to evaluate the tick() method 2,500 times to simulate 25 microseconds (assuming a 10ns clock). Under the hood, sc_simcontext::crunch() would loop 2,500 times, processing m_runnable and m_update_list continuously.

The AbstractTimer requires the SystemC kernel to evaluate timer_process() exactly 2 times. By abstracting away the clock signal and calculating the bulk time jump, the thread process pushes an sc_event_timed precisely into the future and yields its stack via the qt_block assembly routine. The kernel's m_curr_time jumps directly to the target timestamp, bypassing 2,500 iterations. Simulation performance increases by orders of magnitude, making firmware development practical.

Deep Dive: Accellera Source for sc_signal and update()

The sc_signal<T> channel perfectly illustrates the Evaluate-Update paradigm of SystemC. In the Accellera source (src/sysc/communication/sc_signal.cpp), sc_signal inherits from sc_prim_channel.

The write() Implementation

When you call write(const T&), the signal does not immediately change its value. Instead, it stores the requested value in m_new_val and registers itself with the kernel:

template<class T>
inline void sc_signal<T>::write(const T& value_) {
    if( !(m_new_val == value_) ) {
        m_new_val = value_;
        this->request_update(); // Inherited from sc_prim_channel
    }
}

The request_update() call appends the channel to sc_simcontext::m_update_list.

The update() Phase

After the Evaluate phase finishes (all ready processes have run), the kernel iterates over m_update_list and calls the update() virtual function on each primitive channel. For sc_signal, this looks like:

template<class T>
inline void sc_signal<T>::update() {
    if( !(m_new_val == m_cur_val) ) {
        m_cur_val = m_new_val;
        m_value_changed_event.notify(SC_ZERO_TIME); // Notify processes sensitive to value_changed_event()
    }
}

This guarantees that all concurrent processes see the same old value until the delta cycle advances, perfectly mimicking hardware register delays.

Comments and Corrections