Chapter 14: Synthesis Subset

Synthesizing TLM-2.0

The challenges and solutions for High-Level Synthesis of TLM-2.0 models.

How to Read This Lesson

For synthesis, the question changes from 'can C++ run this?' to 'can hardware be built from this?' Keep storage, timing, and static structure in your head as you read.

Synthesizing TLM-2.0

Transaction Level Modeling (TLM) is the cornerstone of Virtual Prototyping. However, synthesizing TLM-2.0 models into physical hardware represents one of the most advanced and difficult frontiers in EDA (Electronic Design Automation).

Source and LRM Trail

For synthesis, use Docs/LRMs/SystemC_Synthesis_Subset_1_4_7.pdf as the primary contract and Docs/LRMs/SystemC_LRM_1666-2023.pdf for base SystemC semantics. Source internals explain simulation behavior, but synthesizability is a tool contract: focus on static structure, reset modeling, wait placement, and bounded loops.

Why is TLM-2.0 Hard to Synthesize?

Standard TLM-2.0 relies on several concepts that are fundamentally incompatible with basic hardware generation:

  1. Pointers and Dynamic Payloads: tlm_generic_payload is usually allocated on the heap, passed by pointer, and contains pointers to data buffers. Hardware cannot easily pass memory pointers across physical bus wires.
  2. Function Calls Across Boundaries: b_transport is a blocking function call. In simulation, module A calls a function that executes inside module B's context. In hardware, module A and module B are separate physical blocks of silicon.
  3. Timing Annotations: The sc_time delay passed in TLM-2.0 is meant for loosely-timed simulation. Hardware operates on physical clock cycles.

Synthesizable TLM Subsets

To solve this, EDA vendors provide Synthesizable TLM subsets or specialized libraries. When you call b_transport() in a synthesizable TLM model:

  1. The HLS tool recognizes the TLM socket.
  2. It halts the caller's SC_CTHREAD.
  3. It drives the physical address, data, and control pins of the generated AXI bus.
  4. It waits for the AXI READY signal.
  5. It resumes the SC_CTHREAD when the transaction completes.

Here is a complete compilable example demonstrating the structural pattern for a synthesizable TLM-2.0 Initiator. It uses static allocation for the payload and relies on vendor-agnostic C++ pragmas which are ignored by standard simulators but picked up by HLS tools.

#include <systemc>
#include <tlm>
#include <tlm_utils/simple_initiator_socket.h>
#include <tlm_utils/simple_target_socket.h>
 
using namespace sc_core;
 
SC_MODULE(SynthesizableInitiator) {
    sc_in_clk clk{"clk"};
    sc_in<bool> rst{"rst"};
    tlm_utils::simple_initiator_socket<SynthesizableInitiator> init_socket{"init_socket"};
 
    SC_CTOR(SynthesizableInitiator) {
        SC_CTHREAD(run_fsm, clk.pos());
        reset_signal_is(rst, true);
    }
 
    void run_fsm() {
        wait(); // Reset synchronization
        
        while (true) {
            // Static payload for synthesis (No 'new' operator)
            tlm::tlm_generic_payload trans;
            unsigned char data[4] = {0xAA, 0xBB, 0xCC, 0xDD};
            
            trans.set_command(tlm::TLM_WRITE_COMMAND);
            trans.set_address(0x4000);
            trans.set_data_ptr(data);
            trans.set_data_length(4);
            trans.set_response_status(tlm::TLM_INCOMPLETE_RESPONSE);
 
            sc_time delay = SC_ZERO_TIME;
            
            // HLS tools often map this directly to an AXI master interface
            #pragma HLS interface m_axi port=init_socket
            init_socket->b_transport(trans, delay);
            
            if (trans.is_response_error()) {
                SC_REPORT_ERROR(name(), "TLM Write Failed");
            }
            wait();
        }
    }
};
 
SC_MODULE(DummyTarget) {
    tlm_utils::simple_target_socket<DummyTarget> target_socket{"target_socket"};
    SC_CTOR(DummyTarget) {
        target_socket.register_b_transport(this, &DummyTarget::b_transport);
    }
    void b_transport(tlm::tlm_generic_payload& trans, sc_time& delay) {
        trans.set_response_status(tlm::TLM_OK_RESPONSE);
    }
};
 
int sc_main(int argc, char* argv[]) {
    sc_clock clk("clk", 10, SC_NS);
    sc_signal<bool> rst("rst");
 
    SynthesizableInitiator init("init");
    DummyTarget tgt("tgt");
 
    init.clk(clk);
    init.rst(rst);
    init.init_socket.bind(tgt.target_socket);
 
    rst.write(true);
    sc_start(10, SC_NS);
    rst.write(false);
    sc_start(50, SC_NS);
 
    return 0;
}

Best Practices for Synthesizable TLM

  • Avoid DMI (Direct Memory Interface): DMI relies entirely on raw C++ pointers mapping directly to host memory. This cannot be synthesized.
  • Keep Data Static: Ensure the data arrays you pass into the generic payload are statically sized and locally allocated within the module, as shown above.
  • Vendor-Specific Pragmas: You will almost certainly need to annotate your TLM sockets with #pragma HLS (or equivalent) to tell the compiler which physical hardware bus to generate.

By following the SystemC Synthesizable Subset, engineers can write an algorithm once in C++, verify it in a Virtual Platform at high speed, and then push that exact same source code through an HLS compiler to generate physical silicon.

Under the Hood: TLM Synthesis Limitations

TLM-2.0 tlm_generic_payload relies heavily on dynamic memory allocation, pointers, and virtual functions—all of which are notoriously difficult or impossible to synthesize into RTL. When you attempt to synthesize TLM interfaces, HLS tools often require strict constraints: payloads must be statically allocated (e.g., placed on the stack), and extension pointers are usually forbidden. Some vendors provide synthesized wrappers (like tlm_fifo) that map a subset of TLM blocking transport calls directly into AXI4 Memory Mapped bus protocols.

Deep Dive: Accellera Source for sc_signal and update()

The sc_signal<T> channel perfectly illustrates the Evaluate-Update paradigm of SystemC. In the Accellera source (src/sysc/communication/sc_signal.cpp), sc_signal inherits from sc_prim_channel.

The write() Implementation

When you call write(const T&), the signal does not immediately change its value. Instead, it stores the requested value in m_new_val and registers itself with the kernel:

template<class T>
inline void sc_signal<T>::write(const T& value_) {
    if( !(m_new_val == value_) ) {
        m_new_val = value_;
        this->request_update(); // Inherited from sc_prim_channel
    }
}

The request_update() call appends the channel to sc_simcontext::m_update_list.

The update() Phase

After the Evaluate phase finishes (all ready processes have run), the kernel iterates over m_update_list and calls the update() virtual function on each primitive channel. For sc_signal, this looks like:

template<class T>
inline void sc_signal<T>::update() {
    if( !(m_new_val == m_cur_val) ) {
        m_cur_val = m_new_val;
        m_value_changed_event.notify(SC_ZERO_TIME); // Notify processes sensitive to value_changed_event()
    }
}

This guarantees that all concurrent processes see the same old value until the delta cycle advances, perfectly mimicking hardware register delays.

Comments and Corrections