Chapter 12: Virtual Platform Construction

Router DMI and Debug Transport

Optimizing VP performance with Direct Memory Interface (DMI) and backdoor Debug Transport.

How to Read This Lesson

For virtual platforms, imagine a firmware engineer trying to boot real software on your model. Every abstraction choice should help that person move faster without lying about the hardware.

Router DMI and Debug Transport

A Loosely Timed (LT) Virtual Platform booting an operating system executes millions of memory instructions (Instruction Fetches, Stack Pushes). If every single memory access has to allocate a tlm_generic_payload, traverse the router, decode the address, and execute a b_transport function call, the simulation will be devastatingly slow.

The IEEE 1666 standard provides Direct Memory Interface (DMI) to bypass the socket transport entirely. Let's dig into the TLM kernel source to see how this works mechanically.

Source and LRM Trail

Virtual platform lessons combine standard TLM behavior with architecture practice. Use Docs/LRMs/SystemC_LRM_1666-2023.pdf for TLM and kernel rules, .codex-src/systemc/src/tlm_core/tlm_2 for sockets and payloads, .codex-src/cci for configurable platforms, and .codex-src/systemc-common-practices for reusable patterns.

Direct Memory Interface (DMI)

DMI allows an initiator to request a direct C++ pointer to the target's physical memory array.

  1. CPU attempts an access.
  2. CPU asks the Router for DMI permissions for that address region.
  3. Router forwards the request to RAM.
  4. RAM returns a tlm_dmi struct containing the raw unsigned char* pointer and the allowed address range.
  5. CPU caches this pointer and executes millions of future reads/writes using direct array indexing (ptr[offset]), achieving native execution speed.

Complete DMI Target Example

This sc_main example implements a DMI-compliant RAM module.

#include <systemc>
#include <tlm>
#include <tlm_utils/simple_target_socket.h>
 
class DMI_RAM : public sc_core::sc_module {
public:
    tlm_utils::simple_target_socket<DMI_RAM> socket;
    unsigned char* memory;
    unsigned int size;
 
    SC_HAS_PROCESS(DMI_RAM);
    DMI_RAM(sc_core::sc_module_name name, unsigned int size_bytes) 
        : sc_core::sc_module(name), size(size_bytes) {
        
        memory = new unsigned char[size];
        memset(memory, 0, size);
 
        // Standard Transport
        socket.register_b_transport(this, &DMI_RAM::b_transport);
        // Register DMI Hook
        socket.register_get_direct_mem_ptr(this, &DMI_RAM::get_direct_mem_ptr);
    }
 
    ~DMI_RAM() { delete[] memory; }
 
private:
    void b_transport(tlm::tlm_generic_payload& trans, sc_core::sc_time& delay) {
        // Standard transport logic omitted...
        
        // Hint to the initiator that DMI is available for this region
        trans.set_dmi_allowed(true);
        trans.set_response_status(tlm::TLM_OK_RESPONSE);
    }
 
    // LRM DMI Method
    bool get_direct_mem_ptr(tlm::tlm_generic_payload& trans, tlm::tlm_dmi& dmi_data) {
        // Provide the direct C++ pointer to the memory array
        dmi_data.set_dmi_ptr(memory);
        
        // Define the valid physical range for this pointer (Local offsets)
        dmi_data.set_start_address(0x00000000);
        dmi_data.set_end_address(size - 1);
        
        // Allow both read and write
        dmi_data.set_granted_access(tlm::tlm_dmi::DMI_ACCESS_READ_WRITE);
        
        // Set latencies so the initiator can accurately accumulate time
        dmi_data.set_read_latency(sc_core::sc_time(10, sc_core::SC_NS));
        dmi_data.set_write_latency(sc_core::sc_time(10, sc_core::SC_NS));
 
        std::cout << "@" << sc_core::sc_time_stamp() << " [RAM] Granted DMI Access." << std::endl;
        return true;
    }
};
 
int sc_main(int argc, char* argv[]) {
    DMI_RAM ram("ram", 0x10000); // 64KB RAM
    // Initiator omitted for brevity
    return 0;
}

The Router DMI Address Adjustment Rule

Under the Hood: In the get_direct_mem_ptr implementation above, the RAM sets the bounds to 0x0 through 0xFFFF because it only knows its local size. However, the CPU requested a global address (e.g., 0x4000_0000). If the CPU caches a tlm_dmi struct with bounds 0x0-0xFFFF, its global cache lookup will immediately fail on the next instruction. Therefore, the Router must intercept the returning tlm_dmi struct from the target. The router executes dmi_data.set_start_address( dmi_data.get_start_address() + base_address ) and dmi_data.set_end_address( dmi_data.get_end_address() + base_address ) before returning it to the CPU. The CPU then calculates the native memory array index as dmi_ptr[ global_address - dmi_start_address ].

Debug Transport (transport_dbg)

When a debugger (like GDB attached to the Virtual Platform) inspects memory, it must NOT alter the hardware state. Reading a UART FIFO should not pop the FIFO.

The LRM provides transport_dbg, a completely side-effect-free, non-blocking path. It does not take an sc_time argument, and it executes instantaneously in zero simulation time. Under the Hood: Unlike b_transport, transport_dbg does not rely on set_response_status. Instead, the virtual method returns an unsigned int representing the exact number of bytes successfully read or written. If the address is invalid, the target simply returns 0. Targets must implement transport_dbg to support architectural inspection utilities and GDB stubs.

Comments and Corrections