How to Read This Lesson

This best-practice lesson is written for code reviews. Use it to decide what should be portable standard behavior, what is an implementation detail, and what needs a project rule.

Modeling Best Practices: Datatypes and Performance

Datatype choice is a critical architectural decision in SystemC. It directly impacts simulation speed, memory footprint, and the clarity of the C++ code. The IEEE 1666 standard provides powerful hardware-accurate types, but misusing them in software-oriented Virtual Platforms (VPs) is the leading cause of poor performance.

We must examine the Accellera kernel source code to understand the immense runtime cost of certain datatype selections.

Source and LRM Trail

Best-practice lessons should be traceable. Use Docs/LRMs/SystemC_LRM_1666-2023.pdf, the domain LRMs for AMS/CCI/UVM when relevant, .codex-src/systemc, .codex-src/cci, .codex-src/uvm-systemc, and .codex-src/systemc-common-practices. Mark what is portable, what is source insight, and what is project policy.

The Performance Hierarchy

When selecting a datatype, always start at the top of this hierarchy and only move down when the hardware semantics strictly demand it.

Native C++ Types (uint32_t, bool, std::array):
- Performance: Native CPU speed.
- Kernel Reality: Zero overhead. The compiler maps these directly to CPU registers and native load/store instructions.
- When to use: Memory arrays, software-visible registers, counters, flags, TLM-2.0 payload data pointers.
SystemC Fixed-Width Integers (sc_dt::sc_uint<W>, sc_dt::sc_int<W>):
- Performance: High.
- Kernel Reality: Internally, an sc_uint<W> (where $W \le 64$) simply wraps a standard 64-bit integer (uint64_t m_val). The performance cost arises from proxy classes. When you call .range(a,b) or operator[], SystemC instantiates sc_uint_subref or sc_uint_bitref proxy objects on the stack. These proxy classes overload operator= to execute bitwise masking (&, |, <<) under the hood to ensure exact hardware truncation.
- When to use: Exact hardware bit-width modeling, register field extraction, bit-level concatenation where $W \le 64$.
SystemC Arbitrary-Precision Integers (sc_dt::sc_biguint<W>):
- Performance: Slow.
- Kernel Reality: Inherits from sc_unsigned. It represents large numbers as a heap-allocated array of sc_digit (which are typically uint32_t words). Every arithmetic operation involves for-loops iterating over these arrays to propagate carry bits. Temporary instantiations can trigger expensive heap memory allocations (new[]).
- When to use: Cryptographic keys, very wide buses (e.g., 256-bit memory controllers).
SystemC Bit Vectors (sc_dt::sc_bv<W>):
- Performance: Slower.
- Kernel Reality: Internally backed by sc_bv_base. Bits are packed into an array of sc_digits. While logic operations are bitwise, proxy overhead is severe for individual bit manipulation compared to native masking.
- When to use: When you need to manipulate or observe uninterpreted streams of bits, but do not need 'X' or 'Z' states.
SystemC Logic Vectors (sc_dt::sc_lv<W>, sc_core::sc_logic):
- Performance: Very Slow.
- Kernel Reality: Implements 4-state logic ('0', '1', 'Z', 'X'). Under the hood, sc_logic_vec maintains two distinct sc_digit arrays: m_data (Data Plane) and m_ctrl (Control Plane). To resolve an operation, the kernel must execute parallel bitwise operations across both planes and utilize complex resolution tables.
- When to use: Only for pin-level RTL interfaces where High-Impedance ('Z') or Unknown ('X') states are actively modeled and verified.

TLM Payload Data and Endianness

TLM-2.0 generic payloads (tlm_generic_payload) transfer data using unsigned char*. Never cast this pointer directly to a C++ struct or a larger integer pointer (like uint32_t*) unless you are absolutely certain of the host machine's endianness and memory alignment rules.

Instead, construct the values explicitly.

Complete Example: High-Performance Modeling

Here is a complete sc_main demonstrates the performance best practices: using native C++ arrays for memory, extracting bits correctly without proxy temporaries, and safely packing/unpacking TLM-style byte arrays.

#include <systemc>
#include <iostream>
#include <vector>
#include <iomanip>
 
SC_MODULE(HighPerformanceMemory) {
    // 1. Native C++ type for large memory (Fast, low overhead)
    std::vector<uint8_t> ram;
 
    // 2. Hardware-accurate register for control logic
    sc_dt::sc_uint<32> status_register;
 
    SC_CTOR(HighPerformanceMemory) : ram(1024, 0), status_register(0) {
        SC_METHOD(run_tests);
    }
 
    // Helper function to safely read 32-bits from a byte array (Endian-safe)
    uint32_t read_le32(const uint8_t* p) {
        return uint32_t(p[0])
             | (uint32_t(p[1]) << 8)
             | (uint32_t(p[2]) << 16)
             | (uint32_t(p[3]) << 24);
    }
 
    // Helper function to safely write 32-bits to a byte array (Endian-safe)
    void write_le32(uint8_t* p, uint32_t val) {
        p[0] = static_cast<uint8_t>(val & 0xFF);
        p[1] = static_cast<uint8_t>((val >> 8) & 0xFF);
        p[2] = static_cast<uint8_t>((val >> 16) & 0xFF);
        p[3] = static_cast<uint8_t>((val >> 24) & 0xFF);
    }
 
    void run_tests() {
        // --- Test 1: TLM Payload Processing ---
        uint32_t test_val = 0xDEADBEEF;
        write_le32(&ram[0], test_val);
 
        uint32_t recovered = read_le32(&ram[0]);
        std::cout << "[Memory] Wrote: 0x" << std::hex << test_val 
                  << " Recovered: 0x" << recovered << "\n";
 
        // --- Test 2: Bit Extraction without Proxy Overhead ---
        // BAD: status_register.range(15, 8) = ... (creates proxy temporaries)
        // GOOD: Use native operations where possible, or cast at boundaries
        
        uint32_t status_flags = 0x5; // Native
        sc_dt::sc_uint<4> hw_flags = status_flags; // Boundary conversion
        
        // Pack back into the status register safely via proxy objects
        // The sc_uint_subref proxy intercepts operator= and masks m_val
        status_register.range(3, 0) = hw_flags;
        status_register.range(31, 28) = 0xF;
 
        std::cout << "[Register] Status Reg: 0x" << status_register << "\n";
    }
};
 
int sc_main(int argc, char* argv[]) {
    HighPerformanceMemory mem("mem");
    
    std::cout << "Starting Simulation...\n";
    sc_core::sc_start();
    
    return 0;
}

Explanation of the Execution

Starting Simulation...
[Memory] Wrote: 0xdeadbeef Recovered: 0xdeadbeef
[Register] Status Reg: 0xf0000005

By keeping the 1024-byte RAM as a std::vector<uint8_t>, the memory footprint is exactly 1KB, and reads/writes execute in a single CPU cycle. If sc_dt::sc_lv<8> were used for the RAM array instead, the memory footprint would skyrocket due to the complex class overhead of m_data/m_ctrl arrays for 4-state logic, and every read/write would require function calls to evaluate the logic tables.

The read_le32 and write_le32 functions guarantee that regardless of whether this code is compiled on an x86 (Little Endian) or ARM/PowerPC (potentially Big Endian) host, the modeled hardware behaves consistently as a Little Endian device.