Rust's repr: Optimize Struct Memory for Cache Efficiency

The repr attribute controls struct memory layout, which is critical for low-level optimization in high-throughput systems where cache locality drives performance.

How They Work

repr(C): Enforces C-compatible layout with fields ordered sequentially as declared, adding padding to align each field to its natural alignment (e.g., u32 aligns to 4 bytes). Ensures predictable interoperability and typically aligns well with CPU cache lines (often 64 bytes).

repr(packed): Removes all padding, packing fields tightly together regardless of alignment. Minimizes memory usage but can lead to unaligned memory accesses, which are slower on most architectures.

Optimization for Cache Locality

With repr(C), the compiler adds padding to align fields, increasing struct size but ensuring efficient, aligned access:

#[repr(C)]
struct Data {
    flag: bool,   // 1 byte + 3 bytes padding (on 32-bit alignment)
    value: u32,   // 4 bytes
    counter: u64, // 8 bytes
}
// Size: 16 bytes (due to padding for alignment)

Here, repr(C) ensures value and counter are aligned—great for loops accessing value repeatedly. Aligned reads are fast and cache-friendly, but padding after flag wastes space.

With repr(packed):

#[repr(packed)]
struct PackedData {
    flag: bool,   // 1 byte
    value: u32,   // 4 bytes, unaligned
    counter: u64, // 8 bytes, unaligned
}
// Size: 13 bytes (no padding)

This shrinks size to 13 bytes, ideal for tight memory constraints, but unaligned accesses to value and counter incur significant performance penalties.

Trade-Offs

Aspect	`repr(C)`	`repr(packed)`
Performance	Fast aligned access, cache-efficient	Slower unaligned access penalties
Memory Usage	Larger due to padding	Minimal footprint
Portability	Safe across platforms	Risk of UB or panics on strict architectures

Performance: repr(C) wins for speed—aligned access is faster and cache-efficient
Memory Usage: repr(packed) reduces footprint, critical for large arrays or tight constraints
Portability: repr(C) is safer; repr(packed) risks undefined behavior with unsafe dereferencing

Example Scenario

Real-time packet parser in a network server processing millions of packets per second:

#[repr(C)]
struct Packet {
    header: u8,   // 1 byte + 3 padding
    id: u32,      // 4 bytes
    payload: u64, // 8 bytes
}

With repr(C), size is 16 bytes, and id/payload are aligned, speeding up field access in tight loops checking id. Cache locality is decent since the struct fits in a 64-byte cache line.

If using repr(packed) (13 bytes), I'd save 3 bytes per packet, but unaligned id and payload accesses could halve throughput due to penalties—unacceptable for this workload.

Choice: repr(C) for performance-critical code. Consider reordering fields (payload, id, header) to group hot fields together.

Alternative scenario: Serializing thousands of tiny structs to disk with infrequent access—repr(packed) might make sense to minimize storage, accepting slower deserialization.

Advanced Considerations

Use profiling tools like perf to confirm cache miss reductions
Consider #[repr(C, packed)] for C-compatible but packed layout
Field reordering can optimize cache line usage without changing repr
Test trade-offs on target hardware, especially ARM vs x86_64

Key Takeaways

✅ repr(C): Choose for performance-critical code where cache efficiency matters
✅ repr(packed): Use for memory-constrained scenarios with infrequent access
🚀 Profile cache performance before and after to validate optimizations

Try This: What happens if you access a field in a repr(packed) struct through a raw pointer?
Answer: Unaligned access through raw pointers can cause panics on strict architectures or performance penalties—always measure on your target platform!