When optimizing, treat performance as a design constraint: prevent contention/cache inefficiency, ensure server-side work is bounded per call, and keep tests from becoming slow bottlenecks.

1) Cache-line & contention hygiene (C structs)

Example pattern:

typedef struct __attribute__((aligned(CACHE_LINE_SIZE))) {
    redisAtomic long long io_reads_processed;
    redisAtomic long long io_writes_processed;
    /* other hot, per-IO-thread counters */
} IOThreadStats;

typedef struct {
    /* hot per-IO-thread fields */
    IOThreadStats stats; /* each thread updates its own cache line */
    /* cold / infrequently read fields after */
    int unused_cold_field;
} IOThread;

2) Bound work for collection-style commands

3) Keep tests fast and deterministic