Dynamic Allocation and Latency: A Simple Benchmark

April 23, 2026 · Luciano Muratore

Dynamic Allocation and Latency: A Simple Benchmark

On the path to understanding low-latency C++, one lesson shows up early and repeatedly: dynamic memory allocation is expensive. Not in an abstract sense—measurably, concretely, nanosecond-by-nanosecond expensive. A simple benchmark is enough to make this visible.


The Two Programs

The experiment compares two versions of the same loop, each running one million iterations.

Version 1 — with dynamic allocation (main_new):

#include <iostream>
#include <chrono>

int main() {
    const int N = 1'000'000;          // number of operations
    volatile long long sum = 0;        // prevent optimization

    auto start = std::chrono::high_resolution_clock::now();

    for (int i = 0; i < N; ++i) {
        int* p = new int(i);           // dynamically allocate memory
        sum += *p;                     // the operation we're timing
        delete p;                      // free allocated memory
    }

    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start);

    std::cout << "Total time: " << duration.count() << " ns\n";
    std::cout << "Average latency per operation: "
              << static_cast<double>(duration.count()) / N << " ns\n";

    return 0;
}

Version 2 — without dynamic allocation (main):

#include <iostream>
#include <chrono>

int main() {
    const int N = 1'000'000;          // number of operations
    volatile long long sum = 0;        // prevent optimization

    auto start = std::chrono::high_resolution_clock::now();

    for (int i = 0; i < N; ++i) {
        sum += i;                      // the operation we're timing
    }

    auto end = std::chrono::high_resolution_clock::now();
    auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start);

    std::cout << "Total time: " << duration.count() << " ns\n";
    std::cout << "Average latency per operation: "
              << static_cast<double>(duration.count()) / N << " ns\n";

    return 0;
}

The only difference is three lines: new, the dereference, and delete. Everything else is identical.


The Results

main_new — Total time: 141644700 ns
           Average latency per operation: 141.645 ns

main     — Total time: 1835900 ns
           Average latency per operation: 1.8359 ns

The version with dynamic allocation is roughly 77 times slower per operation. One million iterations of new/delete take 141 ms. One million iterations of a plain stack addition take under 2 ms.


Why Allocation Is Slow

new and delete are not free. Behind a single call to new, several things can happen:

  • The heap allocator must find a suitable free block of memory. This may require traversing a free list, splitting a block, or requesting more memory from the OS.
  • The allocator often holds an internal mutex to remain thread-safe. Acquiring that mutex is a synchronization cost.
  • The newly allocated memory may not be in the CPU cache, causing a cache miss on the first access.
  • delete must return the block and potentially coalesce adjacent free memory—more bookkeeping.

Each of these is individually small. Together, across a million calls, they add up to 141 ms.

A stack variable, by contrast, is just an offset from the stack pointer. No search, no mutex, no cache miss for a fresh allocation. The CPU handles it in a single instruction.


Why This Matters for Low-Latency Code

In real-time audio, game engines, trading systems, or any latency-sensitive context, the cost of allocation is not just about throughput—it is about worst-case timing. The heap allocator does not guarantee how long it will take. On a busy system, an allocation could stall for an unpredictable amount of time.

This is exactly why, as discussed in the context of JUCE’s Audio Thread, real-time callbacks must never allocate. The solution is always the same: preallocate everything before the real-time loop begins, and do nothing but use that memory during the time-critical section.


Summary

  • Dynamic allocation (new/delete) introduces latency through heap traversal, internal mutex contention, and cache misses.
  • A benchmark of one million iterations shows a ~77x latency difference: 141.6 ns per operation with allocation vs 1.8 ns without.
  • Stack variables are orders of magnitude faster because the CPU manages them directly, with no bookkeeping.
  • In low-latency contexts, the heap allocator’s unpredictability is as dangerous as its average cost.
  • Final Insight: Dynamic allocation is not just slow—it is unpredictably slow. In real-time code, that unpredictability is the real enemy. The numbers make it impossible to ignore.