Dynamic Allocation and Latency: A Simple Benchmark
April 23, 2026 · Luciano Muratore
Dynamic Allocation and Latency: A Simple Benchmark
On the path to understanding low-latency C++, one lesson shows up early and repeatedly: dynamic memory allocation is expensive. Not in an abstract sense—measurably, concretely, nanosecond-by-nanosecond expensive. A simple benchmark is enough to make this visible.
The Two Programs
The experiment compares two versions of the same loop, each running one million iterations.
Version 1 — with dynamic allocation (main_new):
#include <iostream>
#include <chrono>
int main() {
const int N = 1'000'000; // number of operations
volatile long long sum = 0; // prevent optimization
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < N; ++i) {
int* p = new int(i); // dynamically allocate memory
sum += *p; // the operation we're timing
delete p; // free allocated memory
}
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start);
std::cout << "Total time: " << duration.count() << " ns\n";
std::cout << "Average latency per operation: "
<< static_cast<double>(duration.count()) / N << " ns\n";
return 0;
}
Version 2 — without dynamic allocation (main):
#include <iostream>
#include <chrono>
int main() {
const int N = 1'000'000; // number of operations
volatile long long sum = 0; // prevent optimization
auto start = std::chrono::high_resolution_clock::now();
for (int i = 0; i < N; ++i) {
sum += i; // the operation we're timing
}
auto end = std::chrono::high_resolution_clock::now();
auto duration = std::chrono::duration_cast<std::chrono::nanoseconds>(end - start);
std::cout << "Total time: " << duration.count() << " ns\n";
std::cout << "Average latency per operation: "
<< static_cast<double>(duration.count()) / N << " ns\n";
return 0;
}
The only difference is three lines: new, the dereference, and delete. Everything else is identical.
The Results
main_new — Total time: 141644700 ns
Average latency per operation: 141.645 ns
main — Total time: 1835900 ns
Average latency per operation: 1.8359 ns
The version with dynamic allocation is roughly 77 times slower per operation. One million iterations of new/delete take 141 ms. One million iterations of a plain stack addition take under 2 ms.
Why Allocation Is Slow
new and delete are not free. Behind a single call to new, several things can happen:
- The heap allocator must find a suitable free block of memory. This may require traversing a free list, splitting a block, or requesting more memory from the OS.
- The allocator often holds an internal mutex to remain thread-safe. Acquiring that mutex is a synchronization cost.
- The newly allocated memory may not be in the CPU cache, causing a cache miss on the first access.
deletemust return the block and potentially coalesce adjacent free memory—more bookkeeping.
Each of these is individually small. Together, across a million calls, they add up to 141 ms.
A stack variable, by contrast, is just an offset from the stack pointer. No search, no mutex, no cache miss for a fresh allocation. The CPU handles it in a single instruction.
Why This Matters for Low-Latency Code
In real-time audio, game engines, trading systems, or any latency-sensitive context, the cost of allocation is not just about throughput—it is about worst-case timing. The heap allocator does not guarantee how long it will take. On a busy system, an allocation could stall for an unpredictable amount of time.
This is exactly why, as discussed in the context of JUCE’s Audio Thread, real-time callbacks must never allocate. The solution is always the same: preallocate everything before the real-time loop begins, and do nothing but use that memory during the time-critical section.
Summary
- Dynamic allocation (
new/delete) introduces latency through heap traversal, internal mutex contention, and cache misses. - A benchmark of one million iterations shows a ~77x latency difference: 141.6 ns per operation with allocation vs 1.8 ns without.
- Stack variables are orders of magnitude faster because the CPU manages them directly, with no bookkeeping.
- In low-latency contexts, the heap allocator’s unpredictability is as dangerous as its average cost.
- Final Insight: Dynamic allocation is not just slow—it is unpredictably slow. In real-time code, that unpredictability is the real enemy. The numbers make it impossible to ignore.