Noise Cancellation in C++ and Python: Phase 1 — Real-Time Audio I/O with PortAudio
April 29, 2026 · Luciano Muratore
This is the first article in a series about building a noise cancellation system using C++ as the audio engine and Python as the AI layer. Before any AI model can clean up a signal, the program needs to reliably capture audio from a microphone and send it back out to a speaker or headphone. That is what Phase 1 is about.
The Architecture
The full system is built as two separate processes that communicate over a bridge:
[Microphone] → [C++ Audio Engine] → [Python AI Model] → [C++ Output] → [Speakers]
C++ owns everything that touches hardware and requires real-time performance. Python owns the AI model that predicts and removes noise. The bridge between them will be built in a later phase. For now, the goal is to get the C++ audio pipeline working end to end.
Why C++ for Audio I/O
Real-time audio processing has strict timing requirements. Every chunk of audio must be captured, processed, and played back within a fixed time window — typically a few milliseconds. If that window is missed, the result is an audible glitch.
C++ is the right tool here for three reasons. It gives direct control over memory with no garbage collector that could pause execution at the wrong moment. It runs close to the hardware with predictable latency. And it has mature audio libraries, PortAudio being the most widely used, that are designed around these constraints.
What Is PortAudio
PortAudio is a cross-platform C library that provides a unified interface for audio input and output. The same code works on Windows, macOS, and Linux, regardless of the underlying audio system (WASAPI, CoreAudio, ALSA, JACK, and others). It handles the low-level details of talking to audio drivers and exposes a simple callback-based API to the developer.
The Callback Model
PortAudio does not work by letting you ask for audio whenever you want. Instead, it calls your function automatically every time a new chunk of audio is ready. This is the callback model, and it is the most important concept in this phase.
PortAudio (high-priority thread)
↓ calls every ~32ms
audioCallback(inputBuffer, outputBuffer, frameCount, ...)
↓
Your code runs here
The callback runs on a dedicated high-priority thread managed by PortAudio. Because of this, there are strict rules about what you can do inside it. You must never allocate or free memory, never perform file or network I/O, and never block on a lock. Any of these can cause the callback to miss its deadline and produce a glitch.
Key Configuration Values
Three constants define the behaviour of the audio pipeline:
constexpr int SAMPLE_RATE = 16000; // samples per second
constexpr int FRAMES_PER_BUFFER = 512; // chunk size per callback
constexpr int NUM_CHANNELS = 1; // mono audio
Sample rate of 16,000 Hz means the audio is sampled 16,000 times per second. This is the standard rate for speech and voice processing — high enough to capture the full frequency range of the human voice, and low enough to keep the processing load manageable.
Frames per buffer of 512 means each callback receives and produces 512 samples at a time. At 16,000 Hz, this corresponds to approximately 32 milliseconds of audio per callback. Smaller buffers reduce latency but increase CPU pressure. Larger buffers are more forgiving but feel less real-time.
Mono is sufficient for a microphone noise cancellation use case. Stereo would double the data without adding meaningful information for this task.
The Full Code
#include <iostream>
#include <vector>
#include <portaudio.h>
// -----------------------------------------------
// Configuration
// -----------------------------------------------
constexpr int SAMPLE_RATE = 16000;
constexpr int FRAMES_PER_BUFFER = 512;
constexpr int NUM_CHANNELS = 1;
// -----------------------------------------------
// Audio Buffer
// -----------------------------------------------
struct AudioData {
std::vector<float> buffer;
};
// -----------------------------------------------
// PortAudio Callback
// Called automatically every time a new audio
// chunk is ready. Runs on a high-priority thread.
// IMPORTANT: No memory allocation or I/O here!
// -----------------------------------------------
static int audioCallback(
const void* inputBuffer,
void* outputBuffer,
unsigned long framesPerBuffer,
const PaStreamCallbackTimeInfo* timeInfo,
PaStreamCallbackFlags statusFlags,
void* userData)
{
const float* in = static_cast<const float*>(inputBuffer);
float* out = static_cast<float*>(outputBuffer);
if (inputBuffer == nullptr) {
for (unsigned long i = 0; i < framesPerBuffer; ++i)
out[i] = 0.0f;
return paContinue;
}
// PASSTHROUGH: input goes directly to output
// Phase 3 will replace this with AI-cleaned audio
for (unsigned long i = 0; i < framesPerBuffer; ++i)
out[i] = in[i];
return paContinue;
}
// -----------------------------------------------
// List all available audio devices
// -----------------------------------------------
void listAudioDevices() {
int deviceCount = Pa_GetDeviceCount();
std::cout << "\n=== Available Audio Devices ===\n";
for (int i = 0; i < deviceCount; ++i) {
const PaDeviceInfo* info = Pa_GetDeviceInfo(i);
std::cout << "[" << i << "] " << info->name
<< " | IN: " << info->maxInputChannels
<< " | OUT: " << info->maxOutputChannels
<< "\n";
}
std::cout << "\nDefault Input Device: " << Pa_GetDefaultInputDevice() << "\n";
std::cout << "Default Output Device: " << Pa_GetDefaultOutputDevice() << "\n";
std::cout << "================================\n\n";
}
// -----------------------------------------------
// Main
// -----------------------------------------------
int main() {
std::cout << "=== Noise Cancellation — Phase 1: Audio I/O ===\n";
// 1. Initialize PortAudio
PaError err = Pa_Initialize();
if (err != paNoError) {
std::cerr << "PortAudio init failed: " << Pa_GetErrorText(err) << "\n";
return 1;
}
// 2. List devices
listAudioDevices();
// 3. Input (mic) parameters
PaStreamParameters inputParams;
inputParams.device = Pa_GetDefaultInputDevice();
inputParams.channelCount = NUM_CHANNELS;
inputParams.sampleFormat = paFloat32;
inputParams.suggestedLatency = Pa_GetDeviceInfo(inputParams.device)->defaultLowInputLatency;
inputParams.hostApiSpecificStreamInfo = nullptr;
// 4. Output (speakers) parameters
PaStreamParameters outputParams;
outputParams.device = Pa_GetDefaultOutputDevice();
outputParams.channelCount = NUM_CHANNELS;
outputParams.sampleFormat = paFloat32;
outputParams.suggestedLatency = Pa_GetDeviceInfo(outputParams.device)->defaultLowOutputLatency;
outputParams.hostApiSpecificStreamInfo = nullptr;
// 5. Open the stream
AudioData audioData;
PaStream* stream;
err = Pa_OpenStream(
&stream,
&inputParams,
&outputParams,
SAMPLE_RATE,
FRAMES_PER_BUFFER,
paClipOff,
audioCallback,
&audioData
);
if (err != paNoError) {
std::cerr << "Failed to open stream: " << Pa_GetErrorText(err) << "\n";
Pa_Terminate();
return 1;
}
// 6. Start the stream
err = Pa_StartStream(stream);
if (err != paNoError) {
std::cerr << "Failed to start stream: " << Pa_GetErrorText(err) << "\n";
Pa_Terminate();
return 1;
}
std::cout << "✓ Audio stream started!\n";
std::cout << " Sample Rate: " << SAMPLE_RATE << " Hz\n";
std::cout << " Buffer Size: " << FRAMES_PER_BUFFER << " frames\n";
std::cout << " Latency (approx): "
<< (FRAMES_PER_BUFFER * 1000.0 / SAMPLE_RATE) << " ms\n\n";
std::cout << "Speak into your mic — you should hear yourself in the output.\n";
std::cout << "Press ENTER to stop...\n";
std::cin.get();
// 7. Clean up
Pa_StopStream(stream);
Pa_CloseStream(stream);
Pa_Terminate();
std::cout << "Stream stopped. Phase 1 complete!\n";
return 0;
}
Walking Through the Code
Initialization — Pa_Initialize() starts the PortAudio engine and detects all available audio devices on the system. It must be called before any other PortAudio function, and its counterpart Pa_Terminate() must always be called before the program exits.
Device listing — Pa_GetDeviceCount() and Pa_GetDeviceInfo() let the program enumerate every audio device the operating system exposes. This is useful for debugging and for understanding what hardware is available. In Phase 1, the program simply uses the system defaults.
Stream parameters — PaStreamParameters is a struct that describes one side of the audio connection. Two are created: one for input (the microphone) and one for output (the speakers or headphones). The most important fields are device, channelCount, sampleFormat, and suggestedLatency. The sample format paFloat32 means each audio sample is a 32-bit floating point number in the range [-1.0, 1.0]. This is the format the AI model will consume in later phases.
Opening the stream — Pa_OpenStream() connects the two parameter structs and registers the callback function. After this call, PortAudio knows what hardware to use, what format the audio is in, and what function to call when data is ready. The stream is not running yet at this point.
Starting the stream — Pa_StartStream() begins the real-time audio loop. From this moment, PortAudio calls audioCallback continuously on its internal high-priority thread.
The passthrough — Inside the callback, the current implementation simply copies every sample from the input buffer to the output buffer. This is the simplest possible thing the callback can do, and it confirms that the full pipeline — microphone capture, callback invocation, output playback — is working correctly. Speaking into the microphone should produce an audible echo in the headphones.
Cleanup — Pa_StopStream(), Pa_CloseStream(), and Pa_Terminate() shut everything down in order. Skipping any of these can leave audio drivers in a bad state.
The Passthrough as a Foundation
The passthrough is not the final behaviour. It is a deliberate placeholder that serves one purpose: proving the pipeline works before adding complexity.
In Phase 3, the single line that copies input to output:
out[i] = in[i];
will be replaced by audio that has been processed by the Python AI model. The callback will read from a shared buffer that Python writes into, rather than reading directly from the microphone. But the structure of the callback — the function signature, the loop over frames, the return value — stays exactly the same.
What This Phase Establishes
Phase 1 establishes three things that the rest of the project depends on. It confirms the audio hardware is accessible and working. It establishes 16kHz mono Float32 as the audio format for the entire pipeline, which the AI model in Phase 2 will be trained to expect. And it introduces the callback pattern that will remain the heartbeat of the C++ engine through all subsequent phases.
The next phase moves to Python, where DeepFilterNet will be set up to process audio chunks in exactly this format.
Things to improve
There is a delay when I hear what I have spoken through the headphones .
Mic captures audio (~5ms)
Windows audio driver (~10-20ms)
PortAudio buffer (~32ms ← our FRAMES_PER_BUFFER = 512)
Output driver (~10-20ms)
─────────────────────────────────
Total (~57-72ms)
Github Link: https://github.com/Dextromethorpan/Noise_Cancellation