A Mutex Is Not Enough: How Do Condition Variables Let Threads Sleep?
In the previous article, we added a mutex to the task queue.
The problem it solved was:
If multiple threads operate on the task queue at the same time, they may corrupt the queue state.
So when we take a task, we lock the mutex:
pthread_mutex_lock(&pool->mutex);
task = pool->queue[pool->queue_head];
pool->queue_head = (pool->queue_head + 1) % pool->queue_capacity;
pool->queue_size -= 1;
pthread_mutex_unlock(&pool->mutex);
This protects shared state.
But a thread pool still has another important problem:
What should a worker do when the queue is empty?
If there is no task, the worker should not keep burning CPU.
It should:
sleep when there is nothing to do
wake up when a task arrives
That is the problem solved by condition variables.
The core idea of this article is:
A mutex alone is not enough. A thread also needs a mechanism for "sleep when nothing is ready, wake when something changes".
1. What should a worker do when the queue is empty?
The simplest worker loop looks like this:
while true:
take one task from the queue
run the task
But sometimes the queue is empty.
One naive implementation is:
for (;;) {
if (thread_pool_take(pool, &task) == 0) {
task.fn(task.arg);
}
}
This is called busy waiting.
The worker repeatedly checks the queue.
If the queue is empty, it immediately checks again.
The thread is doing no useful work, but it still consumes CPU.
That is wasteful.
What we really want is:
If the queue is empty, the worker sleeps.
When submit adds a task, one worker wakes up.
2. Can we just sleep for a while?
Another idea is:
for (;;) {
if (thread_pool_take(pool, &task) == 0) {
task.fn(task.arg);
} else {
sleep(1);
}
}
This avoids spinning.
But it creates a new problem: latency.
Suppose the worker checks the queue and finds it empty.
Then it sleeps for one second.
Right after it goes to sleep, the main thread submits a task.
The task is now in the queue, but the worker will not notice until the sleep finishes.
So the task waits for no good reason.
If we reduce the sleep time, CPU usage goes up again.
If we increase the sleep time, latency gets worse.
This is not a clean synchronization mechanism.
We need something more precise:
sleep exactly when the condition is not satisfied
wake when another thread changes the condition
That mechanism is a condition variable.
3. Enter condition variables
In pthreads, a condition variable is represented by:
pthread_cond_t
You can think of it as:
A notification mechanism that lets a thread wait for a shared state to change.
For a thread pool, the shared state might be:
queue_size > 0
This means:
there is at least one task in the queue
The condition variable itself does not store that condition.
The actual condition is still stored in shared state:
pool->queue_size
The condition variable is only used to wait and notify:
pthread_cond_wait(...)
pthread_cond_signal(...)
pthread_cond_broadcast(...)
So the relationship is:
mutex protects shared state
condition variable lets threads wait for changes to shared state
The most important sentence is:
A condition variable is not the condition itself.
The condition is still something you check manually:
while (pool->queue_size == 0) {
pthread_cond_wait(&pool->not_empty, &pool->mutex);
}
Here, pool->not_empty does not mean the queue is definitely non-empty.
It means:
threads waiting here want to be notified when the queue may become non-empty
After waking up, a thread must check queue_size again.
4. A condition variable must be used with a mutex
This is not optional.
pthread_cond_wait always works together with a mutex.
The usual pattern is:
pthread_mutex_lock(&pool->mutex);
while (pool->queue_size == 0) {
pthread_cond_wait(&pool->not_empty, &pool->mutex);
}
/* the condition is satisfied, use shared state */
pthread_mutex_unlock(&pool->mutex);
Why does it need the mutex?
Because the condition you are checking is shared state:
pool->queue_size
Reading it must be protected by the same mutex that protects modifications to it.
The submitting thread also changes that state under the mutex:
pthread_mutex_lock(&pool->mutex);
pool->queue[pool->queue_tail].fn = fn;
pool->queue[pool->queue_tail].arg = arg;
pool->queue_tail = (pool->queue_tail + 1) % pool->queue_capacity;
pool->queue_size += 1;
pthread_cond_signal(&pool->not_empty);
pthread_mutex_unlock(&pool->mutex);
So both sides follow the same rule:
check or change queue_size while holding the mutex
wait or notify through the condition variable
5. Why must wait release the lock?
This part is easy to misunderstand.
Suppose a worker locks the mutex and sees:
queue_size == 0
It wants to sleep.
But if it keeps holding the mutex while sleeping, no other thread can submit a task.
The submitter needs the same mutex:
pthread_mutex_lock(&pool->mutex);
If the sleeping worker never releases the mutex, the submitter cannot add a task, and therefore cannot wake the worker.
That would deadlock.
So pthread_cond_wait does two things as one atomic operation:
pthread_cond_wait(&pool->not_empty, &pool->mutex);
Conceptually, it does:
1. Release the mutex.
2. Put the current thread to sleep on the condition variable.
3. When woken up, lock the mutex again before returning.
The key detail is:
It atomically releases the mutex and enters the wait state.
This prevents the classic lost wake-up problem.
When pthread_cond_wait returns, the thread holds the mutex again.
That is why the code after pthread_cond_wait can safely check shared state again.
6. Why use while, not if?
This is the standard pattern:
while (pool->queue_size == 0 && !pool->stop) {
pthread_cond_wait(&pool->not_empty, &pool->mutex);
}
Many beginners write:
if the queue is empty:
wait
But the correct pattern is:
while the condition is not satisfied:
wait
There are several reasons.
First, a thread may wake up even if no task is available.
This is called a spurious wakeup.
Second, more than one worker may wake up, but only one of them gets the task.
The others must check the condition again.
Third, a wake-up only means:
something may have changed
It does not mean:
the condition is definitely true
So this is wrong:
if (pool->queue_size == 0) {
pthread_cond_wait(&pool->not_empty, &pool->mutex);
}
This is right:
while (pool->queue_size == 0 && !pool->stop) {
pthread_cond_wait(&pool->not_empty, &pool->mutex);
}
The rule is:
Always put
waitinside awhile. After waking up, always check the condition again.
Another way to say it:
condition variables notify changes
they do not guarantee facts
7. not_empty and not_full: do not mix different conditions
In the thread pool, we usually have more than one condition.
For workers, the important condition is:
the queue is not empty
For submitters, the important condition may be:
the queue is not full
So we use two condition variables:
pthread_cond_t not_empty;
pthread_cond_t not_full;
Workers wait on not_empty:
while (pool->queue_size == 0 && !pool->stop) {
pthread_cond_wait(&pool->not_empty, &pool->mutex);
}
Submitters wait on not_full:
while (pool->queue_size == pool->queue_capacity && !pool->stop) {
pthread_cond_wait(&pool->not_full, &pool->mutex);
}
This makes the intent clear:
not_empty wakes workers
not_full wakes submitters
When a task is submitted, the queue may become non-empty, so we notify not_empty.
When a worker takes a task, the queue may become non-full, so we notify not_full.
This separation is better than putting unrelated waiting reasons on one condition variable.
8. Use signal on the normal path
When submit adds one task, normally only one worker needs to wake up:
pool->queue[pool->queue_tail].fn = fn;
pool->queue[pool->queue_tail].arg = arg;
pool->queue_tail = (pool->queue_tail + 1) % pool->queue_capacity;
pool->queue_size += 1;
pthread_cond_signal(&pool->not_empty);
Only one new task was produced.
Waking one worker is usually enough.
Similarly, when a worker takes one task, it frees one queue slot:
task = pool->queue[pool->queue_head];
pool->queue_head = (pool->queue_head + 1) % pool->queue_capacity;
pool->queue_size -= 1;
pool->working_count += 1;
pthread_cond_signal(&pool->not_full);
Only one slot was freed.
Waking one blocked submitter is usually enough.
So the normal producer-consumer path is:
submit one task -> signal not_empty
worker takes one task -> signal not_full
9. Use broadcast when shutting down the thread pool
Shutdown is different.
When destroying the pool, we change a global state:
pool->stop = 1;
This state affects all waiting threads.
Some workers may be waiting on not_empty:
pthread_cond_wait(&pool->not_empty, &pool->mutex);
Some submitters may be waiting on not_full:
pthread_cond_wait(&pool->not_full, &pool->mutex);
If the pool is stopping, all of them should wake up and re-check stop.
So destroy should use broadcast:
pthread_mutex_lock(&pool->mutex);
pool->stop = 1;
pthread_cond_broadcast(&pool->not_empty);
pthread_cond_broadcast(&pool->not_full);
pthread_mutex_unlock(&pool->mutex);
A worker wakes up and checks:
if (pool->stop && pool->queue_size == 0) {
pthread_mutex_unlock(&pool->mutex);
break;
}
A submitter wakes up and checks:
if (pool->stop) {
pthread_mutex_unlock(&pool->mutex);
return -1;
}
So the shutdown path is:
global state changed -> broadcast to all relevant waiters
This is the key difference:
signal: wake one waiter
broadcast: wake all waiters
Normal production and consumption often use signal.
Shutdown uses broadcast.
10. Why does all_done also use broadcast?
The thread pool also has a condition for waiting until all tasks are finished.
The condition is:
pool->queue_size == 0 && pool->working_count == 0
queue_size == 0 means:
there are no queued tasks
working_count == 0 means:
no worker is currently running a task
The waiting function looks like this:
while (pool->queue_size > 0 || pool->working_count > 0) {
pthread_cond_wait(&pool->all_done, &pool->mutex);
}
After a worker finishes a task, it updates working_count:
pool->working_count -= 1;
if (pool->queue_size == 0 && pool->working_count == 0) {
pthread_cond_broadcast(&pool->all_done);
}
Why broadcast?
Because more than one thread could theoretically be waiting for the pool to become idle.
When the pool becomes idle, all of them can proceed.
Even if your current program has only one waiting thread, using broadcast here keeps the semantics clear:
the pool-wide completion condition became true
11. Summary
A mutex solves one problem:
protect shared state from concurrent modification
A condition variable solves another problem:
let threads sleep until shared state may have changed
In the thread pool, the common condition variables are:
not_empty: workers wait for tasks
not_full: submitters wait for queue space
all_done: waiters wait for all tasks to finish
The most important usage pattern is:
lock mutex
while condition is not satisfied:
cond_wait
use shared state
unlock mutex
The most important rule is:
wait must be inside while
For notification:
normal path:
submit one task -> signal not_empty
worker takes one task -> signal not_full
shutdown path:
set stop -> broadcast not_empty and not_full
completion path:
queue empty and no active workers -> broadcast all_done
With mutexes and condition variables together, the thread pool finally stops busy-waiting.
Workers can sleep when there is no work, wake when work arrives, and exit cleanly when the pool is shutting down.