CONCURRENCY IN C++
Yuqing Xia
CSCI 5828
Prof. Kenneth M. Anderson
University of Colorado at Boulder
OUTLINE
ï‚„ Introduction
ï‚„ Importance of Concurrency
ï‚„ Threads launch
ï‚„ Protection on shared data
ï‚„ Atomic
ï‚„ Mutex
ï‚„ Communication
ï‚„ Condition variables
ï‚„ Memory model
ï‚„ Definition
ï‚„ Operaton orders
INTRODUCTION-WHY CONCURRENCY
IS NECESSARY
ï‚„ Processor speed has slowed and we can use transistors from Moore’s
law for parallelism to increase speed.
ï‚„ Concurrent programming is necessary to utilize parallel hardware.
ï‚„ Sometimes it is natural to describe a problem with multi-threads just
like divide a problem into several steps.
CONCURRENCY INTRODUCED TO C++11
ï‚„ Original C++ Stan-dard was published in 1998 only support single
thread programming
ï‚„ The new C++ Standard (referred to as C++11 or C++0x) was
published in 2011. It will acknowledge the existence of multithreaded
programs. Memory models for concurrency is also introduced.
WHY WRITE CONCURRENT PROGRAMS
ï‚„ Dividing a problem into multiple executing threads is an important
programming technique.
ï‚„ Multiple executing threads may be the best way to describe a problem
ï‚„ With multiple executing threads, we can write highly efficient program
by taking advantage of any parallelism available in computer system.
CONCURRENCY REQUIREMENT ON
PROGRAMMING LANGUAGE
ï‚„ Thread creation – be able to create another thread of control.
ï‚„ Thread synchronization – be able to establish timing relationships
among threads.
ï‚„ One thread waits until another thread has reached a certain point in its code.
ï‚„ One threads is ready to transmit information while the other is ready to receive the
message, simultaneously.
ï‚„ Thread communication – be able to correctly transmit data among
threads.
THREAD CREATION
ï‚„ C++11 introduced a new thread library including utilities for starting
and managing threads.
ï‚„ Creating an instance of std::thread will automatically start a new thread.
ï‚„ The join function here is to force the current thread to wait for the
thread to th finish. Otherwise the main function may exit without the
thread th finished
Two thread will be created. The
main thread will launch a new
thread when it encounter the
code std::thread th( ) to executive
the function threadFucntion();
CRITICAL SECTION
ï‚„ Data are usually shared between threads. There is a problem when
multiple threads attempting to operate on the same object
simultaneously.
ï‚„ If the operation is atomic(not divisible) which means no other thread can modify any
partial results during the operation on the object, then it is safe. Otherwise, we are in
a race condition.
ï‚„ a critical section is a piece of code that accesses a shared resource
(data structure or device) that must not be concurrently accessed by
more than one thread of execution
ï‚„ Preventing simultaneous execution of critical section by multiple thread
is called mutual exclusion.
EXAMPLE
ï‚„ Shared objects between threads will lead synchronization issues. For example
5 threads created try to increase the
counter 5000 times. This program
has a synchronization problem. Here
are some result obtained on my
computer:
24138
20326
23345
25000
17715
It is not the same every time.
PROTECT SHARED DATA
ï‚„ The problem is that the increment is not an atomic operation
ï‚„ Atomic operation: during which a processor can simultaneously read a location and
write it in the same bus operation. Atomic implies indivisibility and irreducibility, so
an atomic operation must be performed entirely or not performed at all.
ï‚„ The increment in the example is made of three operations:
ï‚„ Read the current value of vaue
ï‚„ Add one to the current value
ï‚„ Write that new value to value
ï‚„ So when you launch more than one thread, they might interleave with
each other and make the result impossible to predict.
PROTECT SHARED DATA
ï‚„ Solutions
ï‚„ Semaphores — Mutex is a binary semaphore.
ï‚„ Atomic references
ï‚„ Monitors — guarantee on thread can be active within monitor at a time. C++ does
not support monitor but Java does.
ï‚„ Condition variables.
ï‚„ Compare-and-swap — It compares the contents of a memory location to a given
value and, only if they are the same, modifies the contents of that memory location
to a given new value
ï‚„ Etc.
ï‚„ Here we will only introduce the most common solutions mutexes and
atomic reference in C++.
PROTECT SHARED DATA WITH MUTEXES
ï‚„ Mutexes(name after mutual exclusion) enable us to mark the code that
access the data structure as mutually exclusive so that if any thread was
running one of them, any other tread that tried to access the data had to
wait until the first thread was finished
ï‚„ In C++, you create a mutex by constructing an instance of std::mutex,
lock it with a call to the member function lock() and unlock it with a call
to the function unlock().
• Lock(): enable a
thread to obtain the
lock to block other
thread.
• Unlock(): release the
lock to unblock
waiting threads.
RAII IDIOM
mutex.lock() is called when the
instance of std::lock_guard is
constructed and mutex.unlock() is
called when the instance guard is
descontructed.
Because of mutexes, only one thread
can do counter.increment() each time
ensuring the correctness of our
result.
It is not wise to call the member functions directly because you have to remember to
Unlock() on every code path out of a function including those due to exceptions.
The template std::lock_guard implements that Resource Acquisition Is Initialization
(RAII) idiom for a mutex
ADANCED LOCKING WITH MUTEXES
ï‚„ Recursive Lokcing .
ï‚„ std::recursive_mutex
ï‚„ Recursive locking enable the same thread to lock the same mutex twice and won’t
deadlock.
ï‚„ Timed Locking
ï‚„ std::timed_mutex, std::recursive_timed_mutex
ï‚„ Timed locking enable a thread to do something else when waiting for a thread to finish.
ï‚„ Call once
ï‚„ Std::call_once(std::once_flag falg, function);
ï‚„ It is possible we only want a function to be called only one time no matter how many
thread is launched. Each std::call_once is matched to a std:::once_flag variable.
USING ATOMIC TYPES
ï‚„ C++11 concurrency library introduces atomic types as a template class:
std::atomic. You can use any type you want with the template and the
operation on that variable will be atomic and so thread-safe.
ï‚„ std::atomic<Type> object.
ï‚„ Different locking technique is applied according to the data type and
size.
ï‚„ lock-free technique: integral types like int, long, float. It is much faster than mutexes
technique.
ï‚„ Mutexes technique: for big type(such as 2MB storage). There is no performance
advantage for atomic type over mutexes.
EXAMPLE OF USING ATOMIC TYPES
The same example with atomic template Speed comparison between atomic type
and mutexes
SYNCHRONIZATION BETWEEN THREADS
ï‚„ Except for protecting shared data, we also need to synchronization
action on separate threads.
ï‚„ In C++ Standard Library, conditional variables and futures are
provided to handle synchronization problems.
ï‚„ The condition_variable class is a synchronization primitive that can be used to block
a thread, or multiple threads at the same time, until:
ï‚„ a notification is received from another thread
ï‚„ a timeout expires
ï‚„ Any thread that intends to wait on std::condition_variable has to acquire a
std::unique_lock first. The wait operations atomically release the mutex and
suspend the execution of the thread. When the condition variable is notified, the
thread is awakened, and the mutex is reacquired.
EXAMPLE
ï‚„ queue is used to pass data
between two threads
ï‚„ When data is ready, the
thread locks the mutex and
push the data into the
queue(#2) and then call
notify_one() member function
in std::condition_variable
instance to notify the waiting
thread(#3)
EXAMPLE
ï‚„ On the other hand, the
processing thread first
lock the mutex with
std::unique_lock. The
thread calls wait() in the
condition varaible and
checking the condition in
the lambda function.
ï‚„ When the condition variable is notified by a call to notify_one() from the data
preparation thread, the thread wakes and check the condition and lock the
mutex if the condition is true and then process the next command.
MORE ABOUT UNIQUE_LOCK
ï‚„ The condition variables require std::unique_lock rather than the
std::lock_quard — the waiting thread must unlock the mutex while it is
waiting , the lock it again afterwards and the std::lock_guard does not
provide such flexibility.
ï‚„ The flexibility to unlock a std::unique_lock is not just used for the call
to wait() , it is also used once we've got the data to process, but before
processing it (#6): processing data can potentially be a time-consuming
operation, and as we saw in chapter 3, it is a bad idea to hold a lock on a
mutex for longer than necessary.
ONE-OFF EVENT WITH FUTURES
ï‚„ If a thread needs to wait for a specific one-off event, then it obtains a
future representing this event. The thread can poll the future to see if
the event has occurred while performing some other task.
ï‚„ Two sorts of futures templates in C++ Standard Library.
ï‚„ std::unique_furture<> — the instance is the only one that refers to its associated
event.
ï‚„ std::shared_future<> — multiple instances of it may refer to the same event. All the
instance become ready at the same time, and they may all access any data associated
with the event.
ONE-OFF EVENT WITH FUTURES
ï‚„ The first thread, running
wait_for_flight1() obtains a
std::shared_future<bording_inf
omation> with the bording
information(#1), and call get(),
which waits for the future to
become ready.
ï‚„ The second thread, running
wait_for_flight2() , after
obtaining the future(#2), does
something else while
periodically checking to see if
the flight is ready to board by
calling is_ready() on the
future(#4).
A GLANCE OF MEMORY MODEL
ï‚„ Why a C++ Memory Model
ï‚„ Problem: Hard for programmers to reason about correctness
ï‚„ Without precise semantics, hard to reason if compiler will violate
semantics
ï‚„ Compiler transformations could introduce data races without violating
language specification.
——resulting execution could yield unexpected behaviors.
MEMORY MODEL
ï‚„ Two aspects to the memory model:
I. the basic structural aspects — how things are laid out in memory
1. Every variable is an object, including those that are members of other objects.
2. Every object occupies at least one memory location.
3. Variables of fundamental type such as int or char are exactly one memory location,
whatever their size, even if they’re adjacent or part of an array.
4. Adjacent bit fields are part of the same memory location.
II. The concurrency aspects
1. If there is no enforced ordering between two accesses to a single memory location from
separate threads, these accesses is not atomic,
2. if one or both accesses is a write, this is a data race, and causes undefined behaviour.
MEMORY MODEL
OPERATIONS ORDERS
ï‚„ Each of the operations on atomic types has an optional memory-ordering
argument that be used to specify the required memory-ordering semantics.
These operations can be divided into three categories:
1. Store operation, which can have memory_order_relaxed,
memorey_order_release or memory_order_seq_cst ordering
2. Load operations, which can have memory_order_relaxed,
memory_order_consume, memory_order_acquire, or
memory_order_seq_cst ordering
3. Read-modify-write operations, which can have memory_order_relaxed,
memory_order_consume, memory_order_acquire,
memory_order_release, memory_order_acq_rel, or
memory_order_seq_cst ordering
EXAMPLE
ï‚„ The behavior here is undefined.
ï‚„ The default mode for atomic loads/stores in c++11 is to enforce sequential
consistency which means all loads and stores must be “as if” they happened in
the order you wrote them within each thread. The possible output are:
0 0 (thread2 runs before thread 1)
37 17 (thread 2 runs after thread 1)
0 17 (threads 2 runs after thread 1 assigns to x but before it assigns to y)
Let see how the threads ordering will affect the result.
EXAMPLE
Relaxed ordering: there are no constraints on reordering of memory
accesses around the atomic variable. So you might get the output
37 0
The result is the same as sequential consistency.
MEMORY MODEL
ï‚„ Memory model provides low-level atomic operations
ï‚„ Expert programmers can maximize performance
ï‚„ Atomic variables can be explicitly parameterized with respect to
memory ordering constrain allowing instruction to be reordered with
other memory operations.
ï‚„ For read-modify-write operations, programmer can specify whether the
operations acts as an acquire, a release operation, neither(relaxed), or
both.
CONCLUSION
ï‚„ C++ committee introduced the concurrency into C++0X and C++11
make it support multi-threads which make C++ a adaptive to the
current programming style.
ï‚„ To make concurrency possible, language should support threads launch,
threads synchronization and threads communication.
ï‚„ The basic problem in concurrency is how to protected shared data and
synchronize threads. Mutexes and atomic template are the solution to
race conditions.
ï‚„ Lower level control based on memory models allow us to clear the
semantic meaning in concurrency operation
Questions?
Thanks you!
cocnurrency_C++
Posted on at