c++ - atomic fetch_add vs add performance -


the code below demonstrates curiosities of multi-threaded programming. in particular performance of std::memory_order_relaxed increment vs regular increment in single thread. not understand why fetch_add(relaxed) single-threaded twice slower regular increment.

static void bm_incrementcounterlocal(benchmark::state& state) {   volatile std::atomic_int val2;    while (state.keeprunning()) {     (int = 0; < 10; ++i) {       donotoptimize(val2.fetch_add(1, std::memory_order_relaxed));     }   } } benchmark(bm_incrementcounterlocal)->threadrange(1, 8);  static void bm_incrementcounterlocalint(benchmark::state& state) {   volatile int val3 = 0;    while (state.keeprunning()) {     (int = 0; < 10; ++i) {       donotoptimize(++val3);     }   } } benchmark(bm_incrementcounterlocalint)->threadrange(1, 8); 

output:

       benchmark                               time(ns)    cpu(ns) iterations       ----------------------------------------------------------------------       bm_incrementcounterlocal/threads:1            59         60   11402509                                        bm_incrementcounterlocal/threads:2            30         61   11284498                                        bm_incrementcounterlocal/threads:4            19         62   11373100                                        bm_incrementcounterlocal/threads:8            17         62   10491608        bm_incrementcounterlocalint/threads:1         31         31   22592452                                        bm_incrementcounterlocalint/threads:2         15         31   22170842                                        bm_incrementcounterlocalint/threads:4          8         31   22214640                                        bm_incrementcounterlocalint/threads:8          9         31   21889704   

with volatile int, compiler must ensure not optimize away and/or reorder reads/writes of variable.

with fetch_add, cpu must take precautions read-modify-write operation atomic.

these 2 different requirements: atomicity requirement means cpu has communicate other cpus on machine, ensuring don't read/write given memory location between own read , write. if compiler compiles fetch_add using compare-and-swap instruction, emit short loop catch case other cpu modified value in between.

for volatile int no such communication necessary. on contrary, volatile requires compiler not invent reads: volatile designed single thread communication hardware registers, mere act of reading value may have side effects.


Comments

Popular posts from this blog

ruby - Trying to change last to "x"s to 23 -

jquery - Clone last and append item to closest class -

c - Unrecognised emulation mode: elf_i386 on MinGW32 -