c++ - C++11 atomic memory ordering - is this a correct usage of relaxed (release-consume) ordering? -


i have made port c++11 using std::atomic of triple buffer used concurrency sync mechanism. idea behind thread sync approach producer-consumer situation have producer running faster consumer, triple buffering can give benefits since producer thread won't "slowed" down having wait consumer. in case, have physics thread updated @ ~120fps, , render thread running @ ~60fps. obviously, want render thread gets recent state possible, know skipping lot of frames physics thread, because of difference in rates. on other hand, want physics thread maintain constant update rate , not limited slower render thread locking data.

the original c code made remis-thoughts , full explanation in blog. encourage interested in reading further understanding of original implementation.

my implementation can found here.

the basic idea have array 3 positions (buffers) , atomic flag compare-and-swapped define array elements correspond state, @ given time. way, 1 atomic variable used model 3 indexes of array , logic behind triple buffering. buffer's 3 positions named dirty, clean , snap. producer writes dirty index, , can flip writer swap dirty current clean index. consumer can request new snap, swaps current snap index clean index recent buffer. consumer reads buffer in snap position.

the flag consists of 8 bit unsigned int , bits correspond to:

(unused) (new write) (2x dirty) (2x clean) (2x snap)

the newwrite bit flag set writer , cleared reader. reader can use check if there have been writes since last snap, , if not won't take snap. flag , indexes can obtained using simple bitwise operations.

ok code:

template <typename t> class triplebuffer {  public:    triplebuffer<t>();   triplebuffer<t>(const t& init);    // non-copyable behavior   triplebuffer<t>(const triplebuffer<t>&) = delete;   triplebuffer<t>& operator=(const triplebuffer<t>&) = delete;    t snap() const; // current snap read   void write(const t newt); // write new value   bool newsnap(); // swap latest value, if   void flipwriter(); // flip writer positions dirty / clean    t readlast(); // wrapper read last available element (newsnap + snap)   void update(t newt); // wrapper update new element (write + flipwriter)  private:    bool isnewwrite(uint_fast8_t flags); // check if newwrite bit 1   uint_fast8_t swapsnapwithclean(uint_fast8_t flags); // swap snap , clean indexes   uint_fast8_t newwriteswapcleanwithdirty(uint_fast8_t flags); // set newwrite 1 , swap clean , dirty indexes    // 8 bit flags (unused) (new write) (2x dirty) (2x clean) (2x snap)   // newwrite   = (flags & 0x40)   // dirtyindex = (flags & 0x30) >> 4   // cleanindex = (flags & 0xc) >> 2   // snapindex  = (flags & 0x3)   mutable atomic_uint_fast8_t flags;    t buffer[3]; }; 

implementation:

template <typename t> triplebuffer<t>::triplebuffer(){    t dummy = t();    buffer[0] = dummy;   buffer[1] = dummy;   buffer[2] = dummy;    flags.store(0x6, std::memory_order_relaxed); // dirty = 0, clean = 1 , snap = 2 }  template <typename t> triplebuffer<t>::triplebuffer(const t& init){    buffer[0] = init;   buffer[1] = init;   buffer[2] = init;    flags.store(0x6, std::memory_order_relaxed); // dirty = 0, clean = 1 , snap = 2 }  template <typename t> t triplebuffer<t>::snap() const{    return buffer[flags.load(std::memory_order_consume) & 0x3]; // read snap index }  template <typename t> void triplebuffer<t>::write(const t newt){    buffer[(flags.load(std::memory_order_consume) & 0x30) >> 4] = newt; // write dirty index }  template <typename t> bool triplebuffer<t>::newsnap(){    uint_fast8_t flagsnow(flags.load(std::memory_order_consume));   {     if( !isnewwrite(flagsnow) ) // nothing new, no need swap       return false;   } while(!flags.compare_exchange_weak(flagsnow,                                        swapsnapwithclean(flagsnow),                                        memory_order_release,                                        memory_order_consume));   return true; }  template <typename t> void triplebuffer<t>::flipwriter(){    uint_fast8_t flagsnow(flags.load(std::memory_order_consume));   while(!flags.compare_exchange_weak(flagsnow,                                      newwriteswapcleanwithdirty(flagsnow),                                      memory_order_release,                                      memory_order_consume)); }  template <typename t> t triplebuffer<t>::readlast(){     newsnap(); // recent value     return snap(); // return }  template <typename t> void triplebuffer<t>::update(t newt){     write(newt); // write new value     flipwriter(); // change dirty/clean buffer positions next update }  template <typename t> bool triplebuffer<t>::isnewwrite(uint_fast8_t flags){     // check if newwrite bit 1     return ((flags & 0x40) != 0); }  template <typename t> uint_fast8_t triplebuffer<t>::swapsnapwithclean(uint_fast8_t flags){     // swap snap clean     return (flags & 0x30) | ((flags & 0x3) << 2) | ((flags & 0xc) >> 2); }  template <typename t> uint_fast8_t triplebuffer<t>::newwriteswapcleanwithdirty(uint_fast8_t flags){     // set newwrite bit 1 , swap clean dirty      return 0x40 | ((flags & 0xc) << 2) | ((flags & 0x30) >> 2) | (flags & 0x3); } 

as can see, have decided use release-consume pattern memory ordering. release (memory_order_release) store assures no writes in current thread can reordered after store. on other side, consume assures no reads in current thread dependent on value loaded can reordered before load. ensures writes dependent variables in other threads release same atomic variable visible in current thread.

if understanding correct, since need flags atomically set, operations on other variables don't affect directly flags can reordered freely compiler, allowing more optimizations. reading documents on new memory model, aware these relaxed atomics have noticeable effect on platforms such arm , power (they introduced because of them). since targeting arm, believe benefit these operations , able squeeze little bit more performance out.

now question:

am using correctly release-consume relaxed ordering specific problem?

thanks,

andré

ps: sorry long post, believed decent context needed better view of problem.

edit : implemented @yakk's suggestions:

  • fixed flags read on newsnap() , flipwriter() using direct assignment, hence using default load(std::memory_order_seq_cst).
  • moved bit fiddling operations dedicated functions clarity.
  • added bool return type newsnap(), returns false when there's nothing new , true otherwise.
  • defined class non-copyable using = delete idiom since both copy , assignment constructors unsafe if triplebuffer being used.

edit 2: fixed description, incorrect (thanks @useless). consumer requests new snap , reads snap index (not "writer"). sorry distraction , useless pointing out.

edit 3: optimized newsnap() , flipriter() functions according @display name's suggestions, removing 2 redundant load()'s per loop cycle.

why loading old flags value twice in cas loops? first time flags.load(), , second compare_exchange_weak(), standard specifies on cas failure load previous value first argument, in case flagsnow.

according http://en.cppreference.com/w/cpp/atomic/atomic/compare_exchange, "otherwise, loads actual value stored in *this expected (performs load operation)." loop doing on failure, compare_exchange_weak() reloads flagsnow, loop repeats, , first statement loads once again, after load compare_exchange_weak(). seems me loop ought instead have load pulled outside loop. example, newsnap() be:

uint_fast8_t flagsnow(flags.load(std::memory_order_consume)); {     if( !isnewwrite(flagsnow)) return false; // nothing new, no need swap } while(!flags.compare_exchange_weak(flagsnow, swapsnapwithclean(flagsnow), memory_order_release, memory_order_consume)); 

and flipwriter():

uint_fast8_t flagsnow(flags.load(std::memory_order_consume)); while(!flags.compare_exchange_weak(flagsnow, newwriteswapcleanwithdirty(flagsnow), memory_order_release, memory_order_consume)); 

Comments

Popular posts from this blog

ruby - Trying to change last to "x"s to 23 -

jquery - Clone last and append item to closest class -

c - Unrecognised emulation mode: elf_i386 on MinGW32 -