Ring Buffers Are Cool
Ring Buffers Are Cool
I’ve kinda really gotten into Ring Buffers. It’s such a simple data structure yet there are a million ways to optimize around it and also, it’s still so massively useful. Kind of reminds me of the situation with B+ Trees or LSM Trees except that Ring Buffers are way easier to grasp. Here is a pretty simple introduction into Ring Buffers: Data Structure and Algorithms: Ring Buffer. I do like the Python version because we get pseudo-infinite read/write pointers due to Python’s int implementation, though typically a uint64 is more than enough. Here’s my Python verison:
class RingBuffer:
def __init__(self, size=8):
if (size & (size - 1)) != 0 or size <= 1:
raise ValueError("Ring Buffer size must be power of 2 and greater than 1")
self.size = size
self.data = [None] * self.size
self.write = 0
self.read = 0
self.mask = size - 1
def push(self, value):
if self.is_full():
raise OverflowError("Ring Buffer is full")
self.data[self.write & self.mask] = value
self.write += 1
def pop(self):
if self.is_empty():
raise IndexError("Ring Buffer is empty")
read_ptr = self.read & self.mask
tmp = self.data[read_ptr]
self.data[read_ptr] = None
self.read += 1
return tmp
def reads_left(self):
return self.write - self.read
def writes_left(self):
return self.size - self.reads_left()
def is_empty(self):
return self.reads_left() == 0
def is_full(self):
return self.writes_left() == 0
def __iter__(self):
read_ptr = self.read
write_ptr = self.write
data = self.data[:]
while read_ptr < write_ptr:
yield data[read_ptr & self.mask]
read_ptr += 1
def __str__(self):
return '[' + ','.join(map(repr, self)) + ']'
I made some interesting choices in this Python version! This Ring Buffer isn’t thread-safe, and the __iter__ function doubles down on this. Although I create a snapshot of the ring buffer with data = self.data[:] and iterate over that, there’s still a race condition in the assigning of read_ptr and write_ptr which could change while the snapshot is being created. This is especially destructive for larger ring buffers as the copy will take longer. You could fix this by using a lock instead!
import threading
class RingBuffer:
def __init__(self, size=8):
# rest of code here
self.lock = thread.Lock()
def __iter__(self):
with self.lock:
read_ptr = self.read
write_ptr = self.write
data = self.data[:]
# rest of code here
while read_ptr < write_ptr:
pass
Though you’d also have to lock the push and pop code to make everything fully thread-safe. Interestingly, since Python is (currently?) still using the GIL, you shouldn’t run into any threading issues but that’s a Python-specific implementation detail.
I really like the little detail of the __str__ function as you can print a Ring Buffer and its contents using the __iter__ funct by passing repr and self to map. repr feels a bit more natural as you’ll get output that makes it more clear when a value is intentionally None vs the string "None". Everything is so concise and kinda neat looking. I think RingBuffers make a decent class-based object. Oh, I should note that printing a ring buffer without popping the contents is usually unnecessary.
Here’s my Odin verison for comparison:
package main
import "core:fmt"
RingBuffer :: struct($T: typeid, $N: int)
where (N & (N-1)) == 0 && N > 0 {
data: [N]T,
write: int,
read: int,
}
rb_push :: proc(rb: ^RingBuffer($T, $N), value: T) -> bool {
if rb_is_full(rb) {
return false
}
rb.data[rb.write & (N - 1)] = value
rb.write += 1
return true
}
rb_pop :: proc(rb: ^RingBuffer($T, $N)) -> (T, bool) {
if rb_is_empty(rb) {
return {}, false
}
read_ptr := rb.read & (N-1)
tmp := rb.data[read_ptr]
rb.data[read_ptr] = {}
rb.read += 1
return tmp, true
}
rb_reads_left :: proc(rb: ^RingBuffer($T, $N)) -> int {
return rb.write - rb.read
}
rb_writes_left :: proc(rb: ^RingBuffer($T, $N)) -> int {
return N - rb_reads_left(rb)
}
rb_is_empty :: proc(rb: ^RingBuffer($T, $N)) -> bool {
return rb_reads_left(rb) == 0
}
rb_is_full :: proc(rb: ^RingBuffer($T, $N)) -> bool {
return rb_writes_left(rb) == 0
}
main :: proc() {
my_rb := RingBuffer(int, 16){}
rb_push(&my_rb, 6)
rb_push(&my_rb, 5)
rb_push(&my_rb, 4)
rb_push(&my_rb, 3)
fmt.println("Test pop: ", rb_pop(&my_rb))
fmt.println("Test pop: ", rb_pop(&my_rb))
fmt.println("Test pop: ", rb_pop(&my_rb))
fmt.println("Test pop: ", rb_pop(&my_rb))
}
My favorite thing about Odin’s version is the where clause that’s embedded into the struct definition. Power-of-2 sizing for our ring buffer is important because then we get to avoid the modulo operator and use bitmasking instead. This shouldn’t be especially useful in the Python version but it does make a meaningful difference in the Odin version, where I believe a bitmask op is 1 CPU cycle vs 3+ for the modulo operator, depending on compiler optimizations and your CPU’s DIV/IDIV ops / latencies. Plus, it feels like bitmasking is simpler on the noggin’ when combined with monotonically-increasing read/write heads.
Both versions have their own unique tensions around error handling. Python allows for raising exceptions, whereas with Odin we get to return bool that we have to check for later. Another huge difference is that Python’s ring buffer will live on the heap, whereas Odin’s version is on the stack by default. How much this matters depends on where you’re running the ring buffer, how big it’s going to be, and performance considerations. For embedded contexts, the stack is going to be far more limited in size but if you can fit it in there, you eliminate memory fragmentation and you should get better cache locality; maybe permanently in L1 cache if everything is gonna fit?
I’ve been reading a ton of ring buffer papers recently so I do have more optimizations to get into and study, particularly with atomics and cache alignments so I’ll write about that next. My biggest goal for the year is to poop out a ring buffer in Odin that implements as many optimizations as I can think of.