Benchmarking Data-Oriented Design, part 2

· jollygood's blog

#data-oriented-design

Welcome to part 2 of Benchmarking Data-Oriented Design.

In part 11 we benchmarked data-oriented design by comparing a struct of arrays vs an array of structs. Both of these data structures used value types as array elements. Today we will compare how the type of array elements affect performance - value types vs reference types.

First, a short intro to value types and reference types and how they relate to our data-oriented design benchmark.

# Value types

Value types, such as booleans, bytes, integers, floats, or structs, are stored in one or more contiguous bytes of memory. Arrays of value types are stored as one big contiguous section of memory like so:

Array of value types

When we iterate over an array of value types we are effectively reading a single contiguous section of memory sequentially, which means we utilize the CPU cache efficiently in a couple of ways.

  1. Cache prefetching. The CPU cache prefetcher typically recognizes when we are reading memory sequentially and predicts what section of memory we will read next. It then prefetches the following memory sections into cache before we need it. This means we get cache hits which is good.
  2. Cache line utilization. If values are small and multiple values fit into a single cache line, then we get even more cache hits.

# Reference types

Reference types contain a reference to one or more values which are located in different memory locations, separate from the reference. An array of reference types stores all references in a contiguous section of memory, but the values are scattered around at different places in memory like so:

Array of reference types

The reference types introduce a level of indirection: first we read the reference, then we load the data section from the referenced memory address. This usually leads to random memory access patterns, which is bad for CPU cache performance. When we iterate over an array of reference types we get the CPU cache benefits only for the references. Typically, we get a cache miss each time we read a data section.2

# Benchmark

Let's expand the benchmark from part 1: iterating over 1 million users and finding out which email addresses are still unconfirmed. Today we'll use pointers, a simple form of reference type, and benchmark arrays of pointers against the arrays of value types from part 1.

 1// userRefs is an array of references
 2var userRefs []*user
 3
 4type user struct {
 5	ID                 int
 6	IsActive           bool
 7	Name               string
 8	Username           string
 9	PasswordHash       []byte
10	MustChangePassword bool
11	Email              string
12	EmailConfirmed     bool
13}
14
15// users is a struct with arrays of references
16var users userRefData
17
18type userRefData struct {
19	ID                 []*int
20	IsActive           []*bool
21	Name               []*string
22	Username           []*string
23	PasswordHash       []*[]byte
24	MustChangePassword []*bool
25	Email              []*string
26	EmailConfirmed     []*bool
27}

Running the benchmark3, we get the following results:

                                              Iterations          Time                  Memory throughput       Data processed
BenchmarkValueTypes/array-of-structs-4               121           9906752 ns/op        10497.89 MB/s            99.18 MB/op
BenchmarkValueTypes/struct-of-arrays-4               588           1974807 ns/op          586.80 MB/s            1.105 MB/op
BenchmarkRefTypes/array-of-references-4              127           9240654 ns/op        12120.35 MB/s            106.8 MB/op
BenchmarkRefTypes/struct-of-ref-arrays-4             123           9478477 ns/op          974.58 MB/s            8.810 MB/op

The BenchmarkValueTypes lines benchmark value types and the BenchmarkRefTypes lines benchmark the reference types.

Both reference type benchmarks perform quite close to the BenchmarkValueTypes/array-of-structs benchmark in terms of time per operation. This shows that reducing the size of the reference type array element, from struct to single values, didn't improve performance.

An interesting detail is that BenchmarkRefTypes/struct-of-ref-arrays reads an order of magnitude less data per operation than BenchmarkRefTypes/array-of-references (8.8 MB/s vs 106.8 MB/s) but they both perform similarly in terms of time per operation. This indicates that the random memory access patterns that reference types introduce prevent the CPU cache from being efficiently used.

As in part 1, BenchmarkValueTypes/struct-of-arrays shows that if we use an array of value types with small sized elements, we can utilize the performance boost that the CPU cache gives. The struct-of-arrays (value type) performs 5x faster than the others.

# Summary

In order to be able to benefit from data-oriented design, it is important to understand the difference between value types and reference types and what their memory layout looks like. It is also important to understand the effect of random memory access patterns that reference types introduce, and their effect on CPU cache utilization.

CPU cache utilization is not an precise form of engineering, but we can get a good idea on how different data structures and algorithms perform by using benchmarks. These benchmarks helps us get a better understanding and intuition for how the CPU cache works.


  1. Benchmarking Data-Oriented Design, part 1 ↩︎

  2. It might be possible to get a good cache hit ratio for an array of reference types if we have allocated the referenced data sections in a contiguous section of memory, and are iterating over it in a predictable pattern. For example, if the data sections were stored as an array of value types. ↩︎

  3. Benchmark source code and results for part 2 ↩︎