go caching Archives - Start.io - A Sell-side Omnichannel Advertising Platform

Guest Post written by Sidney Feiner, Start.io’s Data Platform Team

Here at Start.io, we handle millions of events per second from our SDK, implemented in many applications both on Android and iOS. To successfully handle and store all those events, we develop scalable applications (previously Spark, currently Flink) that are very focused on performance.

Then, our favorite people over at Data take the data ingested by our databases and perform their ETLs.

Once the data is done, we develop APIs that access this data in the most efficient way and return the results to whomever needs them.

Some of those APIs have a very high throughput (thousands of requests per second) and require quick responses (5-20 ms!), so in order to support those requirements, we must implement efficient caching. And that’s what we are going to explore in this blog post – how we did it and how you can too. I hope it’s helpful!

Motivation

To improve query performance and also decrease storage size, we want to avoid a string in every record. Instead, we store a lookup value (int) for every string, and save that key’s original string value in a separate table. This is also very useful in instances of columns with repeating values, such as countries, cities, etc.

That lookup key is generated during ingestion (getOrCreate), and it’s important that when our APIs query the data, they return the original string values.

In both stages (ingestion and APIs), it would be useful to have a cache for the lookups, in order to avoid querying the database each and every time.

Some APIs require querying multiple tables and data sources concurrently. Those APIs are implemented in Scala (Play framework), but lately we’ve been thinking of reproducing them in Golang. So, the first thing we need to do is find a caching library that best fits our use-case.

Use-case

Generally, our use-case is as follows:

During ingestion, we check the local cache (cache read) for every value we need to use a lookup for (value to key).
1. If it exists, return the value.
2. If it doesn’t, query the database table that holds all the lookups.
  1. If it exists in the database, save its value inside the cache (cache write) and return it.
  2. If it doesn’t:
    - Generate a key for it.
    - Insert into database.
    - Return the key.
When the Data API queries data (key to value), we first read the local cache (cache read).
1. If it exists, return the key.
2. If it doesn’t, query the database table that holds all the lookups.
  1. If it exists, save the value-to-key mapping inside the cache (cache write) and return it.
  2. If it doesn’t, return null.

And to allow all those at scale, we actually need a cache that has good “cache-read” benchmarks when read with high concurrency.

Golang caches included

For those benchmarks, we conducted benchmark tests on some Golang caching libraries that had a commit in the last year:

BigCache

Saves data in heap but isn’t collected by GC
Allows enforcing a maximum size for a single entry

FreeCache

Described as “nearly LRU” (Least Recently Used eviction)
Maximum allowed memory must be pre-allocated

Ristretto

Newest of the bunch so it’s usable but still under active development
Developed by DGraph, which wasn’t content with the performance of the existing cache libraries. DGraph’s main requirements were:

Concurrent cache that scales well as number of cores and go-routines increase
High cache-hit ratio

groupcache was considered but left out because it has some IO overhead and, in our case, we don’t need a distributed cache.

Benchmarking

Now that we use Golang, we have the power of go-routines. So we can change the “key retrieval” process by concurrently querying the cache and the database, instead of first querying the cache and only then querying the database if needed.

The benchmarking process consisted of a few steps:

Without concurrent actions on cache and retrieving keys in random order.
With concurrent actions on cache and changing cache sizes, simulating real-life pipeline values (cache read and write).
With concurrent actions on cache and changing cache sizes. Populate cache at first and then test only cache reads.

First Step – No concurrency, random keys

For the first step, we conducted the benchmarking with no concurrency, and with generating a sequence of values we want to retrieve from the cache. (Remember: “retrieving” a value from the cache means that we try fetching it from the cache and if it doesn’t exist, we retrieve it from the database and insert it into the cache).

When generating keys for the benchmarks, we split the results into 3 “sub-tests” with the following expected hit rates:

As the expected hit rate goes up, we obviously expect a shorter read time, because there is no reason to wait for the database’s response as the item should already be in the cache.

For every expected hit rate, we generated a sequence of lookups keys. This sequence was applied on every cache in the same order to ensure fairness for all caches. The sequence was generated in the following way:

Find keys that definitely DON’T exist in lookup table (keys to be used when we want the longest possible time for a read).
Find keys that definitely DO exist in lookup table (keys to be used when we want a miss for the first time and then only hits).
Generate over 2 million values and based on hit probability:

If we need a miss, we either query a key that doesn’t exist in the table or query a key that does and will then be saved in cache (50/50 for those cases).
If we need a hit, we query a key that exists in table AND that we queried before, and therefore it should be in cache.

Once the sequence is generated, we exhaust the whole sequence on a single cache, for every cache. For every retrieval, we log the amount of time the retrieval took, whether we got a hit, and if we got a hit – if it was a local hit (found in cache) or not (retrieved from database).

Results

Writes

The following graph shows the average time of retrieving a key’s value from the database and inserting it in the cache. Note that the write times do not only reflect the time it took to write a new entry into the cache, but also include the key retrieval from the database.

Because we’re looking at the average write time, and all caches queried the same values from the database, we can still deduct that FreeCache has the advantage here.

Reads

The following graph shows the average read time of a key when it was already in the cache, per hit rate. The difference between the hit rates could indicate a difference in performance when the cache gets fuller.

It seems that FreeCache is the more stable, but Ristretto has better read times (when we have more reads, Ristretto shows better results).

Overall times

The following graph shows the overall average time of retrieving keys, whether it was found locally in the cache or the database.

Cache overall get times by hit probability

To sum up, it appears that FreeCache is the clear winner.

Conclusions

When we query randomly without concurrency, FreeCache is the clear winner.

Now would be the time to guess whether this trend will continue, because who doesn’t like a good guessing game?

Second Step – With concurrency and different cache sizes

For this step, we actually wanted to drill a bit deeper:

Analyze the cache’s performance when it has different sizes.
Instead of randomly simulating keys, see how the cache behaves when we use a real-life sequence of values.

To implement the second bullet, a million events were fetched from our historic data, ordered according to their ingestion time. This ensures we simulate a real-life sequence of events and, consequently, real life hit rates.

Benchmarks results were split into the following concurrencies:

The configured concurrency defined the number of go-routines to be created in order to retrieve keys from the cache, thus creating opportunities for concurrently reading and/or writing from the cache.