Bcache uses "buckets" on the SSD to store cached HDD sectors. These buckets contain multiple "extents" of sectors that get appended to the bucket until it's full. Extents can slowly get expired as the keys in the btree that refer to them get updated by writes that invalidate those sectors (these writes could be cached into other buckets, or just bypass the cache, depending on settings and throttling etc). This means buckets slowly "empty" of valid sectors, and may not be full. The cache needs empty buckets for caching new sectors, so buckets need to be expired and/or GC'ed, and ideally you want to expire the buckets containing the least useful data. This means both buckets that are mostly empty, and/or have sectors that are not giving enough cache hits. For GC'ing you want to GC the most empty buckets. Note that expiring nearly empty buckets also effectively does GC for you, probably resulting in less GC required.

Currently buckets have a "prio" priority used for GC'ing, and there is also effectively a "fifo" of their index order. In alloc.c there are 3 different cache expiry methods selected via sysfs; lru, fifo, random. These try to fill a free_inc fifo with buckets to invalidate/expire. Not all buckets can be expired at the time of the invalidate run, some can be not marked reclaimable, or pinned, or their "gen" might be too high and at risk of wrapping (an incrementing "gen" counter is how buckets are marked invalid). So it's possible not enough buckets are found to fill the free_inc fifo, in which case wake_up_gc() is used to trigger a GC to free up some buckets.

The random expiry is exactly what it says on the tin.

The fifo expiry cycles through all the buckets in order.

The lru expiry uses prio as an indicator of how recently the bucket has been used. It is set to a high number (0x8000) at each access and decremented by one every chached_sectors/1024 sectors read, so it's decremented by 1024 for every full cache worth of sectors is read. This means prio is smaller the older a bucket is. At each invalidation run the buckets are fed into an overflowing-heap to find the oldest buckets for expiry. This is interesting for an lru, because a heap is less efficient than a linked list for LRU, and appears to have been designed for a fancier expiry policy than LRU. It also multiplies the prio by the amount of sectors in the bucket when pushing into the heap, so buckets with less sectors effectively have a lower priority.

The resetting of a bucket's prio back to the large number on cache hits is done inside request.c inside cache_lookup_fn().

If another policy was to to be implemented, it would probably change alloc.c:bch_rescale_priorities() to rescale prio differently, and change request.c:cache_lookup_fn() to call a bch_increment_priority(bucket, sectors) to adjust the buckets prio for the number of sectors hit.

A good one would be to make prio represent the number of ms saved by hits to the bucket over the past T x cache-size worth of cache data read. It should use an exponential decay in bch_rescale_priorities(), and increment prio based on seek+sector-read time differences between HDD and SSD for reading that number of sectors. (HDD seek/sector-read ~= 10ms/4us, SSD seek/sector-read ~= 100us/1us).

Both movinggc.c and alloc.c add buckets to a small "overflowing" heap to find the best candidates for GC'ing or expiring. When the heap is full, it compares the new element to the heap_peek() and swaps it in if it's better. This is implemented in 2 places and could be done with one helper macro, though arguably they are doing slightly different things with the resulting swap (movingcg.c needs to update a count of sectors_to_move in the buckets in the heap). In alloc.c buckets are added to a priority max-heap (so the heap top has the highest priority of all the lowest priorities, and we only swap in buckets with lower priorities), and then re-heapified into a min-heap to extract buckets in lowest priority first. In movinggc.c buckets are added to a sectors-used max-heap, but it's not then re-heapified. Instead it pops the fullest buckets off the heap until the number of sectors to GC remaining in the heap is less than the bucket-space available to GC into.

  ( 2 subscribers )