LRU, 更多的 LRU

mwish

2021-04-08

面试的时候面试官经常会让你搓个 LRU，boost 这里有个 case:

https://www.boost.org/doc/libs/1_67_0/boost/compute/detail/lru_cache.hpp

// a cache which evicts the least recently used item when it is full
template<class Key, class Value>
class lru_cache
{
public:
    typedef Key key_type;
    typedef Value value_type;
    typedef std::list<key_type> list_type;
    typedef std::map<
                key_type,
                std::pair<value_type, typename list_type::iterator>
            > map_type;

    lru_cache(size_t capacity)
        : m_capacity(capacity)
    {
    }

    ~lru_cache()
    {
    }

    size_t size() const；
    size_t capacity() const;
    bool empty() const;
    bool contains(const key_type &key);
    void insert(const key_type &key, const value_type &value);
    boost::optional<value_type> get(const key_type &key);
    void clear();

private:
    void evict();

private:
    map_type m_map;
    list_type m_list;
    size_t m_capacity;
};

这玩意短小又精悍:

list_type 存储了 key 的 LRU 链表，插入在最前方插入/把中间某个 key 丢到最前面
m_map 存储了 map[key -> (list_iter, value)]

这个相对来说应该是很好理解的。但是这玩意也掩盖了 LRU 的复杂性:

假设我们 evict, 那么我们要逐出一个元素，然后根据 key 查一下 m_map，complexity 是 O(1) 的，但是仍然需要访问 m_map
如果并发的话，hash 肯定要上一把大锁

实际上上面两点都有一定的问题。然后对于 LRU 的调用者，还有个“内部”，“外部”的问题，可以参考一下 CMU 15-445 Lab-1 的接口：

使用的时候可能会把 LRU 某个取出来，作为活跃的、正在使用的对象，使用完之后再丢进 LRU 里头

LevelDB LRU

LevelDB 实现了侵入式的 LRU, 同时分为了多个 shard。这分别针对了我们上面说的复杂性 (1) 和 (2), 先看看接口：

LEVELDB_EXPORT Cache* NewLRUCache(size_t capacity);

class LEVELDB_EXPORT Cache {
 public:
  Cache() = default;

  Cache(const Cache&) = delete;
  Cache& operator=(const Cache&) = delete;

  // Destroys all existing entries by calling the "deleter"
  // function that was passed to the constructor.
  virtual ~Cache();

  // Opaque handle to an entry stored in the cache.
  struct Handle {};

  // Insert a mapping from key->value into the cache and assign it
  // the specified charge against the total cache capacity.
  //
  // Returns a handle that corresponds to the mapping.  The caller
  // must call this->Release(handle) when the returned mapping is no
  // longer needed.
  //
  // When the inserted entry is no longer needed, the key and
  // value will be passed to "deleter".
  virtual Handle* Insert(const Slice& key, void* value, size_t charge,
                         void (*deleter)(const Slice& key, void* value)) = 0;

  // If the cache has no mapping for "key", returns nullptr.
  //
  // Else return a handle that corresponds to the mapping.  The caller
  // must call this->Release(handle) when the returned mapping is no
  // longer needed.
  virtual Handle* Lookup(const Slice& key) = 0;

  // Release a mapping returned by a previous Lookup().
  // REQUIRES: handle must not have been released yet.
  // REQUIRES: handle must have been returned by a method on *this.
  virtual void Release(Handle* handle) = 0;

  // Return the value encapsulated in a handle returned by a
  // successful Lookup().
  // REQUIRES: handle must not have been released yet.
  // REQUIRES: handle must have been returned by a method on *this.
  virtual void* Value(Handle* handle) = 0;

  // If the cache contains entry for key, erase it.  Note that the
  // underlying entry will be kept around until all existing handles
  // to it have been released.
  virtual void Erase(const Slice& key) = 0;

  // Return a new numeric id.  May be used by multiple clients who are
  // sharing the same cache to partition the key space.  Typically the
  // client will allocate a new id at startup and prepend the id to
  // its cache keys.
  virtual uint64_t NewId() = 0;

  // Remove all cache entries that are not actively in use.  Memory-constrained
  // applications may wish to call this method to reduce memory usage.
  // Default implementation of Prune() does nothing.  Subclasses are strongly
  // encouraged to override the default implementation.  A future release of
  // leveldb may change Prune() to a pure abstract method.
  virtual void Prune() {}

  // Return an estimate of the combined charges of all elements stored in the
  // cache.
  virtual size_t TotalCharge() const = 0;

 private:
  void LRU_Remove(Handle* e);
  void LRU_Append(Handle* e);
  void Unref(Handle* e);

  struct Rep;
  Rep* rep_;
};

先做个阅读理解：

struct Handle 是一个没有定义任何成员的类型，Cache::Handle* 其实被我们当成一个特殊的地址处理，可以参考 LRUHandle 的定义. 可以看到，第一个成员是 value. 我们在 LRUCache 返回的时候用 reinterpret_cast<Cache::Handle>(lru_hanle_ptr), 然后对应地址最前面就是 value。当然这个也可以通过 Cache::Value 访问
Lookup 发生的时候，LRUCache 的元素会被标记 ref, Release 的时候，会被标记为 Unref.
Insert 带 Deleter 这个比较正常，charge 相对来说不好理解一些，可以和 Shard 一起理解，每个 shard 不是按Handle 个数算的，而是按 charge 算的。然后 block cache (options.block_cache 配置，存储 table 相关的 content ) 里面，每个 block 单位是一个 block 的 size, TableCache （对 SST 文件的句柄 cache）里面，charge 对应是 1.
NewID 这个玩意实际上和实现的逻辑有关，在 block cache 里，key 被编码成 64bit 的 cache id (per table) 和 64bit 的 block 偏移量，所以需要每个 block cache 能生成出一个 unique id.

struct LRUHandle {
  void* value;
  void (*deleter)(const Slice&, void* value);
  LRUHandle* next_hash;
  LRUHandle* next;
  LRUHandle* prev;
  // charge 是一个统计量，在 block_cache 中，这个统计量是 size().
  // 在 TableCache 中，这个统计量是1.
  // 注：TableCache 和 block 不是同一个对象。
  size_t charge;  // TODO(opt): Only allow uint32_t?
  size_t key_length;
  bool in_cache;     // Whether entry is in the cache.
  uint32_t refs;     // References, including cache reference, if present.
  uint32_t hash;     // Hash of key(); used for fast sharding and comparisons
  char key_data[1];  // Beginning of key

  Slice key() const {
    // next_ is only equal to this if the LRU handle is the list head of an
    // empty list. List heads never have meaningful keys.
    assert(next != this);

    return Slice(key_data, key_length);
  }
};

我们来瞅瞅具体的 LRUCache 结构体：

class LRUCache {
 public:
  LRUCache();
  ~LRUCache();

  // Separate from constructor so caller can easily make an array of LRUCache
  void SetCapacity(size_t capacity) { capacity_ = capacity; }

  // Like Cache methods, but with an extra "hash" parameter.
  Cache::Handle* Insert(const Slice& key, uint32_t hash, void* value,
                        size_t charge,
                        void (*deleter)(const Slice& key, void* value));
  Cache::Handle* Lookup(const Slice& key, uint32_t hash);
  void Release(Cache::Handle* handle);
  void Erase(const Slice& key, uint32_t hash);
  void Prune();
  // charge 属于用户逻辑了。
  size_t TotalCharge() const {
    MutexLock l(&mutex_);
    return usage_;
  }

 private:
  void LRU_Remove(LRUHandle* e);
  void LRU_Append(LRUHandle* list, LRUHandle* e);
  void Ref(LRUHandle* e);
  void Unref(LRUHandle* e);
  bool FinishErase(LRUHandle* e) EXCLUSIVE_LOCKS_REQUIRED(mutex_);

  // Initialized before use.
  size_t capacity_;

  // mutex_ protects the following state.
  mutable port::Mutex mutex_;
  size_t usage_ GUARDED_BY(mutex_);

  // Dummy head of LRU list.
  // lru.prev is newest entry, lru.next is oldest entry.
  // Entries have refs==1 and in_cache==true.
  LRUHandle lru_ GUARDED_BY(mutex_);

  // Dummy head of in-use list.
  // Entries are in use by clients, and have refs >= 2 and in_cache==true.
  LRUHandle in_use_ GUARDED_BY(mutex_);

  HandleTable table_ GUARDED_BY(mutex_);
};

LRUCache::LRUCache() : capacity_(0), usage_(0) {
  // Make empty circular linked lists.
  lru_.next = &lru_;
  lru_.prev = &lru_;
  in_use_.next = &in_use_;
  in_use_.prev = &in_use_;
}

LRUCache 包含两大部分，in_use_ （客户端正在使用的双向链表）和 lru_ （ LRU 逻辑），这里和 LRUHandle 的逻辑有相关，LRUHandle 上有 ref_, 表示引用计数。lru_ 也相当于我们之前 boost 那里的 m_list_

不过 LRUHandle 上还有个迷之字段 next_hash, 然后 HandleTable 又是什么样的呢？这就是今天的坑点了：

class HandleTable {
 public:
  HandleTable() : length_(0), elems_(0), list_(nullptr) { Resize(); }
  ~HandleTable() { delete[] list_; }

  LRUHandle* Lookup(const Slice& key, uint32_t hash) {
    return *FindPointer(key, hash);
  }

  LRUHandle* Insert(LRUHandle* h) {
    LRUHandle** ptr = FindPointer(h->key(), h->hash);
    LRUHandle* old = *ptr;
    h->next_hash = (old == nullptr ? nullptr : old->next_hash);
    *ptr = h;
    if (old == nullptr) {
      ++elems_;
      if (elems_ > length_) {
        // Since each cache entry is fairly large, we aim for a small
        // average linked list length (<= 1).
        Resize();
      }
    }
    return old;
  }

  LRUHandle* Remove(const Slice& key, uint32_t hash) {
    LRUHandle** ptr = FindPointer(key, hash);
    LRUHandle* result = *ptr;
    if (result != nullptr) {
      *ptr = result->next_hash;
      --elems_;
    }
    return result;
  }

 private:
  // The table consists of an array of buckets where each bucket is
  // a linked list of cache entries that hash into the bucket.
  uint32_t length_;
  uint32_t elems_;
  LRUHandle** list_;

  // Return a pointer to slot that points to a cache entry that
  // matches key/hash.  If there is no such cache entry, return a
  // pointer to the trailing slot in the corresponding linked list.
  LRUHandle** FindPointer(const Slice& key, uint32_t hash) {
    LRUHandle** ptr = &list_[hash & (length_ - 1)];
    while (*ptr != nullptr && ((*ptr)->hash != hash || key != (*ptr)->key())) {
      ptr = &(*ptr)->next_hash;
    }
    return ptr;
  }

  void Resize() {
    uint32_t new_length = 4;
    while (new_length < elems_) {
      new_length *= 2;
    }
    LRUHandle** new_list = new LRUHandle*[new_length];
    memset(new_list, 0, sizeof(new_list[0]) * new_length);
    uint32_t count = 0;
    for (uint32_t i = 0; i < length_; i++) {
      LRUHandle* h = list_[i];
      while (h != nullptr) {
        LRUHandle* next = h->next_hash;
        uint32_t hash = h->hash;
        LRUHandle** ptr = &new_list[hash & (new_length - 1)];
        h->next_hash = *ptr;
        *ptr = h;
        h = next;
        count++;
      }
    }
    assert(elems_ == count);
    delete[] list_;
    list_ = new_list;
    length_ = new_length;
  }
};

别看这个结构很复杂，实际上就是个链式 hash. 开辟了一个数组，然后用 next_hash 表示链，这个类似 absl 的 flat_hash_set, 而 set 的成员则是 LRUHandle, 那么这个时候，evict 的时候，可以直接通过 hash 和查表来定位。

相对 boost 的实现来说，这里可以算绕了个大圈子，但是 Google 表示这里有 5% 的性能收益（boost 可能受限于 hash table 的结构, 有空间上的开销和局部性上的开销）。

那么，还有一个问题是，为什么 LRUHandle 只存有 next_hash, 没有 prev_hash? 如果有 prev_hash, 那么 Erase 的时候甚至不需要查表。这里可以看看 Insert 代码：

LRUHandle* Insert(LRUHandle* h) {
  LRUHandle** ptr = FindPointer(h->key(), h->hash);
  LRUHandle* old = *ptr;
  h->next_hash = (old == nullptr ? nullptr : old->next_hash);
  *ptr = h;
  if (old == nullptr) {
    ++elems_;
    if (elems_ > length_) {
      // Since each cache entry is fairly large, we aim for a small
      // average linked list length (<= 1).
      Resize();
    }
  }
  return old;
}

这里大概要求 next_hash 链表不会很长, 然后不加上这个来节省空间。

RocksDB LRU

RocksDB 的 LRU 大致支持类似 LevelDB 的 LRU, 但是它支持类似 2-LRU，会在一定条件下，把 LRU 丢到高优队列中：

  enum Flags : uint8_t {
    // Whether this entry is referenced by the hash table.
    IN_CACHE = (1 << 0),
    // Whether this entry is high priority entry.
    IS_HIGH_PRI = (1 << 1),
    // Whether this entry is in high-pri pool.
    IN_HIGH_PRI_POOL = (1 << 2),
    // Whether this entry has had any lookups (hits).
    HAS_HIT = (1 << 3),
  };
  
void LRUCacheShard::LRU_Insert(LRUHandle* e) {
  assert(e->next == nullptr);
  assert(e->prev == nullptr);
  size_t total_charge = e->CalcTotalCharge(metadata_charge_policy_);
  if (high_pri_pool_ratio_ > 0 && (e->IsHighPri() || e->HasHit())) {
    // Inset "e" to head of LRU list.
    e->next = &lru_;
    e->prev = lru_.prev;
    e->prev->next = e;
    e->next->prev = e;
    e->SetInHighPriPool(true);
    high_pri_pool_usage_ += total_charge;
    MaintainPoolSize();
  } else {
    // Insert "e" to the head of low-pri pool. Note that when
    // high_pri_pool_ratio is 0, head of low-pri pool is also head of LRU list.
    e->next = lru_low_pri_->next;
    e->prev = lru_low_pri_;
    e->prev->next = e;
    e->next->prev = e;
    e->SetInHighPriPool(false);
    lru_low_pri_ = e;
  }
  lru_usage_ += total_charge;
}

近似 LRU 和其他方法

可以看到，这里的 LRU 是带锁实现的。HashTable + doubly linked list 让它的并发变得比较麻烦。这里可以参考一下一些近似的 LRU 算法，在算法本身的性能上有相对好一些的开销：

redis 的 LRU
1. 这里的 LRU 感觉一点都不 LRU，只是产生一个 LRU 时间，采样 key, 然后处理掉“ Least Recent Used” 的数据，是一个与概率的结合，本身靠采样 + 随机性实现
RocksDB 的 clock 算法
1. 这里 clock 本身相对 LRU 来说好并行很多，RocksDB 这里用了 tbb 中的库来实现。

此外，也有 Adaptive replacement cache 这样的策略，或者 LFU 这样的策略。可以配合 bench 和 pprof ，配合自己的需求使用。