LRU, 更多的 LRU

面试的时候面试官经常会让你搓个 LRU,boost 这里有个 case:

https://www.boost.org/doc/libs/1_67_0/boost/compute/detail/lru_cache.hpp

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
// a cache which evicts the least recently used item when it is full
template<class Key, class Value>
class lru_cache
{
public:
typedef Key key_type;
typedef Value value_type;
typedef std::list<key_type> list_type;
typedef std::map<
key_type,
std::pair<value_type, typename list_type::iterator>
> map_type;

lru_cache(size_t capacity)
: m_capacity(capacity)
{
}

~lru_cache()
{
}

size_t size() const
size_t capacity() const;
bool empty() const;
bool contains(const key_type &key);
void insert(const key_type &key, const value_type &value);
boost::optional<value_type> get(const key_type &key);
void clear();

private:
void evict();

private:
map_type m_map;
list_type m_list;
size_t m_capacity;
};

这玩意短小又精悍:

  1. list_type 存储了 key 的 LRU 链表,插入在最前方插入/把中间某个 key 丢到最前面
  2. m_map 存储了 map[key -> (list_iter, value)]

这个相对来说应该是很好理解的。但是这玩意也掩盖了 LRU 的复杂性:

  1. 假设我们 evict, 那么我们要逐出一个元素,然后根据 key 查一下 m_map,complexity 是 O(1) 的,但是仍然需要访问 m_map
  2. 如果并发的话,hash 肯定要上一把大锁

实际上上面两点都有一定的问题。然后对于 LRU 的调用者,还有个“内部”,“外部”的问题,可以参考一下 CMU 15-445 Lab-1 的接口:

  • 使用的时候可能会把 LRU 某个取出来,作为活跃的、正在使用的对象,使用完之后再丢进 LRU 里头

LevelDB LRU

LevelDB 实现了侵入式的 LRU, 同时分为了多个 shard。这分别针对了我们上面说的复杂性 (1) 和 (2), 先看看接口:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
LEVELDB_EXPORT Cache* NewLRUCache(size_t capacity);

class LEVELDB_EXPORT Cache {
public:
Cache() = default;

Cache(const Cache&) = delete;
Cache& operator=(const Cache&) = delete;

// Destroys all existing entries by calling the "deleter"
// function that was passed to the constructor.
virtual ~Cache();

// Opaque handle to an entry stored in the cache.
struct Handle {};

// Insert a mapping from key->value into the cache and assign it
// the specified charge against the total cache capacity.
//
// Returns a handle that corresponds to the mapping. The caller
// must call this->Release(handle) when the returned mapping is no
// longer needed.
//
// When the inserted entry is no longer needed, the key and
// value will be passed to "deleter".
virtual Handle* Insert(const Slice& key, void* value, size_t charge,
void (*deleter)(const Slice& key, void* value)) = 0;

// If the cache has no mapping for "key", returns nullptr.
//
// Else return a handle that corresponds to the mapping. The caller
// must call this->Release(handle) when the returned mapping is no
// longer needed.
virtual Handle* Lookup(const Slice& key) = 0;

// Release a mapping returned by a previous Lookup().
// REQUIRES: handle must not have been released yet.
// REQUIRES: handle must have been returned by a method on *this.
virtual void Release(Handle* handle) = 0;

// Return the value encapsulated in a handle returned by a
// successful Lookup().
// REQUIRES: handle must not have been released yet.
// REQUIRES: handle must have been returned by a method on *this.
virtual void* Value(Handle* handle) = 0;

// If the cache contains entry for key, erase it. Note that the
// underlying entry will be kept around until all existing handles
// to it have been released.
virtual void Erase(const Slice& key) = 0;

// Return a new numeric id. May be used by multiple clients who are
// sharing the same cache to partition the key space. Typically the
// client will allocate a new id at startup and prepend the id to
// its cache keys.
virtual uint64_t NewId() = 0;

// Remove all cache entries that are not actively in use. Memory-constrained
// applications may wish to call this method to reduce memory usage.
// Default implementation of Prune() does nothing. Subclasses are strongly
// encouraged to override the default implementation. A future release of
// leveldb may change Prune() to a pure abstract method.
virtual void Prune() {}

// Return an estimate of the combined charges of all elements stored in the
// cache.
virtual size_t TotalCharge() const = 0;

private:
void LRU_Remove(Handle* e);
void LRU_Append(Handle* e);
void Unref(Handle* e);

struct Rep;
Rep* rep_;
};

先做个阅读理解:

  1. struct Handle 是一个没有定义任何成员的类型,Cache::Handle* 其实被我们当成一个特殊的地址处理,可以参考 LRUHandle 的定义. 可以看到,第一个成员是 value. 我们在 LRUCache 返回的时候用 reinterpret_cast<Cache::Handle>(lru_hanle_ptr), 然后对应地址最前面就是 value。当然这个也可以通过 Cache::Value 访问
  2. Lookup 发生的时候,LRUCache 的元素会被标记 ref, Release 的时候,会被标记为 Unref.
  3. Insert 带 Deleter 这个比较正常,charge 相对来说不好理解一些,可以和 Shard 一起理解,每个 shard 不是按Handle 个数算的,而是按 charge 算的。然后 block cache (options.block_cache 配置,存储 table 相关的 content ) 里面,每个 block 单位是一个 block 的 size, TableCache (对 SST 文件的句柄 cache)里面,charge 对应是 1.
  4. NewID 这个玩意实际上和实现的逻辑有关,在 block cache 里,key 被编码成 64bit 的 cache id (per table) 和 64bit 的 block 偏移量,所以需要每个 block cache 能生成出 一个 unique id.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
struct LRUHandle {
void* value;
void (*deleter)(const Slice&, void* value);
LRUHandle* next_hash;
LRUHandle* next;
LRUHandle* prev;
// charge 是一个统计量,在 block_cache 中,这个统计量是 size().
// 在 TableCache 中,这个统计量是1.
// 注:TableCache 和 block 不是同一个对象。
size_t charge; // TODO(opt): Only allow uint32_t?
size_t key_length;
bool in_cache; // Whether entry is in the cache.
uint32_t refs; // References, including cache reference, if present.
uint32_t hash; // Hash of key(); used for fast sharding and comparisons
char key_data[1]; // Beginning of key

Slice key() const {
// next_ is only equal to this if the LRU handle is the list head of an
// empty list. List heads never have meaningful keys.
assert(next != this);

return Slice(key_data, key_length);
}
};

我们来瞅瞅具体的 LRUCache 结构体:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
class LRUCache {
public:
LRUCache();
~LRUCache();

// Separate from constructor so caller can easily make an array of LRUCache
void SetCapacity(size_t capacity) { capacity_ = capacity; }

// Like Cache methods, but with an extra "hash" parameter.
Cache::Handle* Insert(const Slice& key, uint32_t hash, void* value,
size_t charge,
void (*deleter)(const Slice& key, void* value));
Cache::Handle* Lookup(const Slice& key, uint32_t hash);
void Release(Cache::Handle* handle);
void Erase(const Slice& key, uint32_t hash);
void Prune();
// charge 属于用户逻辑了。
size_t TotalCharge() const {
MutexLock l(&mutex_);
return usage_;
}

private:
void LRU_Remove(LRUHandle* e);
void LRU_Append(LRUHandle* list, LRUHandle* e);
void Ref(LRUHandle* e);
void Unref(LRUHandle* e);
bool FinishErase(LRUHandle* e) EXCLUSIVE_LOCKS_REQUIRED(mutex_);

// Initialized before use.
size_t capacity_;

// mutex_ protects the following state.
mutable port::Mutex mutex_;
size_t usage_ GUARDED_BY(mutex_);

// Dummy head of LRU list.
// lru.prev is newest entry, lru.next is oldest entry.
// Entries have refs==1 and in_cache==true.
LRUHandle lru_ GUARDED_BY(mutex_);

// Dummy head of in-use list.
// Entries are in use by clients, and have refs >= 2 and in_cache==true.
LRUHandle in_use_ GUARDED_BY(mutex_);

HandleTable table_ GUARDED_BY(mutex_);
};

LRUCache::LRUCache() : capacity_(0), usage_(0) {
// Make empty circular linked lists.
lru_.next = &lru_;
lru_.prev = &lru_;
in_use_.next = &in_use_;
in_use_.prev = &in_use_;
}
  1. LRUCache 包含两大部分,in_use_ (客户端正在使用的双向链表)和 lru_ ( LRU 逻辑),这里和 LRUHandle 的逻辑有相关,LRUHandle 上有 ref_, 表示引用计数。lru_ 也相当于我们之前 boost 那里的 m_list_

不过 LRUHandle 上还有个迷之字段 next_hash, 然后 HandleTable 又是什么样的呢?这就是今天的坑点了:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
class HandleTable {
public:
HandleTable() : length_(0), elems_(0), list_(nullptr) { Resize(); }
~HandleTable() { delete[] list_; }

LRUHandle* Lookup(const Slice& key, uint32_t hash) {
return *FindPointer(key, hash);
}

LRUHandle* Insert(LRUHandle* h) {
LRUHandle** ptr = FindPointer(h->key(), h->hash);
LRUHandle* old = *ptr;
h->next_hash = (old == nullptr ? nullptr : old->next_hash);
*ptr = h;
if (old == nullptr) {
++elems_;
if (elems_ > length_) {
// Since each cache entry is fairly large, we aim for a small
// average linked list length (<= 1).
Resize();
}
}
return old;
}

LRUHandle* Remove(const Slice& key, uint32_t hash) {
LRUHandle** ptr = FindPointer(key, hash);
LRUHandle* result = *ptr;
if (result != nullptr) {
*ptr = result->next_hash;
--elems_;
}
return result;
}

private:
// The table consists of an array of buckets where each bucket is
// a linked list of cache entries that hash into the bucket.
uint32_t length_;
uint32_t elems_;
LRUHandle** list_;

// Return a pointer to slot that points to a cache entry that
// matches key/hash. If there is no such cache entry, return a
// pointer to the trailing slot in the corresponding linked list.
LRUHandle** FindPointer(const Slice& key, uint32_t hash) {
LRUHandle** ptr = &list_[hash & (length_ - 1)];
while (*ptr != nullptr && ((*ptr)->hash != hash || key != (*ptr)->key())) {
ptr = &(*ptr)->next_hash;
}
return ptr;
}

void Resize() {
uint32_t new_length = 4;
while (new_length < elems_) {
new_length *= 2;
}
LRUHandle** new_list = new LRUHandle*[new_length];
memset(new_list, 0, sizeof(new_list[0]) * new_length);
uint32_t count = 0;
for (uint32_t i = 0; i < length_; i++) {
LRUHandle* h = list_[i];
while (h != nullptr) {
LRUHandle* next = h->next_hash;
uint32_t hash = h->hash;
LRUHandle** ptr = &new_list[hash & (new_length - 1)];
h->next_hash = *ptr;
*ptr = h;
h = next;
count++;
}
}
assert(elems_ == count);
delete[] list_;
list_ = new_list;
length_ = new_length;
}
};

别看这个结构很复杂,实际上就是个链式 hash. 开辟了一个数组,然后用 next_hash 表示链,这个类似 absl 的 flat_hash_set, 而 set 的成员则是 LRUHandle, 那么这个时候,evict 的时候,可以直接通过 hash 和查表来定位。

相对 boost 的实现来说,这里可以算绕了个大圈子,但是 Google 表示这里有 5% 的性能收益(boost 可能受限于 hash table 的结构, 有空间上的开销和局部性上的开销)。

那么,还有一个问题是,为什么 LRUHandle 只存有 next_hash, 没有 prev_hash? 如果有 prev_hash, 那么 Erase 的时候甚至不需要查表。这里可以看看 Insert 代码:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
LRUHandle* Insert(LRUHandle* h) {
LRUHandle** ptr = FindPointer(h->key(), h->hash);
LRUHandle* old = *ptr;
h->next_hash = (old == nullptr ? nullptr : old->next_hash);
*ptr = h;
if (old == nullptr) {
++elems_;
if (elems_ > length_) {
// Since each cache entry is fairly large, we aim for a small
// average linked list length (<= 1).
Resize();
}
}
return old;
}

这里大概要求 next_hash 链表不会很长, 然后不加上这个来节省空间。

RocksDB LRU

RocksDB 的 LRU 大致支持类似 LevelDB 的 LRU, 但是它支持类似 2-LRU,会在一定条件下,把 LRU 丢到高优队列中:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
  enum Flags : uint8_t {
// Whether this entry is referenced by the hash table.
IN_CACHE = (1 << 0),
// Whether this entry is high priority entry.
IS_HIGH_PRI = (1 << 1),
// Whether this entry is in high-pri pool.
IN_HIGH_PRI_POOL = (1 << 2),
// Whether this entry has had any lookups (hits).
HAS_HIT = (1 << 3),
};

void LRUCacheShard::LRU_Insert(LRUHandle* e) {
assert(e->next == nullptr);
assert(e->prev == nullptr);
size_t total_charge = e->CalcTotalCharge(metadata_charge_policy_);
if (high_pri_pool_ratio_ > 0 && (e->IsHighPri() || e->HasHit())) {
// Inset "e" to head of LRU list.
e->next = &lru_;
e->prev = lru_.prev;
e->prev->next = e;
e->next->prev = e;
e->SetInHighPriPool(true);
high_pri_pool_usage_ += total_charge;
MaintainPoolSize();
} else {
// Insert "e" to the head of low-pri pool. Note that when
// high_pri_pool_ratio is 0, head of low-pri pool is also head of LRU list.
e->next = lru_low_pri_->next;
e->prev = lru_low_pri_;
e->prev->next = e;
e->next->prev = e;
e->SetInHighPriPool(false);
lru_low_pri_ = e;
}
lru_usage_ += total_charge;
}

近似 LRU 和其他方法

可以看到,这里的 LRU 是带锁实现的。HashTable + doubly linked list 让它的并发变得比较麻烦。这里可以参考一下一些近似的 LRU 算法,在算法本身的性能上有相对好一些的开销:

  1. redis 的 LRU
    1. 这里的 LRU 感觉一点都不 LRU,只是产生一个 LRU 时间,采样 key, 然后处理掉“ Least Recent Used” 的数据,是一个与概率的结合,本身靠采样 + 随机性实现
  2. RocksDB 的 clock 算法
    1. 这里 clock 本身相对 LRU 来说好并行很多,RocksDB 这里用了 tbb 中的库来实现。

此外,也有 Adaptive replacement cache 这样的策略,或者 LFU 这样的策略。可以配合 bench 和 pprof ,配合自己的需求使用。