Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit the cluster cache by memory instead of number of clusters #956

Open
wants to merge 10 commits into
base: main
Choose a base branch
from
60 changes: 49 additions & 11 deletions src/concurrent_cache.h
Original file line number Diff line number Diff line change
Expand Up @@ -23,13 +23,38 @@

#include "lrucache.h"

#include <chrono>
#include <cstddef>
#include <future>
#include <mutex>

namespace zim
{

template<typename CostEstimation>
struct FutureToValueCostEstimation {
template<typename T>
static size_t cost(const std::shared_future<T>& future) {
// The future is the value in the cache.
// When calling getOrPut, if the key is not in the cache,
// we add a future and then we compute the value and set the future.
// But lrucache call us when we add the future, meaning before we have
// computed the value. If we wait here (or use future.get), we will dead lock
// as we need to exit before setting the value.
// So in this case, we return 0. `ConcurrentCache::getOrPut` will correctly increase
// the current cache size when it have an actual value.
// We still need to compute the size of the value if the future has a value as it
// is also use to decrease the cache size when the value is drop.
std::future_status status = future.wait_for(std::chrono::nanoseconds::zero());
if (status == std::future_status::ready) {
return CostEstimation::cost(future.get());
} else {
return 0;
}
}

};

/**
ConcurrentCache implements a concurrent thread-safe cache

Expand All @@ -39,16 +64,16 @@ namespace zim
safe, and, in case of a cache miss, will block until that element becomes
available.
*/
template <typename Key, typename Value>
class ConcurrentCache: private lru_cache<Key, std::shared_future<Value>>
template <typename Key, typename Value, typename CostEstimation>
class ConcurrentCache: private lru_cache<Key, std::shared_future<Value>, FutureToValueCostEstimation<CostEstimation>>
{
private: // types
typedef std::shared_future<Value> ValuePlaceholder;
typedef lru_cache<Key, ValuePlaceholder> Impl;
typedef lru_cache<Key, ValuePlaceholder, FutureToValueCostEstimation<CostEstimation>> Impl;

public: // types
explicit ConcurrentCache(size_t maxEntries)
: Impl(maxEntries)
explicit ConcurrentCache(size_t maxCost)
: Impl(maxCost)
{}

// Gets the entry corresponding to the given key. If the entry is not in the
Expand All @@ -70,6 +95,19 @@ class ConcurrentCache: private lru_cache<Key, std::shared_future<Value>>
if ( x.miss() ) {
try {
valuePromise.set_value(f());
auto cost = CostEstimation::cost(x.value().get());
// There is a small window when the valuePromise may be drop from lru cache after
// we set the value but before we increase the size of the cache.
// In this case decrease the size of `cost` before increasing it.
// First of all it should be pretty rare as we have just put the future in the cache so it
// should not be the least used item.
// If it happens, this should not be a problem if current_size is bigger than `cost` (most of the time)
// For the really rare specific case of current cach size being lower than `cost` (if possible),
// `decreaseCost` will clamp the new size to 0.
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't like such subtle pitfalls here and there (that may turn into time-bombs). This makes me come up with the following question and proposal:

Is memory usage accounting going to expand beyond clusters? If not, then why don't we implement item cost tracking at ConcurrentCache level, leaving lru_cache absolutely intact?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is memory usage accounting going to expand beyond clusters?

I don't know. The may purpose of all this is to avoid a exaggerated memory consumption for long time running kiwix-serve with a "lot" (not defined, relative to the system) of zim to serve.

We have assumed it was the cluster cache taking all this memory. Mainly because we currently have a limit per zim file (and so, a lot of zim files make the cache mostly unlimited) and this cache should be more memory consuming than dirent cache.

So there is no current plan to extend this limitation to anything else for now, but we may if we found our assumption was wrong.

If not, then why don't we implement item cost tracking at ConcurrentCache level, leaving lru_cache absolutely intact?

Your idea seems good at first glance. However thinking twice, here few limitations (not necessarily blockers):

  • We would have to make the inner lru_cache used by ConcurrentCache unlimited (can be easily made with a huge size_t limit, but still)
  • Concurrent cache would have to re-implement all the put/putMissing and drop method to drop itself the clusters from the cache and trace the allocated memory.
  • All the thing about what is the (constant) memory consumption of a cluster would still stand.

At the end, it would simply move all the things about memory consumption a level up (in ConcurrentCache) and make it call lru_cache to drop an item even if its internal limit is not reach.
It would indeed make lru_cache almost unmodified but I'm not sure it would worth it.
(And I don't see a reason we would like to keep lru_cache clean from memory limitation but accept the modifications in ConcurrentCache)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(And I don't see a reason we would like to keep lru_cache clean from memory limitation but accept the modifications in ConcurrentCache)

The main benefit would be avoiding design quirks.

  • Concurrent cache would have to re-implement all the put/putMissing and drop method to drop itself the clusters from the cache and trace the allocated memory.

ConcurrentCache is a relatively lightweight wrapper around lru_cache (without any is-a relation between them). Its public API related to memory usage management consists of just three member functions:

  • getOrPut()
  • drop()
  • setMaxSize()

Maintaining memory limits entirely within that small set of functions can be achieved with less changes. It also simplifies the implementation in which the cost of an item is evaluated only once (on addition) and is stored with the item.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The main benefit would be avoiding design quirks.

But ConcurrentCache is a thin wrapper to make the lru_cache thread safe.
Adding a memory limitation on it seems to me this is a quirk we should avoid.

Maintaining memory limits entirely within that small set of functions can be achieved with less changes. It also simplifies the implementation in which the cost of an item is evaluated only once (on addition) and is stored with the item.

I've started implementing your idea to see if I was wrong. Here few things have faced:

  • We have to decrease the size when we drop an item. So we need to make lru_cache::dropLast return the drop item. However, current drop implementation return false if key is not found. As we want to return the value and can't use std::optional (we limit ourselves to cpp14), we would have to throw the std::out_of_range instead of returning false. Add so all user of drop have to catch it.
  • We still have to drop all cluster from a archive at archive destruction. So we need a dropAll. Either it is implemented in lru_cache (and we have to return all of them to be able to decrease the size) or it is implemented in ConcurrentCache (and we break encapsulation).
  • lru_cache may drop value by itself. This should not happen in practice as ConcurrentCache will configure it to be unlimited. But it create somehow an inconsistent api as we have code which can drop value from lru_cache without ConcurrentCache catching it.

Anyway, last commit implement your idea. We can discuss about real implementation.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Indeed, it was more complicated than I thought, but it helped to surface a potential problem that was present with the old implementation too. I felt that something was wrong, but I thought that it was just an uncomfortable feeling caused by the arcane interplay between lru_cache and ConcurrentCache. Now with the cost accounting fully done in a single class the problem was easier to spot.

Consider working with two ZIM files - one on (very) slow drive and the other on a fast drive. A cluster from the slow ZIM file is requested. A placeholder for it is created in the cache, and while the cluster is loading, a burst of activity on the fast ZIM file happens that fills the cache with entries from the fast ZIM file and drops the placeholder entry of the slow ZIM file. Since the cost of that placeholder entry is not known, a fake value of 0 is used instead (the result being that the removal of the entry has no effect and leaves no trace). Then loading of the slow cluster completes and the cache cost is increased by its actual memory usage value, but that item is no longer in the cache! Repetition of that scenario can lead to the current "fake" cost of the cache being above the limit, even though the cache is empty!

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The same scenario can occur more naturally as follows:

  • slow access: cluster not in the cache (cache miss)
  • fast access: cluster already in the cache (cache hit, the cluster is moved to the beginning of the LRU list)

Thus, a cache miss, followed by a large enough number of cache hits (so that the cache miss is pushed to the end of the LRU queue), followed by another cache miss (occurring before the first cache miss is fully resolved), will result in the first cache miss being evicted from the cache before it was even properly recorded and the cache memory usage balance off by the cost of that ill-fated cluster.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Last commit should fix this point.
(I have removed the "WIP" commit)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it doesn't

{
std::unique_lock<std::mutex> l(lock_);
Impl::increaseCost(cost);
}
} catch (std::exception& e) {
drop(key);
throw;
Expand All @@ -85,19 +123,19 @@ class ConcurrentCache: private lru_cache<Key, std::shared_future<Value>>
return Impl::drop(key);
}

size_t getMaxSize() const {
size_t getMaxCost() const {
std::unique_lock<std::mutex> l(lock_);
return Impl::getMaxSize();
return Impl::getMaxCost();
}

size_t getCurrentSize() const {
size_t getCurrentCost() const {
std::unique_lock<std::mutex> l(lock_);
return Impl::size();
return Impl::cost();
}

void setMaxSize(size_t newSize) {
void setMaxCost(size_t newSize) {
std::unique_lock<std::mutex> l(lock_);
return Impl::setMaxSize(newSize);
return Impl::setMaxCost(newSize);
}

private: // data
Expand Down
8 changes: 4 additions & 4 deletions src/dirent_accessor.h
Original file line number Diff line number Diff line change
Expand Up @@ -55,9 +55,9 @@ class LIBZIM_PRIVATE_API DirectDirentAccessor
std::shared_ptr<const Dirent> getDirent(entry_index_t idx) const;
entry_index_t getDirentCount() const { return m_direntCount; }

size_t getMaxCacheSize() const { return m_direntCache.getMaxSize(); }
size_t getCurrentCacheSize() const { return m_direntCache.size(); }
void setMaxCacheSize(size_t nbDirents) const { m_direntCache.setMaxSize(nbDirents); }
size_t getMaxCacheSize() const { return m_direntCache.getMaxCost(); }
size_t getCurrentCacheSize() const { return m_direntCache.cost(); }
void setMaxCacheSize(size_t nbDirents) const { m_direntCache.setMaxCost(nbDirents); }

private: // functions
std::shared_ptr<const Dirent> readDirent(offset_t) const;
Expand All @@ -67,7 +67,7 @@ class LIBZIM_PRIVATE_API DirectDirentAccessor
std::unique_ptr<const Reader> mp_pathPtrReader;
entry_index_t m_direntCount;

mutable lru_cache<entry_index_type, std::shared_ptr<const Dirent>> m_direntCache;
mutable lru_cache<entry_index_type, std::shared_ptr<const Dirent>, UnitCostEstimation> m_direntCache;
mutable std::mutex m_direntCacheLock;

mutable std::vector<char> m_bufferDirentZone;
Expand Down
6 changes: 3 additions & 3 deletions src/fileimpl.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -792,13 +792,13 @@ bool checkTitleListing(const IndirectDirentAccessor& accessor, entry_index_type


size_t FileImpl::getClusterCacheMaxSize() const {
return clusterCache.getMaxSize();
return clusterCache.getMaxCost();
}
size_t FileImpl::getClusterCacheCurrentSize() const {
return clusterCache.getCurrentSize();
return clusterCache.getCurrentCost();
}
void FileImpl::setClusterCacheMaxSize(size_t nbClusters) {
clusterCache.setMaxSize(nbClusters);
clusterCache.setMaxCost(nbClusters);
}

size_t FileImpl::getDirentCacheMaxSize() const {
Expand Down
3 changes: 2 additions & 1 deletion src/fileimpl.h
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,7 @@
#include "file_reader.h"
#include "file_compound.h"
#include "fileheader.h"
#include "lrucache.h"
#include "zim_types.h"
#include "direntreader.h"

Expand All @@ -55,7 +56,7 @@ namespace zim
std::unique_ptr<const IndirectDirentAccessor> mp_titleDirentAccessor;

typedef std::shared_ptr<const Cluster> ClusterHandle;
ConcurrentCache<cluster_index_type, ClusterHandle> clusterCache;
ConcurrentCache<cluster_index_type, ClusterHandle, UnitCostEstimation> clusterCache;

const bool m_hasFrontArticlesIndex;
const entry_index_t m_startUserEntry;
Expand Down
108 changes: 89 additions & 19 deletions src/lrucache.h
Original file line number Diff line number Diff line change
Expand Up @@ -43,10 +43,38 @@
#include <cstddef>
#include <stdexcept>
#include <cassert>
#include <iostream>

namespace zim {

template<typename key_t, typename value_t>
struct UnitCostEstimation {
template<typename value_t>
static size_t cost(const value_t& value) {
return 1;
}
};

/**
* A lru cache where the cost of each item can be different than 1.
*
* Most lru cache is limited by the number of items stored.
* This implementation may have a different "size" per item, so the current size of
* this lru is not the number of item but the sum of all items' size.
*
* This implementation used is pretty simple (dumb) and have few limitations:
* - We consider than size of a item do not change over time. Especially the size of a
* item when we put it MUST be equal to the size of the same item when we drop it.
* - Cache eviction is still a Least Recently Used (LRU), so we drop the least used item(s) util
* we have enough space. No other consideration is used to select which item to drop.
*
* This lru is parametrized by a CostEstimation type. The type must have a static method `cost`
* taking a reference to a `value_t` and returing its "cost". As already said, this method must
* always return the same cost for the same value.
*
* While cost could be any kind of value, this implemention is intended to be used only with
* `UnitCostEstimation` (classic lru) and `FutureToValueCostEstimation<ClusterMemorySize>`.
*/
template<typename key_t, typename value_t, typename CostEstimation>
class lru_cache {
Comment on lines +78 to 79
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a serious enhancement that should be difficult to do right. LRU is about heuristics of maintaining items in the cache based on the order they have been used. Introducing cost adds an important dimension to consider - when facing the choice of dropping from the cache an expensive old item or a cheap not-so-old one, which one is more justified? Or the other way around - it may be preferable to get rid of a heavy item that is not the first candidate for removal according to the LRU heuristics since that will make more room in the cache for multiple lightweight items.

public: // types
typedef typename std::pair<key_t, value_t> key_value_pair_t;
Expand Down Expand Up @@ -81,9 +109,10 @@ class lru_cache {
};

public: // functions
explicit lru_cache(size_t max_size) :
_max_size(max_size) {
}
explicit lru_cache(size_t max_cost) :
_max_cost(max_cost),
_current_cost(0)
{}

// If 'key' is present in the cache, returns the associated value,
// otherwise puts the given value into the cache (and returns it with
Expand All @@ -103,6 +132,8 @@ class lru_cache {
auto it = _cache_items_map.find(key);
if (it != _cache_items_map.end()) {
_cache_items_list.splice(_cache_items_list.begin(), _cache_items_list, it->second);
decreaseCost(CostEstimation::cost(it->second->second));
increaseCost(CostEstimation::cost(value));
it->second->second = value;
} else {
putMissing(key, value);
Expand All @@ -120,37 +151,72 @@ class lru_cache {
}

bool drop(const key_t& key) {
list_iterator_t list_it;
try {
auto list_it = _cache_items_map.at(key);
_cache_items_list.erase(list_it);
_cache_items_map.erase(key);
return true;
list_it = _cache_items_map.at(key);
} catch (std::out_of_range& e) {
return false;
}
decreaseCost(CostEstimation::cost(list_it->second));
_cache_items_list.erase(list_it);
_cache_items_map.erase(key);
return true;
}

bool exists(const key_t& key) const {
return _cache_items_map.find(key) != _cache_items_map.end();
}

size_t size() const {
return _cache_items_map.size();
size_t cost() const {
return _current_cost;
}

size_t getMaxSize() const {
return _max_size;
size_t getMaxCost() const {
return _max_cost;
}

void setMaxSize(size_t newSize) {
while (newSize < this->size()) {
void setMaxCost(size_t newMaxCost) {
while (newMaxCost < this->cost()) {
dropLast();
}
_max_size = newSize;
_max_cost = newMaxCost;
}

protected:

void increaseCost(size_t extra_cost) {
// increaseSize is called after we have added a value to the cache to update
// the size of the current cache.
// We must ensure that we don't drop the value we just added.
// While it is technically ok to keep no value if max cache size is 0 (or memory size < of the size of one cluster)
// it will make recreate the value all the time.
// Let's be nice with our user and be tolerent to misconfiguration.
if (!extra_cost) {
// Don't try to remove an item if we have new size == 0.
// This is the case when concurent cache add a future without value.
// We will handle the real increase size when concurent cache will directly call us.
return;
}
_current_cost += extra_cost;
while (_current_cost > _max_cost && size() > 1) {
dropLast();
}
}

void decreaseCost(size_t costToRemove) {
if (costToRemove > _current_cost) {
std::cerr << "WARNING: We have detected inconsistant cache management, trying to remove " << costToRemove << " from a cache with size " << _current_cost << std::endl;
std::cerr << "Please open an issue on https://github.com/openzim/libzim/issues with this message and the zim file you use" << std::endl;
_current_cost = 0;
} else {
_current_cost -= costToRemove;
}
}

private: // functions
void dropLast() {
auto list_it = _cache_items_list.back();
decreaseCost(CostEstimation::cost(list_it.second));
_cache_items_map.erase(_cache_items_list.back().first);
_cache_items_list.pop_back();
}
Expand All @@ -159,15 +225,19 @@ class lru_cache {
assert(_cache_items_map.find(key) == _cache_items_map.end());
_cache_items_list.push_front(key_value_pair_t(key, value));
_cache_items_map[key] = _cache_items_list.begin();
if (_cache_items_map.size() > _max_size) {
dropLast();
}
increaseCost(CostEstimation::cost(value));
}

size_t size() const {
return _cache_items_map.size();
}


private: // data
std::list<key_value_pair_t> _cache_items_list;
std::map<key_t, list_iterator_t> _cache_items_map;
size_t _max_size;
size_t _max_cost;
size_t _current_cost;
};

} // namespace zim
Expand Down
Loading