mirror of
https://github.com/Zygo/bees.git
synced 2025-06-17 10:06:16 +02:00
hash: remove the experimental shared hash-table and shared mmap features
The experiments are over, and the results were not success. Having two filesystems cohabiting in the same hash table results in a lot of false positives, each of which requires some heavy IO to resolve. Using MAP_SHARED to share a beeshash.dat between processes results in catastrophically bad performance. These features were abandoned long ago, but some of the code--and even worse, its documentation--still remains. Bees wants a hash table false positive rate below 0.1%. With a shared hash table the FP rate is about the same as the dedup rate. Typically duplicate files on one filesystem are duplicate on many filesystems. One or more of Linux VFS and the btrfs mmap(MAP_SHARED) implementation produce extremely poor performance results. A five-order-of-magnitude speedup was achieved by implementing paging in userspace with worker threads. We no longer need the support code for the MAP_SHARED case. It is still possible to run many BeesContexts in a single process, but now the only thing contexts share is the FD cache.
This commit is contained in:
@ -247,8 +247,6 @@ BeesContext::BeesContext(shared_ptr<BeesContext> parent) :
|
||||
m_parent_ctx(parent)
|
||||
{
|
||||
if (m_parent_ctx) {
|
||||
m_hash_table = m_parent_ctx->hash_table();
|
||||
m_hash_table->set_shared(true);
|
||||
m_fd_cache = m_parent_ctx->fd_cache();
|
||||
}
|
||||
}
|
||||
|
Reference in New Issue
Block a user