1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-18 13:55:44 +02:00

11 Commits

Author SHA1 Message Date
Zygo Blaxell
382f8bf06a hash: prevent eleventy-gigabyte core dumps
Add MADV_DONTDUMP to the list of advice flags.

There are now three flags which may or may not be supported by the
target kernel.  Try each one and log its success or failure separately.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-12 22:55:08 -05:00
Zygo Blaxell
5e91529ad2 hash: remove the unused m_prefetch_rate_limit
The hash table statistics calculation in BeesHashTable::prefetch_loop
and the data-driven operation of the extent scanner always pulls the
hash table into RAM as fast as the disk will push the data.  We never
use the prefetch rate limit, so remove it.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-11 21:15:12 -05:00
Zygo Blaxell
bddc07bd28 hash: make thread status message more consistent
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-11 21:15:12 -05:00
Zygo Blaxell
1b261b1ba7 build: move BEES_VERSION to a separate C file to avoid unnecessary building
Every git commit was causing bees.cc and bees-hash.cc to be rebuilt,
which was expensive and unnecessary.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-09 23:23:05 -05:00
Zygo Blaxell
e8eaa7e471 trivial: mass purge of whitespace errors
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-06 22:14:50 -05:00
Zygo Blaxell
efda609f66 log: remove path from thread name
The thread name has an arbitrarily limited size, and we are eventually
removing support for multiple paths in a single bees daemon process.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2016-12-27 15:15:16 -05:00
Zygo Blaxell
77c11bb90f bees: add version string and put it in main() and stats file
Now that we have more than one bees release it's somewhat important
to know which one each bug report is for...
2016-12-08 23:55:59 -05:00
Zygo Blaxell
b5c01c1985 hash: don't throw an exception if MADV_HUGEPAGE fails
We don't _need_ transparent hugepages.  We like them because they can
be faster, but it's not a requirement, and some people will disable
transparent hugepages because they make non-Bees-like workloads slow.

Try to use MADV_HUGEPAGE, but if it fails, just log the error and
continue.

MADV_DONTFORK would be useful if we still fork()ed, but we don't currently
do that.  It's still a useful flag to have because a fork() with more
than 50% of RAM in mlocked pages would result in a kernel OOM crash.
I don't think it's possible to run Bees on a kernel that does not support
the MADV_DONTFORK flag, so don't bother checking for that flag separately.
2016-12-08 23:55:59 -05:00
Zygo Blaxell
642581e89a hash: remove the experimental shared hash-table and shared mmap features
The experiments are over, and the results were not success.

Having two filesystems cohabiting in the same hash table results in a
lot of false positives, each of which requires some heavy IO to resolve.

Using MAP_SHARED to share a beeshash.dat between processes results in
catastrophically bad performance.

These features were abandoned long ago, but some of the code--and even
worse, its documentation--still remains.

Bees wants a hash table false positive rate below 0.1%.  With a shared
hash table the FP rate is about the same as the dedup rate.  Typically
duplicate files on one filesystem are duplicate on many filesystems.

One or more of Linux VFS and the btrfs mmap(MAP_SHARED) implementation
produce extremely poor performance results.  A five-order-of-magnitude
speedup was achieved by implementing paging in userspace with worker
threads.  We no longer need the support code for the MAP_SHARED case.

It is still possible to run many BeesContexts in a single process,
but now the only thing contexts share is the FD cache.
2016-12-02 00:26:02 -05:00
Zygo Blaxell
6fa8de660b hash: create beeshash.dat if it does not exist
BeesHashTable can now create a beeshash.dat if the file does not already
exist.  Currently the default size is one hash table extent (16MB) and
there's no way to change that (yet), so users should still create their
own hash tables for now.

The opening of the hash table is deferred (slightly) in preparation for
hash table resizing.

No doc as the feature is currently unfinished.
2016-12-02 00:20:30 -05:00
Zygo Blaxell
cca0ee26a8 bees: remove local cruft, throw at github 2016-11-17 12:12:13 -05:00