The vector<uint8_t> in the hash table doesn't hurt very much--only a few
microseconds per 128K hash block.
The vector<uint8_t> in BeesBlockData hurts a bit more--we run that
constructor thousands of times per second.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The hash table is one of the few cases in bees where a non-trivial amount
of page cache memory will be used in a predictable way, so we can advise
the kernel about our IO demands in advance.
Use WILLNEED to prefetch hash table pages at startup.
Use DONTNEED to trigger writeback on hash table pages at shutdown.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Get rid of the thread-local TempFiles and use Pool instead. This
eliminates a potential FD leak when the loadavg governor repeatedly
creates and destroys threads.
With the old per-thread TempFiles, we were guaranteed to have exclusive
ownership of the TempFile object within the current thread. Pool is
somewhat stricter: it only guarantees ownership while the checked-out
Handle exists. Adjust the users of TempFile objects to ensure they hold
the Handle object until they are finished using the TempFile.
It appears that maintaining large, heavily-reflinked, long-lived temporary
files costs more than truncating after every use: btrfs has to write
multiple references to the temporary file's extents, then some commits
later, remove references as the temporary file is deleted or truncated.
Using the temporary file in a dedupe operation flushes the data to disk,
so nothing is saved by pretending that there is writeback pipelining and
trying to avoid flushes in truncate. Pool provides usage tracking and
a checkin callback, so use it to truncate the temporary file immediately
after every use.
Redesign TempFile so that every instance creates exactly one Fd which
persists over the lifetime of the TempFile object. Provide a reset()
method which resets the file back to the initial state and call it from
the Pool checkin callback. This makes TempFile's lifetime equivalent to
its Fd's lifetime, which simplifies interactions with FdCache and Roots.
This change means we can now blacklist temporary files without having
an effective memory leak, so do that. We also have a reason to ever
remove something from the blacklist, so add a method for that too.
In order to move to extent-centric addressing, we need to be able to
reliably open temporary files by root and inode number. Previously we
would place TempFile fd's into the cache with insert_root_ino, but the
cache would be cleared periodically, and it would not be possible to
reopen temporary files after that happened. Now that the TempFile's
lifetime is the same as the TempFile Fd's lifetime, we can have TempFile
manage a separate FileId -> Fd map in Roots which is unaffected by the
periodic cache clearing. BeesRoots::open_root_ino_nocache will check
this map before attempting to open the file via btrfs root+ino lookup,
and return it through the cache as if Roots had opened the file via btrfs.
Hold a reference to BeesRoots in BeesTempFile because the usual way
to get such a reference now throws an exception in BeesTempFile's
destructor.
These changes make method BeesTempFile::create() and all methods named
insert_root_ino unnecessary, so delete them.
We construct and destroy TempFiles much less often now, so make their
constructor and destructor more informative.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Localize the hash function in bees to a single spot to make it easier
to change later (or at runtime).
Remove some code that was using a property of CRC as an optimization.
The optimization doesn't work for other hash functions, and running the
CRC function takes more CPU time than the optimization saved.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
We stopped supporting shared hash tables a long time ago. Remove comments
describing the behavior of shared hash tables.
Add an event counter for pushing a hash to the front when it is already at
the front.
Audited the code for a bug related to bucket handling that impairs space
efficiency when the bucket size is greater than 1. Didn't find one.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Capture SIGINT and SIGTERM and shut down, preserving current completed
crawl and hash table state.
* Executing tasks are completed, queued tasks are paused.
* Crawl state is saved.
* The crawl master and crawl writeback threads are terminated.
* The task queue is flushed.
* Dirty hash table extents are flushed.
* Hash prefetch and writeback threads are terminated.
* Hash table is deallocated.
* FD caches and tmpfiles are destroyed.
* Assuming the above didn't crash or deadlock, bees exits.
The above order isn't the fastest, but it does roughly follow the
shared_ptr dependencies and avoids data races--especially those that
might lead to bees reporting an extent scanned when it was only queued
for future scanning that did not occur.
In case of a violation of expected shared_ptr dependency order,
exceptions in BeesContext child object accessor methods (i.e. roots(),
hash_table(), etc) prevent any further progress in threads that somehow
remain unexpectedly active.
Move some threads from main into BeesContext so they can be stopped
via BeesContext. The main thread now runs a loop waiting for signals.
A slow FD leak was discovered in TempFile handling. This has not been
fixed yet, but an implementation detail of the C++ runtime library makes
the leak so slow it may never be important enough to fix.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Faster and more reliable toxic extent detection means we can now be much
less paranoid about creating toxic extents.
The paranoia has significant impact on dedupe hit rates because every
extent that contains even one toxic hash is abandoned. The preloaded
toxic hashes were chosen because they occur more frequently than any
other block contents in typical filesystem data. The combination of these
resulted in as much as 30% of duplicate extents being left untouched.
Remove the preloaded toxic extent blacklist, and rely on the new
kernel-CPU-usage-based workaround instead.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
"saved" is used only during hash table correctness analysis, which is
normally not enabled at compile time, and requires source modification
to enable.
Remove the pointless copy and save a tiny bit of CPU.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
BEESLOGNOTE was intended to combine BEESLOG and BEESNOTE, i.e. write a
log message and set the task status message from a single expression.
With the log levels we would now need several more variants
(BEESLOGNOTEDEBUG, BEESLOGNOTEERR...) or a parameter (BEESNOTELOG(DEBUG,
...)).
Or we give up on the idea. This combination was used only 3 times so far.
The log messages and the note message have different editorial styles.
Remove the three instances of BEESLOGNOTE, and make the BEESLOGNOTE
definition equvalent to BEESLOG at LOG_NOTICE level for consistency.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
No public version of bees ever created old-style compressed hash table
entries. Remove the code that supports them.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Having too many "write a message to the log" primitives is confusing,
and having one that intermittently and silently discards output is even
_more_ confusing.
Replace all BEESINFO with appropriate BEESLOG*s. Usually DEBUG.
Except for one or two that occur too often. Just delete those.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit adds log levels to the output. In systemd, it makes colored
lines, otherwise it's probably just a number. Bees is very chatty, so
this paves the road for log level filtering.
Signed-off-by: Kai Krakow <kai@kaishome.de>
The mlock runs much faster, probably because the hash fetches are
doing most of the work that mlock does.
It makes bees startup latency for testing smaller, even if it takes more
time in absolute terms.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This avoids PERFORMANCE warnings when large hash tables are used on slow
CPUs or with lots of worker threads. It also simplifies the code (no
locksets, only one object-wide mutex instead of two).
Fixed a few minor bugs along the way (e.g. we were not setting the dirty
flag on the right hash table extent when we detected hash table errors).
Simplified error handling: IO errors on the hash table are ignored,
instead of throwing an exception into the function that tried to use the
hash table.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Add MADV_DONTDUMP to the list of advice flags.
There are now three flags which may or may not be supported by the
target kernel. Try each one and log its success or failure separately.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The hash table statistics calculation in BeesHashTable::prefetch_loop
and the data-driven operation of the extent scanner always pulls the
hash table into RAM as fast as the disk will push the data. We never
use the prefetch rate limit, so remove it.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Every git commit was causing bees.cc and bees-hash.cc to be rebuilt,
which was expensive and unnecessary.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The thread name has an arbitrarily limited size, and we are eventually
removing support for multiple paths in a single bees daemon process.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
We don't _need_ transparent hugepages. We like them because they can
be faster, but it's not a requirement, and some people will disable
transparent hugepages because they make non-Bees-like workloads slow.
Try to use MADV_HUGEPAGE, but if it fails, just log the error and
continue.
MADV_DONTFORK would be useful if we still fork()ed, but we don't currently
do that. It's still a useful flag to have because a fork() with more
than 50% of RAM in mlocked pages would result in a kernel OOM crash.
I don't think it's possible to run Bees on a kernel that does not support
the MADV_DONTFORK flag, so don't bother checking for that flag separately.
The experiments are over, and the results were not success.
Having two filesystems cohabiting in the same hash table results in a
lot of false positives, each of which requires some heavy IO to resolve.
Using MAP_SHARED to share a beeshash.dat between processes results in
catastrophically bad performance.
These features were abandoned long ago, but some of the code--and even
worse, its documentation--still remains.
Bees wants a hash table false positive rate below 0.1%. With a shared
hash table the FP rate is about the same as the dedup rate. Typically
duplicate files on one filesystem are duplicate on many filesystems.
One or more of Linux VFS and the btrfs mmap(MAP_SHARED) implementation
produce extremely poor performance results. A five-order-of-magnitude
speedup was achieved by implementing paging in userspace with worker
threads. We no longer need the support code for the MAP_SHARED case.
It is still possible to run many BeesContexts in a single process,
but now the only thing contexts share is the FD cache.
BeesHashTable can now create a beeshash.dat if the file does not already
exist. Currently the default size is one hash table extent (16MB) and
there's no way to change that (yet), so users should still create their
own hash tables for now.
The opening of the hash table is deferred (slightly) in preparation for
hash table resizing.
No doc as the feature is currently unfinished.