Automatically fall back to LOGICAL_INO if LOGICAL_INO_V2 fails and no
_V2 flags are used.
Add methods to set the flags argument with build portability to older
headers.
Use thread_local storage for the somewhat large buffers used by
LOGICAL_INO_V2 (and other users of BtrfsDataContainer like INO_PATHS).
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
ExtentWalker doesn't gain significant benefits from caching, and the
extra SEARCH_V2 ioctls were blamed for a 33% kernel CPU overhead by perf.
Reduce the number of extents to 16 in lieu of fixing the caching.
This gives a significant speed boost on CPU-bound workloads compared
to the original 1024--almost 40% faster on a single SSD with a filesystem
consisting of raw VM images mounted with compress=zstd.
This also seems to reduce LOGICAL_INO overhead. Perhaps SEARCH_V2 and
LOGICAL_INO were trying to lock the same extents, and interfering with
each other?
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The -g option limits the number of worker threads when the target load
average is exceeded. On some systems the load normally runs high, and
continuous bees operation is required to avoid running out of disk space.
Add a -G/--thread-min option to force at least some threads to continue
running.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Add -g / --loadavg-target parameter to track system load and add or
remove bees worker threads dynamically to keep system load close to the
loadavg target. Thread count may vary from zero to the maximum
specified by -c or -C, and is adjusted every 5 seconds.
This is better than implementing a similar load average scheme from
outside of the process (though that is still possible) because the
in-process load tracker does not disrupt the performance timing feedback
mechanisms as a freezer cgroup or SIGSTOP would when controlling bees
from outside. The internal load average tracker can also adjust the
number of active threads while an external tracker can only choose from
the maximum or zero.
Also fix a bug where a Task could deadlock waiting for itself to exit
if it tries to insert a new Task after the number of worker threads has
been set to zero.
Also correct usage message for --scan-mode (values are 0..2) since
we are touching adjacent lines anyway.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
set() was broken and redundant. Calling hold() and discarding the
returned object has the correct effect.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The task queue can become very large with many subvols, requiring hours
for the queue to clear. 'beescrawl.dat' saves in the meantime will save
the work currently scheduled, not the work currently completed.
Fix by tracking progress with ProgressTracker. ProgressTracker::begin()
gives the last completed crawl position. ProgressTracker::end() gives
the last scheduled crawl position. begin() does not advance if there
is any item between begin() and end() is not yet completed. In between
are crawled extents that are on the task queue but not yet processed.
The file 'beescrawl.dat' saves the begin() position while the extent
scanning task queue is fed from the end() position.
Also remove an unused method crawl_state_get() and repurpose the
operator<(BeesCrawlState) that nobody was using.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Clearing the FD cache could trigger a lot of inode evicts in the kernel,
which will block the cache entry destructors called by map::clear().
This prevents any cache lookups or new file opens while it happens.
Move the map to an auto variable and destroy it after releasing the
mutex lock. This probably has the same net result (all the bees threads
will be blocked in the kernel instead of on a bees mutex), but at least
the problem is outside of userspace now.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The default constructor makes it more convenient to use Task as a
class member.
The ID is useful to disambiguate Task references.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
update_monotonic does not reset the counter if a new count is smaller than
earlier counts. Useful when consuming an unsorted stream of eveent counts.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Perf was blaming more than 50% of cycles on TREE_SEARCH_V2. strace
showed 4 TREE_SEARCH_V2 calls for every pread in grow_backward().
Fix by increasing the extent fetch batch size so it is more likely
to include the desired items in the first fetch attempt.
This removes TREE_SEARCH_V2 from the top 10 list of cycle consumers.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
RateEstimator estimates the rate of external events by sampling a
counter.
Conversion functions are provided to predict the time when the
event counter will be incremented to particular values based on past
observations of the event counter.
Synchronization functions are provided to block a thread until a specific
counter value is reached.
Event polling is supported using the history of previous event counts
to determine the predicted time of the next event. A decay function
emphasizes more recent event history.
Polling delays are bounded by minimum and maximum values in the constructor
parameters.
wait_for() and wait_until() block the calling thread until the target
event count is reached (or the counter is reset). These functions are
not bounded by min_delay or max_delay, and require a separate tread
to call update(). wait_for() waits for the counter to be incremented
from its current value by the given count. wait_until() waits for the
counter to reach an absolute value.
update() counts external events and unblocks threads that are blocked
in wait_for() or wait_until(). If the event counter decreases then it
is reset to the new value.
duration() and time_point() convert relative and absolute event counts
into relative and absolute C++11 time quantities based on the last update
time, last observed event count, and the observed event rate.
Convenience functions seconds_for() and seconds_until() calculate
polling delays for for the desired relative and absolute event counts
respectively. These delays are bounded by max and min delay parameters.
rate() and ratio() provide conversion factors based on the current
estimated event rate.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Since we are now unconditionally rendering the print_fn as a static
string, there is no need for it to be a function. We also need it to
be brief and mostly constant.
Use a string instead. Put the string before the function in the Task
constructor arguments so that the title string appears as a heading in
code, since we are making a breaking API change already.
Drop TASK_MACRO as it is broken by this change, but there is no similar
usage of Task anywhere to make it worth fixing.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This enables bees' thread introspection to use task descriptions in
status and log messages.
BeesNote will be calling Task::current_task() from non-Task contexts,
which means we need to allow Task's shared state pointer to be null.
Remove some asserts that will ruin our day in that case.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit adds log levels to the output. In systemd, it makes colored
lines, otherwise it's probably just a number. Bees is very chatty, so
this paves the road for log level filtering.
Signed-off-by: Kai Krakow <kai@kaishome.de>
We need a better cache expiration algorithm than "make a copy of
the entire thing, sort it while holding a lock, and delete half
the items in a single burst."
Replace the Lamport clock with a double-linked list. Each insert
or lookup operation moves the affected item to the head of the list.
Each erase operation deletes one single item at the tail of the list.
Also sort out some iterator invalidation nonsense by doing erases before
inserts instead of "insert, erase, find the inserted item again because
we invalidated the found iterator during the erase."
The new implementation adds a second word-sized member to each Value
as well as a copy of the Key. Hopefully the enlarged size is not
a deal-breaker.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
We need a mechanism for distributing work across processor cores and
disks.
Task implements a simple FIFO/LIFO queue model for executing closures.
Some locking primitives are included (mutex and barrier).
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
We were holding weak refs until the next time the resource ID was used.
This is a bad thing if resource IDs are sparse (e.g. pointers or hashes)
because we'll never see an ID twice.
To fix, determine whether we released the last instance of a resource,
and if so, free its weak ref immediately.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The bugs in other parts of the code have been identified and fixed,
so the overprotective locks around shared_ptr can be removed.
Keep the other improvements to the Resource class.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Use a static function instead of embedding side-effects in the constructor
of an unrelated class.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
(cherry picked from commit 85106bd9a9fb3be2c574df1a3aed6674e3951465)
To make bees more friendly to use with syslog/systemd, we add an option
to omit timestamps from the log output.
Signed-off-by: Kai Krakow <kai@kaishome.de>
If you have a lot of or a few big nocow files (like vm images) which
contain a lot of potential deduplication candidates, bees becomes
incredibly slow running through a lot "invalid operation" exceptions.
Let's just skip over such files to get more bang for the buck. I did no
regression testing as this patch seems trivial (and I cannot imagine any
pitfalls either). The process progresses much faster for me now.
Keep track of the locking thread so we can see why we are deadlocked
in gdb.
Use a handle type for locks based on shared_ptr. Change the handle type
name to flush out any non-auto local variables.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
(cherry picked from commit aa0b22d445664409c36503c6fd808bc49b6816d0)
check_overflow() will invalidate iterators if it decides there are too
many cache entries.
If items are deleted from the cache, search for the inserted item again
to ensure the iterator is valid.
Increase size of timestamp to size_t.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Before:
unique_lock<mutex> lock(some_mutex);
// run lock.~unique_lock() because return
// return reference to unprotected heap
return foo[bar];
After:
unique_lock<mutex> lock(some_mutex);
// make copy of object on heap protected by mutex lock
auto tmp_copy = foo[bar];
// run lock.~unique_lock() because return
// pass locally allocated object to copy constructor
return tmp_copy;
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
If we release the lock first (and C++ destructor order says we do), then
the return value will be constructed from data living in an unprotected
container object. That data might be destroyed before we get to the
copy constructor for the return value.
Make a temporary copy of the return value that won't be destroyed by any
other thread, then unlock the mutex, then return the copy object.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Get rid of the ResourceHolder class.
Fix GCC static template member instantiation issues.
Replace assert() with exceptions.
shared_ptr can't seem to do reference counting in a multi-threaded
environment. The code looks correct (for both ResourceHandle and
std::shared_ptr); however, continual segfaults don't lie.
Carpet-bomb with mutex locks to reduce the likelihood of losing shared_ptr
races.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Found by valgrind. It was mostly harmless because the range of
usable values is limited by m_burst (which was initialized) and 0.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Extend the LockSet class so that the total number of locked (active)
items can be limited. When the limit is reached, no new items can be
locked until some existing locked items are unlocked.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This gets rid of some more big memsets. It may replace them
with a lot of tiny mallocs, though. If this turns out to be
a bad idea then at least we can easily revert the change.
We really do need some large buffers for BtrfsIoctlSearchKey in some
cases, but we don't need to zero them out first. Don't do that so we
save some CPU.
Reduce the default buffer size to 4K because most BISK users don't get
need much more than 1K. Set the buffer size explicitly to the product of
the number of items and the desired item size in the places that really
need a lot of items.
The current crc64 algorithm is a variant of the Redis implementation.
Change it to a variant of the Adler implementation as described
at https://matt.sh/redis-crcspeed
Test program at https://github.com/PeeJay/crc64-compare
Filesize: 1.1G
Asking crc64-redis to sum "/media/peejay/BTRFS/1/ubuntu-14.04.5-desktop-amd64.iso"...
Asking crc64-adler to sum "/media/peejay/BTRFS/1/ubuntu-14.04.5-desktop-amd64.iso"...
Redis CRC-64: f971f9ac6c8ba458
Adler CRC-64: f971f9ac6c8ba458
Adler throughput: 1659.913308 MB/s
Redis throughput: 437.284661 MB/s
Adler is 3.79x faster than Redis
Signed-off-by: Paul Jones <paul@pauljones.id.au>