Before:
unique_lock<mutex> lock(some_mutex);
// run lock.~unique_lock() because return
// return reference to unprotected heap
return foo[bar];
After:
unique_lock<mutex> lock(some_mutex);
// make copy of object on heap protected by mutex lock
auto tmp_copy = foo[bar];
// run lock.~unique_lock() because return
// pass locally allocated object to copy constructor
return tmp_copy;
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
Previously, the scan order processed each subvol in order. This required
very large amounts of temporary disk space, as a full filesystem scan
was required before any shared extents could be deduped. If the hash
table RAM was underprovisioned this would mean some shared dup blocks
were removed from the hash table before they could be deduped.
Currently the scan order takes the first unscanned extent from each
subvol. This works well if--and only if--the subvols are either empty
or children of a common ancestor. It forces the same inode/offset pairs
to be read at close to the same time from each subvol.
When a new snapshot is created, this ordering diverts scanning to the
new subvol until it catches up to the existing subvols. For large
filesystems with frequent snapshot creation this means that the scanner
never reaches the end of all subvols. Each new subvol effectively
resets the current scan position for the entire filesystem to zero.
This prevents bees from ever completing the first filesystem scan.
Change the order again, so that we now read one unscanned extent from
each subvol in round-robin fashion. When a new subvol is created, we
share scan time between old and new subvols. This ensures we eventually
finish scanning initial subvols and enter the incremental scanning state.
The cost of this change is more repeated reading of shared extents at
scan time with less benefit from disk-device-level caching; however, the
only way to really fix this problem is to implement scanning on tree 2
(the btrfs extent tree) instead of the subvol trees.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
The thread name has an arbitrarily limited size, and we are eventually
removing support for multiple paths in a single bees daemon process.
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
We really do need some large buffers for BtrfsIoctlSearchKey in some
cases, but we don't need to zero them out first. Don't do that so we
save some CPU.
Reduce the default buffer size to 4K because most BISK users don't get
need much more than 1K. Set the buffer size explicitly to the product of
the number of items and the desired item size in the places that really
need a lot of items.
Linux kernel commit 7f8e406 ("btrfs: improve delayed refs iterations")
seems to dramatically improve LOGICAL_INO performance. Hopefully this
commit will find its way into mainline Linux soon.
This means that most of the time in Bees is now spent on block reading
(50-75%); however, there is still a big gap between block read and
the sum of everything else we are measuring with the "*_ms" counters.
This gap is about 30% of the run time, so it would be good to find out
what's in the gap.
Add ms counters around the crawl and open calls to capture where we are
spending all the time.