1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-18 13:55:44 +02:00

61 Commits

Author SHA1 Message Date
Zygo Blaxell
fbf6b395c8 types: member m_fd in BeesFileRange must be protected against data races
We had an unfortunate pattern of:

	const BeesFileRange bfr;
	shared_ptr<BeesContext> ctx;
	// ...
	BEESNOTE("foo " << bfr);
	bfr.fd(ctx);
	BEESNOTE("foo after opening: " << bfr);

If dump_status started running after the first BEESNOTE, but before
the second, then bfr.fd() might expose a single Fd object's shared_ptr
member to two threads at the same time (the thread running dump_status
and the thread running BEESNOTE) without protection by a lock.  One of
the threads would see a partially-initialized Fd object, and the other
thread would crash on an assertion failure, e.g.

	#0  __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50
	#1  0x00007f4c4fde5537 in __GI_abort () at abort.c:79
	#2  0x00007f4c4fde540f in __assert_fail_base (fmt=0x7f4c4ff4e128 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=0x5557605629dd "!m_destroyed", file=0x5557605627c0 "../include/crucible/namedptr.h", line=77, function=<optimized out>) at assert.c:92
	#3  0x00007f4c4fdf4662 in __GI___assert_fail (assertion=assertion@entry=0x5557605629dd "!m_destroyed", file=file@entry=0x5557605627c0 "../include/crucible/namedptr.h", line=line@entry=77,
	    function=function@entry=0x555760562970 "crucible::NamedPtr<Return, Arguments>::Value::~Value() [with Return = crucible::IOHandle; Arguments = {int}]") at assert.c:101
	#4  0x00005557605306f6 in crucible::NamedPtr<crucible::IOHandle, int>::Value::~Value (this=0x7f4a3c2ff0d0, __in_chrg=<optimized out>) at ../include/crucible/namedptr.h:77
	#5  0x00005557605137da in std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7f4a3c2ff0c0) at /usr/include/c++/10/bits/shared_ptr_base.h:151
	#6  std::_Sp_counted_base<(__gnu_cxx::_Lock_policy)2>::_M_release (this=0x7f4a3c2ff0c0) at /usr/include/c++/10/bits/shared_ptr_base.h:151
	#7  std::__shared_count<(__gnu_cxx::_Lock_policy)2>::~__shared_count (this=0x7f4c4c5b5f28, __in_chrg=<optimized out>) at /usr/include/c++/10/bits/shared_ptr_base.h:733
	#8  std::__shared_ptr<crucible::IOHandle, (__gnu_cxx::_Lock_policy)2>::~__shared_ptr (this=0x7f4c4c5b5f20, __in_chrg=<optimized out>) at /usr/include/c++/10/bits/shared_ptr_base.h:1183
	#9  std::shared_ptr<crucible::IOHandle>::~shared_ptr (this=0x7f4c4c5b5f20, __in_chrg=<optimized out>) at /usr/include/c++/10/bits/shared_ptr.h:121
	#10 crucible::Fd::~Fd (this=0x7f4c4c5b5f20, __in_chrg=<optimized out>) at ../include/crucible/fd.h:46
	#11 BeesFileRange::file_size (this=0x7f4c4e5ba4a0) at bees-types.cc:156
	#12 0x0000555760513950 in operator<< (os=..., bfr=...) at bees-types.cc:80
	#13 0x000055576050d662 in std::function<void (std::ostream&)>::operator()(std::ostream&) const (__args#0=..., this=0x7f4c4e5b9f60) at /usr/include/c++/10/bits/std_function.h:622
	#14 BeesNote::get_status[abi:cxx11]() () at bees-trace.cc:165
	#15 0x00005557604c9676 in BeesContext::dump_status (this=0x5557611c4de0) at bees-context.cc:89
	#16 0x00005557605206fb in std::function<void ()>::operator()() const (this=this@entry=0x7f4c4c5b65f0) at /usr/include/c++/10/bits/std_function.h:622
	#17 crucible::catch_all(std::function<void ()> const&, std::function<void (std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> >)> const&) (f=..., explainer=...) at error.cc:55
	#18 0x000055576050aaa7 in operator() (__closure=0x5557611c52c8) at bees-thread.cc:22
	#19 0x00007f4c501beed0 in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
	#20 0x00007f4c502c8ea7 in start_thread (arg=<optimized out>) at pthread_create.c:477
	#21 0x00007f4c4febddef in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Fix by making BeesFileRange::m_fd really const (not just mutable),
then fix all the broken code referencing it.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-12-19 15:10:02 -05:00
Zygo Blaxell
84094c7cb9 context: use consistent status for dedupe in log and thread note
Once the physical addresses are known, put them where they can be
seen in BEESTATUS as well as the log.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-12-19 15:10:02 -05:00
Zygo Blaxell
a83c68eb18 bees: style cleanups: const, size_t, symbolic names
No functional changes.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-12-19 15:10:02 -05:00
Zygo Blaxell
6d6686eb5b context: get rid of resolve (LOGICAL_INO) serializer
There are kernel bugs in LOGICAL_INO from time to time; however, we
can't avoid these bugs by serializing LOGICAL_INO calls.

It hasn't been used for some time, so remove the code and
less-than-completely-accurate comments.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-12-19 15:10:02 -05:00
Zygo Blaxell
c698fd7211 context: stop using deprecated memset_zero template
Use ordinary literal initialization instead.  The ioctl doesn't
need initialization of args at all.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-11-29 21:27:48 -05:00
Zygo Blaxell
ecf110f377 context: add a comment explaining why we are not adding bees_unreadahead
At the end of scanning one extent, in theory we do not need that extent
any more.  In practice, it hurts benchmark scores if we drop the extents
after reading them.

Add a comment to note this where we put the bees_unreadhead call.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-11-29 21:27:48 -05:00
Zygo Blaxell
7f7f919d08 context: fix the status message that will never be seen
BEESNOTE can only be seen if the status thread is running at the time,
making the log of activities during shutdown incomplete.

Wake up the status thread early during shutdown so the logged sequence
of shutdown actions is complete.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-11-29 21:27:48 -05:00
Zygo Blaxell
11fabd66a8 context: add experimental code for avoiding tiny extents
In the current architecture we can't directly measure the physical extent
size, and we can't make good decisions with the extent data (reference)
item alone.  If the early return is enabled here, there is a small speedup
and a large drop in dedupe hit rate, especially when extent splits occur.

Leave the early return commented for now, but collect the event statistics.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-11-29 21:27:48 -05:00
Zygo Blaxell
522e52618e context: calculate TOTAL RATES correctly
The denominator for TOTAL RATES is the total running time, not the delta
running time.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-08-30 18:23:42 -04:00
Zygo Blaxell
d9e3c0070b context: stop creating new refs when there are too many already
LOGICAL_INO_V2 has a maximum limit of 655050 references per extent.
Although it no longer has a crippling performance problem, at roughly
two seconds to process extent, it's too slow to be useful.

When an extent gains an absurd number of references, stop making any
more.  Returning zero extent refs will make bees believe the extent
was deleted, and it will remove the block from the hash table.

This helps speed processing of highly duplicated large files like
VM images, and the cost of a slightly lower dedupe hit rate.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 21:05:55 -04:00
Zygo Blaxell
cf4b5417c9 context: remove unnecessary copies
These were added while debugging a crash that was fixed years ago.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
77ef6a0638 roots: split constructor into separate start method
This allows us to use the fd cache and inode resolve functions
without starting crawler threads.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
0f0da21198 context: track record extent reference counts
This might be interesting information, though most of the motivation for
this evaporated when kernel 5.7 came out.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
8a70bca011 bees: misc comment updates
These have been accumulating in unpublished bees commits.  Squash them all
into one.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
20b8f8ae0b bees: use helper function for readahead
There seem to be multiple ways to do readahead in Linux, and only some
of them work.  Hopefully reading the actual data is one of them.

This is an attempt to avoid page-by-page reads in the generic dedupe code.
We load both extents into the VFS cache (read sequentially) and hope they
are still there by the time we call dedupe on them.

We also call readahead(2) and hopefully that either helps or does nothing.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:54 -04:00
Zygo Blaxell
0afd2850f4 cache: emit log messages when clearing FD cache
This enables us to correlate FD cache clears with external events such
as btrfs inode eviction storms.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:56:46 -04:00
Zygo Blaxell
4f032ab85b context: report Task instance count
Report the number of Task objects that currently exist as well as the number
on the global work queue.

	THREADS (work queue 298 of 2385 tasks, 16 workers):

This helps spot leaks, since Task objects that are blocked on other Task
post-exec queues are otherwise invisible.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:49:15 -04:00
Zygo Blaxell
0bbaddd54c docs: finally concede that the consensus spelling is "dedupe"
Change documentation and comments to use the word "dedupe," not "dedup"
as found in circa-3.15 kernel sources.

No changes in code or program output--if they used "dedup" before, they
will continue to be spelled "dedup" now.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:49:15 -04:00
Zygo Blaxell
80c69f1ce4 context: get rid of shared_ptr<BeesContext> in every single cached Fd object
Support for multiple BeesContext objects sharing a FdCache was wasting
significant space and atomic inc/dec memory cycles for no good reason
since the shared-FdCache feature was deprecated.

open_root and open_root_ino still need a BeesContext to work.  Pass the
BeesContext pointer through the function object instead of the cache
key arguments.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-04-28 21:54:00 -04:00
Zygo Blaxell
db65031c2b context: get rid of all instances of pthread_cancel
pthread_cancel doesn't really work properly.  It was only being used in
bees to bring threads to a stop if the BeesContext is destroyed early.
It is frequently implicated in core dump reports because of the fragility
of the C++ iostream / C stdio / library infrastructure, particularly
surrounding upgrades on the host running bees.  The pthread_cancel call
itself often simply fails even when it doesn't call terminate().

Defer creation of the status and progress threads until after the
BeesContext::start method is invoked.  At that point, the existing
ask-threads-nicely-to-stop code is up and running, and normal condvars
can be used to bring bees to a stop, without having to resort to
pthread_cancel.

Since we're deleting half of the BeesContext constructor in this change,
let's remove the other half too, and put an end to the deprecated support
for multiple BeesContexts sharing a process.  It's still possible to run
multiple BeesContexts, but they will not share a FD cache.  This will
allow the FD cache's keys to become smaller and hopefully save some
memory later on.

Fixes: #171

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-04-28 21:42:03 -04:00
Zygo Blaxell
bcf3e7de3e uuid: drop dependency on uuid.h
The weird things distros do to the path where uuid.h gets installed
have broken bees builds for the last time.

We were only using uuid to support a legacy feature that was removed
over four years ago.

Hypothetical users who are upgrading directly from bees v0.1 should
probably restart all the crawlers anyway--there were bugs.  Also, if any
such users exist, I respect their tremendous patience with the horrible
performance all these years--bees got about 30x faster since v0.1.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-04-23 08:16:50 -04:00
Zygo Blaxell
636e69267e resolve: add bees.h constants for balance and logical_ino serialization
Make these workarounds configurable in src/bees.h instead of #if 0
code blocks.  Someday we'll make the constants in bees.h configurable
through a file or similar.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2020-12-17 18:07:36 -05:00
Zygo Blaxell
5248985da0 context: fix shutdown log messages identifying the wrong thread
We are waiting for the status thread, not the progress thread.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2020-12-17 17:54:51 -05:00
Zygo Blaxell
6705cd9c26 context: move TempFile from TLS to Pool and fix some FdCache issues
Get rid of the thread-local TempFiles and use Pool instead.  This
eliminates a potential FD leak when the loadavg governor repeatedly
creates and destroys threads.

With the old per-thread TempFiles, we were guaranteed to have exclusive
ownership of the TempFile object within the current thread.  Pool is
somewhat stricter:  it only guarantees ownership while the checked-out
Handle exists.  Adjust the users of TempFile objects to ensure they hold
the Handle object until they are finished using the TempFile.

It appears that maintaining large, heavily-reflinked, long-lived temporary
files costs more than truncating after every use: btrfs has to write
multiple references to the temporary file's extents, then some commits
later, remove references as the temporary file is deleted or truncated.
Using the temporary file in a dedupe operation flushes the data to disk,
so nothing is saved by pretending that there is writeback pipelining and
trying to avoid flushes in truncate.  Pool provides usage tracking and
a checkin callback, so use it to truncate the temporary file immediately
after every use.

Redesign TempFile so that every instance creates exactly one Fd which
persists over the lifetime of the TempFile object.  Provide a reset()
method which resets the file back to the initial state and call it from
the Pool checkin callback.  This makes TempFile's lifetime equivalent to
its Fd's lifetime, which simplifies interactions with FdCache and Roots.

This change means we can now blacklist temporary files without having
an effective memory leak, so do that.  We also have a reason to ever
remove something from the blacklist, so add a method for that too.

In order to move to extent-centric addressing, we need to be able to
reliably open temporary files by root and inode number.  Previously we
would place TempFile fd's into the cache with insert_root_ino, but the
cache would be cleared periodically, and it would not be possible to
reopen temporary files after that happened.  Now that the TempFile's
lifetime is the same as the TempFile Fd's lifetime, we can have TempFile
manage a separate FileId -> Fd map in Roots which is unaffected by the
periodic cache clearing.  BeesRoots::open_root_ino_nocache will check
this map before attempting to open the file via btrfs root+ino lookup,
and return it through the cache as if Roots had opened the file via btrfs.

Hold a reference to BeesRoots in BeesTempFile because the usual way
to get such a reference now throws an exception in BeesTempFile's
destructor.

These changes make method BeesTempFile::create() and all methods named
insert_root_ino unnecessary, so delete them.

We construct and destroy TempFiles much less often now, so make their
constructor and destructor more informative.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2020-12-17 17:54:51 -05:00
Zygo Blaxell
b49d458792 context: move prealloc dedupe to a separate Task
A prealloc extent reference can be deduped immediately and asynchronously.
There is no need to slow down extent scanning to do it.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2020-12-17 17:54:51 -05:00
Zygo Blaxell
f263c8751e bees context: make it build with clang
Remove unused function getenv_or_die.  All of our environment variable
parameters are optional or have default values.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2020-12-17 17:54:51 -05:00
Zygo Blaxell
8e9b53b3fd stats: remove nonsense dedup_unique_bytes stat
A long time ago, when bees used dedicated threads to scan each subvol, the
calculation of the "dedup_unique_bytes" statistic was still wrong.

This stat can only be calculated when dedupe runs on extent data items
instead of extent reference items.  Remove the stat variable until
that happens.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2020-12-17 17:54:51 -05:00
Zygo Blaxell
c4f0e4abee context: workaround to prevent LOGICAL_INO and btrfs balance from running concurrently
This avoids some kernel bugs.  One of them is fixed in 5.3.4 and later:

	efad8a853a "Btrfs: fix use-after-free when using the tree modification log"

There are apparently others in current kernels, so for now just put bees
on pause until the balance is done.

At some point we may want to provide an option to disable this
workaround; however, running bees and balance at the same time makes
neither particularly fast, so maybe we'll just leave it this way.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2019-11-28 00:13:15 -05:00
Kai Krakow
44f446e17e
bees-context: Remove confusing log message
Saying just "This feature" at some log levels could be puzzling. Let's
remove this message, the feature works without problems for a year.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2019-11-06 09:04:52 +01:00
Zygo Blaxell
978c577412 status: report number of active worker threads in status output
This is especially useful when dynamic load management allocates more
worker threads than active tasks, so the extra threads are effectively
invisible.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2019-01-07 22:52:12 -05:00
Zygo Blaxell
570b3f7de0 bees: handle SIGTERM and SIGINT, force immediate flush and exit
Capture SIGINT and SIGTERM and shut down, preserving current completed
crawl and hash table state.

  * Executing tasks are completed, queued tasks are paused.
  * Crawl state is saved.
  * The crawl master and crawl writeback threads are terminated.
  * The task queue is flushed.
  * Dirty hash table extents are flushed.
  * Hash prefetch and writeback threads are terminated.
  * Hash table is deallocated.
  * FD caches and tmpfiles are destroyed.
  * Assuming the above didn't crash or deadlock, bees exits.

The above order isn't the fastest, but it does roughly follow the
shared_ptr dependencies and avoids data races--especially those that
might lead to bees reporting an extent scanned when it was only queued
for future scanning that did not occur.

In case of a violation of expected shared_ptr dependency order,
exceptions in BeesContext child object accessor methods (i.e. roots(),
hash_table(), etc) prevent any further progress in threads that somehow
remain unexpectedly active.

Move some threads from main into BeesContext so they can be stopped
via BeesContext.  The main thread now runs a loop waiting for signals.

A slow FD leak was discovered in TempFile handling.  This has not been
fixed yet, but an implementation detail of the C++ runtime library makes
the leak so slow it may never be important enough to fix.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-12-09 23:39:44 -05:00
Zygo Blaxell
bf2a014607 roots: improve "RO root 6094" message
This sequence of log messages isn't clear:

	crawl_master: WORKAROUND: Avoiding RO subvol 6094
	crawl_master: WORKAROUND: RO root 6094

The first is from a cache miss, and appears wherever a root is opened
(dedupe or crawl).  The second is skipping an entire subvol scan, and
only happens in crawl_master.

Elaborate on the second message a little.

Also use the term "root" consistently when referring to subvol tree IDs.
btrfs refers to these objects by (at least) three distinct names:  tree,
subvol, and root.  Using three different words for the same thing is worse
than using a single wrong word consistently to refer to the same concept.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-11-22 21:10:15 -05:00
Zygo Blaxell
23f3e4ec42 workarounds: add workaround for btrfs send
Introduce --workaround options which trade performance or effectiveness to
avoid triggering kernel bugs.

The first such option is --workaround-btrfs-send, which avoids making any
modification to read-only subvols to avoid btrfs send bugs.

Clean up usage message:  no tabs for formatting, split options into
sections by theme.

Make scan mode a non-static data member like all (most?) other options.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-11-21 21:49:16 -05:00
Zygo Blaxell
c2762740ef context: remove limit on the number of references to an extent
Better toxic extent detection means we can now handle extents with
many more references--easily hundreds of thousands.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-11-05 21:12:11 -05:00
Zygo Blaxell
542371684c context: better detection for toxic extents
We detect toxic extents by measuring how long the LOGICAL_INO ioctl takes
to run.  If it is above some threshold, we consider the extent toxic,
and blacklist it; otherwise, we process the extent normally.

The detector was using the execution time of the ioctl, which detects
toxic extents, but it also detects pauses of the bees process and
transaction commit latency due to load.  This leads to a significant
number of false positives.  The detection threshold was also very long,
burning a lot of kernel CPU before the detection was triggered.

Use the per-thread system CPU statistics to measure the kernel CPU usage
of the LOGICAL_INO call directly.  This is much more reliable because it
is not confounded by other threads, and it's faster because we can set
the time threshold two orders of magnitude lower.

Also remove the lock and mutex added in "context: serialize LOGICAL_INO
calls" because we theoretically no longer need it (but leave the code
there with #if 0 in case we do need it in practice).

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-10-31 21:12:16 -04:00
Zygo Blaxell
1a51bb53bf context: cache result of home_fd()
BeesContext::home_fd() is supposed to open $BEESHOME once and cache
the Fd for later calls; however, instead it was reopening a new Fd each
time it was called, and _also_ holding that Fd in a BeesContext member.
Fds clean themselves up when they are forgotten, so it was not leaking
per se, but it certainly had more open Fds than it needed to.

Check to see if we have m_home_fd open, and return that if so.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-10-30 21:12:16 -04:00
Zygo Blaxell
63ddbb9a4f context: serialize LOGICAL_INO calls
LOGICAL_INO can trip over the btrfs slow-backrefs bug, resulting in
some very long in-kernel runtimes.  If too many threads are executing
LOGICAL_INO then there may be no cores left on the system to run other
tasks.

Toxic extent detection is done by a very rudimentary algorithm which
can be confused by unrelated sources of latency within btrfs (especially
commit latency).  The algorithm can also be confused by other threads
executing the LOGICAL_INO ioctl.

These are two good reasons to prevent any two threads in a single bees
process instance from executing LOGICAL_INO at the same time, so let's
do that.

It is possible to limit the number of threads executing LOGICAL_INO with
the -c and -C options; however, this also limits the number of threads
which can perform any operation, while only LOGICAL_INO (*) has such a
profound effect on the rest of system operation.

Also make the status message clearer about exactly when LOGICAL_INO is
executed, as opposed to merely waiting to acquire a lock before executing
the ioctl.

(*) or maybe FILE_EXTENT_SAME.  The problem function that keeps showing
up in kernel stack traces is find_parent_nodes, which is called by both
the LOGICAL_INO and FILE_EXTENT_SAME ioctls.  We'll try this change
first and see if it prevents any recurrences of forced watchdog reboots;
if it does not, then we'll limit FILE_EXTENT_SAME the same way.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-10-30 21:12:16 -04:00
Zygo Blaxell
96eb100ded bees: use readahead instead of posix_fadvise
Other btrfs utils use readahead() not posix_fadvise().

There does not appear to be a performance or correctness difference
between the three (none, posix_fadvise, or readahead()).

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-09-14 23:50:00 -04:00
Zygo Blaxell
b22db12390 context: log dedups with single unbroken log message
When BEESLOGINFO is called multiple times it generates separate log
records that can be mixed up when multiple threads dedup.

Use a single BEESLOGINFO call for each dedup to prevent this.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-09-14 23:50:00 -04:00
Zygo Blaxell
33d274eabd resolve: break up long intra-extent dedup loops
When both block candidates for dedup are located in the same extent, bees
excludes them from deduplication because the dedup operation would not
free any space (both blocks are still referenced, so neither is deleted).
Candidates in other extents are still considered.

Typically a few blocks are duplicated many thousands or even millions
of times within a filesystem.  Many of these blocks appear in the same
extent as each other.  In cases where an extent contains an extremely
common duplicate block, it may appear multiple times in many extents.
bees can get into a loop with a very bad worst-case running time:  32768
blocks per extent * 2560 bees reference limit * 256 distinct hash table
entries = 21.5 *billion* iterations...squared, because this loop happens
every time bees encounteres any of the references.  Not an infinite
number, but close enough.

In each iteration of the loop, replace_dst detects that both src and dst
block are part of the same btrfs extent data item and therefore should
not be deduped; however, this occurs after the block has been allocated
and read by chase_extent_ref.  This dst is discarded, but the outer
loop tries again with another reference to the same block and gets the
same result.

An easy fix for this problem is to stop the loop immediately when the
same physical extent is found in both src and dst.  The condition is rare
enough to ignore the negligible space efficiency loss, and filesystem
scan stops dead if the loop is allowed to proceed.  An exception is
thrown to terminate the loop at scan_one_extent from within replace_dst.

It would be better to determine the extent bytenr of each candidate
extent and filter them out in scan_one_extent (which reduces the number
of LOGICAL_INO calls as a side-effect), but bees has no code capable of
doing extent data tree lookups with backward iteration yet.  Even better
would be to change the hash table format so that the extent bytenr can
be decoded directly from the hash table entry (this already exists for
compressed extents).  Both of these changes are too large for v0.6.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-02-25 10:08:42 -05:00
Zygo Blaxell
d367c6364c context: improve toxic match logs
Reword log message for discovery of new toxic extents vs. lookup of
previously known toxic extents.  Also add the block data (especially
filename) to the discovery message.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-29 00:48:06 -05:00
Zygo Blaxell
e74c0a9d80 scan: fix length mismatch exception for prealloc extents at EOF
Prealloc extent sizes were taken from the Extent object and did not
take the file size into account.  If a file with a non-4K-aligned
size is preallocated, the resulting dedup fails with an exception
because the size of both ranges of the BeesRangePair do not match.

Limit the size of the replacement hole extent to not extend past the
end of the file.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-28 01:46:08 -05:00
Zygo Blaxell
ded26ff044 FdCache: clear cache on every new transid / crawl cycle
The periodic cache age check was not protected by a lock, so multiple
threads may decide to concurrently clear the cache.  This led to
duplicate log messages.

Fix by moving the cache expiry trigger out of FdCache and into Roots,
which knows when transids change and can perform cache clears at exactly
the time they are most relevant, i.e. after something that was deleted
becomes permanently so.

This removes the last references to BEES_COMMIT_INTERVAL, so get rid
of its definition too.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
f6909dac17 bees: drop BEESINFO
Having too many "write a message to the log" primitives is confusing,
and having one that intermittently and silently discards output is even
_more_ confusing.

Replace all BEESINFO with appropriate BEESLOG*s.  Usually DEBUG.
Except for one or two that occur too often.  Just delete those.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Kai Krakow
677da5de45 Logging: Add log levels to output
This commit adds log levels to the output. In systemd, it makes colored
lines, otherwise it's probably just a number. Bees is very chatty, so
this paves the road for log level filtering.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 23:41:29 +01:00
Zygo Blaxell
055c8d4c75 roots: scan in parallel using Tasks
Distribute incoming extents across a thread pool for faster execution
on multi-core, multi-disk environments.

Switch extent enumeration model to scan extent refs consecutively(ish).

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:52:00 -05:00
Zygo Blaxell
a175ee0689 bees: clean up #if 0 ... fsync ... #endif code
Remove some dead code because dedup-related deadlocks have not been
observed since Linux kernel v4.11.

Preserve rationale of remaining #if 0 block (why we do write/rename
instead of write/fsync/rename) so that people don't try to replace the
"missing" fsync() there.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:30:07 -05:00
Zygo Blaxell
8d3a27bf85 subvol-threads: increase resource and thread limits
With kernel 4.14 there is no sign of the previous LOGICAL_INO performance
problems, so there seems to be no need to throttle threads using this
ioctl.

Increase the FD cache size limits and scan thread count.  Let the kernel
figure out scheduling.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:30:07 -05:00
Zygo Blaxell
77614a0e99 scan: insert toxic matched extents into hash table as they are discovered
When a toxic extent is discovered, insert the offending hash/address/toxic
entry into the hash table.

When a previously discovered toxic extent is encountered, do nothing,
i.e. allow the offending hash/address/toxic entry in the hash table
to expire.

Previously both inserts were removed from the code, but the former one
is required.  The latter prevents bees from forgiving toxic extents
(or any hash matching one) should they be relocated, deleted, or simply
become non-toxic.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-12-21 13:56:15 -05:00
Zygo Blaxell
732896b471 log: simplify output for dedup and scan
With many threads it is inconvenient to reassemble the elided parts of
the dedup src/dst and scan filenames output.  Simply output them
unconditionally, and balance the line lengths.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-09-16 17:30:30 -04:00