GGLinnk/bees - bees - Virtual World Git

mirror of https://github.com/Zygo/bees.git synced 2025-07-12 05:12:25 +02:00

Author	SHA1	Message	Date
Zygo Blaxell	b2d4a07c6f	roots: add a TRACE for transid_max search and crawl_transid thread Some users are hitting an exception somewhere in crawl_transid, which forces bees to return back to the transid_max calculation over and over. Also out-of-range transids. Add some BEESTRACE so we can see what we were doing in the exception handler. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:56:54 -04:00
Zygo Blaxell	7008c74113	bees: trace and log improvements during roots and context startup Currently if crawl throws an exception, we don't have basic information about what was being crawled or even if the crawler was running at all. These traces also help identify the causes of early exception failures. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:56:54 -04:00
Zygo Blaxell	77ef6a0638	roots: split constructor into separate start method This allows us to use the fd cache and inode resolve functions without starting crawler threads. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:56:54 -04:00
Zygo Blaxell	8a70bca011	bees: misc comment updates These have been accumulating in unpublished bees commits. Squash them all into one. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:56:54 -04:00
Zygo Blaxell	ffac407a9b	roots: clean up crawl_master Remove some broken #if 0 code, and take advantage of new Task non-repeating execution semantics. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:49:15 -04:00
Zygo Blaxell	0bbaddd54c	docs: finally concede that the consensus spelling is "dedupe" Change documentation and comments to use the word "dedupe," not "dedup" as found in circa-3.15 kernel sources. No changes in code or program output--if they used "dedup" before, they will continue to be spelled "dedup" now. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:49:15 -04:00
Zygo Blaxell	80c69f1ce4	context: get rid of shared_ptr<BeesContext> in every single cached Fd object Support for multiple BeesContext objects sharing a FdCache was wasting significant space and atomic inc/dec memory cycles for no good reason since the shared-FdCache feature was deprecated. open_root and open_root_ino still need a BeesContext to work. Pass the BeesContext pointer through the function object instead of the cache key arguments. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-04-28 21:54:00 -04:00
Zygo Blaxell	bcf3e7de3e	uuid: drop dependency on uuid.h The weird things distros do to the path where uuid.h gets installed have broken bees builds for the last time. We were only using uuid to support a legacy feature that was removed over four years ago. Hypothetical users who are upgrading directly from bees v0.1 should probably restart all the crawlers anyway--there were bugs. Also, if any such users exist, I respect their tremendous patience with the horrible performance all these years--bees got about 30x faster since v0.1. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-04-23 08:16:50 -04:00
Zygo Blaxell	7f660f50b8	lib: fs: stop using libbtrfs-dev helper functions to re-enable buffer length checks The Linux kernel's btrfs headers are better than the libbtrfs-dev headers: - the libbtrfs-dev headers have C++ language compatibility issues - upstream version in Linux kernel is more accurate and up to date - macros in libbtrfs-dev's ctree.h hide information that would enable bees to perform runtime buffer length checking - enum types whose presence cannot be detected with #ifdef When accessing members of metadata items from the filesystem, we want to verify that the member we are accessing is within the boundaries of the item that was retrieved; otherwise, a memory access violation may occur or garbage may be returned to the caller. A simple C++ template, given a pointer to a structure member and a buffer, can determine that the buffer contains enough bytes to safely access a struct member. This was implemented back in 2016, but left unused due to ctree.h issues. Some btrfs metadata structures have variable length despite using a fixed-size in-memory structure. The members that appear earliest in the structure contain information about which following members of the structure are used. The item stored in the filesystem is truncated after the last used member, and all following members must not be accessed. 'btrfs_stack_*' accessor macros obscure the memory boundaries of the members they access, which makes it impossible for a C++ template to verify the memory access. If the template checks the length of the entire structure, it will find an access violation for variable-length metadata items because the item is rarely large enough for the entire structure. Get rid of all the libbtrfs-dev accessor macros and reimplement them with the necessary buffer length checks. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-02-22 20:06:43 -05:00
Zygo Blaxell	420c218c83	cache: remove unused #includes Also fix bees-roots's missing headers. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-12-17 17:54:52 -05:00
Zygo Blaxell	6705cd9c26	context: move TempFile from TLS to Pool and fix some FdCache issues Get rid of the thread-local TempFiles and use Pool instead. This eliminates a potential FD leak when the loadavg governor repeatedly creates and destroys threads. With the old per-thread TempFiles, we were guaranteed to have exclusive ownership of the TempFile object within the current thread. Pool is somewhat stricter: it only guarantees ownership while the checked-out Handle exists. Adjust the users of TempFile objects to ensure they hold the Handle object until they are finished using the TempFile. It appears that maintaining large, heavily-reflinked, long-lived temporary files costs more than truncating after every use: btrfs has to write multiple references to the temporary file's extents, then some commits later, remove references as the temporary file is deleted or truncated. Using the temporary file in a dedupe operation flushes the data to disk, so nothing is saved by pretending that there is writeback pipelining and trying to avoid flushes in truncate. Pool provides usage tracking and a checkin callback, so use it to truncate the temporary file immediately after every use. Redesign TempFile so that every instance creates exactly one Fd which persists over the lifetime of the TempFile object. Provide a reset() method which resets the file back to the initial state and call it from the Pool checkin callback. This makes TempFile's lifetime equivalent to its Fd's lifetime, which simplifies interactions with FdCache and Roots. This change means we can now blacklist temporary files without having an effective memory leak, so do that. We also have a reason to ever remove something from the blacklist, so add a method for that too. In order to move to extent-centric addressing, we need to be able to reliably open temporary files by root and inode number. Previously we would place TempFile fd's into the cache with insert_root_ino, but the cache would be cleared periodically, and it would not be possible to reopen temporary files after that happened. Now that the TempFile's lifetime is the same as the TempFile Fd's lifetime, we can have TempFile manage a separate FileId -> Fd map in Roots which is unaffected by the periodic cache clearing. BeesRoots::open_root_ino_nocache will check this map before attempting to open the file via btrfs root+ino lookup, and return it through the cache as if Roots had opened the file via btrfs. Hold a reference to BeesRoots in BeesTempFile because the usual way to get such a reference now throws an exception in BeesTempFile's destructor. These changes make method BeesTempFile::create() and all methods named insert_root_ino unnecessary, so delete them. We construct and destroy TempFiles much less often now, so make their constructor and destructor more informative. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-12-17 17:54:51 -05:00
Zygo Blaxell	de6282c6cd	roots: separate crawl sizes into bytes and items Number of items should be low enough that we don't have too many stale items, but high enough to amortize system call overhead to a reasonable ratio. Number of bytes should be constant: one worst-case metadata page (the btrfs limit is 64K, though 16K is much more common) so that we always have enough space for one worst-case item; otherwise, we get EOVERFLOW if we set the number of items too low and there's a big item in the tree, and we can't make further progress. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-12-17 17:54:51 -05:00
Zygo Blaxell	d332616eff	roots: report the search parameters on tree search ioctl error There are lots of ways the search can fail, but it's hard to pick one without knowing the parameters. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-12-17 17:54:51 -05:00
Zygo Blaxell	bbaf55b2b0	roots: make it build with clang Remove an unnecessary cast that was breaking namespace lookup for clang. Closes: #159 Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-12-17 17:54:51 -05:00
Zygo Blaxell	570b3f7de0	bees: handle SIGTERM and SIGINT, force immediate flush and exit Capture SIGINT and SIGTERM and shut down, preserving current completed crawl and hash table state. * Executing tasks are completed, queued tasks are paused. * Crawl state is saved. * The crawl master and crawl writeback threads are terminated. * The task queue is flushed. * Dirty hash table extents are flushed. * Hash prefetch and writeback threads are terminated. * Hash table is deallocated. * FD caches and tmpfiles are destroyed. * Assuming the above didn't crash or deadlock, bees exits. The above order isn't the fastest, but it does roughly follow the shared_ptr dependencies and avoids data races--especially those that might lead to bees reporting an extent scanned when it was only queued for future scanning that did not occur. In case of a violation of expected shared_ptr dependency order, exceptions in BeesContext child object accessor methods (i.e. roots(), hash_table(), etc) prevent any further progress in threads that somehow remain unexpectedly active. Move some threads from main into BeesContext so they can be stopped via BeesContext. The main thread now runs a loop waiting for signals. A slow FD leak was discovered in TempFile handling. This has not been fixed yet, but an implementation detail of the C++ runtime library makes the leak so slow it may never be important enough to fix. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-12-09 23:39:44 -05:00
Zygo Blaxell	f4464c6896	roots: quick fix for task scheduling bug leading to loss of crawl_master The crawl_master task had a simple atomic variable that was supposed to prevent duplicate crawl_master tasks from ending up in the queue; however, this had a race condition that could lead to m_task_running being set with no crawl_master task running to clear it. This would in turn prevent crawl_thread from scheduling any further crawl_master tasks, and bees would eventually stop doing any more work. A proper fix is to modify the Task class and its friends such that Task::run() guarantees that 1) at most one instance of a Task is ever scheduled or running at any time, and 2) if a Task is scheduled while an instance of the Task is running, the scheduling is deferred until after the current instance completes. This is part of a fairly large planned change set, but it's not ready to push now. So instead, unconditionally push a new crawl_master Task into the queue on every poll, then silently and quickly exit if the queue is too full or the supply of new extents is empty. Drop the scheduling-related members of BeesRoots as they will not be needed when the proper fix lands. Fixes: `4f0bc78a` "crawl: don't block a Task waiting for new transids" Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-11-25 23:46:55 -05:00
Zygo Blaxell	bf2a014607	roots: improve "RO root 6094" message This sequence of log messages isn't clear: crawl_master: WORKAROUND: Avoiding RO subvol 6094 crawl_master: WORKAROUND: RO root 6094 The first is from a cache miss, and appears wherever a root is opened (dedupe or crawl). The second is skipping an entire subvol scan, and only happens in crawl_master. Elaborate on the second message a little. Also use the term "root" consistently when referring to subvol tree IDs. btrfs refers to these objects by (at least) three distinct names: tree, subvol, and root. Using three different words for the same thing is worse than using a single wrong word consistently to refer to the same concept. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-11-22 21:10:15 -05:00
Zygo Blaxell	23f3e4ec42	workarounds: add workaround for btrfs send Introduce --workaround options which trade performance or effectiveness to avoid triggering kernel bugs. The first such option is --workaround-btrfs-send, which avoids making any modification to read-only subvols to avoid btrfs send bugs. Clean up usage message: no tabs for formatting, split options into sections by theme. Make scan mode a non-static data member like all (most?) other options. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-11-21 21:49:16 -05:00
Zygo Blaxell	9a97699dd9	roots: reimplement transid_max_nocache using extent tree root ROOT_TREE contains the ROOT_ITEM for EXTENT_TREE. Every modification (that we care about) to a btrfs must go through EXTENT_TREE, and must modify the page in ROOT_TREE pointing to the root of EXTENT_TREE... which makes that a very good source for the filesystem transid. Remove the loop and the root lookups, and just look at one item for max_transid. Also note that every caller of transid_max_nocache() immediately feeds the return value to m_transid_re.update(), so don't do that inside transid_max_nocache(). Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-31 00:09:49 -04:00
Zygo Blaxell	0e8b591232	Revert "roots: simplify BeesRoots::transid_max_nocache" It turns out that we do need to scan all the subvols in order to find transid_max. Keep the bug fix though. This reverts commit `bf6ae80eee`. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-30 23:29:05 -04:00
Zygo Blaxell	bf6ae80eee	roots: simplify BeesRoots::transid_max_nocache BeesRoots::transid_max_nocache calls btrfs_get_root_transid() which retrieves the transid of the root of the given Fd. Since the FS_TREE (subvol 5) is the root of the subvol hierarchy, it will always have the highest transid on the filesystem, and we do not need to look at any others. Also fix a bug where we pass BTRFS_FS_TREE_OBJECTID instead of the file descriptor root_fd() to btrfs_get_root_transid(). If BEESHOME is somewhere on the same btrfs filesystem, and there are no leaked FDs at bees startup, then BTRFS_FS_TREE_OBJECTID (5) usually has the same integer value as a valid file descriptor of some object on the filesystem that has a regularly increasing transid value. If Fd 5 happens to be a file in BEESHOME then bees itself drives the transid increments. This, combined with the search of all subvol roots, hides the bug (unless Fd 5 gets closed somehow). Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-30 21:12:17 -04:00
Zygo Blaxell	373b9ef038	roots: fix subvol scan rollover on subvols with empty transid range The ordering function for BeesCrawlState did not consider root 292 inode 0 min_transid 2345 max_transid 3456 to be larger than root 292 inode 258 min_transid 2345 max_transid 2345 so when we attempted to update the end pointer for the crawl progress, the new state was not considered newer than the old state because the min_transid was equal, but the new crawl state's inode number was smaller. Normally this is not a problem because subvol scans typically begin and end in separate transactions (in part because we don't start a subvol scan until at least two transactions are available); however, the cleanup code for the aftermath of the recent transid_min() bug can create crawlers with equal max_transid and min_transid records. Fix this by ordering both transid fields before any others in the crawl state. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-30 21:12:14 -04:00
Zygo Blaxell	866a35c7fb	roots: do not accept 18446744073709551615 as max_transid in beescrawl.dat Due to an earlier bug some beescrawl.dat files will contain uint64_t max as max_transid. This prevents any further scanning on the subvol because there is no possibiity of having a real transid (or any other uint64_t number) larger than uint64_t max. If we detect a bad transid in beescrawl.dat, log a warning, then use some more plausible value: either min_transid to repeat the previous incremental crawl, or 0 to restart the subvol scan from the beginning. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-30 21:12:14 -04:00
Zygo Blaxell	90132182fd	roots: do not allow transid_min to be numeric_limits<uint64_t>::max() On a few test machines max_transid on subvols is getting set to 18446744073709551615 (aka uint64_t max). Prevent transid_min() from ever returning this value. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-30 21:12:14 -04:00
Zygo Blaxell	3d536ea6df	roots: if queue is full run again The task queue may already be full of tasks when the crawl task is executed. In this case simply reschedule the crawl task at the end of the current queue. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-09-14 23:50:06 -04:00
Zygo Blaxell	e66086516f	bees: dynamic thread pool size based on system load average Add -g / --loadavg-target parameter to track system load and add or remove bees worker threads dynamically to keep system load close to the loadavg target. Thread count may vary from zero to the maximum specified by -c or -C, and is adjusted every 5 seconds. This is better than implementing a similar load average scheme from outside of the process (though that is still possible) because the in-process load tracker does not disrupt the performance timing feedback mechanisms as a freezer cgroup or SIGSTOP would when controlling bees from outside. The internal load average tracker can also adjust the number of active threads while an external tracker can only choose from the maximum or zero. Also fix a bug where a Task could deadlock waiting for itself to exit if it tries to insert a new Task after the number of worker threads has been set to zero. Also correct usage message for --scan-mode (values are 0..2) since we are touching adjacent lines anyway. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-09-14 23:50:03 -04:00
Zygo Blaxell	8bc4bee8a3	crucible: progress: drop the set() method set() was broken and redundant. Calling hold() and discarding the returned object has the correct effect. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-09-14 23:49:54 -04:00
Zygo Blaxell	c3effe0a20	crawl: use custom order instead of (ab)using BeesFileRange::operator< This makes the code clearer and keeps changes to BeesFileRange ordering isolated. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-05-18 00:16:08 -04:00
Zygo Blaxell	5bdad7fc93	crucible: progress: a progress tracker for worker queues The task queue can become very large with many subvols, requiring hours for the queue to clear. 'beescrawl.dat' saves in the meantime will save the work currently scheduled, not the work currently completed. Fix by tracking progress with ProgressTracker. ProgressTracker::begin() gives the last completed crawl position. ProgressTracker::end() gives the last scheduled crawl position. begin() does not advance if there is any item between begin() and end() is not yet completed. In between are crawled extents that are on the task queue but not yet processed. The file 'beescrawl.dat' saves the begin() position while the extent scanning task queue is fed from the end() position. Also remove an unused method crawl_state_get() and repurpose the operator<(BeesCrawlState) that nobody was using. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-02-28 23:49:39 -05:00
Zygo Blaxell	8f0e88433e	roots: get rid of common error messages, add more error counters One very common case is losing a race to open a file that was deleted. No need to spam the logs with mere ENOENT reports. Other errors are more significant. Log those with errno, and add event counters to record them. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-02-07 23:12:01 -05:00
Zygo Blaxell	6aad124241	crawl: somebody should set max_transid The previous commit had both max_transid assigments commented out. It happens to work because we set max_transid in the constructor and it doesn't change after that, but it's cleaner to assign it explicitly. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-31 22:52:12 -05:00
Zygo Blaxell	087ec26c44	crawl: filter extents correctly When an extent ref is modified, all of the refs in the same metadata page get the same transid in the TREE_SEARCH_V2 header. This causes two problems: - Extents with generation < min_transid are included if they happen to be referenced by pages with generation >= min_transid. - Extent refs with generation > max_transid are excluded even if they reference extents with generation <= max_transid. Both of these are wrong: the first causes some extents to be repeatedly scanned, the second causes some extents to not be scanned at all. Change the TREE_SEARCH_V2 parameters so that Crawl sees all extents newer than min_transid (i.e. set max_transid to max). The TREE_SEARCH_V2 kernel logic already operates this way, i.e. it fetches every page with transid >= min_transid and discards newer items if they are too new for max_transid. Filter strictly by the extent reference generation field (i.e. the copy of the extent generation that is in the extent reference). Note this still scans extent data multiple times, but it should now be exactly once per extent reference. A proper fix for this requires extent-based scanning instead of extent-ref-based scanning. Formerly commit `5a8c655fc4` "roots: filter out obsolete extents from extent refs" which landed in the subvol-threads branch but not master. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-31 22:48:39 -05:00
Zygo Blaxell	af250f7732	roots: determine transid_max without open()ing every subvol root Scan the roots tree directly for roots other than 5 (the FS root), and use btrfs_get_root_transid on root_fd for root 5. This avoids filling up the root FD cache every time we want a new transid_max. Now the only reason we open a subvol root FD is to open a file within the subvol. transid_max may be the same as the FS root's transid, in which case the search loop is not necessary. Place a counter (transid_max_miss) to see if we ever need to look at root items. If this counter never goes above zero, or does so very rarely, we can delete the search loop. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-29 21:37:39 -05:00
Zygo Blaxell	4f0bc78a4c	crawl: don't block a Task waiting for new transids Task should not block for extended periods of time. Remove the RateEstimator::wait_for() in crawl_roots. When crawl_roots runs out of data, let the last crawl_task end without rescheduling. Schedule crawl_task again on transid polls if it was not already running. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-29 21:37:39 -05:00
Zygo Blaxell	636328fdc2	roots: add scan-mode 2 "oldest crawler first" Add a third scan mode with alternative trade-offs. Benefits: Good sequential read performance. Avoids race conditions described in https://github.com/Zygo/bees/issues/27. Avoids diverting scan resources into short-lived snapshots before their long-lived origin subvols are fully scanned. Drawbacks: Takes the longest time of the three implemented scan-modes to free space in extents that are shared between snapshots. Uses the maximum amount of temporary space. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-29 00:48:05 -05:00
Zygo Blaxell	ef44947145	roots: move common code for creating crawl Tasks into a method Duplicated code between the different scan modes has slowly been becoming less and less trivial. Move the code to a method and make both scan-modes call it. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-28 22:52:17 -05:00
Zygo Blaxell	762f833ab0	roots: poll every 10 transids Restartng scans for each transid is a bit aggressive. Scan every 10 transids for a polling rate close to the former BEES_COMMIT_INTERVAL. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-26 23:48:05 -05:00
Zygo Blaxell	48e78bbe82	roots: use RateEstimator as a transid_max cache and clean up logs transid_max is now measured at a single point in the crawl_transid thread. Move the Crawl deferred logic into BeesRoots so it restarts all crawls when transid_max increases. Gets rid of some messy time arithmetic. Change name of Crawl thread to "crawl_master" in both thread name and log messages. Replace "Next transid" with "Crawl started". Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-26 23:48:05 -05:00
Zygo Blaxell	ded26ff044	FdCache: clear cache on every new transid / crawl cycle The periodic cache age check was not protected by a lock, so multiple threads may decide to concurrently clear the cache. This led to duplicate log messages. Fix by moving the cache expiry trigger out of FdCache and into Roots, which knows when transids change and can perform cache clears at exactly the time they are most relevant, i.e. after something that was deleted becomes permanently so. This removes the last references to BEES_COMMIT_INTERVAL, so get rid of its definition too. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-26 23:48:05 -05:00
Zygo Blaxell	72857e84c0	crawl: combine two messages per crawl cycle into one Now that the polling interval is up to 30 times faster, next_transid seems too verbose again. Make it clearer that the interval quoted in the "Deferring..." message is the computed transaction polling interval. Combine "Next transid" and "Restarted crawl" into a single message. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-26 23:48:05 -05:00
Zygo Blaxell	0fdae37962	roots: use RateEstimator to track transids Make the crawl polling interval more closely track the commit interval on the btrfs filesystem. In the future this will provide opportunities to do things like clear FD caches and stop crawls on deleted subvols, but triggered by transaction commits instead of arbitrary time intervals. Rename the "crawl" thread so it no longer has the same name as the "crawl" task, and repurpose it for dedicated transid polling. Cancel the deletion of crawl_thread and repurpose it to trigger new crawls and wake up the main crawl Task when it runs out of data. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-26 23:48:05 -05:00
Zygo Blaxell	a3f02d5dec	roots: comment updates and general cleanup Fix discussion of nodatasum files, clarifying what we can and cannot do. Get rid of some BEESNOTE and BEESTRACE calls which cannot be observed (well, BEESNOTE can, but you have to be quick!). Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-26 23:48:05 -05:00
Zygo Blaxell	f6909dac17	bees: drop BEESINFO Having too many "write a message to the log" primitives is confusing, and having one that intermittently and silently discards output is even _more_ confusing. Replace all BEESINFO with appropriate BEESLOG*s. Usually DEBUG. Except for one or two that occur too often. Just delete those. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-26 23:48:05 -05:00
Zygo Blaxell	f64fc78e36	Task: convert print_fn to a string Since we are now unconditionally rendering the print_fn as a static string, there is no need for it to be a function. We also need it to be brief and mostly constant. Use a string instead. Put the string before the function in the Task constructor arguments so that the title string appears as a heading in code, since we are making a breaking API change already. Drop TASK_MACRO as it is broken by this change, but there is no similar usage of Task anywhere to make it worth fixing. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-26 23:48:04 -05:00
Zygo Blaxell	4c05c53d28	roots: update Task print functions for new usage This restores the old "crawl" prefix in the case of Crawler log messages. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-20 14:00:52 -05:00
Zygo Blaxell	e970ac6c02	crawl: make logging less verbose Silence the three(!) log messages per crawl increment an extra one at the end of the subvol. The three critical messages per subvol crawl cycle are: Next transid in BeesCrawlState <SUBVOL>:0 offset 0x0 transid <A>..<B> started <T> (<AGO>s ago) Subvol has been completely scanned and a new transaction range will be created. CrawlState is the state of the old subvol. Restarted crawl BeesCrawlState <SUBVOL>:0 offset 0x0 transid <B>..<C> started <T+AGO> (0s ago) Subvol has been restarted. CRawlState is the state of the new subvol. Deferring next transid in BeesCrawlState <SUBVOL>:0 offset 0x0 transid <B>..<C> started <T+AGO> (0s ago) Subvol has been completely scanned, but it is too soon to start a new scan. Fix the "Restart..." message to use the correct verb tense and to use the correct BeesCrawlState data. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-20 13:50:47 -05:00
Kai Krakow	677da5de45	Logging: Add log levels to output This commit adds log levels to the output. In systemd, it makes colored lines, otherwise it's probably just a number. Bees is very chatty, so this paves the road for log level filtering. Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-01-18 23:41:29 +01:00
Zygo Blaxell	56c23c4517	crawl: implement two crawler algorithms and adjust scheduling parameters There are two subvol scan algorithms implemented so far. The two modes are unimaginatively named 0 and 1. 0: sorts extents by (inode, subvol, offset), 1: scans extents round-robin from all subvols. Algorithm 0 scans references to the same extent at close to the same time, which is good for performance; however, whenever a snapshot is created, the scan of the entire filesystem restarts at the beginning of the new snapshot. Algorithm 1 makes continuous forward progress even when new snapshots are created, but it does not benefit from caching and will force the kernel to reread data multiple times when there are snapshots. The algorithm can be selected at run-time using the -m or --scan-mode option. We can collect some field data on these before replacing them with an extent-tree-based scanner. Alternatively, for pre-4.14 kernels, we can keep these two modes as non-default options. Currently these algorithms have terrible names. TODO: fix that, but also TODO: delete all that code and do scans directly from the extent tree instead. Augment the scan algorithms relative to their earlier implementation by batching multiple extents to scan from each subvol before switching to a different subvol. Sprinkle some BEESNOTEs on the Task objects so that they don't disappear from the thread status output. Adjust some timing constants to deal with the increased latency from competing threads. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-17 22:53:49 -05:00
Zygo Blaxell	055c8d4c75	roots: scan in parallel using Tasks Distribute incoming extents across a thread pool for faster execution on multi-core, multi-disk environments. Switch extent enumeration model to scan extent refs consecutively(ish). Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-17 22:52:00 -05:00
Zygo Blaxell	796aaed7f8	roots: remove dead code and #if blocks In both instances the code contained within (or the conditional compilation surrounding it) is no longer controversial. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-17 22:52:00 -05:00

1 2

63 Commits