GGLinnk/bees - bees - Virtual World Git

mirror of https://github.com/Zygo/bees.git synced 2025-07-14 22:12:26 +02:00

Author	SHA1	Message	Date
Zygo Blaxell	48b7fbda9c	progress: adjust minimum thresholds for ETA to 10 seconds and 1 GiB of data 1% is a lot of data on a petabyte filesystem, and a long time to wait for an ETA. After 1 GiB we should have some idea of how fast we're reading the data. Increase the time to 10 seconds to avoid a nonsense result just after a scan starts. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-02-06 22:42:15 -05:00
Zygo Blaxell	874832dc58	openat2: log a warning when we fall back to openat This should occur only once per run, but it's worth leaving a note that it has happened. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-19 22:19:42 -05:00
Zygo Blaxell	5fe89d85c3	extent scan: make sure we run every extent crawler once per transaction There's a pathological case where all of the extent scan crawlers except one are at the end of a crawl cycle, but the one crawler that is still running is keeping the Task queue full. The result is that bees never starts the other extent scan crawlers, because the queue is always full at the instant a new transid triggers the start of a new scan. That's bad because it will result in bees falling behind when new data from the inactive size tiers appears. To fix this, check for throttling _after_ creating at least one scan task in each crawler. That will keep the crawlers running, and possibly allow them to claw back some space in the Task queue. It slightly overcommits the Task queue, so there will be a few more Tasks than nominally allowed. Also (re)introduce some hysteresis in the queue size limit and reduce it a little, so that bees isn't continually stopping and restarting crawls every time one task is created or completed, and so that we stay under the configured Task limit despite overcommitting. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-19 22:19:42 -05:00
Zygo Blaxell	a2b3e1e0c2	log: demote a lot of BEESLOGWARN to higher verbosity levels Toxic extent workarounds are going away because the underlying kernel bugs have been fixed. They are no longer worthy of spamming non-developer logs. INO_PATHS can return no paths if an inode has been deleted. It doesn't need a log message at all, much less one at WARN level. Dedupe failure can be INFO, the same level as dedupe itself, especially since the "NO dedupe" message doesn't mention what was [not] deduped. Inspired by Kai Krakow's "context: demote "abandoned toxic match" to debug log level". Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-19 01:08:28 -05:00
Zygo Blaxell	d4a681c8a2	Revert "roots: use a non-idle task for next_transid" next_transid tasks don't respect queue selection very well, because they effectively end up spinning in a loop until all other worker threads become busy. Back this out, and fix the priority handling in the Task library. This reverts commit `58db4071de`.	2025-01-12 18:48:33 -05:00
Zygo Blaxell	b8dd9a2db0	progress: put a timestamp in the bottom row This records the time when the progress data was calculated, to help indicate when the data might be very old. While we're here, move "now" out of the loop so there's only one value. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-11 23:39:55 -05:00
Zygo Blaxell	2f2a68be3d	roots: use openat2 instead of openat when available This increases resistance to symlink and mount attacks. Previously, bees could follow a symlink or a mount point in a directory component of a subvol or file name. Once the file is opened, the open file descriptor would be checked to see if its subvol and inode matches the expected file in the target filesystem. Files that fail to match would be immediately closed. With openat2 resolve flags, symlinks and mount points terminate path resolution in the kernel. Paths that lead through symlinks or onto mount points cannot be opened at all. Fall back to openat() if openat2() returns ENOSYS, so bees will still run on kernels before v5.6. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-09 02:26:53 -05:00
Zygo Blaxell	613ddc3c71	progress: rename "ctime" -> "tm_left" "ctime", an abbreviation of "cycle time", collides with "ctime", an abbreviation of "st_ctime", a well-known filesystem term. "tm_left" fits in the column, so use that. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-06 12:50:50 -05:00
Zygo Blaxell	c3a39b7691	progress: rework the progress table after github discussion * Report position within cycle in units that cannot be mistaken for size or percentage * Put the total/maximum values in their own row * Add a start time column * Change column titles to reference "cycles" * Use "idle" instead of "finished" when a crawler is not running * Replace "transid" with "gen" because it's shorter Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-03 23:45:37 -05:00
Zygo Blaxell	58db4071de	roots: use a non-idle task for next_transid The scanners which finish early can become stuck behind scanners that are able to keep the queue full. Switch the next_transid task to the normal Task queues so that we force scanners to restart on every new transaction, possibly deferring already queued work to do so. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-03 23:36:53 -05:00
Zygo Blaxell	1af5fcdf34	roots: don't access a shared variable after releasing a lock Access the local copy of `m_root_crawl_map` instead. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-03 23:36:53 -05:00
Zygo Blaxell	87472b6086	extent scan: don't put non-data block groups in the data extent map The total data size should not include metadata or system block groups, and already does not; however, we still have these block groups in the map for mapping the crawl pointer to a logical offset within the filesystem. Rearrange a few lines around the `if` statement so that the map doesn't contain anything it should not. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-03 23:32:48 -05:00
Zygo Blaxell	ca351d389f	extent scan: pick the right block groups for mixed-bg filesystems The progress indicator was failing on a mixed-bg filesystem because those filesystems have block groups which have both _DATA and _METADATA bits, and the filesystem size calculation was excluding block groups that have _METADATA set. It should exclude block groups that have _DATA not set. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-03 23:15:37 -05:00
Zygo Blaxell	231593bfbc	throttle: don't hold the multilock during throttle Release the lock before entering the throttle sleep, so that other threads can still run. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-03 23:15:37 -05:00
Zygo Blaxell	ea45982293	throttle: add delays to match deferred request rate to btrfs completion rate Measure the time spent running various operations that extend btrfs transaction completion times (`LOGICAL_INO`, tmpfiles, and dedupe) and arrange for each operation to run for not less than the average amount of time by adding a sleep after each operation that takes less than the average. The delay after each operation is intended to slow down the rate of deferred and long-running requests from bees to match the rate at which btrfs is actually completing them. This may help avoid big spikes in latency if btrfs has so many requests queued that it has to force a commit to release memory. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-16 23:32:18 -05:00
Zygo Blaxell	c4b31bdd5c	extent scan: no need for "No ref for extent" debug message While a snapshot is being deleted, there will be a continuous stream of "No ref for extent" messages. This is a common event that does not need to be reported. There is an analogous situation when a call to open() fails with ENOENT. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-14 15:02:39 -05:00
Zygo Blaxell	08fe145988	context: wait for btrfs send to finish, then try dedupe again Dedupe is not possible on a subvol where a btrfs send is running: BTRFS warning (device dm-22): cannot deduplicate to root 259417 while send operations are using it (1 in progress) btrfs informs a process with EAGAIN that a dedupe could not be performed due to a running send operation. It would be possible to save the crawler state at the affected point, fork a new crawler that avoids the subvol under send, and resume the crawler state after a successful dedupe is detected; however, this only helps the intersection of the set of users who have unrelated subvols that don't share extents, and the set of users who cannot simply delay dedupe until send is finished. The simplest approach is to simply stop and wait until the send goes away. The simplest approach is taken here. When a dedupe fails with EAGAIN, affected Tasks will poll, approximately once per transaction, until the dedupe succeeds or fails with a different error. bees dedupe performance corresponds with the availability of subvols that can accept dedupe requests. While the dedupe is paused, no new Tasks can be performed by the worker thread. If subvols are small and isolated from the bulk of the filesystem data, the result will be a small but partial loss of dedupe performance during the send as some worker threads get stuck on the sending subvol. If subvols heavily share extents with duplicate data in other subvols, worker threads will all become blocked, and the entire bees process will pause until at least some of the running sends terminate. During the polling for btrfs send, the dedupe Task will hold its dst file open. This open FD won't interfere with snapshot or file delete because send subvols are always read-only (it is not possible to delete a file on a RO subvol, open or otherwise) and send itself holds the affected subvol open, preventing its deletion. Once the send terminates, the dedupe will terminate soon after, and the normal FD release can occur. This pausing during btrfs send is unrelated to the `--workaround-btrfs-send` option, although `--workaround-btrfs-send` will cause the pausing to trigger less often. It applies to all scan modes. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-14 14:51:28 -05:00
Zygo Blaxell	bb09b1ab0e	roots: drop method `transid_re` There are no callers of this method any more, and it exposes more of BeesRoots than we really want things to have access to. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-13 23:19:43 -05:00
Zygo Blaxell	94d9945d04	roots: move the transid cache update into transid_max_nocache() All callers of the `transid_max_nocache` method update `m_transid_re` with the return value, so do that in `transid_max_nocache` itself. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-13 23:19:43 -05:00
Zygo Blaxell	b9abcceacb	progress: move the "finished" tag to a column where it won't obscure data The "done" pointer and the "%done" fields are still useful because they indicate _actual_ progress, not the work that has been _promised_. So it is possible for a crawl to be "finished" (all extents queued) but not "100.0000%" (some of those extents still active or in the queue). "deferred" state isn't particularly useful, so drop it. "finished" state implies no ETA, so that column is unused. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-12 23:10:15 -05:00
Zygo Blaxell	31f3a8d67d	progress: relabel the inaccurate ETA column ETA is calculated using a sample obtained by snooping on bees's normal crawling operations. This sample is heavily biased and not representative of the entire filesystem. If the distribution of extent sizes in the filesystem is not uniform, the ETA can be wildly wrong. Collecting an accurate sample set would require extra IO and CPU time which should be spent doing dedupes instead. Explicitly label the ETA as inaccurate to avoid having too many users report the same bug. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-12 23:10:15 -05:00
Zygo Blaxell	3e89fe34ed	roots: avoid copying a BtrfsIoctlSearchKey Although all the members of BtrfsExtentDataFetcher are theoretically copiable, there's no need to actually make any such copy. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-03 16:54:14 -05:00
Zygo Blaxell	c1af219246	progress: squeeze the progress table into 80 columns or less We don't need the subvol numbers since they're only interesting to developers. We don't need both max and min sizes, pick one and drop the other. Replace "16E" with "max"--it is the same number of characters, but doesn't require the user to know what 1<<64 is off the top of their head. Shorten "remain" to "todo" because sometimes those extra two columns matter. Drop the seconds field in ETA timestamps. Long scan arrival times are years away, and short scan arrival times are only updated once every 5 minutes, so the extra precision isn't useful. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	9c183c2c22	progress: put the progress table in the stats and status files Make the progress information more accessible, without having to enable full debug log and fish it out of the stream with grep. Also increase the progress log level to INFO. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	59f8a467c3	extent scan: fix crawl_map creation There are two crawl_maps in extent scan's next_transid: one gets initialized, the other gets used. This works OK as long as bees is resuming an existing scan, because the two maps are identical; however, but it fails if bees is starting without an existing set of crawl data, and one of the two maps is empty or partially filled. The failure is intermittent, as the crawl map is being populated at the same time next_transid runs. It will eventually be completed after several transaction cycles, at which point bees runs normally. It does add significant delays during startup for benchmarks. There's only one crawl_map in extent scan, it always has the same crawlers, and extent scan's `next_transid` creates it by itself. Ignore the map from BeesRoots/BeesCrawl. Also throw in some missing but helpful trace statements. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	9987aa8583	progress: estimate actual data sizes for progress report Replace pointers in the "done" and "total" columns with estimated data sizes for each size tier. The estimation is based on statistics collected from extents scanned during the current bees run. Move the total size for the entire filesystem up to the heading. Report the _completed_ position (i.e. the one that would be saved in `beescrawl.dat`), not the _queued_ position (i.e. the one where the next Task would be created in memory). At the end of the data, the crawl pointer ends up at some random point in the filesystem just after the newest extent, so the progress gets to 99.7% and then goes to some random value like 47% or 3%, not to 100%. Report "deferred" in the "done" column when the crawler is waiting for the next transid, and "finished" in the "%done" column when the crawler has reached the end of the data. Suppress the ETA when finished. This makes it clear that there's no further work to do for these crawlers. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	8080abac97	extent scan: refactor BeesScanMode so derived classes decide their own scan scheduling BeesScanModeExtent uses six scan Tasks instead of one, which leads to awkwardness like the do_scan method to tell crawl_roots how to do what it shouldn't need to know how to do anyway. Move the crawl_roots logic into the ::scan methods themselves. This also deletes the very popular "crawl_more ran out of data" message. Extent scan explicitly indicates when a scan is complete, so there's no longer a need to fish this message out of the log. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	1e139d0ccc	extent scan: put all the refs in a single Task, sort them, use idle task The sorting avoids problematic read orders, like extent refs in the same inode with descending offsets, that btrfs is not optimized for. Putting everything in one Task keeps the queue sizes small, and manages the lock contention much more calmly. We only want to be mapping extent refs if there's not enough extents already in the queue to keep worker threads busy, so use the `idle()` method instead of `run()`. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	6542917ffa	extent scan: introduce SCAN_MODE_EXTENT The EXTENT scan mode reads the extent tree, splits it into tiers by extent size, converts each tiers's extents into subvol/inode/offset refs, then runs the legacy bees dedupe engine on the refs. The extent scan mode can cheaply compute completion percentage and ETA, so do that every time a new transid is observed. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	05bf1ebf76	counters: fix counter names for scan_eof, scan_no_fd, scanf_deferred_inode This code gets moved around from time to time and ends up with the wrong prefix. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	a7baa565e4	crawl: rename next_transid() to avoid confusion with BeesScanMode::next_transid() Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	717bdf5eb5	roots: make sure transid_max's computed value isn't max We check the result of transid_max_nocache(), but not the result of transid_max(). The latter is a computed result that is even more likely to be wrong[citation needed]. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:45:29 -05:00
Zygo Blaxell	d5a99c2f5e	roots: don't share a RootFetcher between threads If the send workaround is enabled, it is possible for two threads (a thread running the crawl_new task, and a thread attempting to apply the send workaround) to access the same RootFetcher object at the same time. That never ends well. Give each function its own BtrfsRootFetcher object. Fixes: https://github.com/Zygo/bees/issues/250 Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-20 11:14:34 -05:00
Zygo Blaxell	a115587fad	roots: fix extent lock failure handling Drop the crawl_restart counter, it doesn't happen here (or anywhere else). Add the crawl_again counter for extents that are restarted due to an extent-level lock. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-01-05 01:10:17 -05:00
Zygo Blaxell	c3b664fea5	context: don't forget to retry locked extents The caller of scan_forward has to stop advancing the BeesFileCrawl position when an extent lock blocks a scan, so that it will resume from the same position when the Task is scheduled again; otherwise, bees simply skips over the extent and leave it incompletely deduped. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-22 23:46:36 -05:00
Zygo Blaxell	bbcfd9daa6	roots: replace BEES_TRANSID_FACTOR with BEES_TRANSID_POLL_INTERVAL Restart crawl_more (and update crawl roots and flush FD caches) every time the transid changes, and only when the transid changes, but not more often than a reasonable minimum poll interval. Clean up the log message: use the proper thread name and remove the wildly inaccurate estimate of when crawl will resume. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:51:01 -05:00
Zygo Blaxell	d6d3e1045e	context: keep the resolve cache smaller We don't need to cache 65536 extent maps, especially if each one can have almost 700K references. Valgrind's massif tool points to the extent map cache as a very large memory allocator, but test runs with memcg disagree. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:51:01 -05:00
Zygo Blaxell	d5d17cbe62	roots: run insert_new_crawl from within a Task If we have loadavg targeting enabled, there may be no worker threads available to respond to new subvols, so we should not bother updating the subvols list. Put insert_new_crawl into a Task so it only executes when a worker is available. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:51:01 -05:00
Zygo Blaxell	7267707687	roots: disable recent sorting by max_transid On large filesystems where the min_transid of all subvols gets stuck at 0, bees may lose the ability to effectively track recent data. A secondary sort by max_transid will allow scanning newer subvols that were created after bees started running on the filesystem, but before bees completed the first scan of all subvols. On the other hand, the secondary sort does a reverse version of the sequential scan mode, and the sequential scan mode is simply awful. Disable it for now. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:51:01 -05:00
Zygo Blaxell	03f809bf22	roots: reimplement scan modes using virtual base and methods Split each scan mode into two distinct phases: 1. A heavy discovery phase, where we search the entire filesystem for something (new items in subvol trees in this case). 2. A light consuming phase, where we fetch extents to dedupe from places that we found in the discovery phase. Part 1 recomputes the subvol ordering every time there is a new transid. For some scan modes this computation is quite expensive, far too costly to pay for every extent, so we do it no more than once per transaction. Part 2 is run every time a worker thread hits the crawl_more Task. It simply pulls one extent from the first crawler off a sorted list, removing the crawler from the list when the crawler runs out of data. Part 1 creates a new structure and swaps it into place, while Part 2 continues to run using the previous strucuture. Neither of these need to block the other, so they don't. The separate class and base pointer also make it easer to add new scan modes that are not based on subvol trees or that don't use BeesCrawl. While we're here, fix up some method visibility in BeesRoots. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:51:01 -05:00
Zygo Blaxell	f5c4714a28	roots: add 'recent' crawl mode for a mix of new and old data Crawl mode 3 'recent' prioritizes data from new updates to previously scanned subvols over subvols that have not been completely scanned yet. If no such new data exists, falls back to a variation of 'lockstep' scan mode. This enables us to keep up with new data as it arrives, a key weakness of all the other scan modes, and worth violating our unwritten "no new scan modes until we have extent-tree dedupe working" policy for. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:51:00 -05:00
Zygo Blaxell	de96a38460	roots: emit "crawl finished" at the correct time The correct time is when we set the deferred bit after a tree search returns empty. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:51:00 -05:00
Zygo Blaxell	82c2b5bafe	roots: improve thread status tracking messages Don't dereference a shared_ptr inside a thread status function. Do trace the crawl start events. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:51:00 -05:00
Zygo Blaxell	84f91af503	context: don't let multiple worker Tasks get stuck on a single extent or inode When two Tasks attempt to lock the same extent, append the later Task to the earlier Task's post-exec work queue. This will guarantee that all Tasks which attempt to manipulate the same extent will execute sequentially, and free up threads to process other extents. Similarly, if two scanner threads operate on the same inode, any dedupe they perform will lock out other scanner threads in btrfs. Avoid this by serializing Task objects that reference the same file. This does theoretically use an unbounded amount of memory, but in practice a Task that encounters a contended extent or inode quickly stops spawning new Tasks that might increase the queue size, and all Tasks that might contend for the same lock(s) end up on a single FIFO queue. Note that the scope of inode locks is intentionally global, i.e. when an inode is locked, it locks every inode with the same number in every subvol. This avoids significant lock contention and task queue growth when the same inode with the same file extents appear in snapshots. Fixes: https://github.com/Zygo/bees/issues/158 Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:51:00 -05:00
Zygo Blaxell	31d26bcfc6	roots: organize scan workers by inode instead of extent Split crawlers into two separate Tasks: 1. a Task which locates the next inode with a new data extent. 2. a Task which scans every new extent in that inode. This simplifies some lock contention and execution ordering issues. Files are read sequentially. Workers dynamically scale up or down as needed, without creating thousands of deferred Task objects. Workers obtain inode locks for different inodes in btrfs, so they can work in parallel instead of waiting for each other. This change in behavior comes with new names for the worker Tasks: "crawl_master" is now "crawl_more", the singular Task which creates inode-scanning Tasks. "crawl_<subvol>" is now "crawl_<subvol>_<inode>". Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:51:00 -05:00
Zygo Blaxell	7cef1133be	roots: use symbolic names for SCAN_MODEs This was done on the development branch three years ago, and has been creating annoying merge conflicts ever since. Sync up the branches so they have the same names for these. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:51:00 -05:00
Zygo Blaxell	f98599407f	roots: rework btrfs send workaround using btrfs-tree Drop the cache since we no longer have to open a file every time we check a subvol's status. Also stop counting workaround events at the root level twice. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:50:59 -05:00
Zygo Blaxell	31b2aa3c0d	context: speed up orderly process termination Quite often bees exceeds its service timeout for termination because it is waiting for a loop embedded in a Task to finish some long-running btrfs operation. This can cause bees to be aborted by SIGKILL before it can completely flush the hash table or save crawl state. There are only two important things SIGTERM does when bees terminates: 1. Save crawl progress 2. Flush out the hash table Everything else is automatically handled by the kernel when the process is terminated by SIGKILL, so we don't have to bother doing it ourselves. This can save considerable time at shutdown since we don't have to wait for every thread to reach a point where it becomes idle, or force loops to terminate by throwing exceptions, or check a condition every time we access a pointer. Instead, we need do only the things in the list above, and then call _exit() to clean up everything else. Hash table and crawl state writeback can happen in their background threads instead of the foreground one. Separate the "stop" method for these classes into "stop_request" and "stop_wait" so that these writebacks can run at the same time. Deprecate and remove all references to the BeesHalt exception, and remove several unnecessary checks for BeesContext::stop_requested. Pause the task queue instead of cancelling it, which preserves the crawl progress state and stops new Tasks from competing for iops and CPU during writeback. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:50:58 -05:00
Zygo Blaxell	be9321cdb3	roots: correctly track crawl dirty state If there's an error while writing the crawl state, the state should remain dirty. If the crawl state is successfully written, the state is only clean if there were no changes to crawl state since the write was committed. We need to release the lock while writing the state but correctly set the dirty flag when the state is written successfully. Replace the bool with a version number counter. Track the last version successfully saved and the current version of the crawl state. The state is dirty if these counters disagree and clean if they agree. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:50:54 -05:00
Zygo Blaxell	07a4c9e8c0	roots: sprinkle on some more const Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-10-25 12:56:16 -04:00

1 2 3

119 Commits