GGLinnk/bees - bees - Virtual World Git

mirror of https://github.com/Zygo/bees.git synced 2025-08-03 14:23:29 +02:00

Author	SHA1	Message	Date
Zygo Blaxell	9a9644659c	trace: clean up the formatting around top-level exception log messages Fewer newlines. More consistent application of the "TRACE:" prefix. All at the same log level. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-06-18 21:17:48 -04:00
Zygo Blaxell	51b3bcdbe4	trace: deprecate BEESLOGTRACE, align trace logs with exception notices Exceptions were logged at level NOTICE while the stack traces were logged at level DEBUG. That produced useless noise in the output with `-v5` or `-v6`, where there were exception headings logged, but no details. Fix that by placing the exceptions and traces at level DEBUG, but prefix them with `TRACE:` for easy grepping. Most of the events associated with BEESLOGTRACE either never happen, or they are harmless (e.g. trying to open deleted files or subvols). Reassign them to ordinary BEESLOGDEBUG, with one exception for unrecognized Extent flags that should be debugged if any appear. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-02-13 23:59:42 -05:00
Zygo Blaxell	3e7eb43b51	BeesStringFile: figure out when to call--or _not_ call--fsync Older kernel versions featured some bugs in btrfs `fsync`, which could leave behind "ghost dirents", orphan filename items that did not have a corresponding inode. These dirents were created during log replay during the first mount after a crash due to several different bugs in the log tree and its use over the years. The last known bug of this kind was fixed in kernel 5.16. As of this writing, no fixes for this bug have been backported to any earlier LTS kernel. Some filesystems, including btrfs, will flush the contents of a new file before renaming it over an old file. On paper, btrfs can do this very cheaply since the contents of the new file are not referenced, and the old file not dereferenced, until a tree commit which includes both actions atomically; however, in real life, btrfs provides `fsync`-like semantics and uses the log-tree infrastructure to implement them, which compromises performance and acts as a magnet for bugs. The benefit of this trade-off is that `rename` can be used as a synchronization point for data outside of the btrfs, which would not happen if everything `rename` does was simply deferred to the next tree commit. The cost of this trade-off is that for the first 8 years of its existence, bees would trigger the bug so often that the project recommended its users put $BEESHOME in its own subvol to make it easy to remove ghost dirents left behind by the bug. Some other filesystems, such as xfs, don't have any special semantics for `rename`, and require `fsync` to avoid garbage or missing data after a crash. Even filesystems which do have a special case for `rename` can be configured to turn it off. btrfs will silently delete data from files in the event that an unrecoverable data block write error occurs. Kernel version 6.2 adds important new and unexpected cases where this can happen on filesystems using raid56 data, but it also happens in all usable btrfs versions (the silent deletion behavior was introduced in kernel version 3.9). Unrecoverable write errors are currently reported to userspace only through `fsync`. Since the failed extents are deleted, they cannot be detected via csum failures or scrub after the fact--and it's too late by then, the data is already gone. `fsync` is the last opportunity to detect the write failure before the `rename`. If the error is not detected, the contents of the file will be silently discarded in btrfs. The impact on bees is that scans will abruptly restart from zero after a crash combined with some other reasonably common failures. Putting all of this together leads to a rather complex workaround: if the filesystem under $BEESHOME (specifically, the filesystem where BeesStringFile objects such as `beescrawl.dat` are written) is a btrfs filesystem, and the host kernel is a version prior to 5.16, then don't call `fsync` before `rename`. In all other cases, do call `fsync`, and prevent dependent writes (i.e. the following `rename`) in the event of errors. Since present kernel versions still require `fsync`, we don't need an upper bound on the kernel version check until someone fixes btrfs `rename` (or perhaps adds a flag to `renameat2` which prevents use of the log tree) in the kernel. Once that fix happens, we can drop the `fsync` call for kernels after that fixed version. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-02-10 21:04:20 -05:00
Zygo Blaxell	88b1e4ca6e	main: unconditionally enable workaround for the logical_ino-vs-clone kernel bug This obviously doesn't fix or prevent the kernel bug, but it does prevent bees from triggering the bug without assitance from another application. The bug can still be triggered by running bees at the same time as an application which uses clone or LOGICAL_INO. `btdu` uses LOGICAL_INO, while `cp` from coreutils (and many others) use clone (reflink copy). Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-02-06 23:14:16 -05:00
Zygo Blaxell	440740201a	main: the base directory for `--strip-paths` should be root_fd, not cwd The cwd is where core dumps and various profiling and verification libraries want to write their data, whereas root_fd is the root of the target filesystem. These are often intentionally different. When they are different, `--strip-paths` sets the wrong prefix to strip from paths. Once the root fd has been established, we can set the path prefix to the string prefix that we'll get from future calls to `name_fd`. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-02-06 22:42:15 -05:00
Zygo Blaxell	30cd375d03	readahead: clean up the code, update docs Remove dubious comments and #if 0 section. Document new event counters, and add one for read failures. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-02-06 22:42:15 -05:00
Zygo Blaxell	a2b3e1e0c2	log: demote a lot of BEESLOGWARN to higher verbosity levels Toxic extent workarounds are going away because the underlying kernel bugs have been fixed. They are no longer worthy of spamming non-developer logs. INO_PATHS can return no paths if an inode has been deleted. It doesn't need a log message at all, much less one at WARN level. Dedupe failure can be INFO, the same level as dedupe itself, especially since the "NO dedupe" message doesn't mention what was [not] deduped. Inspired by Kai Krakow's "context: demote "abandoned toxic match" to debug log level". Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-19 01:08:28 -05:00
Zygo Blaxell	1f0b8c623c	options: improve message when too many--or too few--path arguments given Running bees with no arguments complains about "Only one" path argument. Replace this with "Exactly one" which uses similar terminology to other btrfs tools. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-03 23:15:37 -05:00
Zygo Blaxell	74296c644a	options: return EXIT_SUCCESS after displaying help message `getopt_long` already supplies a message when an option cannot be parsed, so there isn't a need to distinguish option parse failures from help requests. Fixes: https://github.com/Zygo/bees/pull/277 Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-03 23:15:37 -05:00
Zygo Blaxell	81bbf7e1d4	throttle: set default to 0.0 Longer latency testing runs are not showing a consistent gain from a throttle factor of 1.0. Make the default more conservative. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-03 23:15:37 -05:00
Zygo Blaxell	2a1ed0b455	throttle: track time values more closely Decaying averages by 10% every 5 minutes gives roughly a half-hour half-life to the rolling average. Speed that up to once per minute. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-03 23:14:31 -05:00
Zygo Blaxell	d160edc15a	throttle: add --throttle-factor option to control throttling factor Also change the initializer syntax for the option list to use C99 compound literals. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2025-01-03 23:13:51 -05:00
Zygo Blaxell	e79b242ce2	options: clean up the parser, prepare for new options with no short form We're not adding any more short options, but the debugging code doesn't work with optvals above 255. Also clean up constness and variable lifetimes. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-16 23:32:18 -05:00
Zygo Blaxell	ea45982293	throttle: add delays to match deferred request rate to btrfs completion rate Measure the time spent running various operations that extend btrfs transaction completion times (`LOGICAL_INO`, tmpfiles, and dedupe) and arrange for each operation to run for not less than the average amount of time by adding a sleep after each operation that takes less than the average. The delay after each operation is intended to slow down the rate of deferred and long-running requests from bees to match the rate at which btrfs is actually completing them. This may help avoid big spikes in latency if btrfs has so many requests queued that it has to force a commit to release memory. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-16 23:32:18 -05:00
Zygo Blaxell	0580c10082	main: add support for pause (SIGUSR1) and resume (SIGUSR2) These are simple on/off switches for the task queue. They are lightweight requests for bees to be paused temporarily, but allow bees to release open files and save progress while paused. These signals are an alternative to SIGSTOP and SIGCONT, or using the cgroup freezer's FROZEN and THAWED states, which pause and resume the bees process, but do not allow the bees process to release open files or save progress. Snapshot and file deletes can occur on the filesystem while bees is paused by SIGUSR1 but not by SIGSTOP. These signals are also an alternative to SIGTERM and restart, which flush out the whole hash table and progress state on exit, and read the whole table back into memory on restart. This feature is experimental and may be replaced by a more general configuration or runtime control mechanism in the future. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-12 23:01:19 -05:00
Zygo Blaxell	e40339856f	readahead: use the right parameter order when checking the range In some cases the offset and size arguments were flipped when checking to see if a range had already been read. This would have been OK as long as the same mistake had been made consistently, since `bees_readahead_check` only does a cache lookup on the parameters, it doesn't try to use them to read a file. Alas, there was one case where the correct order was used, albeit a relatively rare one. Fix all the calls to use the correct order. Also fix a comment: the recent request cache is global to all threads. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-04 11:17:44 -05:00
Zygo Blaxell	43d38ca536	extent scan: don't serialize dedupe and LOGICAL_INO when using extent scan mode The serialization doesn't seem to be necessary for the extent scan mode. No infinite loops in the kernel have been observed in the past two years, despite never having used MultiLock for the extent scanner. Leave the serialization for now on the subvol scanners. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:52 -05:00
Zygo Blaxell	8d4d153d1d	main: set default scan mode to mode 4 (EXTENT) Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-12-01 00:17:51 -05:00
Zygo Blaxell	e78e05e212	readahead: inject more sanity at the foundation of an insane architecture This solves a third bad problem with bees reads: 3. The architecture above the read operations will issue read requests for the same physical blocks over and over in a short period of time. Fixing that properly requires rewriting the upper-level code, but a simple small table of recent read requests can reduce the effect of the problem by orders of magnitude. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	8d08a3c06f	readahead: inject some sanity at the foundation of an insane architecture This solves some of the worst problems with bees reads: 1. The kernel readahead doesn't work. More precisely, it's much better adapted for a very different use case: a single thread alternating between reading a file sequentially and processing the data that was read. bees has multiple threads which compete for access to IO and then issue reads in random order immediately after the call to readahead. The kernel uses idle ioprio scheduling for the readaheads, so the readaheads get preempted by the random reads, or cancels the readaheads because the data access pattern isn't sequential after the readahead was issued. 2. Seeking drives perform terribly with multiple competing readers, especially with btrfs striped profiles where the iops are broken into tiny stripe-sized pieces. At one point I intended to read the btrfs device map and figure out which devices can be read in parallel, but to make that useful, the user needs to have an array with multiple drives in single profile, or 4+ drives in raid1 profile. In all other cases, the elaborate calculations always return the same result: there can be only one reader at a time. This commit fixes both problems: 1. Don't use the kernel readahead. Use normal reads into a dummy buffer instead. 2. Allow only one thread to readahead at any time. Once the read is completed, the data is in the page cache, and all the random-order small reads that bees does will hit the page cache, not a spinning disk. In some cases we need to read two things close together, so add a `bees_readahead_pair` which holds one lock across both reads. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	d27621b779	main: catch exceptions and exit gracefully Calling 'bees -m4' should not call 'std::terminate()', but it does. Use catch_all instead. It will still pass the exit value to return from main. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-01-05 01:10:17 -05:00
Zygo Blaxell	c327e0bb10	readahead: report the original size in BEESTOOLONG BEESTOOLONG was always reporting a size of zero, and the offset of the end of the readahead region. Report the original size instead (and also in BEESTRACE and BEESNOTE). Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-01-05 01:10:17 -05:00
Zygo Blaxell	e13c62084b	roots: use scan mode 'independent' by default Independent subvol scanners fairly consistently outperform either of the correlated scan modes. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:51:00 -05:00
Zygo Blaxell	7cef1133be	roots: use symbolic names for SCAN_MODEs This was done on the development branch three years ago, and has been creating annoying merge conflicts ever since. Sync up the branches so they have the same names for these. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:51:00 -05:00
Zygo Blaxell	d345ea2b78	readahead: use emulation It seems that readahead() does not work on btrfs, or at least it has no discernable effect. Enable the workaround instead. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:50:55 -05:00
Zygo Blaxell	cc87125e41	bees: drop bees_sync, we will not need it bees_sync() was an exception-trapping wrapper around fsync() which is not needed in any of the contexts from which it was called: 1. dedupe operations implicitly flush the src data, so there is no need to call fsync() to do that twice. 2. crawl position is written to a temporary file and renamed over the original, which always forces a flush when the original exists. On the first write, where there is no original, a crash would result in starting over with an empty or hole-filled beescrawl file, which is the initial state of bees. There is also a long history of kernel bugs triggered by fsync() in this case. 3. we use unreadahead to trigger writeback for flushing the hash table to persistent storage. Here is a space where we might use fsync after all, as part of bees_unreadahead's emulation of POSIX_FADV_DONTNEED, but we need to get read-once behavior from the scanner before we can use this capability. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2022-12-20 20:50:54 -05:00
Zygo Blaxell	ba694b4881	hash: move the random generator out of bees-hash.cc We need random numbers in more places, so centralize the engines. Initialize with a proper random seed so every worker thread gets different behavior. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-11-29 21:27:48 -05:00
Zygo Blaxell	5e379b4c48	readahead: update comments to reflect bakeoff results It turns out that readahead() alone is fastest. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-11-29 21:27:48 -05:00
Zygo Blaxell	a353d8cc6e	hash: use POSIX_FADV_WILLNEED and POSIX_FADV_DONTNEED The hash table is one of the few cases in bees where a non-trivial amount of page cache memory will be used in a predictable way, so we can advise the kernel about our IO demands in advance. Use WILLNEED to prefetch hash table pages at startup. Use DONTNEED to trigger writeback on hash table pages at shutdown. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-10-04 20:41:09 -04:00
Zygo Blaxell	97d70ef4c5	bees: readahead() in the kernel is posix_fadvise(..., POSIX_FADV_WILLNEED) In theory, we don't need the pread() loop, because the kernel will do a better job with readahead(). In practice, we might still need the pread() code, as the readahead will occur at idle IO priority, which could adversely affect bees performance. More testing is required. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-10-04 20:21:01 -04:00
Kai Krakow	081a6af278	bees: Avoid unused result with -Werror=unused-result Fixes: commit `20b8f8ae0b` ("bees: use helper function for readahead") Signed-off-by: Kai Krakow <kai@kaishome.de>	2021-06-19 10:35:28 +02:00
Zygo Blaxell	03532effed	trace: move BeesTrace and BeesNote into their own translation unit This allows these components to be used by test executables without pulling in all of bees, and more rapidly iterate their code. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:56:54 -04:00
Zygo Blaxell	1fd26a03b2	tracer: annotate both ends of the stack trace Add a matching "--- BEGIN TRACE..." line to complement the "--- END TRACE..." line. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:56:54 -04:00
Zygo Blaxell	7008c74113	bees: trace and log improvements during roots and context startup Currently if crawl throws an exception, we don't have basic information about what was being crawled or even if the crawler was running at all. These traces also help identify the causes of early exception failures. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:56:54 -04:00
Zygo Blaxell	20b8f8ae0b	bees: use helper function for readahead There seem to be multiple ways to do readahead in Linux, and only some of them work. Hopefully reading the actual data is one of them. This is an attempt to avoid page-by-page reads in the generic dedupe code. We load both extents into the VFS cache (read sequentially) and hope they are still there by the time we call dedupe on them. We also call readahead(2) and hopefully that either helps or does nothing. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:56:54 -04:00
Zygo Blaxell	0bbaddd54c	docs: finally concede that the consensus spelling is "dedupe" Change documentation and comments to use the word "dedupe," not "dedup" as found in circa-3.15 kernel sources. No changes in code or program output--if they used "dedup" before, they will continue to be spelled "dedup" now. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:49:15 -04:00
Zygo Blaxell	fbd1091052	options: remove default 8 CPU thread limit Higher CPU core counts became more common, and kernel bugs became less common, since the arbitrary 8-thread limit was introduced. We can remove the limit now, and treat any remaining scaling inefficiency as a bug to be removed. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:49:15 -04:00
Zygo Blaxell	10af3f9763	bees: remove si_addr_lsb from siginfo debug message to fix FTBFS Apparently it is missing in newer Linux headers, making builds fail. We don't need it, so remove it. Closes: #160 Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-12-21 19:26:22 -05:00
Zygo Blaxell	d1f1c386bc	tempfile: remove size limit in realign() Now that tempfiles are using pool checkin functions to control their size, we don't need a size limit in realign(). We keep the limit in make_copy because it's a sanity check against letting a multi-terabyte copy operation slip through. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-12-17 17:54:51 -05:00
Zygo Blaxell	6705cd9c26	context: move TempFile from TLS to Pool and fix some FdCache issues Get rid of the thread-local TempFiles and use Pool instead. This eliminates a potential FD leak when the loadavg governor repeatedly creates and destroys threads. With the old per-thread TempFiles, we were guaranteed to have exclusive ownership of the TempFile object within the current thread. Pool is somewhat stricter: it only guarantees ownership while the checked-out Handle exists. Adjust the users of TempFile objects to ensure they hold the Handle object until they are finished using the TempFile. It appears that maintaining large, heavily-reflinked, long-lived temporary files costs more than truncating after every use: btrfs has to write multiple references to the temporary file's extents, then some commits later, remove references as the temporary file is deleted or truncated. Using the temporary file in a dedupe operation flushes the data to disk, so nothing is saved by pretending that there is writeback pipelining and trying to avoid flushes in truncate. Pool provides usage tracking and a checkin callback, so use it to truncate the temporary file immediately after every use. Redesign TempFile so that every instance creates exactly one Fd which persists over the lifetime of the TempFile object. Provide a reset() method which resets the file back to the initial state and call it from the Pool checkin callback. This makes TempFile's lifetime equivalent to its Fd's lifetime, which simplifies interactions with FdCache and Roots. This change means we can now blacklist temporary files without having an effective memory leak, so do that. We also have a reason to ever remove something from the blacklist, so add a method for that too. In order to move to extent-centric addressing, we need to be able to reliably open temporary files by root and inode number. Previously we would place TempFile fd's into the cache with insert_root_ino, but the cache would be cleared periodically, and it would not be possible to reopen temporary files after that happened. Now that the TempFile's lifetime is the same as the TempFile Fd's lifetime, we can have TempFile manage a separate FileId -> Fd map in Roots which is unaffected by the periodic cache clearing. BeesRoots::open_root_ino_nocache will check this map before attempting to open the file via btrfs root+ino lookup, and return it through the cache as if Roots had opened the file via btrfs. Hold a reference to BeesRoots in BeesTempFile because the usual way to get such a reference now throws an exception in BeesTempFile's destructor. These changes make method BeesTempFile::create() and all methods named insert_root_ino unnecessary, so delete them. We construct and destroy TempFiles much less often now, so make their constructor and destructor more informative. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-12-17 17:54:51 -05:00
Zygo Blaxell	1e7dbc6f97	tempfile: remove old comments about fsync and deadlock bugs I was never able to prove a connection between fsync() and deadlock bugs. There were too many deadlock bugs to be able to isolate a bug that is triggered specifically by fsync. Update the comment (which has been unchanged since kernel 4.14). We still may want to do fsync() on temporary files someday, but there's a full internal API rewrite between here and there. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-12-17 17:54:51 -05:00
Zygo Blaxell	e654e29f45	bees: move usage message out of source file and fix a few inaccuracies It's a pain to read, edit, and format large blocks of text in C++ code, so rip the usage message out of bees.cc and put it in a plain text file. Use a minimal translator to convert it into a C string. While we're here, remove the multiple roots feature from the command line synopsis, as we don't really support it any more. Also clarify that "id 5" is "subvol id 5", and describe in one sentence what workaround-btrfs-send does. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-12-17 17:54:51 -05:00
Zygo Blaxell	17d8759011	bees: make it build with clang Remove unused "addr check" functions. We have ranged_cast for detecting overflow bits. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-12-17 17:54:51 -05:00
Zygo Blaxell	15ab981d9e	bees: replace uncaught_exception(), deprecated in C++17 uncaught_exception() had only the one valid use case, and it can be reimplemented by literally calling current_exception() instead. current_exception() has several valid use cases, so it is not likely to be deprecated any time soon. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-10-09 12:07:10 -04:00
Zygo Blaxell	05bd65444d	bees: initialize context in the correct order We cannot use BeesContext::roots() until after BeesContext::set_root_path() has been called. Save up the parameter settings until then. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-08-31 22:35:17 -04:00
Zygo Blaxell	4363463342	process: Fix gettid() ambiguity with glibc >= 2.30 In version 2.30 glibc added it's own gettid() function. This resulted in "error: call of overloaded ‘gettid()’ is ambiguous" because gettid() now exists in both namespace crucible and std. For now, use explicit references to namespace crucible. This continues to work with new and old libc without having to test specific library versions. At some point, glibc gettid() will be deployed widely enough that we can remove the crucible version entirely. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2019-10-30 00:12:33 -04:00
Zygo Blaxell	2c3d1822f7	bees: don't try to print si_lower and si_upper Some build environments (ARM? AARCH64?) do not have the fields si_lower and si_upper in siginfo. bees doesn't need them, so don't try to access them. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2019-06-12 22:48:05 -04:00
Zygo Blaxell	be2c55119e	bees: make exceptions less prominent in log output Introduce a mechanism to suppress exceptions which do not produce a full stack trace for common known cases where a loop should be aborted. Use this mechanism to suppress the infamous "FIXME" exception. Reduce the log level to at most NOTICE, and in some cases DEBUG. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2019-01-06 01:48:35 -05:00
Zygo Blaxell	570b3f7de0	bees: handle SIGTERM and SIGINT, force immediate flush and exit Capture SIGINT and SIGTERM and shut down, preserving current completed crawl and hash table state. * Executing tasks are completed, queued tasks are paused. * Crawl state is saved. * The crawl master and crawl writeback threads are terminated. * The task queue is flushed. * Dirty hash table extents are flushed. * Hash prefetch and writeback threads are terminated. * Hash table is deallocated. * FD caches and tmpfiles are destroyed. * Assuming the above didn't crash or deadlock, bees exits. The above order isn't the fastest, but it does roughly follow the shared_ptr dependencies and avoids data races--especially those that might lead to bees reporting an extent scanned when it was only queued for future scanning that did not occur. In case of a violation of expected shared_ptr dependency order, exceptions in BeesContext child object accessor methods (i.e. roots(), hash_table(), etc) prevent any further progress in threads that somehow remain unexpectedly active. Move some threads from main into BeesContext so they can be stopped via BeesContext. The main thread now runs a loop waiting for signals. A slow FD leak was discovered in TempFile handling. This has not been fixed yet, but an implementation detail of the C++ runtime library makes the leak so slow it may never be important enough to fix. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-12-09 23:39:44 -05:00
Zygo Blaxell	389dd52cc1	tempfile: drop the fsync() The deadlock seems to be fixed now (if there ever was one--there certainly were deadlocks, but matching deadlocks to root causes is non-trivial and a number of distinct deadlock cases have been fixed in recent years). The benchmark data is inconclusive about whether it is better to fsync or not to fsync. A paranoia option might be useful here. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-12-09 01:00:36 -05:00

1 2

95 Commits