GGLinnk/bees - bees - Virtual World Git

mirror of https://github.com/Zygo/bees.git synced 2025-12-26 13:30:20 +01:00

Author	SHA1	Message	Date
Zygo Blaxell	a353d8cc6e	hash: use POSIX_FADV_WILLNEED and POSIX_FADV_DONTNEED The hash table is one of the few cases in bees where a non-trivial amount of page cache memory will be used in a predictable way, so we can advise the kernel about our IO demands in advance. Use WILLNEED to prefetch hash table pages at startup. Use DONTNEED to trigger writeback on hash table pages at shutdown. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-10-04 20:41:09 -04:00
Zygo Blaxell	97d70ef4c5	bees: readahead() in the kernel is posix_fadvise(..., POSIX_FADV_WILLNEED) In theory, we don't need the pread() loop, because the kernel will do a better job with readahead(). In practice, we might still need the pread() code, as the readahead will occur at idle IO priority, which could adversely affect bees performance. More testing is required. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-10-04 20:21:01 -04:00
Kai Krakow	081a6af278	bees: Avoid unused result with -Werror=unused-result Fixes: commit `20b8f8ae0b` ("bees: use helper function for readahead") Signed-off-by: Kai Krakow <kai@kaishome.de>	2021-06-19 10:35:28 +02:00
Zygo Blaxell	03532effed	trace: move BeesTrace and BeesNote into their own translation unit This allows these components to be used by test executables without pulling in all of bees, and more rapidly iterate their code. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:56:54 -04:00
Zygo Blaxell	1fd26a03b2	tracer: annotate both ends of the stack trace Add a matching "--- BEGIN TRACE..." line to complement the "--- END TRACE..." line. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:56:54 -04:00
Zygo Blaxell	7008c74113	bees: trace and log improvements during roots and context startup Currently if crawl throws an exception, we don't have basic information about what was being crawled or even if the crawler was running at all. These traces also help identify the causes of early exception failures. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:56:54 -04:00
Zygo Blaxell	20b8f8ae0b	bees: use helper function for readahead There seem to be multiple ways to do readahead in Linux, and only some of them work. Hopefully reading the actual data is one of them. This is an attempt to avoid page-by-page reads in the generic dedupe code. We load both extents into the VFS cache (read sequentially) and hope they are still there by the time we call dedupe on them. We also call readahead(2) and hopefully that either helps or does nothing. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:56:54 -04:00
Zygo Blaxell	0bbaddd54c	docs: finally concede that the consensus spelling is "dedupe" Change documentation and comments to use the word "dedupe," not "dedup" as found in circa-3.15 kernel sources. No changes in code or program output--if they used "dedup" before, they will continue to be spelled "dedup" now. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:49:15 -04:00
Zygo Blaxell	fbd1091052	options: remove default 8 CPU thread limit Higher CPU core counts became more common, and kernel bugs became less common, since the arbitrary 8-thread limit was introduced. We can remove the limit now, and treat any remaining scaling inefficiency as a bug to be removed. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2021-06-11 20:49:15 -04:00
Zygo Blaxell	10af3f9763	bees: remove si_addr_lsb from siginfo debug message to fix FTBFS Apparently it is missing in newer Linux headers, making builds fail. We don't need it, so remove it. Closes: #160 Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-12-21 19:26:22 -05:00
Zygo Blaxell	d1f1c386bc	tempfile: remove size limit in realign() Now that tempfiles are using pool checkin functions to control their size, we don't need a size limit in realign(). We keep the limit in make_copy because it's a sanity check against letting a multi-terabyte copy operation slip through. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-12-17 17:54:51 -05:00
Zygo Blaxell	6705cd9c26	context: move TempFile from TLS to Pool and fix some FdCache issues Get rid of the thread-local TempFiles and use Pool instead. This eliminates a potential FD leak when the loadavg governor repeatedly creates and destroys threads. With the old per-thread TempFiles, we were guaranteed to have exclusive ownership of the TempFile object within the current thread. Pool is somewhat stricter: it only guarantees ownership while the checked-out Handle exists. Adjust the users of TempFile objects to ensure they hold the Handle object until they are finished using the TempFile. It appears that maintaining large, heavily-reflinked, long-lived temporary files costs more than truncating after every use: btrfs has to write multiple references to the temporary file's extents, then some commits later, remove references as the temporary file is deleted or truncated. Using the temporary file in a dedupe operation flushes the data to disk, so nothing is saved by pretending that there is writeback pipelining and trying to avoid flushes in truncate. Pool provides usage tracking and a checkin callback, so use it to truncate the temporary file immediately after every use. Redesign TempFile so that every instance creates exactly one Fd which persists over the lifetime of the TempFile object. Provide a reset() method which resets the file back to the initial state and call it from the Pool checkin callback. This makes TempFile's lifetime equivalent to its Fd's lifetime, which simplifies interactions with FdCache and Roots. This change means we can now blacklist temporary files without having an effective memory leak, so do that. We also have a reason to ever remove something from the blacklist, so add a method for that too. In order to move to extent-centric addressing, we need to be able to reliably open temporary files by root and inode number. Previously we would place TempFile fd's into the cache with insert_root_ino, but the cache would be cleared periodically, and it would not be possible to reopen temporary files after that happened. Now that the TempFile's lifetime is the same as the TempFile Fd's lifetime, we can have TempFile manage a separate FileId -> Fd map in Roots which is unaffected by the periodic cache clearing. BeesRoots::open_root_ino_nocache will check this map before attempting to open the file via btrfs root+ino lookup, and return it through the cache as if Roots had opened the file via btrfs. Hold a reference to BeesRoots in BeesTempFile because the usual way to get such a reference now throws an exception in BeesTempFile's destructor. These changes make method BeesTempFile::create() and all methods named insert_root_ino unnecessary, so delete them. We construct and destroy TempFiles much less often now, so make their constructor and destructor more informative. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-12-17 17:54:51 -05:00
Zygo Blaxell	1e7dbc6f97	tempfile: remove old comments about fsync and deadlock bugs I was never able to prove a connection between fsync() and deadlock bugs. There were too many deadlock bugs to be able to isolate a bug that is triggered specifically by fsync. Update the comment (which has been unchanged since kernel 4.14). We still may want to do fsync() on temporary files someday, but there's a full internal API rewrite between here and there. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-12-17 17:54:51 -05:00
Zygo Blaxell	e654e29f45	bees: move usage message out of source file and fix a few inaccuracies It's a pain to read, edit, and format large blocks of text in C++ code, so rip the usage message out of bees.cc and put it in a plain text file. Use a minimal translator to convert it into a C string. While we're here, remove the multiple roots feature from the command line synopsis, as we don't really support it any more. Also clarify that "id 5" is "subvol id 5", and describe in one sentence what workaround-btrfs-send does. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-12-17 17:54:51 -05:00
Zygo Blaxell	17d8759011	bees: make it build with clang Remove unused "addr check" functions. We have ranged_cast for detecting overflow bits. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-12-17 17:54:51 -05:00
Zygo Blaxell	15ab981d9e	bees: replace uncaught_exception(), deprecated in C++17 uncaught_exception() had only the one valid use case, and it can be reimplemented by literally calling current_exception() instead. current_exception() has several valid use cases, so it is not likely to be deprecated any time soon. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-10-09 12:07:10 -04:00
Zygo Blaxell	05bd65444d	bees: initialize context in the correct order We cannot use BeesContext::roots() until after BeesContext::set_root_path() has been called. Save up the parameter settings until then. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2020-08-31 22:35:17 -04:00
Zygo Blaxell	4363463342	process: Fix gettid() ambiguity with glibc >= 2.30 In version 2.30 glibc added it's own gettid() function. This resulted in "error: call of overloaded ‘gettid()’ is ambiguous" because gettid() now exists in both namespace crucible and std. For now, use explicit references to namespace crucible. This continues to work with new and old libc without having to test specific library versions. At some point, glibc gettid() will be deployed widely enough that we can remove the crucible version entirely. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2019-10-30 00:12:33 -04:00
Zygo Blaxell	2c3d1822f7	bees: don't try to print si_lower and si_upper Some build environments (ARM? AARCH64?) do not have the fields si_lower and si_upper in siginfo. bees doesn't need them, so don't try to access them. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2019-06-12 22:48:05 -04:00
Zygo Blaxell	be2c55119e	bees: make exceptions less prominent in log output Introduce a mechanism to suppress exceptions which do not produce a full stack trace for common known cases where a loop should be aborted. Use this mechanism to suppress the infamous "FIXME" exception. Reduce the log level to at most NOTICE, and in some cases DEBUG. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2019-01-06 01:48:35 -05:00
Zygo Blaxell	570b3f7de0	bees: handle SIGTERM and SIGINT, force immediate flush and exit Capture SIGINT and SIGTERM and shut down, preserving current completed crawl and hash table state. * Executing tasks are completed, queued tasks are paused. * Crawl state is saved. * The crawl master and crawl writeback threads are terminated. * The task queue is flushed. * Dirty hash table extents are flushed. * Hash prefetch and writeback threads are terminated. * Hash table is deallocated. * FD caches and tmpfiles are destroyed. * Assuming the above didn't crash or deadlock, bees exits. The above order isn't the fastest, but it does roughly follow the shared_ptr dependencies and avoids data races--especially those that might lead to bees reporting an extent scanned when it was only queued for future scanning that did not occur. In case of a violation of expected shared_ptr dependency order, exceptions in BeesContext child object accessor methods (i.e. roots(), hash_table(), etc) prevent any further progress in threads that somehow remain unexpectedly active. Move some threads from main into BeesContext so they can be stopped via BeesContext. The main thread now runs a loop waiting for signals. A slow FD leak was discovered in TempFile handling. This has not been fixed yet, but an implementation detail of the C++ runtime library makes the leak so slow it may never be important enough to fix. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-12-09 23:39:44 -05:00
Zygo Blaxell	389dd52cc1	tempfile: drop the fsync() The deadlock seems to be fixed now (if there ever was one--there certainly were deadlocks, but matching deadlocks to root causes is non-trivial and a number of distinct deadlock cases have been fixed in recent years). The benchmark data is inconclusive about whether it is better to fsync or not to fsync. A paranoia option might be useful here. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-12-09 01:00:36 -05:00
Zygo Blaxell	cdca2bcdcd	main: single BeesContext instance per process After weeks of testing I copied part of a change to main without copying the rest of the change, leading to an immediate segfault on startup. So here is the rest of the change: limit the number of BeesContexts per process to 1. This change was discussed at https://github.com/Zygo/bees/issues/54#issuecomment-360332529 but there are more reasons to do it now: the candidates to replace the current hash table format are less forgiving of sharing hash tables, and it may even become necessary to have more than one hash table per BeesContext instance (e.g. to keep datasum and nodatasum data separate). Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-11-22 20:40:30 -05:00
Zygo Blaxell	34b04f4255	bees: soft-limit computed thread counts to 8 https://github.com/Zygo/bees/issues/91 describes problems encountered when running bees on systems with many CPU cores. Limit the computed number of threads (using --thread-factor or the default) to a maximum of 8 (i.e. the number of logical cores in a modern laptop). Users can override the limit by using --thread-count. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-11-21 21:49:16 -05:00
Zygo Blaxell	23f3e4ec42	workarounds: add workaround for btrfs send Introduce --workaround options which trade performance or effectiveness to avoid triggering kernel bugs. The first such option is --workaround-btrfs-send, which avoids making any modification to read-only subvols to avoid btrfs send bugs. Clean up usage message: no tabs for formatting, split options into sections by theme. Make scan mode a non-static data member like all (most?) other options. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-11-21 21:49:16 -05:00
Zygo Blaxell	e3247d3471	stats: streamline add_count Perf was blaming BeesStats::add_count for >1% of instructions. Trim the instruction count a little. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-11-08 23:31:50 -05:00
Zygo Blaxell	9dbe2d6fee	bees: add -G/--thread-min option for minimum thread count The -g option limits the number of worker threads when the target load average is exceeded. On some systems the load normally runs high, and continuous bees operation is required to avoid running out of disk space. Add a -G/--thread-min option to force at least some threads to continue running. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-09-14 23:50:07 -04:00
Zygo Blaxell	e66086516f	bees: dynamic thread pool size based on system load average Add -g / --loadavg-target parameter to track system load and add or remove bees worker threads dynamically to keep system load close to the loadavg target. Thread count may vary from zero to the maximum specified by -c or -C, and is adjusted every 5 seconds. This is better than implementing a similar load average scheme from outside of the process (though that is still possible) because the in-process load tracker does not disrupt the performance timing feedback mechanisms as a freezer cgroup or SIGSTOP would when controlling bees from outside. The internal load average tracker can also adjust the number of active threads while an external tracker can only choose from the maximum or zero. Also fix a bug where a Task could deadlock waiting for itself to exit if it tries to insert a new Task after the number of worker threads has been set to zero. Also correct usage message for --scan-mode (values are 0..2) since we are touching adjacent lines anyway. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-09-14 23:50:03 -04:00
Zygo Blaxell	041ad717a5	bees: configurable log verbosity Log messages were already labelled with log levels, but there was no way to filter by log level at run time. Implement the filter inside the bees process so it can skip evaluation of the BEESLOG* arguments if the log messages would not be emitted. Fixes: https://github.com/Zygo/bees/issues/67 Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-09-14 23:50:00 -04:00
Zygo Blaxell	26039cd559	tempfile: update comments around bees_sync Deadlock reproduced on kernel 4.14.34. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-05-18 00:16:04 -04:00
Kai Krakow	408b6ae138	Code style: Fix wrong indentation This had spaces instead of tabs by accident. Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-01-29 21:37:40 -05:00
Kai Krakow	5590fc0b13	Cmdline: Fix text alignment Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-01-29 21:37:40 -05:00
Kai Krakow	29d40ca359	Cmdline: Rename "relative-paths" to "strip-paths" The previous name didn't match what this option really does. Affects: #41 Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-01-29 21:37:40 -05:00
Kai Krakow	b164717a25	Cmdline: Rename "notimestamps" to "no-timestamps" That aligns better with the other options. Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-01-29 21:37:40 -05:00
Zygo Blaxell	f6909dac17	bees: drop BEESINFO Having too many "write a message to the log" primitives is confusing, and having one that intermittently and silently discards output is even _more_ confusing. Replace all BEESINFO with appropriate BEESLOG*s. Usually DEBUG. Except for one or two that occur too often. Just delete those. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-26 23:48:05 -05:00
Zygo Blaxell	f64fc78e36	Task: convert print_fn to a string Since we are now unconditionally rendering the print_fn as a static string, there is no need for it to be a function. We also need it to be brief and mostly constant. Use a string instead. Put the string before the function in the Task constructor arguments so that the title string appears as a heading in code, since we are making a breaking API change already. Drop TASK_MACRO as it is broken by this change, but there is no similar usage of Task anywhere to make it worth fixing. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-26 23:48:04 -05:00
Zygo Blaxell	0710208354	BeesNote: thread naming fixes Move pthread_setname_np to the same place we do pthread_getname_np. Detect errors in pthread_getname_np--but don't throw an exception because we would call ourself recursively from the exception handler when it tries to log the exception. Fix the order of set_name and the first BEESNOTE/BEESLOG call in threads, closing small time intervals where logs have the wrong thread name, and that wrong name becomes persistent for the thread. Make the main thread's name "bees" because Linux kernel stack traces use the pthread name of the main thread instead of the name of the process. Anonymous threads get the process name (usually "bees"). We should not have any such threads, but we do. This appears to occur mostly during exception stack unwinding. GCC/pthread bug? Fixes: https://github.com/Zygo/bees/issues/51 Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-26 23:47:47 -05:00
Zygo Blaxell	5063a635fc	logging: get Task names for log messages When a Task worker thread is executing a Task, the thread name is less useful than the Task description. Use the Task description instead of the thread name if the thread has no BeesThread name and the thread is currently executing a task. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-20 14:00:51 -05:00
Zygo Blaxell	fef7aed8fa	BeesNote: if thread name was not set, get it from Task or pthread_getname_np Threads from the Task module in libcrucible don't set BeesNote::tl_name. Even if they did, in Task context the thread name is unspecific to the point of meaninglessness. Use the Task::print method as the name for such threads, and be sure that future Task print functions are designed for that usage. The extra complexity in BeesNote::get_name() seems preferable to bombarding pthread_setname_np hundreds or thousands of times per second. FIXME: we are now calling Task::print() on every BeesNote, which is effectively unconditionally. Maybe we should have Task::print() and get_name() return a closure, or just evaluate Task::print() once and cache it in TaskState, or define Task's constructor with a string argument instead of the current print_fn closure. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-20 13:57:51 -05:00
Kai Krakow	677da5de45	Logging: Add log levels to output This commit adds log levels to the output. In systemd, it makes colored lines, otherwise it's probably just a number. Bees is very chatty, so this paves the road for log level filtering. Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-01-18 23:41:29 +01:00
Zygo Blaxell	56c23c4517	crawl: implement two crawler algorithms and adjust scheduling parameters There are two subvol scan algorithms implemented so far. The two modes are unimaginatively named 0 and 1. 0: sorts extents by (inode, subvol, offset), 1: scans extents round-robin from all subvols. Algorithm 0 scans references to the same extent at close to the same time, which is good for performance; however, whenever a snapshot is created, the scan of the entire filesystem restarts at the beginning of the new snapshot. Algorithm 1 makes continuous forward progress even when new snapshots are created, but it does not benefit from caching and will force the kernel to reread data multiple times when there are snapshots. The algorithm can be selected at run-time using the -m or --scan-mode option. We can collect some field data on these before replacing them with an extent-tree-based scanner. Alternatively, for pre-4.14 kernels, we can keep these two modes as non-default options. Currently these algorithms have terrible names. TODO: fix that, but also TODO: delete all that code and do scans directly from the extent tree instead. Augment the scan algorithms relative to their earlier implementation by batching multiple extents to scan from each subvol before switching to a different subvol. Sprinkle some BEESNOTEs on the Task objects so that they don't disappear from the thread status output. Adjust some timing constants to deal with the increased latency from competing threads. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-17 22:53:49 -05:00
Zygo Blaxell	055c8d4c75	roots: scan in parallel using Tasks Distribute incoming extents across a thread pool for faster execution on multi-core, multi-disk environments. Switch extent enumeration model to scan extent refs consecutively(ish). Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-17 22:52:00 -05:00
Zygo Blaxell	a175ee0689	bees: clean up #if 0 ... fsync ... #endif code Remove some dead code because dedup-related deadlocks have not been observed since Linux kernel v4.11. Preserve rationale of remaining #if 0 block (why we do write/rename instead of write/fsync/rename) so that people don't try to replace the "missing" fsync() there. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-17 22:30:07 -05:00
Zygo Blaxell	8d3a27bf85	subvol-threads: increase resource and thread limits With kernel 4.14 there is no sign of the previous LOGICAL_INO performance problems, so there seems to be no need to throttle threads using this ioctl. Increase the FD cache size limits and scan thread count. Let the kernel figure out scheduling. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-17 22:30:07 -05:00
Kai Krakow	270a91cf17	Fix a fallthrough error in GCC 7+ GCC 7 and higher turn a previous warning into an error for implicit fallthrough. Let's hint the compiler that this is intentional here. Signed-off-by: Kai Krakow <kai@kaishome.de>	2017-11-14 06:58:43 +01:00
Kai Krakow	f7320baa56	Fix indentation/alignment after integration	2017-11-14 06:58:43 +01:00
Kai Krakow	52997936d5	getopt: Add logic to set relative path from $CWD This commit adds a new option to set relative path output for name_fd(). Signed-off-by: Kai Krakow <kai@kaishome.de>	2017-11-14 01:16:06 +01:00
Zygo Blaxell	71514e7229	main: use static function to control timestamps in log output Adjust bees to match changes in Chatter's interface. Signed-off-by: Zygo Blaxell <bees@furryterror.org> (cherry picked from commit `66fd28830d`)	2017-11-11 15:18:46 -05:00
Kai Krakow	c6be07e158	Add option for prefixing timestamps To make bees more friendly to use with syslog/systemd, we add an option to omit timestamps from the log output. Signed-off-by: Kai Krakow <kai@kaishome.de>	2017-10-27 23:02:47 +02:00
Kai Krakow	c6bf6bfe1d	Implement getopt options parser This commit adds a simple getopt options parser to show help. This can be used as a boilerplate for adding more options later. Signed-off-by: Kai Krakow <kai@kaishome.de>	2017-10-27 22:36:00 +02:00

1 2

67 Commits