GGLinnk/bees - bees - Virtual World Git

mirror of https://github.com/Zygo/bees.git synced 2026-01-08 20:00:22 +01:00

Author	SHA1	Message	Date
rsjaffe	8bec9624da	systemd service replace deprecated parameters Replace CPU shares and IO block weight by CPU weight and IO weight. Note that new parameters are roughly 1/100 of old one--I believe that's the right conversion. Also removed duplicate Nice parameter and alphabetized the parameters for ease of reading.	2018-11-05 12:35:17 -08:00
Zygo Blaxell	aa74a238b3	hash: remove preloaded toxic hash blacklist Faster and more reliable toxic extent detection means we can now be much less paranoid about creating toxic extents. The paranoia has significant impact on dedupe hit rates because every extent that contains even one toxic hash is abandoned. The preloaded toxic hashes were chosen because they occur more frequently than any other block contents in typical filesystem data. The combination of these resulted in as much as 30% of duplicate extents being left untouched. Remove the preloaded toxic extent blacklist, and rely on the new kernel-CPU-usage-based workaround instead. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-31 23:03:01 -04:00
Zygo Blaxell	6e6b08ea0e	scripts: put AL16M back to avoid breaking existing scripts Leave AL16M defined in beesd to avoid breaking scripts based on beesd.conf.sample which used this constant. Use the absolute size in beesd.conf.sample to avoid any future problems. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-31 22:50:36 -04:00
Zygo Blaxell	542371684c	context: better detection for toxic extents We detect toxic extents by measuring how long the LOGICAL_INO ioctl takes to run. If it is above some threshold, we consider the extent toxic, and blacklist it; otherwise, we process the extent normally. The detector was using the execution time of the ioctl, which detects toxic extents, but it also detects pauses of the bees process and transaction commit latency due to load. This leads to a significant number of false positives. The detection threshold was also very long, burning a lot of kernel CPU before the detection was triggered. Use the per-thread system CPU statistics to measure the kernel CPU usage of the LOGICAL_INO call directly. This is much more reliable because it is not confounded by other threads, and it's faster because we can set the time threshold two orders of magnitude lower. Also remove the lock and mutex added in "context: serialize LOGICAL_INO calls" because we theoretically no longer need it (but leave the code there with #if 0 in case we do need it in practice). Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-31 21:12:16 -04:00
Zygo Blaxell	9a97699dd9	roots: reimplement transid_max_nocache using extent tree root ROOT_TREE contains the ROOT_ITEM for EXTENT_TREE. Every modification (that we care about) to a btrfs must go through EXTENT_TREE, and must modify the page in ROOT_TREE pointing to the root of EXTENT_TREE... which makes that a very good source for the filesystem transid. Remove the loop and the root lookups, and just look at one item for max_transid. Also note that every caller of transid_max_nocache() immediately feeds the return value to m_transid_re.update(), so don't do that inside transid_max_nocache(). Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-31 00:09:49 -04:00
Zygo Blaxell	0e8b591232	Revert "roots: simplify BeesRoots::transid_max_nocache" It turns out that we do need to scan all the subvols in order to find transid_max. Keep the bug fix though. This reverts commit `bf6ae80eee`. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-30 23:29:05 -04:00
Zygo Blaxell	bf6ae80eee	roots: simplify BeesRoots::transid_max_nocache BeesRoots::transid_max_nocache calls btrfs_get_root_transid() which retrieves the transid of the root of the given Fd. Since the FS_TREE (subvol 5) is the root of the subvol hierarchy, it will always have the highest transid on the filesystem, and we do not need to look at any others. Also fix a bug where we pass BTRFS_FS_TREE_OBJECTID instead of the file descriptor root_fd() to btrfs_get_root_transid(). If BEESHOME is somewhere on the same btrfs filesystem, and there are no leaked FDs at bees startup, then BTRFS_FS_TREE_OBJECTID (5) usually has the same integer value as a valid file descriptor of some object on the filesystem that has a regularly increasing transid value. If Fd 5 happens to be a file in BEESHOME then bees itself drives the transid increments. This, combined with the search of all subvol roots, hides the bug (unless Fd 5 gets closed somehow). Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-30 21:12:17 -04:00
Zygo Blaxell	1a51bb53bf	context: cache result of home_fd() BeesContext::home_fd() is supposed to open $BEESHOME once and cache the Fd for later calls; however, instead it was reopening a new Fd each time it was called, and _also_ holding that Fd in a BeesContext member. Fds clean themselves up when they are forgotten, so it was not leaking per se, but it certainly had more open Fds than it needed to. Check to see if we have m_home_fd open, and return that if so. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-30 21:12:16 -04:00
Zygo Blaxell	35b21687bc	bees: drop unused member m_uuid There is a m_root_uuid which is used. m_uuid is not, so drop it and save a tiny amount of memory. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-30 21:12:16 -04:00
Zygo Blaxell	63ddbb9a4f	context: serialize LOGICAL_INO calls LOGICAL_INO can trip over the btrfs slow-backrefs bug, resulting in some very long in-kernel runtimes. If too many threads are executing LOGICAL_INO then there may be no cores left on the system to run other tasks. Toxic extent detection is done by a very rudimentary algorithm which can be confused by unrelated sources of latency within btrfs (especially commit latency). The algorithm can also be confused by other threads executing the LOGICAL_INO ioctl. These are two good reasons to prevent any two threads in a single bees process instance from executing LOGICAL_INO at the same time, so let's do that. It is possible to limit the number of threads executing LOGICAL_INO with the -c and -C options; however, this also limits the number of threads which can perform any operation, while only LOGICAL_INO () has such a profound effect on the rest of system operation. Also make the status message clearer about exactly when LOGICAL_INO is executed, as opposed to merely waiting to acquire a lock before executing the ioctl. () or maybe FILE_EXTENT_SAME. The problem function that keeps showing up in kernel stack traces is find_parent_nodes, which is called by both the LOGICAL_INO and FILE_EXTENT_SAME ioctls. We'll try this change first and see if it prevents any recurrences of forced watchdog reboots; if it does not, then we'll limit FILE_EXTENT_SAME the same way. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-30 21:12:16 -04:00
Zygo Blaxell	373b9ef038	roots: fix subvol scan rollover on subvols with empty transid range The ordering function for BeesCrawlState did not consider root 292 inode 0 min_transid 2345 max_transid 3456 to be larger than root 292 inode 258 min_transid 2345 max_transid 2345 so when we attempted to update the end pointer for the crawl progress, the new state was not considered newer than the old state because the min_transid was equal, but the new crawl state's inode number was smaller. Normally this is not a problem because subvol scans typically begin and end in separate transactions (in part because we don't start a subvol scan until at least two transactions are available); however, the cleanup code for the aftermath of the recent transid_min() bug can create crawlers with equal max_transid and min_transid records. Fix this by ordering both transid fields before any others in the crawl state. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-30 21:12:14 -04:00
Zygo Blaxell	866a35c7fb	roots: do not accept 18446744073709551615 as max_transid in beescrawl.dat Due to an earlier bug some beescrawl.dat files will contain uint64_t max as max_transid. This prevents any further scanning on the subvol because there is no possibiity of having a real transid (or any other uint64_t number) larger than uint64_t max. If we detect a bad transid in beescrawl.dat, log a warning, then use some more plausible value: either min_transid to repeat the previous incremental crawl, or 0 to restart the subvol scan from the beginning. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-30 21:12:14 -04:00
Zygo Blaxell	90132182fd	roots: do not allow transid_min to be numeric_limits<uint64_t>::max() On a few test machines max_transid on subvols is getting set to 18446744073709551615 (aka uint64_t max). Prevent transid_min() from ever returning this value. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-30 21:12:14 -04:00
Zygo Blaxell	90f98250c2	hash: remove pointless copy "saved" is used only during hash table correctness analysis, which is normally not enabled at compile time, and requires source modification to enable. Remove the pointless copy and save a tiny bit of CPU. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-19 20:21:04 -04:00
Zygo Blaxell	0c714cd55c	scripts: use multiples (not power) of 128K Adjust the scripts for the new smaller hash table extent size. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-19 20:21:04 -04:00
Zygo Blaxell	924008603e	hash: reduce hash table extent size to 128KB The 16MB hash table extent size did not serve any useful defragmentation or compression purpose, and for very small filesystems (under 100GB), 16MB is much larger than necessary. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-19 20:21:04 -04:00
Zygo Blaxell	c01f129eee	src: add bees-version.new.c to .gitignore Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-19 20:21:04 -04:00
Zygo Blaxell	5a49870fc9	docs: add coredumpctl systemd-coredumpctl collects core files for later analysis with gdb. It's a convenient thing if the keys you use to encrypt /var/lib/systemd/coredump are the same as the keys you use to encrypt the filesystem where you're running bees. Add it to the documentation just before the hand-rolled version. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-19 20:21:04 -04:00
Zygo Blaxell	14b35e3426	docs: add "what to do when something goes wrong" page Standard crash backtrace collection, plus $BEESSTATUS for the high-level overview of what bees is doing. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-04 20:54:08 -04:00
Zygo Blaxell	7bba096077	Merge remote-tracking branch 'nilninull/master'	2018-10-02 22:13:55 -04:00
nilninull	aa324de9ed	FIX: The systemd service file is always installed	2018-10-03 10:19:43 +09:00
Zygo Blaxell	e8298570ed	README: split into sections, reformat for github.io Split the rather large README into smaller sections with a pitch and a ToC at the top. Move the sections into docs/ so that Github Pages can read them. 'make doc' produces a local HTML tree. Update the kernel bugs and gotchas list. Add some information that has been accumulating in Github comments. Remove information about bugs in kernels earlier than 4.14. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-10-02 03:41:31 -04:00
Kai Krakow	32d2739b0d	Makefile: Specify version when building from tarball When package maintainers build from a tarball, the .git directory does not exist to extract the version tag. Let's add a hack to work around this issue and let them specify `BEES_VERSION="v0.y"` on the make cmdline. Github-Bug: https://github.com/Zygo/bees/issues/75 Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-09-30 04:20:26 +02:00
Kai Krakow	faf11b1c0c	Update references to Gentoo Gentoo has officially merged the ebuild into portage as of: https://github.com/gentoo/gentoo/pull/9925 Let's update the readme and get rid of the `contrib/gentoo-bees` directory, so we have no potentially outdated information in the future. Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-09-29 22:26:56 +02:00
Kai Krakow	3504439d5c	contrib/gentoo: Update ebuild Now that the packaging preparations were merged, we should update the ebuild to reflect the upstream master branch. Signed-off-by: Kai Krakow <kai@kaishome.de> v0.6	2018-09-27 10:55:24 +02:00
Zygo Blaxell	d4b3836493	extentwalker: don't fetch absurd numbers of extents just to throw them away ExtentWalker doesn't gain significant benefits from caching, and the extra SEARCH_V2 ioctls were blamed for a 33% kernel CPU overhead by perf. Reduce the number of extents to 16 in lieu of fixing the caching. This gives a significant speed boost on CPU-bound workloads compared to the original 1024--almost 40% faster on a single SSD with a filesystem consisting of raw VM images mounted with compress=zstd. This also seems to reduce LOGICAL_INO overhead. Perhaps SEARCH_V2 and LOGICAL_INO were trying to lock the same extents, and interfering with each other? Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-09-26 23:29:56 -04:00
Kai Krakow	f053e0e1a7	beesd: Fix the wrapper not finding any config file `grep -q something \| grep -q something_else` will never find anything. The for-loop is redundant anyways because `grep -l` can already work for us. Let's replace this with a shorter and working version. CC: Timofey Titovets <timofey.titovets@synesis.ru> (fixes: commit `06d41fd` "Rewrite beesd arg parser") Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-09-16 17:56:31 -04:00
Zygo Blaxell	bcfc3cf08b	Merge https://github.com/Zygo/bees/pull/62	2018-09-15 00:09:46 -04:00
Zygo Blaxell	9dbe2d6fee	bees: add -G/--thread-min option for minimum thread count The -g option limits the number of worker threads when the target load average is exceeded. On some systems the load normally runs high, and continuous bees operation is required to avoid running out of disk space. Add a -G/--thread-min option to force at least some threads to continue running. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-09-14 23:50:07 -04:00
Zygo Blaxell	dd3c32a43d	README: spell 'available' correctly Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-09-14 23:50:07 -04:00
Zygo Blaxell	3d536ea6df	roots: if queue is full run again The task queue may already be full of tasks when the crawl task is executed. In this case simply reschedule the crawl task at the end of the current queue. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-09-14 23:50:06 -04:00
Zygo Blaxell	e66086516f	bees: dynamic thread pool size based on system load average Add -g / --loadavg-target parameter to track system load and add or remove bees worker threads dynamically to keep system load close to the loadavg target. Thread count may vary from zero to the maximum specified by -c or -C, and is adjusted every 5 seconds. This is better than implementing a similar load average scheme from outside of the process (though that is still possible) because the in-process load tracker does not disrupt the performance timing feedback mechanisms as a freezer cgroup or SIGSTOP would when controlling bees from outside. The internal load average tracker can also adjust the number of active threads while an external tracker can only choose from the maximum or zero. Also fix a bug where a Task could deadlock waiting for itself to exit if it tries to insert a new Task after the number of worker threads has been set to zero. Also correct usage message for --scan-mode (values are 0..2) since we are touching adjacent lines anyway. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-09-14 23:50:03 -04:00
Zygo Blaxell	96eb100ded	bees: use readahead instead of posix_fadvise Other btrfs utils use readahead() not posix_fadvise(). There does not appear to be a performance or correctness difference between the three (none, posix_fadvise, or readahead()). Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-09-14 23:50:00 -04:00
Zygo Blaxell	041ad717a5	bees: configurable log verbosity Log messages were already labelled with log levels, but there was no way to filter by log level at run time. Implement the filter inside the bees process so it can skip evaluation of the BEESLOG* arguments if the log messages would not be emitted. Fixes: https://github.com/Zygo/bees/issues/67 Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-09-14 23:50:00 -04:00
Zygo Blaxell	b22db12390	context: log dedups with single unbroken log message When BEESLOGINFO is called multiple times it generates separate log records that can be mixed up when multiple threads dedup. Use a single BEESLOGINFO call for each dedup to prevent this. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-09-14 23:50:00 -04:00
Zygo Blaxell	8938caa029	README.md: update build-deps btrfs/ioctl.h has been moved to a different package. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-09-14 23:49:57 -04:00
Zygo Blaxell	8bc4bee8a3	crucible: progress: drop the set() method set() was broken and redundant. Calling hold() and discarding the returned object has the correct effect. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-09-14 23:49:54 -04:00
Zygo Blaxell	1beb61fb78	crucible: error: record location of exception in what() message Make the log show where the exception is thrown from. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-09-14 23:49:51 -04:00
Timofey Titovets	06d41fd518	Rewrite beesd arg parser Signed-off-by: Timofey Titovets <timofey.titovets@synesis.ru>	2018-09-15 00:21:06 +03:00
Kai Krakow	788774731b	Gentoo: Rework Gentoo ebuild into overlay This commit squashes all the little changes from the previous integration branch into one, adjusts to the new Makefile changes, and introduces an overlay layout so that the contrib/gentoo-bees subtree can be directly added as a Portage overlay to the system. The following list contains the previous commit descriptions: sys-fs/bees: Keyword tested architecture ~amd64 Bees was tested on this platform. sys-fs/bees: Add kernel version checks Add checking the kernel versions and write some info and/or warnings before building and installing the package. Running bees on older kernels may have some serious performance and stability impacts, let's tell the user about it. Closes #55 sys-fs/bees: Add metadata.xml sys-fs/bees: There's no configure script So, there's no point in calling "default". sys-fs/bees: Simplify src_configure() sys-fs/bees: Don't depend on markdown It makes no sense to install both README.md and README.html, and we can get rid of one dependency. Dependencies: btrfs-progs is no longer a buildtime-only dep It is actually needed by the bees service wrapper script, as pointed out by Gentoo QA review. sys-fs/bees: DOCS is not needed "COPYING" is already covered by the licensing. The ebuild defaults already include README* sys-fs/bees: Make warnings exclusive It was recommended by Gentoo QA to show only either one or another warning, and change the texts accordingly. sys-fs/bees: RDEPEND is not implicit RDEPEND does not implicitly default to DEPEND. Let's explicitly set the variable. sys-fs/bees: IUSE=test is only needed for explicit dependencies Thus, remove it. Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-09-08 05:06:39 +02:00
Kai Krakow	679a327ac5	Makefile: Do not force optimizations by default Make life easier for package maintainers by not forcing architecture or compiler optimizations by default. E.g., Gentoo QA refuses to accept both "-march=native" and "-O3". These are usually provided by the package tooling. Instead, we provide easily accessible templates in "makeflags". Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-09-08 04:05:15 +02:00
Kai Krakow	31b41bb3c2	Makefile: Do not force making README.html This forces us to depend on markdown which would be otherwise optional. Most of the time it is sufficient to let package managers just install the README.md file. Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-09-08 03:34:48 +02:00
Kai Krakow	d7e235c178	Makefile: "which" is not portable It was pointed out by Gentoo QA that "type -P" is a better choice. Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-09-08 03:14:18 +02:00
Kai Krakow	51108f839d	Makefile: Due to VPATH, libcrucible links to hard-coded libuuid path Due to VPATH and how make resolves source paths, libcrucible.so ends up with a hard-coded path to link against libuuid.so. Let's fix it by turning the general rule into an explicit rule for libcrucible.so. Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-09-08 03:07:20 +02:00
Kai Krakow	8d102abf8b	Makefile: create a template compiler This creates a simple template compiler using sed in as a reusable variable. Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-09-08 02:59:54 +02:00
Kai Krakow	83e8f87dc9	Scripts: Don't prefix timestamps when running with systemd Since systemd prefix it's own timestamps, we can unconditionally remove timestamps when bees is executed by systemd. Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-09-08 02:59:54 +02:00
Kai Krakow	4417b18d9e	Makefile: .version.o is made from a generated file We should probably not put it into the objects list. Let's instead explicitly put it as a depend of libcrucible.so. This allows us to not use *.cc as a depend for .version.cc which makes more sense as CRUCIBLE_OBJS is also explicitly defined and not built from wildcards. Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-09-08 02:59:54 +02:00
Kai Krakow	8636312cab	Compilation: Let the code know about package config This commit adds support for putting package configuration options into header files. This is needed to prepare reading config files from /etc. Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-09-08 02:59:54 +02:00
Kai Krakow	17e1171464	Installation: Remove USR_PREFIX from Makefile This commit removes USR_PREFIX and introduces ETC_PREFIX instead. The purpose of PREFIX is the installation prefix in the system, not the installation destination. The latter one is what DESTDIR is used for. This should clear up the confusion. PREFIX was already mis-used as installation destination. But that doesn't mix well with how the make targets are designed. CC: Timofey Titovets <nefelim4ag@gmail.com> Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-09-08 02:59:52 +02:00
Kai Krakow	9069201036	Scripts: Fix systemd unit not being templated Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-09-08 02:21:08 +02:00

1 2 3 4 5 ...

422 Commits