GGLinnk/bees - bees - Virtual World Git

mirror of https://github.com/Zygo/bees.git synced 2026-01-06 19:00:21 +01:00

Author	SHA1	Message	Date
Zygo Blaxell	a59a02174f	table: add a simple text table renderer This should help clean up some of the uglier status outputs. Supports: * multi-line table cells * character fills * sparse tables * insert, delete by row and column * vertical separators and not much else. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	e22653e2c6	docs: remove "matched_" prefix event counters We can no longer reliably determine the number of hash table matches, since we'll stop counting after the first one. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	44810d6df8	scan_one_extent: remove the unreadahead after benchmark results That unreadahead used to result in a 10% hit on benchmarks. Now it's closer to 75%. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	8f92b1dacc	BeesRangePair: drop the _really_ expensive toxic extent workaround We were doing a `LOGICAL_INO` ioctl on every _block_ of a matching extent, just to see how long it takes. It takes a while! This could be modified to do an ioctl with the `IGNORE_OFFSET` flag, once per new extent, but the kernel bug was fixed a long time ago, so we can start removing all the toxic extent code. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	0b974b5485	scan_one_extent: in skip/scan lines, log whether extent is compressed Useful for debugging the compressed-zero-block cases. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	ce0367dafe	scan_one_extent: reduce the number of LOGICAL_INO calls before finding a duplicate block range When we have multiple possible matches for a block, we proceed in three phases: 1. retrieve each match's extent refs and put them in a list, 2. iterate over the list converting viable block matches into range matches, 3. sort and flatten the list of range matches into a non-overlapping list of ranges that cover all duplicate blocks exactly once. The separation of phase 1 and 2 creates a performance issue when there are many block matches in phase 1, and all the range matches in phase 2 are the same length. Even though we might quickly find the longest possible matching range early in phase 2, we first extract all of the extent refs from every possible matching block in phase 1, even though most of those refs will never be used. Fix this by moving the extent ref retrieval in phase 1 into a single loop in phase 2, and stop looping over matching blocks as soon as any dedupe range is created. This avoids iterating over a large list of blocks with expensive `LOGICAL_INO` ioctls in an attempt to improve the match when there is no hope of improvement, e.g. when all match ranges are 4K and the content is extremely prevalent in the data. If we find a matched block that is part of a short matching range, we can replace it with a block that is part of a long matching range, because there is a good chance we will find a matching hash block in the long range by looking up hashes after the end of the short range. In that case, overlapping dedupe ranges covering both blocks in the target extent will be inserted into the dedupe list, and the longest matches will be selected at phase 3. This usually provides a similar result to that of the loop in phase 1, but _much_ more efficiently. Some operations are left in phase 1, but they are all using internal functions, not ioctls. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	54ed6e1cff	docs: event counter updates after fixing counter names and scan_one_extent improvements Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	24b08ef7b7	scan_one_extent: eliminate nuisance dedupes, drop caches after reading data A laundry list of problems fixed: * Track which physical blocks have been read recently without making any changes, and don't read them again. * Separate dedupe, split, and hole-punching operations into distinct planning and execution phases. * Keep the longest dedupe from overlapping dedupe matches, and flatten them into non-overlapping operations. * Don't scan extents that have blocks already in the hash table. We can't (yet) touch such an extent without making unreachable space. Let them go. * Give better information in the scan summary visualization: show dedupe range start and end points (<ddd>), matching blocks (=), copy blocks (+), zero blocks (0), inserted blocks (.), unresolved match blocks (M), should-have-been-inserted-but-for-some-reason-wasn't blocks (i), and there's-a-bug-we-didn't-do-this-one blocks (#). * Drop cached data from extents that have been inserted into the hash table without modification. * Rewrite the hole punching for uncompressed extents, which apparently hasn't worked properly since the beginning. Nuisance dedupe elimination: * Don't do more than 100 dedupe, copy, or hole-punch operations per extent ref. * Don't split an extent or punch a hole unless dedupe would save at least half of the extent ref's size. * Write a "skip:" summary showing the planned work when nuisance dedupe elimination decides to skip an extent. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	97eab9655c	types: add shrink_begin and shrink_end methods for BeesFileRange and BeesRangePair These allow trimming of overlapping dedupes. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	05bf1ebf76	counters: fix counter names for scan_eof, scan_no_fd, scanf_deferred_inode This code gets moved around from time to time and ends up with the wrong prefix. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	606ac01d56	multilock: allow turning it off Add a master switch to turn off the entire MultiLock infrastructure for testing, without having to remove and add all the individual entry points. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	72c3bf8438	fs: handle ENOENT within lib This prevents the storms of exceptions that occur when a subvol is deleted. We simply treat the entire tree as if it was empty. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	72958a5e47	btrfs-tree: accessors for TreeFetcher classes' type and tree values Sometimes we have a generic TreeFetcher and we need to know which tree it came from. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	f25b4c81ba	btrfs-tree: add root refs and extent flags fields Lazily filling in accessor methods for btrfs objects as needed by bees. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	a64603568b	task: fix try_lock argument description try_lock allows specification of a different Task to be run instead of the current Task when the lock is busy. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	33cde5de97	bees: increase file cache size limits With some extents having 9999 refs, we can use much larger caches for file descriptors. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	5414c7344f	docs: resolve_overflow limit is only 655050 when BTRFS_MAX_EXTENT_REF_COUNT is Use the current header value in the doc. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	8bac00433d	bees: reduce extent ref limit to 9999 Originally the limit was 2730 (64KiB worth of ref pointers). This limit was a little too low for some common workloads, so it was then raised by a factor of 256 to 699050, but there are a lot of problems with extent counts that large. Most of those problems are memory usage and speed problems, but some of them trigger subtle kernel MM issues. 699050 references is too many to be practical. Set the limit to 9999, only 3-4x larger than the original 2730, to give up on deduplication when each deduped ref reduces the amount of space by no more than 0.01%. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	088cbc951a	docs: event counter updates after readahead sanity improvements Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	e78e05e212	readahead: inject more sanity at the foundation of an insane architecture This solves a third bad problem with bees reads: 3. The architecture above the read operations will issue read requests for the same physical blocks over and over in a short period of time. Fixing that properly requires rewriting the upper-level code, but a simple small table of recent read requests can reduce the effect of the problem by orders of magnitude. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	8d08a3c06f	readahead: inject some sanity at the foundation of an insane architecture This solves some of the worst problems with bees reads: 1. The kernel readahead doesn't work. More precisely, it's much better adapted for a very different use case: a single thread alternating between reading a file sequentially and processing the data that was read. bees has multiple threads which compete for access to IO and then issue reads in random order immediately after the call to readahead. The kernel uses idle ioprio scheduling for the readaheads, so the readaheads get preempted by the random reads, or cancels the readaheads because the data access pattern isn't sequential after the readahead was issued. 2. Seeking drives perform terribly with multiple competing readers, especially with btrfs striped profiles where the iops are broken into tiny stripe-sized pieces. At one point I intended to read the btrfs device map and figure out which devices can be read in parallel, but to make that useful, the user needs to have an array with multiple drives in single profile, or 4+ drives in raid1 profile. In all other cases, the elaborate calculations always return the same result: there can be only one reader at a time. This commit fixes both problems: 1. Don't use the kernel readahead. Use normal reads into a dummy buffer instead. 2. Allow only one thread to readahead at any time. Once the read is completed, the data is in the page cache, and all the random-order small reads that bees does will hit the page cache, not a spinning disk. In some cases we need to read two things close together, so add a `bees_readahead_pair` which holds one lock across both reads. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	cdcdf8e218	hash: use kernel readahead instead of bees_readahead to prefetch hash table The hash table is read sequentially and from a single thread, so the kernel's implementation of readahead is appropriate here. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	37f5b1bfa8	docs: add allocator regression in 6.0+ kernels Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	abe2afaeb2	context: when a task fails to acquire an extent lock, don't go ahead and scan the extent anyway Commit `c3b664fea5` ("context: don't forget to retry locked extents") removed the critical return that prevents a Task from processing an extent that is locked. Put the return back. Fixes: `c3b664fea5` ("context: don't forget to retry locked extents") Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	792fdbbb13	fs: get rid of 16 MiB limit on dedupe requests The kernel has not required a 16 MiB limit on dedupe requests since v4.18-rc1 b67287682688 ("Btrfs: dedupe_file_range ioctl: remove 16MiB restriction"). Kernels before v4.18 would truncate the request and return the size actually deduped in `bytes_deduped`. Kernel v4.18 and later will loop in the kernel until the entire request is satisfied (although still in 16 MiB chunks, so larger extents will be split). Modify the loop in userspace to measure the size the kernel actually deduped, instead of assuming the kernel will only accept 16 MiB. On current kernels this will always loop exactly once. Since we now rely on `bytes_deduped`, make sure it has a sane value. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	30a4fb52cb	Revert "context: add experimental code for avoiding tiny extents" because this problem is better solved elsewhere. This reverts commit `11fabd66a8`.	2024-11-30 23:30:33 -05:00
Zygo Blaxell	90d7075358	usage: the default scan mode is 3 (recent) The code and docs were changed some time ago, but not the usage message. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	faac895568	docs: add the 6.10..6.12 delayed refs bug Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	a7baa565e4	crawl: rename next_transid() to avoid confusion with BeesScanMode::next_transid() Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:30:33 -05:00
Zygo Blaxell	b408eac98e	trace: add file and line numbers all the way up the stack These were added to crucible all the way back in 2018 (`1beb61fb78` "crucible: error: record location of exception in what() message") but it's even more useful in the stack tracer in bees. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-11-30 23:27:24 -05:00
Zygo Blaxell	75131f396f	context: reduce the size of LOGICAL_INO buffers Since we'll never process more than BEES_MAX_EXTENT_REF_COUNT extent references by definition, it follows that we should not allocate buffer space for them when we perform the LOGICAL_INO ioctl. There is some evidence (particularly https://github.com/Zygo/bees/issues/260#issuecomment-1627598058) that the kernel is subjecting the page cache to a lot of disruption when trying allocate large buffers for LOGICAL_INO. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-04-17 23:14:35 -04:00
Zygo Blaxell	cfb7592859	usage: the default scan mode is 1 (independent) Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-04-17 23:14:35 -04:00
Zygo Blaxell	3839690ba3	lib: fix btrfs_data_container pointer casts for 32-bit userspace on 64-bit kernels Apparently reinterpret_cast<uint64_t> sign-extends 32-bit pointers. This is OK when running on a 32-bit kernel that will truncate the pointer to 32 bits, but when running on a 64-bit kernel, the extra bits are interpreted as part of the (now very invalid) address. Use <uintptr_t> instead, which is unsigned, integer, and the same word size as the arch's pointer type. Ordinary numeric conversion can take it from there, filling the rest of the word with zeros. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2024-04-17 23:07:41 -04:00
Zygo Blaxell	124507232f	docs: add vmalloc bug to kernel bugs list The bug is: v6.3-rc6: f349b15e183d mm: vmalloc: avoid warn_alloc noise caused by fatal signal The fixes are: v6.4: 95a301eefa82 mm/vmalloc: do not output a spurious warning when huge vmalloc() fails v6.3.10: c189994b5dd3 mm/vmalloc: do not output a spurious warning when huge vmalloc() fails The bug has been backported to LTS, but the fix has not: v6.2.11: 61334bc29781 mm: vmalloc: avoid warn_alloc noise caused by fatal signal v6.1.24: ef6bd8f64ce0 mm: vmalloc: avoid warn_alloc noise caused by fatal signal v5.15.107: a184df0de132 mm: vmalloc: avoid warn_alloc noise caused by fatal signal Signed-off-by: Zygo Blaxell <bees@furryterror.org> v0.10	2023-07-06 13:50:12 -04:00
Zygo Blaxell	3c5e13c885	context: log when LOGICAL_INO returns 0 refs There was a bug in kernel 6.3 where LOGICAL_INO with IGNORE_OFFSET sometimes fails to ignore the offset. That bug is now fixed, but LOGICAL_INO still returns 0 refs much more often than seems appropriate. This is most likely because bees frequently deletes extents while there is still work waiting for them in Task queues. In this case, LOGICAL_INO correctly returns an empty list, because every reference to some extent is deleted, but the new extent tree with that extent removed is not yet committed in btrfs. Add a DEBUG-level log message and an event counter to track these events. In the absence of a kernel bug, the debug message may indicate CPU time was wasted performing a search whose outcome could have been predicted. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-07-06 12:54:33 -04:00
Zygo Blaxell	a6ca2fa2f6	docs: add IGNORE_OFFSET regression in 6.2..6.3 to kernel bugs list This doesn't impact the current bees master, but it does break bees-next. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-07-06 12:49:36 -04:00
Zygo Blaxell	3f23a0c73f	context: downgrade toxic extent workaround message Toxic extents are much less of a problem now than they were in kernels before 5.7. Downgrade the log message level to reflect their lesser importance. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-07-06 12:49:36 -04:00
Zygo Blaxell	d6732c58e2	test: GCC 13 fix for limits.cc GCC complains that #include <cstdint> is missing, so add that. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-05-07 21:24:21 -04:00
Zygo Blaxell	75b2067cef	btrfs-tree: fix build on clang++16 The "loops" variable isn't read (only set) if not built with extra debug code. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-05-07 21:23:27 -04:00
Zygo Blaxell	da3ef216b1	docs: working around `btrfs send` issues isn't really a feature The critical kernel bugs in send have been fixed for years. The limitations that remain aren't bugs, and bees has no sustainable workaround for them. Also update copyright year range. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-03-07 10:25:51 -05:00
Zygo Blaxell	b7665d49d9	docs: fill in missing LTS backports for "1119a72e223f btrfs: tree-checker: do not error out if extent ref hash doesn't match" Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-03-07 10:17:44 -05:00
Zygo Blaxell	717bdf5eb5	roots: make sure transid_max's computed value isn't max We check the result of transid_max_nocache(), but not the result of transid_max(). The latter is a computed result that is even more likely to be wrong[citation needed]. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:45:29 -05:00
Zygo Blaxell	9b60f2b94d	docs: add "missing" features that have been in development for some time already Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:42:42 -05:00
Zygo Blaxell	8978d63e75	docs: update GCC versions list and clarify markdown statement I don't know if anyone else is testing GCC versions before 8.0 any more, but I'm not. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:39:55 -05:00
Zygo Blaxell	82474b4ef4	docs: update front page At least one user was significantly confused by "designed for large filesystems". The btrfs send workarounds aren't new any more. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:38:50 -05:00
Zygo Blaxell	73834beb5a	docs: minor changes to how-it-works based on past user questions Clarify that "too large" and "too small" are some distance away from each other. The Goldilocks zone is _wide_. The interval between cache drops is now shorter. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:37:37 -05:00
Zygo Blaxell	c92ba117d8	docs: various gotcha updates Fixing the obviously wrong and out of date stuff. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:37:23 -05:00
Zygo Blaxell	c354e77634	docs: simplify the exit-with-SIGTERM description The description now matches the code again. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:36:44 -05:00
Zygo Blaxell	f21569e88c	docs: update the feature interactions page Fixing the obviously out-of-date and no-longer-tested things. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:34:22 -05:00
Zygo Blaxell	3d5ebe4d40	docs: update kernel bugs and workarounds list for 6.2.0 Remove some of the repetition to make the document easier to edit. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2023-02-25 03:32:52 -05:00

1 2 3 4 5 ...

694 Commits