GGLinnk/bees - bees - Virtual World Git

mirror of https://github.com/Zygo/bees.git synced 2025-07-05 10:02:27 +02:00

Author	SHA1	Message	Date
Zygo Blaxell	c21518d8ff	stats: rename "chase_wrong_data" to "chase_no_data" An empty BeesBlockData from the chasing algorithm used to mean that data was found at the expected location but it does not match; however, there are now other reasons for this and they occur much more often. The name is misleading. Change the name to report more correctly what happens: no data, without any guess about the reason. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-03-01 00:01:13 -05:00
Zygo Blaxell	33d274eabd	resolve: break up long intra-extent dedup loops When both block candidates for dedup are located in the same extent, bees excludes them from deduplication because the dedup operation would not free any space (both blocks are still referenced, so neither is deleted). Candidates in other extents are still considered. Typically a few blocks are duplicated many thousands or even millions of times within a filesystem. Many of these blocks appear in the same extent as each other. In cases where an extent contains an extremely common duplicate block, it may appear multiple times in many extents. bees can get into a loop with a very bad worst-case running time: 32768 blocks per extent * 2560 bees reference limit * 256 distinct hash table entries = 21.5 billion iterations...squared, because this loop happens every time bees encounteres any of the references. Not an infinite number, but close enough. In each iteration of the loop, replace_dst detects that both src and dst block are part of the same btrfs extent data item and therefore should not be deduped; however, this occurs after the block has been allocated and read by chase_extent_ref. This dst is discarded, but the outer loop tries again with another reference to the same block and gets the same result. An easy fix for this problem is to stop the loop immediately when the same physical extent is found in both src and dst. The condition is rare enough to ignore the negligible space efficiency loss, and filesystem scan stops dead if the loop is allowed to proceed. An exception is thrown to terminate the loop at scan_one_extent from within replace_dst. It would be better to determine the extent bytenr of each candidate extent and filter them out in scan_one_extent (which reduces the number of LOGICAL_INO calls as a side-effect), but bees has no code capable of doing extent data tree lookups with backward iteration yet. Even better would be to change the hash table format so that the extent bytenr can be decoded directly from the hash table entry (this already exists for compressed extents). Both of these changes are too large for v0.6. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-02-25 10:08:42 -05:00
Zygo Blaxell	591a44e59a	resolve: drop support for old-style compressed BeesAddr No public version of bees ever created old-style compressed hash table entries. Remove the code that supports them. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-29 00:48:06 -05:00
Zygo Blaxell	f6909dac17	bees: drop BEESINFO Having too many "write a message to the log" primitives is confusing, and having one that intermittently and silently discards output is even _more_ confusing. Replace all BEESINFO with appropriate BEESLOG*s. Usually DEBUG. Except for one or two that occur too often. Just delete those. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2018-01-26 23:48:05 -05:00
Kai Krakow	677da5de45	Logging: Add log levels to output This commit adds log levels to the output. In systemd, it makes colored lines, otherwise it's probably just a number. Bees is very chatty, so this paves the road for log level filtering. Signed-off-by: Kai Krakow <kai@kaishome.de>	2018-01-18 23:41:29 +01:00
Zygo Blaxell	e8eaa7e471	trivial: mass purge of whitespace errors Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2017-01-06 22:14:50 -05:00
Zygo Blaxell	ef8d92a3cb	resolve: don't stop at the first physical address lookup failure The btrfs LOGICAL_INO ioctl has no way to report references to compressed blocks precisely, so we must always consider all references to a compressed block, and discard those that do not have the desired offset. When we encounter compressed shared extents containing a mix of unique and duplicate data, we attempt to replace all references to the mixed extent with the same number of references to multiple extents consisting entirely of unique or duplicate blocks. An early exit from the loop in BeesResolver::for_each_extent_ref was stopping this operation early, after replacing as few as one shared reference. This left other shared references to the unique data on the filesystem, effectively creating new dup data. The failing pattern looks like this: dedup: replace 0x14000..0x18000 from some other extent copy: 0x10000..0x14000 dedup: replace 0x10000..0x14000 with the copy [may be multiple dedup lines due to multiple shared references] copy: 0x18000..0x1c000 [missing dedup 0x18000..0x1c000 with the copy here] scan: 0x10000 [++++dddd++++] 0x1c000 If the extent 0x10000..0x1c000 is shared and compressed, we will make a copy of the extent at 0x18000..1c0000. When we try to dedup this copy extent, LOGICAL_INO will return a mix of references to the data at logical 0x10000 and 0x18000 (which are both references to the original shared extent with different offsets). If we break out of the loop too early, we will stop as soon as a reference to 0x10000 is found, and ignore all other references to the extent we are trying to remove. The copy at the beginning of the extent (0x10000..0x14000) usually works because all references to the extent cover the entire extent. When bees performs the dedup at 0x14000..0x18000, bees itself creates the shared references with different offsets. Uncompressed extents were not affected because LOGICAL_INO can locate physical blocks precisely if they reside in uncompressed extents. This change will hurt performance when looking up old physical addresses that belong to new data, but that is a much less urgent problem. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2016-12-27 15:23:40 -05:00
Zygo Blaxell	c1e31004b6	crawl: change scan order to make forward progress at all times Previously, the scan order processed each subvol in order. This required very large amounts of temporary disk space, as a full filesystem scan was required before any shared extents could be deduped. If the hash table RAM was underprovisioned this would mean some shared dup blocks were removed from the hash table before they could be deduped. Currently the scan order takes the first unscanned extent from each subvol. This works well if--and only if--the subvols are either empty or children of a common ancestor. It forces the same inode/offset pairs to be read at close to the same time from each subvol. When a new snapshot is created, this ordering diverts scanning to the new subvol until it catches up to the existing subvols. For large filesystems with frequent snapshot creation this means that the scanner never reaches the end of all subvols. Each new subvol effectively resets the current scan position for the entire filesystem to zero. This prevents bees from ever completing the first filesystem scan. Change the order again, so that we now read one unscanned extent from each subvol in round-robin fashion. When a new subvol is created, we share scan time between old and new subvols. This ensures we eventually finish scanning initial subvols and enter the incremental scanning state. The cost of this change is more repeated reading of shared extents at scan time with less benefit from disk-device-level caching; however, the only way to really fix this problem is to implement scanning on tree 2 (the btrfs extent tree) instead of the subvol trees. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2016-12-27 15:15:42 -05:00
Zygo Blaxell	7ecead1700	doc: comment updates We stopped using FIEMAP for a number of reasons. Document some of them. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2016-12-27 15:15:42 -05:00
Zygo Blaxell	cca0ee26a8	bees: remove local cruft, throw at github	2016-11-17 12:12:13 -05:00

10 Commits