GGLinnk/bees - bees - Virtual World Git

mirror of https://github.com/Zygo/bees.git synced 2025-07-05 18:12:27 +02:00

Author	SHA1	Message	Date
Zygo Blaxell	1052119a53	log: simplify output for dedup and scan With many threads it is inconvenient to reassemble the elided parts of the dedup src/dst and scan filenames output. Simply output them unconditionally, and balance the line lengths. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2017-09-16 16:42:52 -04:00
Zygo Blaxell	917fc8c412	context: drop dead code in dedup wrapper This code has been #if 0 for a long time, and it seems unlikely it will ever be useful in the future. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2017-09-16 16:37:04 -04:00
Zygo Blaxell	59fe9f4617	bees: drop unused BeesWorkQueue classes Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2017-09-16 16:35:42 -04:00
Zygo Blaxell	cc7b4f22b5	bees: trace calls to BeesResolver This helps identify causes of the "same physical address in dedup" exception. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2017-06-17 10:09:24 -04:00
Zygo Blaxell	c1dbd30d82	bees: don't limit number of active crawlers All testing so far incidates more crawlers go faster up to a limit much larger than btrfs's performance limitations on subvols, even on spinning rust. Remove the artificial constraint. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2017-06-17 10:06:16 -04:00
Zygo Blaxell	d43199e3d6	bees: change formatting for physical bytenr ranges in dedup Use a different character to make it easier to search for bytenr ranges in the logs. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2017-06-17 09:50:59 -04:00
Zygo Blaxell	9daa51edaa	bees: limit FD cache size explicitly This will allow the default size limit for cache objects to be changed with impunity. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2017-06-17 09:50:59 -04:00
Zygo Blaxell	b004b22e47	Merge branch 'master' into subvol-threads	2017-06-17 08:15:34 -04:00
Zygo Blaxell	dc00dce842	context: purge FD cache every COMMIT_INTERVAL Holding file FDs open for long periods of time delays inode destruction. For very large files this can lead to excessive delays while bees dedups data that will cease to be reachable. Use the same workaround for file FDs (in the root_ino cache) that is used for subvols (in the root cache): forcibly close all cached FDs at regular intervals. The FD cache will reacquire FDs from files that still have existing paths, and will abandon FDs from files that no longer have existing paths. The non-existing-path case is not new (bees has always been able to discover deleted inodes) so it is already handled by existing code. Fixes: https://github.com/Zygo/bees/issues/18 Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2017-02-08 22:01:00 -05:00
Zygo Blaxell	99fe452101	context: raise limit on the number of concurrent ioctls to cpu_cores/2 This might improve performance on systems with more than 3 CPU cores...or it might bring such a machine to its knees. TODO: find out which of those two things happens. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2017-01-23 21:18:05 -05:00
Zygo Blaxell	be1aa049c6	context: allow concurrent dedup Dedup was spending a lot of time waiting for the ioctl mutex while it was held by non-dedup ioctls; however, when dedup finally locked the mutex, its average run time was comparatively short and the variance was low. With the various workarounds and kernel fixes in place, FILE_EXTENT_SAME and bees play well enough together that we can allow multiple threads to do dedup at the same time. The extent bytenr lockset should be sufficient to prevent undesirable results (i.e. dup data not removed, or deadlocks on old kernels). Remove the ioctl lock on dedup. LOGICAL_INO and SEARCH_V2 (as used by BeesCrawl) remain under the ioctl mutex because they can still have abitrarily large run times. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2017-01-23 21:18:03 -05:00
Zygo Blaxell	e46b96d23c	context: lock extents by bytenr instead of globally prohibiting tmpfiles This prevents two threads from attempting to dispose of the same physical extent at the same time. This is a more precise exclusion than the general lock on all tmpfiles. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2017-01-23 21:18:03 -05:00
Zygo Blaxell	b22b4ed427	bees: process each subvol in its own thread This is yet another multi-threaded Bees experiment. This time we are dividing the work by subvol: one thread is created to process each subvol in the filesystem. There is no change in behavior on filesystems containing only one subvol. In order to avoid or mitigate the impact of kernel bugs and performance issues, the btrfs ioctls FILE_EXTENT_SAME, SEARCH_V2, and LOGICAL_INO are serialized. Only one thread may execute any of these ioctls at any time. All three ioctls share a single lock. In order to simplify the implementation, only one thread is permitted to create a temporary file during one call to scan_one_extent. This prevents multiple threads from racing to replace the same physical extent with separate physical copies. The single "crawl" thread is replaced by one "crawl_<root_number>" for each subvol. The crawl size is reduced from 4096 items to 1024. This reduces the memory requirement per subvol and keeps the data in memory fresher. It also increases the number of log messages, so turn some of them off. TODO: Currently there is no configurable limit on the total number of threads. The number of CPUs is used as an upper bound on the number of active threads; however, we still have one thread per subvol even if all most of the threads do is wait for locks. TODO: Some of the single-threaded code is left behind until I make up my mind about whether this experiment is successful. Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2017-01-23 21:17:54 -05:00
Zygo Blaxell	db8ea92133	bees: fix further instances of copy-after-unlock bug Before: unique_lock<mutex> lock(some_mutex); // run lock.~unique_lock() because return // return reference to unprotected heap return foo[bar]; After: unique_lock<mutex> lock(some_mutex); // make copy of object on heap protected by mutex lock auto tmp_copy = foo[bar]; // run lock.~unique_lock() because return // pass locally allocated object to copy constructor return tmp_copy; Signed-off-by: Zygo Blaxell <bees@furryterror.org>	2017-01-22 22:00:27 -05:00
Zygo Blaxell	eec80944cd	roots: add a counter for crawl_ms, open_root and open_root_ino Linux kernel commit 7f8e406 ("btrfs: improve delayed refs iterations") seems to dramatically improve LOGICAL_INO performance. Hopefully this commit will find its way into mainline Linux soon. This means that most of the time in Bees is now spent on block reading (50-75%); however, there is still a big gap between block read and the sum of everything else we are measuring with the "*_ms" counters. This gap is about 30% of the run time, so it would be good to find out what's in the gap. Add ms counters around the crawl and open calls to capture where we are spending all the time.	2016-12-08 23:55:39 -05:00
Zygo Blaxell	642581e89a	hash: remove the experimental shared hash-table and shared mmap features The experiments are over, and the results were not success. Having two filesystems cohabiting in the same hash table results in a lot of false positives, each of which requires some heavy IO to resolve. Using MAP_SHARED to share a beeshash.dat between processes results in catastrophically bad performance. These features were abandoned long ago, but some of the code--and even worse, its documentation--still remains. Bees wants a hash table false positive rate below 0.1%. With a shared hash table the FP rate is about the same as the dedup rate. Typically duplicate files on one filesystem are duplicate on many filesystems. One or more of Linux VFS and the btrfs mmap(MAP_SHARED) implementation produce extremely poor performance results. A five-order-of-magnitude speedup was achieved by implementing paging in userspace with worker threads. We no longer need the support code for the MAP_SHARED case. It is still possible to run many BeesContexts in a single process, but now the only thing contexts share is the FD cache.	2016-12-02 00:26:02 -05:00
Zygo Blaxell	fdfa78a81b	context: default and relative BEESHOME Allow relative paths with BEESHOME. These paths will be relative to the root of the dedup target filesystem. BEESHOME is now optional. If not specified, '.beeshome' is used. We don't try to create BEESHOME if it doesn't exist. BEESHOME might not be on a btrfs filesystem, so we can't insist it be a subvol.	2016-12-02 00:22:18 -05:00
Zygo Blaxell	1303fb9da8	build: fix FTBFS on GCC 6.2 I'm not surprised that GCC 6 doesn't let me send an ostream ref to itself, even inside an uninstantiated template specialization. I am a little surprised I was trying to, and 4.9 let me get away with it. It's 2016. auto_ptr is deprecated now. Some things were including vector that don't any more. https://github.com/Zygo/bees/issues/1	2016-11-24 22:20:11 -05:00
Zygo Blaxell	cca0ee26a8	bees: remove local cruft, throw at github	2016-11-17 12:12:13 -05:00

19 Commits