mirror of https://github.com/Zygo/bees.git synced 2025-08-23 14:32:20 +02:00

Go to file

Zygo Blaxell 31b2aa3c0d context: speed up orderly process termination

Quite often bees exceeds its service timeout for termination because
it is waiting for a loop embedded in a Task to finish some long-running
btrfs operation.  This can cause bees to be aborted by SIGKILL before
it can completely flush the hash table or save crawl state.

There are only two important things SIGTERM does when bees terminates:
 1.  Save crawl progress
 2.  Flush out the hash table

Everything else is automatically handled by the kernel when the process
is terminated by SIGKILL, so we don't have to bother doing it ourselves.
This can save considerable time at shutdown since we don't have to wait
for every thread to reach a point where it becomes idle, or force loops
to terminate by throwing exceptions, or check a condition every time we
access a pointer.  Instead, we need do only the things in the list
above, and then call _exit() to clean up everything else.

Hash table and crawl state writeback can happen in their background
threads instead of the foreground one.  Separate the "stop" method for
these classes into "stop_request" and "stop_wait" so that these writebacks
can run at the same time.

Deprecate and remove all references to the BeesHalt exception, and remove
several unnecessary checks for BeesContext::stop_requested.

Pause the task queue instead of cancelling it, which preserves the
crawl progress state and stops new Tasks from competing for iops and
CPU during writeback.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>

2022-12-20 20:50:58 -05:00

bin

bees: remove local cruft, throw at github

2016-11-17 12:12:13 -05:00

docs

docs: update kernel bugs list for 5.18 ptvf fix

2022-08-17 13:04:06 -04:00

include/crucible

task: export load tracking statistics

2022-12-20 20:50:57 -05:00

lib

task: use exponential backoff algorithm to set thread count

2022-12-20 20:50:57 -05:00

scripts

Fixes a bad grep pattern caused by dffd6e0

2022-10-13 16:03:30 -04:00

src

context: speed up orderly process termination

2022-12-20 20:50:58 -05:00

test

task: get rid of separate Exclusion and ExclusionState

2022-12-20 20:50:56 -05:00

.gitignore

gitignore: clang creates a lot of *.tmp files

2021-11-29 21:27:48 -05:00

COPYING

GPL-3: license it

2016-11-17 12:12:15 -05:00

Defines.mk

Makefile: create a template compiler

2018-09-08 02:59:54 +02:00

Makefile

Remove duplicated //etc for make install

2021-10-31 10:41:56 +01:00

makeflags

lib: deprecate memset_zero template, use C99 compound literals instead

2021-11-29 21:27:48 -05:00

README.md

2022-07-29 22:20:02 -04:00

README.md

BEES

Best-Effort Extent-Same, a btrfs deduplication agent.

About bees

bees is a block-oriented userspace deduplication agent designed for large btrfs filesystems. It is an offline dedupe combined with an incremental data scan capability to minimize time data spends on disk from write to dedupe.

Strengths

Space-efficient hash table and matching algorithms - can use as little as 1 GB hash table per 10 TB unique data (0.1GB/TB)
Daemon incrementally dedupes new data using btrfs tree search
Works with btrfs compression - dedupe any combination of compressed and uncompressed files
NEW Works around btrfs send problems with dedupe and incremental parent snapshots
Works around btrfs filesystem structure to free more disk space
Persistent hash table for rapid restart after shutdown
Whole-filesystem dedupe - including snapshots
Constant hash table size - no increased RAM usage if data set becomes larger
Works on live data - no scheduled downtime required
Automatic self-throttling based on system load

Weaknesses

Whole-filesystem dedupe - has no include/exclude filters, does not accept file lists
Requires root privilege (or CAP_SYS_ADMIN)
First run may require temporary disk space for extent reorganization
First run may increase metadata space usage if many snapshots exist
Constant hash table size - no decreased RAM usage if data set becomes smaller
btrfs only

Installation and Usage

More Information

Bug Reports and Contributions

Email bug reports and patches to Zygo Blaxell bees@furryterror.org.

You can also use Github:

    https://github.com/Zygo/bees

Copyright & License

GPL (version 3 or later).

README.md

BEES

About bees

Strengths

Weaknesses

Installation and Usage

Recommended Reading

More Information

Bug Reports and Contributions

Copyright & License