mirror of https://github.com/Zygo/bees.git synced 2025-08-23 22:42:20 +02:00

Go to file

Zygo Blaxell 8d08a3c06f readahead: inject some sanity at the foundation of an insane architecture

This solves some of the worst problems with bees reads:

1.  The kernel readahead doesn't work.  More precisely, it's much better
adapted for a very different use case:  a single thread alternating
between reading a file sequentially and processing the data that was read.
bees has multiple threads which compete for access to IO and then issue
reads in random order immediately after the call to readahead.  The kernel
uses idle ioprio scheduling for the readaheads, so the readaheads get
preempted by the random reads, or cancels the readaheads because the
data access pattern isn't sequential after the readahead was issued.

2.  Seeking drives perform terribly with multiple competing readers,
especially with btrfs striped profiles where the iops are broken into
tiny stripe-sized pieces.  At one point I intended to read the btrfs
device map and figure out which devices can be read in parallel, but to
make that useful, the user needs to have an array with multiple drives
in single profile, or 4+ drives in raid1 profile.  In all other cases,
the elaborate calculations always return the same result:  there can be
only one reader at a time.

This commit fixes both problems:

1.  Don't use the kernel readahead.  Use normal reads into a dummy
buffer instead.

2.  Allow only one thread to readahead at any time.  Once the read is
completed, the data is in the page cache, and all the random-order small
reads that bees does will hit the page cache, not a spinning disk.
In some cases we need to read two things close together, so add a
`bees_readahead_pair` which holds one lock across both reads.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>

2024-11-30 23:30:33 -05:00

bin

bees: remove local cruft, throw at github

2016-11-17 12:12:13 -05:00

docs

docs: add allocator regression in 6.0+ kernels

2024-11-30 23:30:33 -05:00

include/crucible

fs: allow BtrfsIoctlLogicalInoArgs to be reused, remove virtual methods

2023-02-23 22:40:12 -05:00

lib

fs: get rid of 16 MiB limit on dedupe requests

2024-11-30 23:30:33 -05:00

scripts

Merge github PR #148

2022-12-23 00:26:33 -05:00

src

readahead: inject some sanity at the foundation of an insane architecture

2024-11-30 23:30:33 -05:00

test

test: GCC 13 fix for limits.cc

2023-05-07 21:24:21 -04:00

.gitignore

gitignore: clang creates a lot of *.tmp files

2021-11-29 21:27:48 -05:00

COPYING

GPL-3: license it

2016-11-17 12:12:15 -05:00

Defines.mk

beesd: Honor DESTDIR on installation.

2022-12-23 11:10:17 +08:00

Makefile

Makefile: also drop fiemap and fiewalk from main Makefile

2023-01-28 11:21:51 +01:00

makeflags

lib: deprecate memset_zero template, use C99 compound literals instead

2021-11-29 21:27:48 -05:00

README.md

docs: working around btrfs send issues isn't really a feature

2023-03-07 10:25:51 -05:00

README.md

BEES

Best-Effort Extent-Same, a btrfs deduplication agent.

About bees

bees is a block-oriented userspace deduplication agent designed for large btrfs filesystems. It is an offline dedupe combined with an incremental data scan capability to minimize time data spends on disk from write to dedupe.

Strengths

Space-efficient hash table and matching algorithms - can use as little as 1 GB hash table per 10 TB unique data (0.1GB/TB)
Daemon incrementally dedupes new data using btrfs tree search
Works with btrfs compression - dedupe any combination of compressed and uncompressed files
Works around btrfs filesystem structure to free more disk space
Persistent hash table for rapid restart after shutdown
Whole-filesystem dedupe - including snapshots
Constant hash table size - no increased RAM usage if data set becomes larger
Works on live data - no scheduled downtime required
Automatic self-throttling based on system load

Weaknesses

Whole-filesystem dedupe - has no include/exclude filters, does not accept file lists
Requires root privilege (or CAP_SYS_ADMIN)
First run may require temporary disk space for extent reorganization
First run may increase metadata space usage if many snapshots exist
Constant hash table size - no decreased RAM usage if data set becomes smaller
btrfs only

Installation and Usage

More Information

Bug Reports and Contributions

Email bug reports and patches to Zygo Blaxell bees@furryterror.org.

You can also use Github:

    https://github.com/Zygo/bees

Copyright & License

GPL (version 3 or later).

README.md

BEES

About bees

Strengths

Weaknesses

Installation and Usage

Recommended Reading

More Information

Bug Reports and Contributions

Copyright & License