mirror of https://github.com/Zygo/bees.git synced 2025-08-23 22:42:20 +02:00

Go to file

Zygo Blaxell 7f660f50b8 lib: fs: stop using libbtrfs-dev helper functions to re-enable buffer length checks

The Linux kernel's btrfs headers are better than the libbtrfs-dev headers:

	- the libbtrfs-dev headers have C++ language compatibility issues

	- upstream version in Linux kernel is more accurate and up to date

	- macros in libbtrfs-dev's ctree.h hide information that would
	enable bees to perform runtime buffer length checking

	- enum types whose presence cannot be detected with #ifdef

When accessing members of metadata items from the filesystem, we want
to verify that the member we are accessing is within the boundaries of
the item that was retrieved; otherwise, a memory access violation may
occur or garbage may be returned to the caller.  A simple C++ template,
given a pointer to a structure member and a buffer, can determine that
the buffer contains enough bytes to safely access a struct member.
This was implemented back in 2016, but left unused due to ctree.h issues.

Some btrfs metadata structures have variable length despite using a
fixed-size in-memory structure.  The members that appear earliest in
the structure contain information about which following members of the
structure are used.  The item stored in the filesystem is truncated after
the last used member, and all following members must not be accessed.

'btrfs_stack_*' accessor macros obscure the memory boundaries of the
members they access, which makes it impossible for a C++ template to
verify the memory access.  If the template checks the length of the
entire structure, it will find an access violation for variable-length
metadata items because the item is rarely large enough for the entire
structure.

Get rid of all the libbtrfs-dev accessor macros and reimplement them
with the necessary buffer length checks.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>

2021-02-22 20:06:43 -05:00

bin

bees: remove local cruft, throw at github

2016-11-17 12:12:13 -05:00

docs

stats: remove nonsense dedup_unique_bytes stat

2020-12-17 17:54:51 -05:00

include/crucible

lib: fs: stop using libbtrfs-dev helper functions to re-enable buffer length checks

2021-02-22 20:06:43 -05:00

lib

lib: fs: stop using libbtrfs-dev helper functions to re-enable buffer length checks

2021-02-22 20:06:43 -05:00

scripts

systemd service replace deprecated parameters

2018-11-05 12:35:17 -08:00

src

lib: fs: stop using libbtrfs-dev helper functions to re-enable buffer length checks

2021-02-22 20:06:43 -05:00

test

build: include localconf everywhere

2021-02-22 20:06:43 -05:00

.gitignore

Compilation: Let the code know about package config

2018-09-08 02:59:54 +02:00

COPYING

GPL-3: license it

2016-11-17 12:12:15 -05:00

Defines.mk

Makefile: create a template compiler

2018-09-08 02:59:54 +02:00

Makefile

build: make libcrucible a static library

2018-12-09 23:39:44 -05:00

makeflags

build: make libcrucible a static library

2018-12-09 23:39:44 -05:00

README.md

README: highlight DATA CORRUPTION WARNING

2019-06-12 22:48:05 -04:00

README.md

BEES

Best-Effort Extent-Same, a btrfs deduplication agent.

About bees

bees is a block-oriented userspace deduplication agent designed for large btrfs filesystems. It is an offline dedupe combined with an incremental data scan capability to minimize time data spends on disk from write to dedupe.

Strengths

Space-efficient hash table and matching algorithms - can use as little as 1 GB hash table per 10 TB unique data (0.1GB/TB)
Daemon incrementally dedupes new data using btrfs tree search
Works with btrfs compression - dedupe any combination of compressed and uncompressed files
NEW Works around btrfs send problems with dedupe and incremental parent shapshots
Works around btrfs filesystem structure to free more disk space
Persistent hash table for rapid restart after shutdown
Whole-filesystem dedupe - including snapshots
Constant hash table size - no increased RAM usage if data set becomes larger
Works on live data - no scheduled downtime required
Automatic self-throttling based on system load

Weaknesses

Whole-filesystem dedupe - has no include/exclude filters, does not accept file lists
Requires root privilege (or CAP_SYS_ADMIN)
First run may require temporary disk space for extent reorganization
First run may increase metadata space usage if many snapshots exist
Constant hash table size - no decreased RAM usage if data set becomes smaller
btrfs only

Installation and Usage

More Information

Bug Reports and Contributions

Email bug reports and patches to Zygo Blaxell bees@furryterror.org.

You can also use Github:

    https://github.com/Zygo/bees

Copyright & License

GPL (version 3 or later).

README.md

BEES

About bees

Strengths

Weaknesses

Installation and Usage

Recommended Reading

More Information

Bug Reports and Contributions

Copyright & License