1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-17 13:25:45 +02:00
Zygo Blaxell f9a697518d btrfs-tree: introduce BtrfsDataExtentTreeFetcher to read data extents without metadata
Binary searches can be extremely slow if the target bytenr is near a
metadata block group, because metadata items are not visible to the
binary search algorithm.  In a non-mixed-bg filesystem, there can be
hundreds of thousands of metadata items between data extent items, and
since the binary search algorithm can't see them, it will run searches
that iterate over hundreds of thousands of objects about a dozen times.

This is less of a problem for mixed-bg filesystems because the data and
metadata blocks are not isolated from each other.  The binary search
algorithm still can't see the metadata items, but there are usually
some data items close by to prevent the linear item filter from running
too long.

Introduce a new fetcher class (all the good names were taken) that tracks
where the end of the current block group is.  When the end of the current
block group is reached in the linear search, skip ahead to a block group
that can contain data items.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-02-06 22:42:15 -05:00
2016-11-17 12:12:15 -05:00
2025-01-11 23:39:55 -05:00

BEES

Best-Effort Extent-Same, a btrfs deduplication agent.

About bees

bees is a block-oriented userspace deduplication agent designed to scale up to large btrfs filesystems. It is an offline dedupe combined with an incremental data scan capability to minimize time data spends on disk from write to dedupe.

Strengths

  • Space-efficient hash table - can use as little as 1 GB hash table per 10 TB unique data (0.1GB/TB)
  • Daemon mode - incrementally dedupes new data as it appears
  • Largest extents first - recover more free space during fixed maintenance windows
  • Works with btrfs compression - dedupe any combination of compressed and uncompressed files
  • Whole-filesystem dedupe - scans data only once, even with snapshots and reflinks
  • Persistent hash table for rapid restart after shutdown
  • Constant hash table size - no increased RAM usage if data set becomes larger
  • Works on live data - no scheduled downtime required
  • Automatic self-throttling - reduces system load
  • btrfs support - recovers more free space from btrfs than naive dedupers

Weaknesses

Installation and Usage

More Information

Bug Reports and Contributions

Email bug reports and patches to Zygo Blaxell bees@furryterror.org.

You can also use Github:

    https://github.com/Zygo/bees

Copyright 2015-2025 Zygo Blaxell bees@furryterror.org.

GPL (version 3 or later).

Description
Best-Effort Extent-Same, a btrfs dedupe agent
Readme 1.7 MiB
Languages
C++ 97%
C 1.6%
Makefile 0.8%
Shell 0.6%