mirror of
https://github.com/Zygo/bees.git
synced 2025-05-17 21:35:45 +02:00
This makes the doc match the code again. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
189 lines
9.2 KiB
Markdown
189 lines
9.2 KiB
Markdown
bees Configuration
|
|
==================
|
|
|
|
The only configuration parameter that *must* be provided is the hash
|
|
table size. Other parameters are optional or hardcoded, and the defaults
|
|
are reasonable in most cases.
|
|
|
|
Hash Table Sizing
|
|
-----------------
|
|
|
|
Hash table entries are 16 bytes per data block. The hash table stores
|
|
the most recently read unique hashes. Once the hash table is full,
|
|
each new entry in the table evicts an old entry.
|
|
|
|
Here are some numbers to estimate appropriate hash table sizes:
|
|
|
|
unique data size | hash table size |average dedupe extent size
|
|
1TB | 4GB | 4K
|
|
1TB | 1GB | 16K
|
|
1TB | 256MB | 64K
|
|
1TB | 128MB | 128K <- recommended
|
|
1TB | 16MB | 1024K
|
|
64TB | 1GB | 1024K
|
|
|
|
Notes:
|
|
|
|
* If the hash table is too large, no extra dedupe efficiency is
|
|
obtained, and the extra space just wastes RAM. Extra space can also slow
|
|
bees down by preventing old data from being evicted, so bees wastes time
|
|
looking for matching data that is no longer present on the filesystem.
|
|
|
|
* If the hash table is too small, bees extrapolates from matching
|
|
blocks to find matching adjacent blocks in the filesystem that have been
|
|
evicted from the hash table. In other words, bees only needs to find
|
|
one block in common between two extents in order to be able to dedupe
|
|
the entire extents. This provides significantly more dedupe hit rate
|
|
per hash table byte than other dedupe tools.
|
|
|
|
* When counting unique data in compressed data blocks to estimate
|
|
optimum hash table size, count the *uncompressed* size of the data.
|
|
|
|
* Another way to approach the hash table size is to simply decide how much
|
|
RAM can be spared without too much discomfort, give bees that amount of
|
|
RAM, and accept whatever dedupe hit rate occurs as a result. bees will
|
|
do the best job it can with the RAM it is given.
|
|
|
|
Factors affecting optimal hash table size
|
|
-----------------------------------------
|
|
|
|
It is difficult to predict the net effect of data layout and access
|
|
patterns on dedupe effectiveness without performing deep inspection of
|
|
both the filesystem data and its structure--a task that is as expensive
|
|
as performing the deduplication.
|
|
|
|
* **Compression** on the filesystem reduces the average extent length
|
|
compared to uncompressed filesystems. The maximum compressed extent
|
|
length on btrfs is 128KB, while the maximum uncompressed extent length
|
|
is 128MB. Longer extents decrease the optimum hash table size while
|
|
shorter extents increase the optimum hash table size because the
|
|
probability of a hash table entry being present (i.e. unevicted) in
|
|
each extent is proportional to the extent length.
|
|
|
|
As a rule of thumb, the optimal hash table size for a compressed
|
|
filesystem is 2-4x larger than the optimal hash table size for the same
|
|
data on an uncompressed filesystem. Dedupe efficiency falls dramatically
|
|
with hash tables smaller than 128MB/TB as the average dedupe extent size
|
|
is larger than the largest possible compressed extent size (128KB).
|
|
|
|
* **Short writes** also shorten the average extent length and increase
|
|
optimum hash table size. If a database writes to files randomly using
|
|
4K page writes, all of these extents will be 4K in length, and the hash
|
|
table size must be increased to retain each one (or the user must accept
|
|
a lower dedupe hit rate).
|
|
|
|
Defragmenting files that have had many short writes increases the
|
|
extent length and therefore reduces the optimum hash table size.
|
|
|
|
* **Time between duplicate writes** also affects the optimum hash table
|
|
size. bees reads data blocks in logical order during its first pass,
|
|
and after that new data blocks are read incrementally a few seconds or
|
|
minutes after they are written. bees finds more matching blocks if there
|
|
is a smaller amount of data between the matching reads, i.e. there are
|
|
fewer blocks evicted from the hash table. If most identical writes to
|
|
the filesystem occur near the same time, the optimum hash table size is
|
|
smaller. If most identical writes occur over longer intervals of time,
|
|
the optimum hash table size must be larger to avoid evicting hashes from
|
|
the table before matches are found.
|
|
|
|
For example, a build server normally writes out very similar source
|
|
code files over and over, so it will need a smaller hash table than a
|
|
backup server which has to refer to the oldest data on the filesystem
|
|
every time a new client machine's data is added to the server.
|
|
|
|
Scanning modes for multiple subvols
|
|
-----------------------------------
|
|
|
|
The `--scan-mode` option affects how bees schedules worker threads
|
|
between subvolumes. Scan modes are an experimental feature and will
|
|
likely be deprecated in favor of a better solution.
|
|
|
|
Scan mode can be changed at any time by restarting bees with a different
|
|
mode option. Scan state tracking is the same for all of the currently
|
|
implemented modes. The difference between the modes is the order in
|
|
which subvols are selected.
|
|
|
|
If a filesystem has only one subvolume with data in it, then the
|
|
`--scan-mode` option has no effect. In this case, there is only one
|
|
subvolume to scan, so worker threads will all scan that one.
|
|
|
|
Within a subvol, there is a single optimal scan order: files are scanned
|
|
in ascending numerical inode order. Each worker will scan a different
|
|
inode to avoid having the threads contend with each other for locks.
|
|
File data is read sequentially and in order, but old blocks from earlier
|
|
scans are skipped.
|
|
|
|
Between subvols, there are several scheduling algorithms with different
|
|
trade-offs:
|
|
|
|
Scan mode 0, "lockstep", scans the same inode number in each subvol at
|
|
close to the same time. This is useful if the subvols are snapshots
|
|
with a common ancestor, since the same inode number in each subvol will
|
|
have similar or identical contents. This maximizes the likelihood
|
|
that all of the references to a snapshot of a file are scanned at
|
|
close to the same time, improving dedupe hit rate and possibly taking
|
|
advantage of VFS caching in the Linux kernel. If the subvols are
|
|
unrelated (i.e. not snapshots of a single subvol) then this mode does
|
|
not provide significant benefit over random selection. This mode uses
|
|
smaller amounts of temporary space for shorter periods of time when most
|
|
subvols are snapshots. When a new snapshot is created, this mode will
|
|
stop scanning other subvols and scan the new snapshot until the same
|
|
inode number is reached in each subvol, which will effectively stop
|
|
dedupe temporarily as this data has already been scanned and deduped
|
|
in the other snapshots.
|
|
|
|
Scan mode 1, "independent", scans the next inode with new data in each
|
|
subvol. Each subvol's scanner shares inodes uniformly with all other
|
|
subvol scanners until the subvol has no new inodes left. This mode makes
|
|
continuous forward progress across the filesystem and provides average
|
|
performance across a variety of workloads, but is slow to respond to new
|
|
data, and may spend a lot of time deduping short-lived subvols that will
|
|
soon be deleted when it is preferable to dedupe long-lived subvols that
|
|
will be the origin of future snapshots. When a new snapshot is created,
|
|
previous subvol scans continue as before, but the time is now divided
|
|
among one more subvol.
|
|
|
|
Scan mode 2, "sequential", scans one subvol at a time, in numerical subvol
|
|
ID order, processing each subvol completely before proceeding to the
|
|
next subvol. This avoids spending time scanning short-lived snapshots
|
|
that will be deleted before they can be fully deduped (e.g. those used
|
|
for `btrfs send`). Scanning is concentrated on older subvols that are
|
|
more likely to be origin subvols for future snapshots, eliminating the
|
|
need to dedupe future snapshots separately. This mode uses the largest
|
|
amount of temporary space for the longest time, and typically requires
|
|
a larger hash table to maintain dedupe hit rate.
|
|
|
|
Scan mode 3, "recent", scans the subvols with the highest `min_transid`
|
|
value first (i.e. the ones that were most recently completely scanned),
|
|
then falls back to "independent" mode to break ties. This interrupts
|
|
long scans of old subvols to give a rapid dedupe response to new data,
|
|
then returns to the old subvols after the new data is scanned. It is
|
|
useful for large filesystems with multiple active subvols and rotating
|
|
snapshots, where the first-pass scan can take months, but new duplicate
|
|
data appears every day.
|
|
|
|
The default scan mode is 1, "independent".
|
|
|
|
If you are using bees for the first time on a filesystem with many
|
|
existing snapshots, you should read about [snapshot gotchas](gotchas.md).
|
|
|
|
Threads and load management
|
|
---------------------------
|
|
|
|
By default, bees creates one worker thread for each CPU detected.
|
|
These threads then perform scanning and dedupe operations. The number of
|
|
worker threads can be set with the [`--thread-count` and `--thread-factor`
|
|
options](options.md).
|
|
|
|
If desired, bees can automatically increase or decrease the number
|
|
of worker threads in response to system load. This reduces impact on
|
|
the rest of the system by pausing bees when other CPU and IO intensive
|
|
loads are active on the system, and resumes bees when the other loads
|
|
are inactive. This is configured with the [`--loadavg-target` and
|
|
`--thread-min` options](options.md).
|
|
|
|
Log verbosity
|
|
-------------
|
|
|
|
bees can be made less chatty with the [`--verbose` option](options.md).
|