1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-17 13:25:45 +02:00

566 Commits

Author SHA1 Message Date
Zygo Blaxell
38fffa8e27 lib: add a version string
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-18 22:17:02 -05:00
Zygo Blaxell
50417e961f crucible: rework the Resource class
Get rid of the ResourceHolder class.

Fix GCC static template member instantiation issues.

Replace assert() with exceptions.

shared_ptr can't seem to do reference counting in a multi-threaded
environment.  The code looks correct (for both ResourceHandle and
std::shared_ptr); however, continual segfaults don't lie.

Carpet-bomb with mutex locks to reduce the likelihood of losing shared_ptr
races.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-18 22:09:18 -05:00
Zygo Blaxell
6cc9b267ef crucible: time: fix uninitialized member
Found by valgrind.  It was mostly harmless because the range of
usable values is limited by m_burst (which was initialized) and 0.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-16 22:02:14 -05:00
Zygo Blaxell
9f120e326b bees: fix deadlock in thread status reporting
"s_name" was a thread_local variable, not static, and did not require a
mutex to protect access.  A deadlock is possible if a thread triggers an
exception with a handler that attempts to log a message (as the top-level
exception handler in bees does).

Remove multiple unnecessary mutex locks.  Rename the thread_local variables
to make their scope clearer.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-15 01:55:34 -05:00
Zygo Blaxell
382f8bf06a hash: prevent eleventy-gigabyte core dumps
Add MADV_DONTDUMP to the list of advice flags.

There are now three flags which may or may not be supported by the
target kernel.  Try each one and log its success or failure separately.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-12 22:55:08 -05:00
Zygo Blaxell
5e91529ad2 hash: remove the unused m_prefetch_rate_limit
The hash table statistics calculation in BeesHashTable::prefetch_loop
and the data-driven operation of the extent scanner always pulls the
hash table into RAM as fast as the disk will push the data.  We never
use the prefetch rate limit, so remove it.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-11 21:15:12 -05:00
Zygo Blaxell
bddc07bd28 hash: make thread status message more consistent
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-11 21:15:12 -05:00
Zygo Blaxell
ffe2a767d3 crucible: extentwalker: add compressed() and bytenr() methods
Also use C++11 syntax for construction.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-11 21:15:11 -05:00
Zygo Blaxell
845267821c main: count arguments correctly
Replace one braindead mistake for another.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-10 01:10:38 -05:00
Zygo Blaxell
3138002a1f main: ArgList would silently drop the first argument
This fixes a bug where bees tries to process itself as a btrfs filesystem.
This is a species of bug that I only notice *after* pushing to a public
git repo.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-09 23:42:02 -05:00
Zygo Blaxell
4a57c5f499 README: update copyright year, remove some obsolete statements 2017-01-09 23:32:33 -05:00
Zygo Blaxell
bda4638048 crucible: LockSet: add a maximum size constraint
Extend the LockSet class so that the total number of locked (active)
items can be limited.  When the limit is reached, no new items can be
locked until some existing locked items are unlocked.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-09 23:23:51 -05:00
Zygo Blaxell
fa8607bae0 crucible: get rid of DefaultBool, just use C++11 initializer syntax
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-09 23:23:32 -05:00
Zygo Blaxell
1b261b1ba7 build: move BEES_VERSION to a separate C file to avoid unnecessary building
Every git commit was causing bees.cc and bees-hash.cc to be rebuilt,
which was expensive and unnecessary.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-09 23:23:05 -05:00
Zygo Blaxell
6980935463 README: "btrfs: improve delayed refs iterations" has been merged into v4.10-rc1
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-09 00:05:18 -05:00
Zygo Blaxell
cf04fb17de crucible: remove unused execpipe
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-08 23:45:05 -05:00
Zygo Blaxell
4a9f26d12e crucible: remove ArgList and drop the unimplemented interpreter classes
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-08 23:45:05 -05:00
Zygo Blaxell
e8eaa7e471 trivial: mass purge of whitespace errors
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2017-01-06 22:14:50 -05:00
Timofey Titovets
22e601912e Make filters configurable
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
2016-12-30 04:26:42 +03:00
Timofey Titovets
badfa6e9b9 Add filter to remove time from bees output
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
2016-12-29 17:04:47 +03:00
Timofey Titovets
03609f73db Add help section to Makefile
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
2016-12-29 13:40:13 +03:00
Timofey Titovets
7f92f22dea Add install_scripts subcommand to make
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
2016-12-29 13:30:08 +03:00
Timofey Titovets
f7c71a7a25 Add install subcommand to make
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
2016-12-29 13:27:47 +03:00
Timofey Titovets
37713c2dd4 Scripts: Remove code for short path name in log
Commit: "log: remove path from thread name" remove path from logs, so this useless

Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
2016-12-29 13:14:17 +03:00
Zygo Blaxell
65a950bc41 README.md: 32-bit hosts work now v0.3 2016-12-27 18:01:30 -05:00
Zygo Blaxell
ef8d92a3cb resolve: don't stop at the first physical address lookup failure
The btrfs LOGICAL_INO ioctl has no way to report references to compressed
blocks precisely, so we must always consider all references to a
compressed block, and discard those that do not have the desired offset.

When we encounter compressed shared extents containing a mix of unique
and duplicate data, we attempt to replace all references to the mixed
extent with the same number of references to multiple extents consisting
entirely of unique or duplicate blocks.  An early exit from the loop
in BeesResolver::for_each_extent_ref was stopping this operation early,
after replacing as few as one shared reference.  This left other shared
references to the unique data on the filesystem, effectively creating
new dup data.

The failing pattern looks like this:

    dedup: replace 0x14000..0x18000 from some other extent
    copy: 0x10000..0x14000
    dedup: replace 0x10000..0x14000 with the copy
    [may be multiple dedup lines due to multiple shared references]
    copy: 0x18000..0x1c000
    [missing dedup 0x18000..0x1c000 with the copy here]
    scan: 0x10000 [++++dddd++++] 0x1c000

If the extent 0x10000..0x1c000 is shared and compressed, we will make
a copy of the extent at 0x18000..1c0000.  When we try to dedup this
copy extent, LOGICAL_INO will return a mix of references to the data
at logical 0x10000 and 0x18000 (which are both references to the
original shared extent with different offsets).  If we break out
of the loop too early, we will stop as soon as a reference to 0x10000
is found, and ignore all other references to the extent we are trying
to remove.

The copy at the beginning of the extent (0x10000..0x14000) usually
works because all references to the extent cover the entire extent.
When bees performs the dedup at 0x14000..0x18000, bees itself creates
the shared references with different offsets.

Uncompressed extents were not affected because LOGICAL_INO can locate
physical blocks precisely if they reside in uncompressed extents.

This change will hurt performance when looking up old physical addresses
that belong to new data, but that is a much less urgent problem.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2016-12-27 15:23:40 -05:00
Zygo Blaxell
6e7137f282 bees: work around btrfs fsync bug
btrfs provides a flush on rename when the rename target exists, so the
fsync is not necessary.  In the initialization case (when the rename
target does not exist and the implicit flush does not occur), the file
may be empty or a hole after a crash.  Bees treats this case the same
as if the file did not exist.  Since this condition occurs for only the
first 15 minutes of the lifetime of a bees installation, it's not worth
bothering to fix.

If we attempt to fsync the file ourselves, on a crash with log replay,
btrfs will end up with a directory entry pointing to a non-existent inode.
This directory entry cannot be deleted or renamed except by deleting
the entire subvol.  On large filesystems this bug is triggered by nearly
every crash (verified on kernels up to 4.5.7).

Remove the fsync to avoid the btrfs bug, and accept the failure mode
that occurs in the first 15 minutes after a bees install.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2016-12-27 15:20:31 -05:00
Zygo Blaxell
c1e31004b6 crawl: change scan order to make forward progress at all times
Previously, the scan order processed each subvol in order.  This required
very large amounts of temporary disk space, as a full filesystem scan
was required before any shared extents could be deduped.  If the hash
table RAM was underprovisioned this would mean some shared dup blocks
were removed from the hash table before they could be deduped.

Currently the scan order takes the first unscanned extent from each
subvol.  This works well if--and only if--the subvols are either empty
or children of a common ancestor.  It forces the same inode/offset pairs
to be read at close to the same time from each subvol.

When a new snapshot is created, this ordering diverts scanning to the
new subvol until it catches up to the existing subvols.  For large
filesystems with frequent snapshot creation this means that the scanner
never reaches the end of all subvols.  Each new subvol effectively
resets the current scan position for the entire filesystem to zero.
This prevents bees from ever completing the first filesystem scan.

Change the order again, so that we now read one unscanned extent from
each subvol in round-robin fashion.  When a new subvol is created, we
share scan time between old and new subvols.  This ensures we eventually
finish scanning initial subvols and enter the incremental scanning state.

The cost of this change is more repeated reading of shared extents at
scan time with less benefit from disk-device-level caching; however, the
only way to really fix this problem is to implement scanning on tree 2
(the btrfs extent tree) instead of the subvol trees.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2016-12-27 15:15:42 -05:00
Zygo Blaxell
7ecead1700 doc: comment updates
We stopped using FIEMAP for a number of reasons.  Document some of them.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2016-12-27 15:15:42 -05:00
Zygo Blaxell
efda609f66 log: remove path from thread name
The thread name has an arbitrarily limited size, and we are eventually
removing support for multiple paths in a single bees daemon process.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2016-12-27 15:15:16 -05:00
Zygo Blaxell
abd696c524 build: add -D_FILE_OFFSET_BITS=64 to makeflags to build on 32-bit hosts
Also update the tests to insist that off_t be at least 64 bits wide.
2016-12-14 19:02:01 -05:00
Zygo Blaxell
e835e8766e crucible: use set instead of vector in BtrfsExtentWalker
This gets rid of some more big memsets.  It may replace them
with a lot of tiny mallocs, though.  If this turns out to be
a bad idea then at least we can easily revert the change.
2016-12-13 21:46:41 -05:00
Zygo Blaxell
7782b79e4b crucible: reduce buffer size and CPU overhead for BtrfsIoctlSearchKey
We really do need some large buffers for BtrfsIoctlSearchKey in some
cases, but we don't need to zero them out first.  Don't do that so we
save some CPU.

Reduce the default buffer size to 4K because most BISK users don't get
need much more than 1K.  Set the buffer size explicitly to the product of
the number of items and the desired item size in the places that really
need a lot of items.
2016-12-13 21:46:35 -05:00
Paul Jones
d7c065e17e Add native compiler optimization's to compiler flags
Signed-off-by: Paul Jones <paul@pauljones.id.au>
2016-12-13 12:53:29 +11:00
Paul Jones
334f5f83ee Remove unused crc64 function
Signed-off-by: Paul Jones <paul@pauljones.id.au>
2016-12-13 12:52:26 +11:00
Paul Jones
8abdeabddc Make crc64 go faster
The current crc64 algorithm is a variant of the Redis implementation.
Change it to a variant of the Adler implementation as described
at https://matt.sh/redis-crcspeed

Test program at https://github.com/PeeJay/crc64-compare
Filesize: 1.1G
Asking crc64-redis to sum "/media/peejay/BTRFS/1/ubuntu-14.04.5-desktop-amd64.iso"...
Asking crc64-adler to sum "/media/peejay/BTRFS/1/ubuntu-14.04.5-desktop-amd64.iso"...
Redis CRC-64: f971f9ac6c8ba458
Adler CRC-64: f971f9ac6c8ba458
Adler throughput: 1659.913308 MB/s
Redis throughput: 437.284661 MB/s
Adler is 3.79x faster than Redis

Signed-off-by: Paul Jones <paul@pauljones.id.au>
2016-12-13 12:41:10 +11:00
Zygo Blaxell
f5f4d69ba3 lib: In 2016, Ubuntu still insists on topologically sorted libraries while linking
This fixes builds on Ubuntu Server 16.04.

Fixes: https://github.com/Zygo/bees/issues/8
2016-12-11 19:53:32 -05:00
Zygo Blaxell
ec9d4a1d15 crucible: fs: use a much smaller default search buffer size
It turns out we never use a value for m_buf_size that isn't the default,
and we also never ask for more than a few thousand items; however,
we do spend a ton of time memsetting the huge buffer to zero.

I don't know what the ideal size is, but 16K is a far better guess
than 1MB.  Let's reduce it for some immediate CPU benefit, and determine
what the size should be later.

Reported at https://github.com/Zygo/bees/issues/11
2016-12-11 13:24:44 -05:00
Zygo Blaxell
77c11bb90f bees: add version string and put it in main() and stats file
Now that we have more than one bees release it's somewhat important
to know which one each bug report is for...
2016-12-08 23:55:59 -05:00
Zygo Blaxell
b5c01c1985 hash: don't throw an exception if MADV_HUGEPAGE fails
We don't _need_ transparent hugepages.  We like them because they can
be faster, but it's not a requirement, and some people will disable
transparent hugepages because they make non-Bees-like workloads slow.

Try to use MADV_HUGEPAGE, but if it fails, just log the error and
continue.

MADV_DONTFORK would be useful if we still fork()ed, but we don't currently
do that.  It's still a useful flag to have because a fork() with more
than 50% of RAM in mlocked pages would result in a kernel OOM crash.
I don't think it's possible to run Bees on a kernel that does not support
the MADV_DONTFORK flag, so don't bother checking for that flag separately.
2016-12-08 23:55:59 -05:00
Zygo Blaxell
d82909387d README: upgrade kernel requirement to 4.4.3 because of kernel bugs 2016-12-08 23:55:58 -05:00
Zygo Blaxell
1cd6263552 README: document impact of 7f8e406 ("btrfs: improve delayed refs iterations") 2016-12-08 23:55:57 -05:00
Zygo Blaxell
eec80944cd roots: add a counter for crawl_ms, open_root and open_root_ino
Linux kernel commit 7f8e406 ("btrfs: improve delayed refs iterations")
seems to dramatically improve LOGICAL_INO performance.  Hopefully this
commit will find its way into mainline Linux soon.

This means that most of the time in Bees is now spent on block reading
(50-75%); however, there is still a big gap between block read and
the sum of everything else we are measuring with the "*_ms" counters.
This gap is about 30% of the run time, so it would be good to find out
what's in the gap.

Add ms counters around the crawl and open calls to capture where we are
spending all the time.
2016-12-08 23:55:39 -05:00
Zygo Blaxell
5a4ff9a0b8 Merge remote-tracking branch 'nefelim4ag/master' 2016-12-02 00:35:51 -05:00
Zygo Blaxell
9506406cff README: BEESHOME is now relative, UUIDs removed, resizing, file contents 2016-12-02 00:32:32 -05:00
Zygo Blaxell
1c4af5ce5a main: update usage message
BEESHOME is downgraded from required to optional.

Don't document the deprecated shared hash table feature.
2016-12-02 00:32:32 -05:00
Zygo Blaxell
642581e89a hash: remove the experimental shared hash-table and shared mmap features
The experiments are over, and the results were not success.

Having two filesystems cohabiting in the same hash table results in a
lot of false positives, each of which requires some heavy IO to resolve.

Using MAP_SHARED to share a beeshash.dat between processes results in
catastrophically bad performance.

These features were abandoned long ago, but some of the code--and even
worse, its documentation--still remains.

Bees wants a hash table false positive rate below 0.1%.  With a shared
hash table the FP rate is about the same as the dedup rate.  Typically
duplicate files on one filesystem are duplicate on many filesystems.

One or more of Linux VFS and the btrfs mmap(MAP_SHARED) implementation
produce extremely poor performance results.  A five-order-of-magnitude
speedup was achieved by implementing paging in userspace with worker
threads.  We no longer need the support code for the MAP_SHARED case.

It is still possible to run many BeesContexts in a single process,
but now the only thing contexts share is the FD cache.
2016-12-02 00:26:02 -05:00
Zygo Blaxell
fdfa78a81b context: default and relative BEESHOME
Allow relative paths with BEESHOME.  These paths will be relative
to the root of the dedup target filesystem.

BEESHOME is now optional.  If not specified, '.beeshome' is used.

We don't try to create BEESHOME if it doesn't exist.  BEESHOME might
not be on a btrfs filesystem, so we can't insist it be a subvol.
2016-12-02 00:22:18 -05:00
Zygo Blaxell
6fa8de660b hash: create beeshash.dat if it does not exist
BeesHashTable can now create a beeshash.dat if the file does not already
exist.  Currently the default size is one hash table extent (16MB) and
there's no way to change that (yet), so users should still create their
own hash tables for now.

The opening of the hash table is deferred (slightly) in preparation for
hash table resizing.

No doc as the feature is currently unfinished.
2016-12-02 00:20:30 -05:00
Zygo Blaxell
d58de9b76d bees: introduce BEESLOGNOTE macro
Quite often we have the same message in BEESLOG and BEESNOTE, so
make a macro to combine them.
2016-12-02 00:20:29 -05:00