1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-17 21:35:45 +02:00

437 Commits

Author SHA1 Message Date
Zygo Blaxell
efda609f66 log: remove path from thread name
The thread name has an arbitrarily limited size, and we are eventually
removing support for multiple paths in a single bees daemon process.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2016-12-27 15:15:16 -05:00
Zygo Blaxell
abd696c524 build: add -D_FILE_OFFSET_BITS=64 to makeflags to build on 32-bit hosts
Also update the tests to insist that off_t be at least 64 bits wide.
2016-12-14 19:02:01 -05:00
Zygo Blaxell
e835e8766e crucible: use set instead of vector in BtrfsExtentWalker
This gets rid of some more big memsets.  It may replace them
with a lot of tiny mallocs, though.  If this turns out to be
a bad idea then at least we can easily revert the change.
2016-12-13 21:46:41 -05:00
Zygo Blaxell
7782b79e4b crucible: reduce buffer size and CPU overhead for BtrfsIoctlSearchKey
We really do need some large buffers for BtrfsIoctlSearchKey in some
cases, but we don't need to zero them out first.  Don't do that so we
save some CPU.

Reduce the default buffer size to 4K because most BISK users don't get
need much more than 1K.  Set the buffer size explicitly to the product of
the number of items and the desired item size in the places that really
need a lot of items.
2016-12-13 21:46:35 -05:00
Paul Jones
d7c065e17e Add native compiler optimization's to compiler flags
Signed-off-by: Paul Jones <paul@pauljones.id.au>
2016-12-13 12:53:29 +11:00
Paul Jones
334f5f83ee Remove unused crc64 function
Signed-off-by: Paul Jones <paul@pauljones.id.au>
2016-12-13 12:52:26 +11:00
Paul Jones
8abdeabddc Make crc64 go faster
The current crc64 algorithm is a variant of the Redis implementation.
Change it to a variant of the Adler implementation as described
at https://matt.sh/redis-crcspeed

Test program at https://github.com/PeeJay/crc64-compare
Filesize: 1.1G
Asking crc64-redis to sum "/media/peejay/BTRFS/1/ubuntu-14.04.5-desktop-amd64.iso"...
Asking crc64-adler to sum "/media/peejay/BTRFS/1/ubuntu-14.04.5-desktop-amd64.iso"...
Redis CRC-64: f971f9ac6c8ba458
Adler CRC-64: f971f9ac6c8ba458
Adler throughput: 1659.913308 MB/s
Redis throughput: 437.284661 MB/s
Adler is 3.79x faster than Redis

Signed-off-by: Paul Jones <paul@pauljones.id.au>
2016-12-13 12:41:10 +11:00
Zygo Blaxell
f5f4d69ba3 lib: In 2016, Ubuntu still insists on topologically sorted libraries while linking
This fixes builds on Ubuntu Server 16.04.

Fixes: https://github.com/Zygo/bees/issues/8
2016-12-11 19:53:32 -05:00
Zygo Blaxell
ec9d4a1d15 crucible: fs: use a much smaller default search buffer size
It turns out we never use a value for m_buf_size that isn't the default,
and we also never ask for more than a few thousand items; however,
we do spend a ton of time memsetting the huge buffer to zero.

I don't know what the ideal size is, but 16K is a far better guess
than 1MB.  Let's reduce it for some immediate CPU benefit, and determine
what the size should be later.

Reported at https://github.com/Zygo/bees/issues/11
2016-12-11 13:24:44 -05:00
Zygo Blaxell
77c11bb90f bees: add version string and put it in main() and stats file
Now that we have more than one bees release it's somewhat important
to know which one each bug report is for...
2016-12-08 23:55:59 -05:00
Zygo Blaxell
b5c01c1985 hash: don't throw an exception if MADV_HUGEPAGE fails
We don't _need_ transparent hugepages.  We like them because they can
be faster, but it's not a requirement, and some people will disable
transparent hugepages because they make non-Bees-like workloads slow.

Try to use MADV_HUGEPAGE, but if it fails, just log the error and
continue.

MADV_DONTFORK would be useful if we still fork()ed, but we don't currently
do that.  It's still a useful flag to have because a fork() with more
than 50% of RAM in mlocked pages would result in a kernel OOM crash.
I don't think it's possible to run Bees on a kernel that does not support
the MADV_DONTFORK flag, so don't bother checking for that flag separately.
2016-12-08 23:55:59 -05:00
Zygo Blaxell
d82909387d README: upgrade kernel requirement to 4.4.3 because of kernel bugs 2016-12-08 23:55:58 -05:00
Zygo Blaxell
1cd6263552 README: document impact of 7f8e406 ("btrfs: improve delayed refs iterations") 2016-12-08 23:55:57 -05:00
Zygo Blaxell
eec80944cd roots: add a counter for crawl_ms, open_root and open_root_ino
Linux kernel commit 7f8e406 ("btrfs: improve delayed refs iterations")
seems to dramatically improve LOGICAL_INO performance.  Hopefully this
commit will find its way into mainline Linux soon.

This means that most of the time in Bees is now spent on block reading
(50-75%); however, there is still a big gap between block read and
the sum of everything else we are measuring with the "*_ms" counters.
This gap is about 30% of the run time, so it would be good to find out
what's in the gap.

Add ms counters around the crawl and open calls to capture where we are
spending all the time.
2016-12-08 23:55:39 -05:00
Zygo Blaxell
5a4ff9a0b8 Merge remote-tracking branch 'nefelim4ag/master' 2016-12-02 00:35:51 -05:00
Zygo Blaxell
9506406cff README: BEESHOME is now relative, UUIDs removed, resizing, file contents 2016-12-02 00:32:32 -05:00
Zygo Blaxell
1c4af5ce5a main: update usage message
BEESHOME is downgraded from required to optional.

Don't document the deprecated shared hash table feature.
2016-12-02 00:32:32 -05:00
Zygo Blaxell
642581e89a hash: remove the experimental shared hash-table and shared mmap features
The experiments are over, and the results were not success.

Having two filesystems cohabiting in the same hash table results in a
lot of false positives, each of which requires some heavy IO to resolve.

Using MAP_SHARED to share a beeshash.dat between processes results in
catastrophically bad performance.

These features were abandoned long ago, but some of the code--and even
worse, its documentation--still remains.

Bees wants a hash table false positive rate below 0.1%.  With a shared
hash table the FP rate is about the same as the dedup rate.  Typically
duplicate files on one filesystem are duplicate on many filesystems.

One or more of Linux VFS and the btrfs mmap(MAP_SHARED) implementation
produce extremely poor performance results.  A five-order-of-magnitude
speedup was achieved by implementing paging in userspace with worker
threads.  We no longer need the support code for the MAP_SHARED case.

It is still possible to run many BeesContexts in a single process,
but now the only thing contexts share is the FD cache.
2016-12-02 00:26:02 -05:00
Zygo Blaxell
fdfa78a81b context: default and relative BEESHOME
Allow relative paths with BEESHOME.  These paths will be relative
to the root of the dedup target filesystem.

BEESHOME is now optional.  If not specified, '.beeshome' is used.

We don't try to create BEESHOME if it doesn't exist.  BEESHOME might
not be on a btrfs filesystem, so we can't insist it be a subvol.
2016-12-02 00:22:18 -05:00
Zygo Blaxell
6fa8de660b hash: create beeshash.dat if it does not exist
BeesHashTable can now create a beeshash.dat if the file does not already
exist.  Currently the default size is one hash table extent (16MB) and
there's no way to change that (yet), so users should still create their
own hash tables for now.

The opening of the hash table is deferred (slightly) in preparation for
hash table resizing.

No doc as the feature is currently unfinished.
2016-12-02 00:20:30 -05:00
Zygo Blaxell
d58de9b76d bees: introduce BEESLOGNOTE macro
Quite often we have the same message in BEESLOG and BEESNOTE, so
make a macro to combine them.
2016-12-02 00:20:29 -05:00
Zygo Blaxell
ea0910ee6c crucible: fd: remove dead reference to unlink_or_die, introduce ftruncate_or_die 2016-12-02 00:19:37 -05:00
Zygo Blaxell
dd21e6f848 crucible: add missing template specializations of pwrite helper functions
I got a little too enthusiastic when redacting the code, and removed some
overloaded functions bees was using.  C++ silently found replacements,
and the result was a bug that prevented any data from being persisted
from the hash table.

Fixes: https://github.com/Zygo/bees/issues/7
v0.2
2016-12-02 00:16:51 -05:00
Zygo Blaxell
06e111c229 crawl: remove UUID from file names
Unfortunately we don't get to remove the libuuid dependency because
we still want to read a file that exists in the legacy location.
2016-12-02 00:16:03 -05:00
Timofey Titovets
606d48acc1 Add option to make mnt path shorter in logs
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
2016-11-28 08:23:50 +03:00
Timofey Titovets
bf4e31ae71 Add default values to vars
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
2016-11-27 06:23:42 +03:00
Timofey Titovets
03c116c3f1 Add Systemd service for bash wrapper
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
2016-11-27 03:19:31 +03:00
Timofey Titovets
a384cd976a Add bash wrapper
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
2016-11-27 03:19:19 +03:00
Zygo Blaxell
38bb70f5d0 build: OK, maybe 32-bit machines could work
I accidentally did a pre-push verification on a 32-bit build host.
There were a surprisingly small number of problems, so fix them.

Bees now builds on a 32-bit host.  Let's not update README just yet,
though:  the 32-bit ioctl support fails immediately after startup on a
64-bit kernel.
2016-11-26 02:06:28 -05:00
Zygo Blaxell
a57404442c execpipe: remove unreachable debug code
This is tripping up builds in stricter build environments.

https://github.com/Zygo/bees/issues/2
2016-11-26 01:06:44 -05:00
Zygo Blaxell
1e621cf4e7 README: Improve "about" section and update compiler dependency
"agent" is a nice generic term for the set of things that userspace
btrfs deduplicators are.  Let's call it that.

Throw out the awkward and rambling "About" text and use the announcement
from linux-btrfs instead.  Terrible English writing I at am.
2016-11-24 23:06:28 -05:00
Zygo Blaxell
1303fb9da8 build: fix FTBFS on GCC 6.2
I'm not surprised that GCC 6 doesn't let me send an ostream ref to itself,
even inside an uninstantiated template specialization.  I am a little
surprised I was trying to, and 4.9 let me get away with it.

It's 2016.  auto_ptr is deprecated now.

Some things were including vector that don't any more.

https://github.com/Zygo/bees/issues/1
2016-11-24 22:20:11 -05:00
Zygo Blaxell
876b76d761 README.md: answer some questions that came in after release 2016-11-17 15:13:47 -05:00
Zygo Blaxell
74de78947d README: more docs v0.1 2016-11-17 12:12:18 -05:00
Zygo Blaxell
4c9982e870 GPL-3: license it 2016-11-17 12:12:15 -05:00
Zygo Blaxell
d126ebf930 markdown: add it and write some 2016-11-17 12:12:14 -05:00
Zygo Blaxell
cca0ee26a8 bees: remove local cruft, throw at github 2016-11-17 12:12:13 -05:00