1
0
mirror of https://github.com/Zygo/bees.git synced 2025-08-02 05:43:29 +02:00

190 Commits
v0.5 ... v0.6.5

Author SHA1 Message Date
Zygo Blaxell
a466ccf2f1 build: include localconf everywhere
Overriding makeflags did not work from localconf in the src, lib, or
test directories.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-02-08 12:52:45 -05:00
Zygo Blaxell
ba04fe1349 roots: make it build with clang
Remove an unnecessary cast that was breaking namespace lookup for clang.

Closes: #159

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-02-08 12:49:48 -05:00
Zygo Blaxell
830df63d4c chatter: make it build with clang
Silence the unused variable warning.  The compiler is correct, but we
may implement line-level debug at some point in the future, so we
want to keep the member and parameters.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-02-08 12:49:42 -05:00
Zygo Blaxell
20c9d2ff6a clang: fix struct/class declaration/definition mismatches
clang does not like a defined class to be declared as a struct.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-02-08 12:49:40 -05:00
Zygo Blaxell
7bbb4d14cb bees context: make it build with clang
Remove unused function getenv_or_die.  All of our environment variable
parameters are optional or have default values.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-02-08 12:49:38 -05:00
Zygo Blaxell
363c45b8cd bees: make it build with clang
Remove unused "addr check" functions.  We have ranged_cast for detecting
overflow bits.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-02-08 12:49:35 -05:00
Zygo Blaxell
4ec2b8ac16 task: make it build with clang
Remove unused closure captures.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-02-08 12:49:30 -05:00
Zygo Blaxell
26d31225fa extentwalker: make it build with clang
Remove unused MAX_OFFSET.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-02-08 12:49:27 -05:00
Zygo Blaxell
21ae937201 roots: reimplement transid_max_nocache using extent tree root
Commit 9a97699dd9 upstream.

This commit accidentally fixes a bug where we call btrfs_get_root_transid
with BTRFS_FS_TREE_OBJECTID instead of m_ctx->root_fd().  This leads
to storms of messages like this:

	crawl_transid[5334]: exception type std::system_error: BTRFS_IOC_INO_LOOKUP: rv = readlink(path.c_str(), buf, size + 1): No such file or directory at fs.cc:430: No such file or directory

The code was working before because BTRFS_FS_TREE_OBJECTID == 5.
bees is constantly opening files, and the Linux kernel fills in unused
fd numbers starting from 0, so it's quite likely that the process has fd
5 open to some existing file somewhere on the target btrfs filesystem
most of the time.  If fd 5 is closed, or if it is open to an orphan
file (one without an existing name), the ioctl in btrfs_get_root_id
(called by btrfs_get_root_transid) will fail and throw and exception.
The exception breaks out of the crawl_transid task before it can do any
scanning work, so bees will stop deduping until FD 5 is open again with
an existing file.  This can only happen if other threads are opening
files, so if bees is idle at the instant when this failure occurs,
it will never dedupe again until the process is terminated and restarted.

The remainder is the original commit message:

ROOT_TREE contains the ROOT_ITEM for EXTENT_TREE.  Every modification
(that we care about) to a btrfs must go through EXTENT_TREE, and must
modify the page in ROOT_TREE pointing to the root of EXTENT_TREE...
which makes that a very good source for the filesystem transid.

Remove the loop and the root lookups, and just look at one item for
max_transid.

Also note that every caller of transid_max_nocache() immediately
feeds the return value to m_transid_re.update(), so don't do that
inside transid_max_nocache().

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2020-12-23 17:27:44 -05:00
Zygo Blaxell
7283126e5c bees: initialize context in the correct order
We cannot use BeesContext::roots() until after
BeesContext::set_root_path() has been called.
Save up the parameter settings until then.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2020-08-31 22:39:51 -04:00
Zygo Blaxell
ac53e50d3e context: workaround to prevent LOGICAL_INO and btrfs balance from running concurrently
This avoids some kernel bugs.  One of them is fixed in 5.3.4 and later:

	efad8a853a "Btrfs: fix use-after-free when using the tree modification log"

There are apparently others in current kernels, so for now just put bees
on pause until the balance is done.

At some point we may want to provide an option to disable this
workaround; however, running bees and balance at the same time makes
neither particularly fast, so maybe we'll just leave it this way.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2019-11-28 11:32:30 +01:00
Zygo Blaxell
6e75857d71 main: single BeesContext instance per process
After weeks of testing I copied part of a change to main without copying
the rest of the change, leading to an immediate segfault on startup.

So here is the rest of the change:  limit the number of
BeesContexts per process to 1.  This change was discussed at
https://github.com/Zygo/bees/issues/54#issuecomment-360332529 but there
are more reasons to do it now:  the candidates to replace the current
hash table format are less forgiving of sharing hash tables, and it may
even become necessary to have more than one hash table per BeesContext
instance (e.g. to keep datasum and nodatasum data separate).

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2019-10-30 00:07:59 -04:00
Zygo Blaxell
9a9dd89177 process: Fix gettid() ambiguity with glibc >= 2.30
In version 2.30 glibc added it's own gettid() function. This resulted in
"error: call of overloaded ‘gettid()’ is ambiguous" because gettid()
now exists in both namespace crucible and std.

For now, use explicit references to namespace crucible.  This continues
to work with new and old libc without having to test specific library
versions.

At some point, glibc gettid() will be deployed widely enough that we can
remove the crucible version entirely.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2019-10-29 23:34:36 -04:00
Zygo Blaxell
7e5c9b6bbf lib: fix non-local lambda expression cannot have a capture-default
We got away with this because GCC 4.8 (and apparently every GCC prior
to 9) didn't notice or care, and because there is nothing referenced
inside the lambda function body that isn't accessible from any other
kind of function body (i.e. the capture wasn't needed at all).

GCC 9 now enforces what the C++ standard said all along:  there is
no need to allow capture-default in this case, so it is not.

Fix by removing the offending capture-default.

Fixes: https://github.com/Zygo/bees/issues/112
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2019-10-29 23:19:31 +01:00
Zygo Blaxell
5ee09ef9e8 tempfile: drop the fsync()
The deadlock seems to be fixed now (if there ever was one--there certainly
were deadlocks, but matching deadlocks to root causes is non-trivial
and a number of distinct deadlock cases have been fixed in recent years).

The benchmark data is inconclusive about whether it is better to fsync or
not to fsync.  A paranoia option might be useful here.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2019-10-29 23:19:31 +01:00
Zygo Blaxell
a5b9919d26 roots: quick fix for task scheduling bug leading to loss of crawl_master
The crawl_master task had a simple atomic variable that was supposed
to prevent duplicate crawl_master tasks from ending up in the queue;
however, this had a race condition that could lead to m_task_running
being set with no crawl_master task running to clear it.  This would in
turn prevent crawl_thread from scheduling any further crawl_master tasks,
and bees would eventually stop doing any more work.

A proper fix is to modify the Task class and its friends such that
Task::run() guarantees that 1) at most one instance of a Task is ever
scheduled or running at any time, and 2) if a Task is scheduled while
an instance of the Task is running, the scheduling is deferred until
after the current instance completes.  This is part of a fairly large
planned change set, but it's not ready to push now.

So instead, unconditionally push a new crawl_master Task into the queue
on every poll, then silently and quickly exit if the queue is too full
or the supply of new extents is empty.  Drop the scheduling-related
members of BeesRoots as they will not be needed when the proper fix lands.

Fixes: 4f0bc78a "crawl: don't block a Task waiting for new transids"
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2019-10-29 23:19:31 +01:00
Zygo Blaxell
04dbfd5bf1 bees: soft-limit computed thread counts to 8
https://github.com/Zygo/bees/issues/91 describes problems encountered
when running bees on systems with many CPU cores.

Limit the computed number of threads (using --thread-factor or the
default) to a maximum of 8 (i.e. the number of logical cores in a modern
laptop).  Users can override the limit by using --thread-count.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2019-10-29 23:14:35 +01:00
Zygo Blaxell
df640062e7 workarounds: add workaround for btrfs send
Introduce --workaround options which trade performance or effectiveness to
avoid triggering kernel bugs.

The first such option is --workaround-btrfs-send, which avoids making any
modification to read-only subvols to avoid btrfs send bugs.

Clean up usage message:  no tabs for formatting, split options into
sections by theme.

Make scan mode a non-static data member like all (most?) other options.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2019-10-29 23:14:35 +01:00
Kai Krakow
1d369a3c18 Makefile: Fix git usage for non-git source archive
We didn't take enough care to fix all invocations of git in this
scenario.

Fixes: 32d2739 ("Makefile: Specify version when building from tarball")
Signed-off-by: Kai Krakow <kai@kaishome.de>
2019-10-29 23:13:51 +01:00
Kai Krakow
14ccf88050 crucible: Try repairing a build failure around swap macro
Gentoo-Bug: https://bugs.gentoo.org/670606
Fixes: https://github.com/Zygo/bees/issues/85
Suggested-by: Zygo Blaxell <bees@furryterror.org>
Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-11-09 06:48:01 +01:00
rsjaffe
b6e4511446 systemd service replace deprecated parameters
Replace CPU shares and IO block weight by CPU weight and IO weight. Note that new parameters are roughly 1/100 of old one--I believe that's the right conversion. Also removed duplicate Nice parameter and alphabetized the parameters for ease of reading.
2018-11-09 06:48:01 +01:00
Zygo Blaxell
256da15ac1 context: cache result of home_fd()
BeesContext::home_fd() is supposed to open $BEESHOME once and cache
the Fd for later calls; however, instead it was reopening a new Fd each
time it was called, and _also_ holding that Fd in a BeesContext member.
Fds clean themselves up when they are forgotten, so it was not leaking
per se, but it certainly had more open Fds than it needed to.

Check to see if we have m_home_fd open, and return that if so.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-11-09 06:48:01 +01:00
Zygo Blaxell
c426794542 roots: fix subvol scan rollover on subvols with empty transid range
The ordering function for BeesCrawlState did not consider

	root 292 inode 0 min_transid 2345 max_transid 3456

to be larger than

	root 292 inode 258 min_transid 2345 max_transid 2345

so when we attempted to update the end pointer for the crawl progress,
the new state was not considered newer than the old state because the
min_transid was equal, but the new crawl state's inode number was smaller.

Normally this is not a problem because subvol scans typically begin
and end in separate transactions (in part because we don't start a
subvol scan until at least two transactions are available); however,
the cleanup code for the aftermath of the recent transid_min() bug can
create crawlers with equal max_transid and min_transid records.

Fix this by ordering both transid fields before any others in the
crawl state.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-11-08 01:04:14 +01:00
Zygo Blaxell
ce0c1ab629 roots: do not accept 18446744073709551615 as max_transid in beescrawl.dat
Due to an earlier bug some beescrawl.dat files will contain uint64_t
max as max_transid.  This prevents any further scanning on the subvol
because there is no possibiity of having a real transid (or any other
uint64_t number) larger than uint64_t max.

If we detect a bad transid in beescrawl.dat, log a warning, then use
some more plausible value:  either min_transid to repeat the previous
incremental crawl, or 0 to restart the subvol scan from the beginning.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-11-08 01:04:10 +01:00
Zygo Blaxell
d11906c4e8 roots: do not allow transid_min to be numeric_limits<uint64_t>::max()
On a few test machines max_transid on subvols is getting set to
18446744073709551615 (aka uint64_t max).

Prevent transid_min() from ever returning this value.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-11-08 01:04:08 +01:00
Zygo Blaxell
e7fbd0c732 src: add bees-version.new.c to .gitignore
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-11-08 01:02:03 +01:00
Kai Krakow
cf9d1d0b78 Makefile: Specify version when building from tarball
When package maintainers build from a tarball, the .git directory does
not exist to extract the version tag. Let's add a hack to work around
this issue and let them specify `BEES_VERSION="v0.y"` on the make
cmdline.

Github-Bug: https://github.com/Zygo/bees/issues/75
Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-11-08 01:01:25 +01:00
Kai Krakow
3504439d5c contrib/gentoo: Update ebuild
Now that the packaging preparations were merged, we should update the
ebuild to reflect the upstream master branch.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-27 10:55:24 +02:00
Zygo Blaxell
d4b3836493 extentwalker: don't fetch absurd numbers of extents just to throw them away
ExtentWalker doesn't gain significant benefits from caching, and the
extra SEARCH_V2 ioctls were blamed for a 33% kernel CPU overhead by perf.

Reduce the number of extents to 16 in lieu of fixing the caching.

This gives a significant speed boost on CPU-bound workloads compared
to the original 1024--almost 40% faster on a single SSD with a filesystem
consisting of raw VM images mounted with compress=zstd.

This also seems to reduce LOGICAL_INO overhead.  Perhaps SEARCH_V2 and
LOGICAL_INO were trying to lock the same extents, and interfering with
each other?

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-09-26 23:29:56 -04:00
Kai Krakow
f053e0e1a7 beesd: Fix the wrapper not finding any config file
`grep -q something | grep -q something_else` will never find anything.
The for-loop is redundant anyways because `grep -l` can already work for
us. Let's replace this with a shorter and working version.

CC: Timofey Titovets <timofey.titovets@synesis.ru>
(fixes: commit 06d41fd "Rewrite beesd arg parser")
Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-16 17:56:31 -04:00
Zygo Blaxell
bcfc3cf08b Merge https://github.com/Zygo/bees/pull/62 2018-09-15 00:09:46 -04:00
Zygo Blaxell
9dbe2d6fee bees: add -G/--thread-min option for minimum thread count
The -g option limits the number of worker threads when the target load
average is exceeded.  On some systems the load normally runs high, and
continuous bees operation is required to avoid running out of disk space.

Add a -G/--thread-min option to force at least some threads to continue
running.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-09-14 23:50:07 -04:00
Zygo Blaxell
dd3c32a43d README: spell 'available' correctly
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-09-14 23:50:07 -04:00
Zygo Blaxell
3d536ea6df roots: if queue is full run again
The task queue may already be full of tasks when the crawl task is
executed.  In this case simply reschedule the crawl task at the
end of the current queue.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-09-14 23:50:06 -04:00
Zygo Blaxell
e66086516f bees: dynamic thread pool size based on system load average
Add -g / --loadavg-target parameter to track system load and add or
remove bees worker threads dynamically to keep system load close to the
loadavg target.  Thread count may vary from zero to the maximum
specified by -c or -C, and is adjusted every 5 seconds.

This is better than implementing a similar load average scheme from
outside of the process (though that is still possible) because the
in-process load tracker does not disrupt the performance timing feedback
mechanisms as a freezer cgroup or SIGSTOP would when controlling bees
from outside.  The internal load average tracker can also adjust the
number of active threads while an external tracker can only choose from
the maximum or zero.

Also fix a bug where a Task could deadlock waiting for itself to exit
if it tries to insert a new Task after the number of worker threads has
been set to zero.

Also correct usage message for --scan-mode (values are 0..2) since
we are touching adjacent lines anyway.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-09-14 23:50:03 -04:00
Zygo Blaxell
96eb100ded bees: use readahead instead of posix_fadvise
Other btrfs utils use readahead() not posix_fadvise().

There does not appear to be a performance or correctness difference
between the three (none, posix_fadvise, or readahead()).

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-09-14 23:50:00 -04:00
Zygo Blaxell
041ad717a5 bees: configurable log verbosity
Log messages were already labelled with log levels, but there was no
way to filter by log level at run time.

Implement the filter inside the bees process so it can skip evaluation
of the BEESLOG* arguments if the log messages would not be emitted.

Fixes: https://github.com/Zygo/bees/issues/67

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-09-14 23:50:00 -04:00
Zygo Blaxell
b22db12390 context: log dedups with single unbroken log message
When BEESLOGINFO is called multiple times it generates separate log
records that can be mixed up when multiple threads dedup.

Use a single BEESLOGINFO call for each dedup to prevent this.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-09-14 23:50:00 -04:00
Zygo Blaxell
8938caa029 README.md: update build-deps
btrfs/ioctl.h has been moved to a different package.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-09-14 23:49:57 -04:00
Zygo Blaxell
8bc4bee8a3 crucible: progress: drop the set() method
set() was broken and redundant.  Calling hold() and discarding the
returned object has the correct effect.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-09-14 23:49:54 -04:00
Zygo Blaxell
1beb61fb78 crucible: error: record location of exception in what() message
Make the log show where the exception is thrown from.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-09-14 23:49:51 -04:00
Timofey Titovets
06d41fd518 Rewrite beesd arg parser
Signed-off-by: Timofey Titovets <timofey.titovets@synesis.ru>
2018-09-15 00:21:06 +03:00
Kai Krakow
788774731b Gentoo: Rework Gentoo ebuild into overlay
This commit squashes all the little changes from the previous
integration branch into one, adjusts to the new Makefile changes, and
introduces an overlay layout so that the contrib/gentoo-bees subtree
can be directly added as a Portage overlay to the system.

The following list contains the previous commit descriptions:

sys-fs/bees: Keyword tested architecture ~amd64

    Bees was tested on this platform.

sys-fs/bees: Add kernel version checks

    Add checking the kernel versions and write some info and/or warnings
    before building and installing the package. Running bees on older
    kernels may have some serious performance and stability impacts, let's
    tell the user about it.

    Closes #55

sys-fs/bees: Add metadata.xml

sys-fs/bees: There's no configure script

    So, there's no point in calling "default".

sys-fs/bees: Simplify src_configure()

sys-fs/bees: Don't depend on markdown

    It makes no sense to install both README.md and README.html, and we can
    get rid of one dependency.

Dependencies: btrfs-progs is no longer a buildtime-only dep

    It is actually needed by the bees service wrapper script, as pointed out
    by Gentoo QA review.

sys-fs/bees: DOCS is not needed

    "COPYING" is already covered by the licensing. The ebuild defaults
    already include README*

sys-fs/bees: Make warnings exclusive

    It was recommended by Gentoo QA to show only either one or another
    warning, and change the texts accordingly.

sys-fs/bees: RDEPEND is not implicit

    RDEPEND does not implicitly default to DEPEND. Let's explicitly set the
    variable.

sys-fs/bees: IUSE=test is only needed for explicit dependencies

    Thus, remove it.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-08 05:06:39 +02:00
Kai Krakow
679a327ac5 Makefile: Do not force optimizations by default
Make life easier for package maintainers by not forcing architecture or
compiler optimizations by default. E.g., Gentoo QA refuses to accept
both "-march=native" and "-O3". These are usually provided by the
package tooling.

Instead, we provide easily accessible templates in "makeflags".

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-08 04:05:15 +02:00
Kai Krakow
31b41bb3c2 Makefile: Do not force making README.html
This forces us to depend on markdown which would be otherwise optional.
Most of the time it is sufficient to let package managers just install
the README.md file.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-08 03:34:48 +02:00
Kai Krakow
d7e235c178 Makefile: "which" is not portable
It was pointed out by Gentoo QA that "type -P" is a better choice.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-08 03:14:18 +02:00
Kai Krakow
51108f839d Makefile: Due to VPATH, libcrucible links to hard-coded libuuid path
Due to VPATH and how make resolves source paths, libcrucible.so ends up
with a hard-coded path to link against libuuid.so. Let's fix it by
turning the general rule into an explicit rule for libcrucible.so.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-08 03:07:20 +02:00
Kai Krakow
8d102abf8b Makefile: create a template compiler
This creates a simple template compiler using sed in as a reusable
variable.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-08 02:59:54 +02:00
Kai Krakow
83e8f87dc9 Scripts: Don't prefix timestamps when running with systemd
Since systemd prefix it's own timestamps, we can unconditionally remove
timestamps when bees is executed by systemd.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-08 02:59:54 +02:00
Kai Krakow
4417b18d9e Makefile: .version.o is made from a generated file
We should probably not put it into the objects list. Let's instead
explicitly put it as a depend of libcrucible.so.

This allows us to not use *.cc as a depend for .version.cc which makes
more sense as CRUCIBLE_OBJS is also explicitly defined and not built
from wildcards.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-08 02:59:54 +02:00
Kai Krakow
8636312cab Compilation: Let the code know about package config
This commit adds support for putting package configuration options into
header files. This is needed to prepare reading config files from /etc.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-08 02:59:54 +02:00
Kai Krakow
17e1171464 Installation: Remove USR_PREFIX from Makefile
This commit removes USR_PREFIX and introduces ETC_PREFIX instead. The
purpose of PREFIX is the installation prefix in the system, not the
installation destination. The latter one is what DESTDIR is used for.

This should clear up the confusion. PREFIX was already mis-used as
installation destination. But that doesn't mix well with how the make
targets are designed.

CC: Timofey Titovets <nefelim4ag@gmail.com>
Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-08 02:59:52 +02:00
Kai Krakow
9069201036 Scripts: Fix systemd unit not being templated
Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-08 02:21:08 +02:00
Kai Krakow
ace814321f Makefile: Auto-detect systemd unit path
This uses pkg-config to detect the system unit dir.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-08 02:21:08 +02:00
Kai Krakow
451f0ad9aa Makefile: Allow installation of fiemap/fiewalk support tools
There's now a new make target called "install_tools" which would not run
by default on installation.

One can add "OPTIONAL_INSTALL_TARGETS=install_tools" into localconf to
install these by default.

fiewalk would be installed to sbin, as only root can run it, the other
goes to bin.

Gentoo can use this to optionally install these tools as a package
feature.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-08 02:20:59 +02:00
Kai Krakow
85f9265034 Makefile: make installing libs a separate target
This will allow installing fiemap/fiewalk support tools as an optional
install target.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-08 02:13:27 +02:00
Kai Krakow
5b28aad27f Makefile: Run install tests only for default target "reallyall"
Otherwise, tests would still run during "make install".

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-08 02:13:27 +02:00
Kai Krakow
6c47bb61c1 Makefile: remove tests from "make all"
Instead, introduce "make reallyall" and make it the default target. Now,
one can override the default target using localconf.

Needed for preparing Gentoo ebuild test behavior.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-09-08 02:13:27 +02:00
Timofey Titovets
2d14fd90e4 Update options in sample config
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
2018-08-29 11:44:25 +03:00
Timofey Titovets
e0f315d47a Make beesd -h useful
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
2018-08-29 11:44:25 +03:00
Zygo Blaxell
e564d27dda README: update known bugs and issues list
Also split "bad feature interactions" into "unknown" (which is what it
really was before) and "bad" (which includes some filesystem-destroying
problems).

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-05-18 00:16:09 -04:00
Zygo Blaxell
c3effe0a20 crawl: use custom order instead of (ab)using BeesFileRange::operator<
This makes the code clearer and keeps changes to BeesFileRange ordering
isolated.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-05-18 00:16:08 -04:00
Zygo Blaxell
f8c27f5c6a bees: revert TOXIC_INTERVAL back to pre-4.14 levels
Linux kernel 4.14, while resistant to extent toxicity, is not immune to it.

Go back to the paranoid setting to avoid tying up filesystems in
ridiculously long kernel loops in find_parent_nodes.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-05-18 00:16:08 -04:00
Zygo Blaxell
26039cd559 tempfile: update comments around bees_sync
Deadlock reproduced on kernel 4.14.34.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-05-18 00:16:04 -04:00
Zygo Blaxell
e9aef89293 fs: fix FTBFS on GCC 8
The memset is just doing an assignment from one dereferenced pointer to
another, so do an assignment to keep GCC 8 happy.

Fixes: https://github.com/Zygo/bees/issues/64

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-05-18 00:15:37 -04:00
Zygo Blaxell
c21518d8ff stats: rename "chase_wrong_data" to "chase_no_data"
An empty BeesBlockData from the chasing algorithm used to mean that data
was found at the expected location but it does not match; however, there
are now other reasons for this and they occur much more often.  The name
is misleading.

Change the name to report more correctly what happens:  no data, without
any guess about the reason.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-03-01 00:01:13 -05:00
Zygo Blaxell
082f04818f BeesBlockData: fix data type issues
Not sure if these cause any problems, but they are theoretically
incorrect data types.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-02-28 23:58:28 -05:00
Zygo Blaxell
5bdad7fc93 crucible: progress: a progress tracker for worker queues
The task queue can become very large with many subvols, requiring hours
for the queue to clear.  'beescrawl.dat' saves in the meantime will save
the work currently scheduled, not the work currently completed.

Fix by tracking progress with ProgressTracker.  ProgressTracker::begin()
gives the last completed crawl position.  ProgressTracker::end() gives
the last scheduled crawl position.  begin() does not advance if there
is any item between begin() and end() is not yet completed.  In between
are crawled extents that are on the task queue but not yet processed.
The file 'beescrawl.dat' saves the begin() position while the extent
scanning task queue is fed from the end() position.

Also remove an unused method crawl_state_get() and repurpose the
operator<(BeesCrawlState) that nobody was using.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-02-28 23:49:39 -05:00
Zygo Blaxell
90c32c3f05 crucible: MAP_32BIT is not defined on ARM
Also fix a stray #if that should be #ifdef.

Closes:  https://github.com/Zygo/bees/issues/59

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-02-25 10:08:44 -05:00
Zygo Blaxell
33d274eabd resolve: break up long intra-extent dedup loops
When both block candidates for dedup are located in the same extent, bees
excludes them from deduplication because the dedup operation would not
free any space (both blocks are still referenced, so neither is deleted).
Candidates in other extents are still considered.

Typically a few blocks are duplicated many thousands or even millions
of times within a filesystem.  Many of these blocks appear in the same
extent as each other.  In cases where an extent contains an extremely
common duplicate block, it may appear multiple times in many extents.
bees can get into a loop with a very bad worst-case running time:  32768
blocks per extent * 2560 bees reference limit * 256 distinct hash table
entries = 21.5 *billion* iterations...squared, because this loop happens
every time bees encounteres any of the references.  Not an infinite
number, but close enough.

In each iteration of the loop, replace_dst detects that both src and dst
block are part of the same btrfs extent data item and therefore should
not be deduped; however, this occurs after the block has been allocated
and read by chase_extent_ref.  This dst is discarded, but the outer
loop tries again with another reference to the same block and gets the
same result.

An easy fix for this problem is to stop the loop immediately when the
same physical extent is found in both src and dst.  The condition is rare
enough to ignore the negligible space efficiency loss, and filesystem
scan stops dead if the loop is allowed to proceed.  An exception is
thrown to terminate the loop at scan_one_extent from within replace_dst.

It would be better to determine the extent bytenr of each candidate
extent and filter them out in scan_one_extent (which reduces the number
of LOGICAL_INO calls as a side-effect), but bees has no code capable of
doing extent data tree lookups with backward iteration yet.  Even better
would be to change the hash table format so that the extent bytenr can
be decoded directly from the hash table entry (this already exists for
compressed extents).  Both of these changes are too large for v0.6.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-02-25 10:08:42 -05:00
Zygo Blaxell
2ac94438bd README: FD caches are now cleared every 10 transactions
Also some other minor editorial changes.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-02-14 21:09:05 -05:00
Zygo Blaxell
9063c6442f README: clarify that bees is not to be used on old kernels
Also note that there is currently no released Linux kernel that is free
of relevant bugs.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-02-14 20:54:48 -05:00
Zygo Blaxell
86afa69cd1 cache: release lock before clearing
Clearing the FD cache could trigger a lot of inode evicts in the kernel,
which will block the cache entry destructors called by map::clear().
This prevents any cache lookups or new file opens while it happens.

Move the map to an auto variable and destroy it after releasing the
mutex lock.  This probably has the same net result (all the bees threads
will be blocked in the kernel instead of on a bees mutex), but at least
the problem is outside of userspace now.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-02-07 23:14:38 -05:00
Zygo Blaxell
8f0e88433e roots: get rid of common error messages, add more error counters
One very common case is losing a race to open a file that was deleted.
No need to spam the logs with mere ENOENT reports.

Other errors are more significant.  Log those with errno, and
add event counters to record them.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-02-07 23:12:01 -05:00
Zygo Blaxell
5c1b45d67c extentwalker: remove wrong constraint check
Extents that extend past EOF will have ipos = (file size rounded up
to next block) and e.end() = (file size not rounded), which fails this
constraint check.

The constraint check is wrong.  Remove it for now.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-02-07 00:07:57 -05:00
Zygo Blaxell
6aad124241 crawl: somebody should set max_transid
The previous commit had both max_transid assigments commented out.
It happens to work because we set max_transid in the constructor and
it doesn't change after that, but it's cleaner to assign it explicitly.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-31 22:52:12 -05:00
Zygo Blaxell
087ec26c44 crawl: filter extents correctly
When an extent ref is modified, all of the refs in the same metadata
page get the same transid in the TREE_SEARCH_V2 header.  This causes
two problems:

	- Extents with generation < min_transid are included if they
	happen to be referenced by pages with generation >= min_transid.

	- Extent refs with generation > max_transid are excluded even
	if they reference extents with generation <= max_transid.

Both of these are wrong:  the first causes some extents to be repeatedly
scanned, the second causes some extents to not be scanned at all.

Change the TREE_SEARCH_V2 parameters so that Crawl sees all extents
newer than min_transid (i.e. set max_transid to max).  The TREE_SEARCH_V2
kernel logic already operates this way, i.e. it fetches every page with
transid >= min_transid and discards newer items if they are too new for
max_transid.  Filter strictly by the extent reference generation field
(i.e. the copy of the extent generation that is in the extent reference).

Note this still scans extent data multiple times, but it should now
be exactly once per extent reference.  A proper fix for this requires
extent-based scanning instead of extent-ref-based scanning.

Formerly commit 5a8c655fc4 "roots: filter
out obsolete extents from extent refs" which landed in the subvol-threads
branch but not master.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-31 22:48:39 -05:00
Kai Krakow
408b6ae138 Code style: Fix wrong indentation
This had spaces instead of tabs by accident.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-29 21:37:40 -05:00
Kai Krakow
e3c4a07216 Makefile: Unclutter "make test" output
This adds a .txt Makefile target to create a text file which receives
the test program output. In case the test failed, it will cat the
contents and fail the target.

Execution of each test itself is forced, so it would run every time make
is invoked, thus no failing test would be missed.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-29 21:37:40 -05:00
Kai Krakow
d8241a7720 README: Add notes about packaging
Give some pointers on how to package bees for a distribution.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-29 21:37:40 -05:00
Kai Krakow
5590fc0b13 Cmdline: Fix text alignment
Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-29 21:37:40 -05:00
Kai Krakow
29d40ca359 Cmdline: Rename "relative-paths" to "strip-paths"
The previous name didn't match what this option really does.

Affects: #41

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-29 21:37:40 -05:00
Kai Krakow
b164717a25 Cmdline: Rename "notimestamps" to "no-timestamps"
That aligns better with the other options.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-29 21:37:40 -05:00
Zygo Blaxell
af250f7732 roots: determine transid_max without open()ing every subvol root
Scan the roots tree directly for roots other than 5 (the FS root), and
use btrfs_get_root_transid on root_fd for root 5.  This avoids filling
up the root FD cache every time we want a new transid_max.  Now the only
reason we open a subvol root FD is to open a file within the subvol.

transid_max may be the same as the FS root's transid, in which case
the search loop is not necessary.  Place a counter (transid_max_miss)
to see if we ever need to look at root items. If this counter never goes
above zero, or does so very rarely, we can delete the search loop.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-29 21:37:39 -05:00
Zygo Blaxell
4f0bc78a4c crawl: don't block a Task waiting for new transids
Task should not block for extended periods of time.

Remove the RateEstimator::wait_for() in crawl_roots.  When crawl_roots
runs out of data, let the last crawl_task end without rescheduling.
Schedule crawl_task again on transid polls if it was not already running.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-29 21:37:39 -05:00
Zygo Blaxell
b67fba0acd log: BEESLOGNOTE doesn't do what we think it does
BEESLOGNOTE was intended to combine BEESLOG and BEESNOTE, i.e. write a
log message and set the task status message from a single expression.
With the log levels we would now need several more variants
(BEESLOGNOTEDEBUG, BEESLOGNOTEERR...) or a parameter (BEESNOTELOG(DEBUG,
...)).

Or we give up on the idea.  This combination was used only 3 times so far.
The log messages and the note message have different editorial styles.

Remove the three instances of BEESLOGNOTE, and make the BEESLOGNOTE
definition equvalent to BEESLOG at LOG_NOTICE level for consistency.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-29 21:37:38 -05:00
Zygo Blaxell
92fda34a68 task: allow user access to ID and default constructor
The default constructor makes it more convenient to use Task as a
class member.

The ID is useful to disambiguate Task references.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-29 00:54:06 -05:00
Zygo Blaxell
2aacdcd95f time: add update_monotonic to RateEstimator
update_monotonic does not reset the counter if a new count is smaller than
earlier counts.  Useful when consuming an unsorted stream of eveent counts.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-29 00:51:13 -05:00
Zygo Blaxell
d367c6364c context: improve toxic match logs
Reword log message for discovery of new toxic extents vs. lookup of
previously known toxic extents.  Also add the block data (especially
filename) to the discovery message.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-29 00:48:06 -05:00
Zygo Blaxell
591a44e59a resolve: drop support for old-style compressed BeesAddr
No public version of bees ever created old-style compressed hash table
entries.  Remove the code that supports them.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-29 00:48:06 -05:00
Zygo Blaxell
27125b8140 README: add scan-mode 2 and expand descriptions of modes 0 and 1
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-29 00:48:06 -05:00
Zygo Blaxell
636328fdc2 roots: add scan-mode 2 "oldest crawler first"
Add a third scan mode with alternative trade-offs.

Benefits:  Good sequential read performance.  Avoids race conditions
described in https://github.com/Zygo/bees/issues/27.  Avoids diverting
scan resources into short-lived snapshots before their long-lived
origin subvols are fully scanned.

Drawbacks:  Takes the longest time of the three implemented scan-modes
to free space in extents that are shared between snapshots.  Uses the
maximum amount of temporary space.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-29 00:48:05 -05:00
Zygo Blaxell
ef44947145 roots: move common code for creating crawl Tasks into a method
Duplicated code between the different scan modes has slowly been
becoming less and less trivial.  Move the code to a method and
make both scan-modes call it.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-28 22:52:17 -05:00
Zygo Blaxell
72cc9c2b60 ExtentWalker: increase efficiency for typical btrfs extent sizes
Perf was blaming more than 50% of cycles on TREE_SEARCH_V2.  strace
showed 4 TREE_SEARCH_V2 calls for every pread in grow_backward().

Fix by increasing the extent fetch batch size so it is more likely
to include the desired items in the first fetch attempt.

This removes TREE_SEARCH_V2 from the top 10 list of cycle consumers.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-28 22:52:07 -05:00
Zygo Blaxell
e74c0a9d80 scan: fix length mismatch exception for prealloc extents at EOF
Prealloc extent sizes were taken from the Extent object and did not
take the file size into account.  If a file with a non-4K-aligned
size is preallocated, the resulting dedup fails with an exception
because the size of both ranges of the BeesRangePair do not match.

Limit the size of the replacement hole extent to not extend past the
end of the file.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-28 01:46:08 -05:00
Zygo Blaxell
762f833ab0 roots: poll every 10 transids
Restartng scans for each transid is a bit aggressive.  Scan every 10
transids for a polling rate close to the former BEES_COMMIT_INTERVAL.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
48e78bbe82 roots: use RateEstimator as a transid_max cache and clean up logs
transid_max is now measured at a single point in the crawl_transid thread.

Move the Crawl deferred logic into BeesRoots so it restarts all crawls
when transid_max increases.  Gets rid of some messy time arithmetic.

Change name of Crawl thread to "crawl_master" in both thread name and
log messages.

Replace "Next transid" with "Crawl started".

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
ded26ff044 FdCache: clear cache on every new transid / crawl cycle
The periodic cache age check was not protected by a lock, so multiple
threads may decide to concurrently clear the cache.  This led to
duplicate log messages.

Fix by moving the cache expiry trigger out of FdCache and into Roots,
which knows when transids change and can perform cache clears at exactly
the time they are most relevant, i.e. after something that was deleted
becomes permanently so.

This removes the last references to BEES_COMMIT_INTERVAL, so get rid
of its definition too.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
72857e84c0 crawl: combine two messages per crawl cycle into one
Now that the polling interval is up to 30 times faster,
next_transid seems too verbose again.

Make it clearer that the interval quoted in the "Deferring..."
message is the computed transaction polling interval.

Combine "Next transid" and "Restarted crawl" into a single message.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
0fdae37962 roots: use RateEstimator to track transids
Make the crawl polling interval more closely track the commit interval
on the btrfs filesystem.  In the future this will provide opportunities
to do things like clear FD caches and stop crawls on deleted subvols,
but triggered by transaction commits instead of arbitrary time intervals.

Rename the "crawl" thread so it no longer has the same name as the "crawl"
task, and repurpose it for dedicated transid polling.  Cancel the deletion
of crawl_thread and repurpose it to trigger new crawls and wake up the
main crawl Task when it runs out of data.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
4694c7d250 time: add RateEstimator, a class for optimally polling irregular external events
RateEstimator estimates the rate of external events by sampling a
counter.

Conversion functions are provided to predict the time when the
event counter will be incremented to particular values based on past
observations of the event counter.

Synchronization functions are provided to block a thread until a specific
counter value is reached.

Event polling is supported using the history of previous event counts
to determine the predicted time of the next event.  A decay function
emphasizes more recent event history.

Polling delays are bounded by minimum and maximum values in the constructor
parameters.

wait_for() and wait_until() block the calling thread until the target
event count is reached (or the counter is reset).  These functions are
not bounded by min_delay or max_delay, and require a separate tread
to call update().  wait_for() waits for the counter to be incremented
from its current value by the given count.  wait_until() waits for the
counter to reach an absolute value.

update() counts external events and unblocks threads that are blocked
in wait_for() or wait_until().  If the event counter decreases then it
is reset to the new value.

duration() and time_point() convert relative and absolute event counts
into relative and absolute C++11 time quantities based on the last update
time, last observed event count, and the observed event rate.

Convenience functions seconds_for() and seconds_until() calculate
polling delays for for the desired relative and absolute event counts
respectively.  These delays are bounded by max and min delay parameters.

rate() and ratio() provide conversion factors based on the current
estimated event rate.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
a3f02d5dec roots: comment updates and general cleanup
Fix discussion of nodatasum files, clarifying what we can and cannot do.

Get rid of some BEESNOTE and BEESTRACE calls which cannot be observed
(well, BEESNOTE can, but you have to be quick!).

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
f6909dac17 bees: drop BEESINFO
Having too many "write a message to the log" primitives is confusing,
and having one that intermittently and silently discards output is even
_more_ confusing.

Replace all BEESINFO with appropriate BEESLOG*s.  Usually DEBUG.
Except for one or two that occur too often.  Just delete those.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
bd2a15733c README: update Linux kernel bugs list (v4.14)
Add the new WARN_ON bug in v4.14.

Clarify what happens when bees is run on a kernel that is too old.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
4ecd467ca0 BeesBlockData: don't leak file contents in the log
The data field of BeesBlockData is only interesting to those who want
to debug the BeesBlockData implementation or other battle-tested parts
of bees.  Users who want to do this can modify and rebuild the source
to enable the output.

To everyone else, the data field is a huge, ongoing infoleak through
the log.

Don't bother with an option, just output the length of the data field
and nothing else.

Fixes:  https://github.com/Zygo/bees/issues/53

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:04 -05:00
Zygo Blaxell
71be53eff6 types: don't throw an exception when it's likely we are already reporting an exception
Empty files are a thing that can happen.  Don't bomb out just reporting
one's existence.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:04 -05:00
Zygo Blaxell
67ac537c5e time: drop unused Timer methods
Timer::set(double d) in particular seems...wrong.

Nothing uses them, so don't bother to fix them.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:04 -05:00
Zygo Blaxell
f64fc78e36 Task: convert print_fn to a string
Since we are now unconditionally rendering the print_fn as a static
string, there is no need for it to be a function.  We also need it to
be brief and mostly constant.

Use a string instead.  Put the string before the function in the Task
constructor arguments so that the title string appears as a heading in
code, since we are making a breaking API change already.

Drop TASK_MACRO as it is broken by this change, but there is no similar
usage of Task anywhere to make it worth fixing.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:04 -05:00
Zygo Blaxell
0710208354 BeesNote: thread naming fixes
Move pthread_setname_np to the same place we do pthread_getname_np.

Detect errors in pthread_getname_np--but don't throw an exception
because we would call ourself recursively from the exception handler
when it tries to log the exception.

Fix the order of set_name and the first BEESNOTE/BEESLOG call in threads,
closing small time intervals where logs have the wrong thread name,
and that wrong name becomes persistent for the thread.

Make the main thread's name "bees" because Linux kernel stack traces use
the pthread name of the main thread instead of the name of the process.

Anonymous threads get the process name (usually "bees").  We should not
have any such threads, but we do.  This appears to occur mostly during
exception stack unwinding.  GCC/pthread bug?

Fixes:  https://github.com/Zygo/bees/issues/51

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:47:47 -05:00
Kai Krakow
c17618c371 README: Some things are simply no longer true
Environment variables are no longer the /only/ option.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-20 14:47:04 -05:00
Kai Krakow
dee6f189bb README: Fix markdown syntax error
Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-20 14:47:04 -05:00
Kai Krakow
de6d7d6f25 Makefile: Get rid of test for-loop
Tests could now be run in parallel. Additionally, single tests can be
run by simply using "make testname", i.e. "make chatter" would run the
chatter test.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-20 14:44:27 -05:00
Kai Krakow
63f249f005 Makefile: force rebuilding tests when Makefile changed
Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-20 14:43:37 -05:00
Kai Krakow
ca1a3bed12 Makefile: -lXXXXX is really a filename parameter
According to gcc docs, -l is converted to a filename which makes it a
filename parameter. Let's move it to the end.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-20 14:43:21 -05:00
Kai Krakow
d6312c338b Logging: Improve text layout when discarding log timestamps
When timestamps are removed from logging, the current text layout shows
lines like

tid 12345 thread_name: Example log

Let's convert it to a more conforming layout:

thread_name[12345]: Example log

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-20 14:42:49 -05:00
Zygo Blaxell
5533d09b3d Merge remote-tracking branch 'kakra/proposal/prepare-for-more-libs' 2018-01-20 14:23:55 -05:00
Zygo Blaxell
4c05c53d28 roots: update Task print functions for new usage
This restores the old "crawl" prefix in the case of Crawler log messages.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-20 14:00:52 -05:00
Zygo Blaxell
5063a635fc logging: get Task names for log messages
When a Task worker thread is executing a Task, the thread name is less
useful than the Task description.

Use the Task description instead of the thread name if the thread has
no BeesThread name and the thread is currently executing a task.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-20 14:00:51 -05:00
Zygo Blaxell
fef7aed8fa BeesNote: if thread name was not set, get it from Task or pthread_getname_np
Threads from the Task module in libcrucible don't set BeesNote::tl_name.
Even if they did, in Task context the thread name is unspecific to the point
of meaninglessness.

Use the Task::print method as the name for such threads, and be sure
that future Task print functions are designed for that usage.

The extra complexity in BeesNote::get_name() seems preferable to
bombarding pthread_setname_np hundreds or thousands of times per second.

FIXME:  we are now calling Task::print() on every BeesNote, which
is effectively unconditionally.  Maybe we should have Task::print()
and get_name() return a closure, or just evaluate Task::print() once
and cache it in TaskState, or define Task's constructor with a string
argument instead of the current print_fn closure.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-20 13:57:51 -05:00
Zygo Blaxell
3f60a0efde task: allow external access to Task print function
This enables bees' thread introspection to use task descriptions in
status and log messages.

BeesNote will be calling Task::current_task() from non-Task contexts,
which means we need to allow Task's shared state pointer to be null.
Remove some asserts that will ruin our day in that case.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-20 13:51:05 -05:00
Zygo Blaxell
e970ac6c02 crawl: make logging less verbose
Silence the three(!) log messages per crawl increment an extra one at
the end of the subvol.

The three critical messages per subvol crawl cycle are:

	Next transid in BeesCrawlState <SUBVOL>:0 offset 0x0 transid <A>..<B> started <T> (<AGO>s ago)

Subvol has been completely scanned and a new transaction range will
be created.  CrawlState is the state of the old subvol.

	Restarted crawl BeesCrawlState <SUBVOL>:0 offset 0x0 transid <B>..<C> started <T+AGO> (0s ago)

Subvol has been restarted.  CRawlState is the state of the new subvol.

	Deferring next transid in BeesCrawlState <SUBVOL>:0 offset 0x0 transid <B>..<C> started <T+AGO> (0s ago)

Subvol has been completely scanned, but it is too soon to start a
new scan.

Fix the "Restart..." message to use the correct verb tense and to use
the correct BeesCrawlState data.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-20 13:50:47 -05:00
Zygo Blaxell
38ccf5c921 counters: track pair growing time
When we find a matching block we attempt to extend ("grow") the matched
pair around the first matching block.  This function takes the IO hit of
reading the second extent from each duplicate extent pair.  It's also
very slow--too many allocations, too small reads, reads in the wrong
order, an order of magnitude too many calls to TREE_SEARCH_V2, and it
is usually in the top 3 most frequent PERFORMANCE warnings.

Start tracking the running time of grows using the pairforward_ms
and pairbackward_ms counters so that we can compare it to various
replacements.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-20 13:04:56 -05:00
Kai Krakow
826b27fde2 Makefile: Fix some dependencies
Some deps are already referenced by depends.mk, some where actually
missing.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-19 01:50:13 +01:00
Kai Krakow
8a5f790a03 Makefile: Some cleanups
Reorder and reformat some arguments so it looks more streamlined during
the build process.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-19 01:50:13 +01:00
Kai Krakow
677da5de45 Logging: Add log levels to output
This commit adds log levels to the output. In systemd, it makes colored
lines, otherwise it's probably just a number. Bees is very chatty, so
this paves the road for log level filtering.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 23:41:29 +01:00
Kai Krakow
d6b847db0d Makefile: speedup dependency generation
Dependencies can be generated in parallel which can be much faster. It
also puts away the problem that for may fail multiple times in a row and
leaving behind a broken intermediate file which would be picked up by
successive runs.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 22:53:00 +01:00
Kai Krakow
b8f933d360 Makefile: do not be verbose about mv
A small left-over from me fixing the same problem as Zygo did in his
merged branch.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 22:53:00 +01:00
Kai Krakow
27b12821ee Makefile: Generalize the .version.cc target
This enables us to move the file around later.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 22:53:00 +01:00
Kai Krakow
fdf434e8eb Makefile: fix dependency generation
Let's generalize the depends.mk target so we can easily move files
around later. While doing it, let's also fix the "gcc -M" call to use
explicit target names and not clobber it with preprocessor output.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 22:53:00 +01:00
Kai Krakow
bc1b67fde1 Makefile: rename OBJS to CRUCIBLE_OBJS
This paves the way for building different .so libs.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 22:53:00 +01:00
Kai Krakow
4cfd5b43da Makefile: generalize .so target
We can generalize the .so target by moving its depends into rules
without build instructions.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 22:53:00 +01:00
Kai Krakow
4789445d7b Makefile: .o already depends on its .h file
We can remove the explicit depend on the .h file because that is covered
by depends.mk. Let's instead depend on makeflags which makes more sense.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 22:53:00 +01:00
Kai Krakow
c8787fecd2 Makefile: depends.mk is not an optional include
We really need depends.mk in the following Makefile reorganization.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 22:53:00 +01:00
Zygo Blaxell
4943a07cce crucible: cache: linked-list LRU implementation
We need a better cache expiration algorithm than "make a copy of
the entire thing, sort it while holding a lock, and delete half
the items in a single burst."

Replace the Lamport clock with a double-linked list.  Each insert
or lookup operation moves the affected item to the head of the list.
Each erase operation deletes one single item at the tail of the list.

Also sort out some iterator invalidation nonsense by doing erases before
inserts instead of "insert, erase, find the inserted item again because
we invalidated the found iterator during the erase."

The new implementation adds a second word-sized member to each Value
as well as a copy of the Key.  Hopefully the enlarged size is not
a deal-breaker.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:58:44 -05:00
Zygo Blaxell
00d9b8ed76 hash: do the mlock after loading the table
The mlock runs much faster, probably because the hash fetches are
doing most of the work that mlock does.

It makes bees startup latency for testing smaller, even if it takes more
time in absolute terms.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:58:44 -05:00
Zygo Blaxell
e8b4ab54c6 README: describe the scanning mode (-m option)
Include a brief description of the two algorithms without getting
into too much detail for an ostensibly temporary feature.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:58:44 -05:00
Zygo Blaxell
56c23c4517 crawl: implement two crawler algorithms and adjust scheduling parameters
There are two subvol scan algorithms implemented so far.  The two modes
are unimaginatively named 0 and 1.

	0:  sorts extents by (inode, subvol, offset),

	1:  scans extents round-robin from all subvols.

Algorithm 0 scans references to the same extent at close to the same
time, which is good for performance; however, whenever a snapshot is
created, the scan of the entire filesystem restarts at the beginning of
the new snapshot.

Algorithm 1 makes continuous forward progress even when new snapshots
are created, but it does not benefit from caching and will force the
kernel to reread data multiple times when there are snapshots.

The algorithm can be selected at run-time using the -m or --scan-mode
option.

We can collect some field data on these before replacing them with
an extent-tree-based scanner.  Alternatively, for pre-4.14 kernels,
we can keep these two modes as non-default options.

Currently these algorithms have terrible names.  TODO:  fix that, but
also TODO: delete all that code and do scans directly from the extent
tree instead.

Augment the scan algorithms relative to their earlier implementation by
batching multiple extents to scan from each subvol before switching to
a different subvol.

Sprinkle some BEESNOTEs on the Task objects so that they don't
disappear from the thread status output.

Adjust some timing constants to deal with the increased latency from
competing threads.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:53:49 -05:00
Zygo Blaxell
055c8d4c75 roots: scan in parallel using Tasks
Distribute incoming extents across a thread pool for faster execution
on multi-core, multi-disk environments.

Switch extent enumeration model to scan extent refs consecutively(ish).

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:52:00 -05:00
Zygo Blaxell
090d79e13b crucible: remove unused TimeQueue and WorkQueue classes
WorkQueue is superceded by Task.  TimeQueue will be replaced by
something based on Tasks.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:52:00 -05:00
Zygo Blaxell
796aaed7f8 roots: remove dead code and #if blocks
In both instances the code contained within (or the conditional
compilation surrounding it) is no longer controversial.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:52:00 -05:00
Zygo Blaxell
8849e57bf0 crucible: add Task class
We need a mechanism for distributing work across processor cores and
disks.

Task implements a simple FIFO/LIFO queue model for executing closures.
Some locking primitives are included (mutex and barrier).

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:51:59 -05:00
Zygo Blaxell
844a488157 README: update dependencies and Linux kernel bugs list
Bees will someday rely on features available only in kernel v4.14.

Let's start now by removing workarounds for bugs that were fixed in v4.11.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:51:59 -05:00
Zygo Blaxell
a175ee0689 bees: clean up #if 0 ... fsync ... #endif code
Remove some dead code because dedup-related deadlocks have not been
observed since Linux kernel v4.11.

Preserve rationale of remaining #if 0 block (why we do write/rename
instead of write/fsync/rename) so that people don't try to replace the
"missing" fsync() there.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:30:07 -05:00
Zygo Blaxell
f376b8e90d test: add -lpthread to Makefile
This resolves missing symbol build errors.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:30:07 -05:00
Zygo Blaxell
3da755713a Makefiles: don't append to depends.mk.new
Fixes errors such as:

	depends.mk:765: *** multiple target patterns.  Stop.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:30:07 -05:00
Zygo Blaxell
8d3a27bf85 subvol-threads: increase resource and thread limits
With kernel 4.14 there is no sign of the previous LOGICAL_INO performance
problems, so there seems to be no need to throttle threads using this
ioctl.

Increase the FD cache size limits and scan thread count.  Let the kernel
figure out scheduling.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:30:07 -05:00
Zygo Blaxell
42a6053229 roots: remove open_root_cache correctly
BEESNOTE puts a message on the status message stack.  BEESINFO logs a
message with rate limiting.  The message that was flooding the logs
was coming from BEESINFO not BEESNOTE.

Fix earlier commit which removed the wrong message.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:30:07 -05:00
Zygo Blaxell
c477618924 crucible: resource: optimize map cleanup
We were holding weak refs until the next time the resource ID was used.
This is a bad thing if resource IDs are sparse (e.g. pointers or hashes)
because we'll never see an ID twice.

To fix, determine whether we released the last instance of a resource,
and if so, free its weak ref immediately.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:30:07 -05:00
Zygo Blaxell
35100c2b9e crucible: resource: remove excess locking
The bugs in other parts of the code have been identified and fixed,
so the overprotective locks around shared_ptr can be removed.

Keep the other improvements to the Resource class.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:30:06 -05:00
Zygo Blaxell
116f15ace5 lockset: drop unused method wait_unlock
This function is not used and does not appear to be useful.

Remove it.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:30:06 -05:00
Zygo Blaxell
8a68b5f20b crucible: add cleanup class
Store a function (or closure) in an instance and invoke the function
from the destructor.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-15 11:07:48 -05:00
Kai Krakow
6d6aedd8ec Makefile: Fail gracefully if markdown is not installed
Previously, MARKDOWN may end up empty. This commit should fix it.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-11 21:25:12 +01:00
Kai Krakow
025b14f38f Installation: Depend Gentoo ebuild on markdown
Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-11 20:56:48 +01:00
Kai Krakow
dd6d8caaa2 Installation: Remove superfluous cruft from Gentoo ebuild
Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-11 20:56:34 +01:00
Zygo Blaxell
4bfb637b0e Merge remote-tracking branch 'nefelim4ag/master' 2018-01-10 23:43:00 -05:00
Zygo Blaxell
4aa5978a89 hash: reduce mutex contention using one mutex per hash table extent
This avoids PERFORMANCE warnings when large hash tables are used on slow
CPUs or with lots of worker threads.  It also simplifies the code (no
locksets, only one object-wide mutex instead of two).

Fixed a few minor bugs along the way (e.g. we were not setting the dirty
flag on the right hash table extent when we detected hash table errors).

Simplified error handling:  IO errors on the hash table are ignored,
instead of throwing an exception into the function that tried to use the
hash table.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-10 23:25:45 -05:00
Kai Krakow
365a913a26 Installation: Add Gentoo ebuild
This commit adds an ebuild for Gentoo. Version 9999 is building live
from current git, currently using kakra:integration because it has some
installation and build fixes important for Gentoo.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-11 03:03:17 +01:00
Kai Krakow
634a1d0bf6 Installation: -fPIC should not be used unconditionally
According to Gentoo packaging guide, -fPIC should only be used on shared
libraries, and not added unconditionally to every linker call.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-11 02:30:12 +01:00
Kai Krakow
3a24cd3010 Installation: Fix soname QA warning in Gentoo
Gentoo warns about libs missing a proper soname during QA phase. Let's
fix this.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-11 02:30:12 +01:00
Kai Krakow
3391593cb9 Installation: Keep version tag in a variable
To prepare soname handling, we need to keep the version tag in a
variable.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-11 02:02:36 +01:00
Kai Krakow
fdd8350239 Installation: Improve filesystem layout flexibility
In preparation for Gentoo QA checks during ebuild merge phase, let's
make some more of the filesystem layout adjustable.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-11 02:02:36 +01:00
Kai Krakow
60cd9c6165 Installation: Introduce DESTDIR into Makefile
In Gentoo, usage of DESTDIR is automatically handled by the build system
to support installation into a clean image from which the package is
created.

Thus, let's add DESTDIR to the install targets. One can now correctly
install bees with packaging systems simply by running:

$ DESTDIR=/tmp/bees-image make all install

This will no longer mess up with the PREFIX setting.

CC: Timofey Titovets <nefelim4ag@gmail.com>
Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-10 22:35:22 +01:00
Kai Krakow
f0e02478ef Installation: Document optional dependency on blkid
If using `scripts/beesd`, we need `blkid` which is part of util-linux.
It should be available on every distribution but let's document it
anyway.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-10 21:47:01 +01:00
Kai Krakow
421641e242 Makefile: Document scripts/beesd
Add a paragraph about the helper script `scripts/beesd` to automatically
setup and configure bees.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-10 21:06:49 +01:00
Kai Krakow
0fce10991b Installation: Add Arch Linux instructions
Closes #34

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-10 20:43:54 +01:00
Kai Krakow
a465d997bd Makefile: Document Makefile changes 2018-01-10 20:41:56 +01:00
Kai Krakow
361ef0bebf Installation: Add new section to README 2018-01-10 20:41:37 +01:00
Kai Krakow
1fcf07cc2a Installation: Prepare README
Rename a section in preparation for a new install section.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-10 20:41:00 +01:00
Kai Krakow
333b2f7746 Makefile improvement
Now you can make bees fly as pointed out in the README... ;-)
2018-01-10 20:09:38 +01:00
Timofey Titovets
ff9e0e3571 Fix: exec bees - breaks bash trap handling of umount bees workdir
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
2018-01-09 23:25:57 +01:00
Timofey Titovets
2d49d98bd2 Fix: exec bees - breaks bash trap handling of umount bees workdir
Signed-off-by: Timofey Titovets <nefelim4ag@gmail.com>
2018-01-09 22:33:57 +03:00
Kai Krakow
92aa13a6ae Add beesd@.service to gitignore
It's a generated file. We should ignore it, so it won't be accidently be
checked in.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-09 01:56:22 +01:00
Kai Krakow
f0c516f33b Makefile: let "make install" install the complete distribution
It happened more than once that I ran just "make install" only, which
doesn't install the scripts.

Let's fix this by renaming the previous install target to install_bees,
and then make a new install target which depends on each install target
and thus installs the complete distribution.

It doesn't hurt to install those few scripts. I don't see the point in
separating the install targets as it was previously done.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-09 01:54:41 +01:00
Kai Krakow
8e2139d6ed Makefile: depend install_scripts on scripts
For consistency with the other install target, let's depend
install_scripts on its build targets.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-09 01:51:18 +01:00
Kai Krakow
b959af1a15 systemd: Provide URL and better description
Let's direct users to the support site when they ask systemd for help
about the service unit, or by looking at error messages.

Also, let's adjust the description to be more pleasing to the eyes. The
previous long description with uncommon formatting really stuck out in
the boot logs.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-09 01:32:17 +01:00
Kai Krakow
78f96a9fbd systemd: Don't start without essential system services
Starting bees right after local-fs.target is probably not what we want,
as basic setup of the system might not have been done (like udev,
cryptsetup, sysctl, swap, etc).

Let's start only after sysinit.target instead which guarantees that all
basic setup has been done, most importantly, sysctl, udev, and swap have
been setup which may apply important tweaks, configuration, and tuning.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-09 01:29:05 +01:00
Kai Krakow
953c158868 systemd: Don't start in system-update.target
Due to bees installing into the local-fs.target, bees also runs during
system-update.target. This should not be done, system-update.target is
meant as an isolated bootup mode for applying updates offline, that is:
Only essential services are running.

Fix this by making it WantedBy basic.target instead. According to
system-update.target and "man bootup", system-update.target pulls in
sysinit.target, as does basic.target. So essentially, basic.target is
not part of the system-update.target transaction.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-09 01:24:35 +01:00
Kai Krakow
f7f99f52b5 Generalize sed invocation rule
Remove the redundant sed call by generalizing the rule to apply sed to
.in templates.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-09 00:48:14 +01:00
Kai Krakow
abeb6e74b2 Add scripts to "make all" target
This prevents scripts being generated by "root" during "sudo make
install" phase.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-09 00:48:14 +01:00
Kai Krakow
6c67ae0d5e Don't zap localconf in "make clean"
When you run "make clean", localconf is being removed. This is probably
in most cases not intentional.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-09 00:48:14 +01:00
Zygo Blaxell
ba981c133a Merge remote-tracking branches 'kakra/feature/add-relative-path-option' and 'kakra/integration' 2018-01-07 21:39:01 -05:00
Kai Krakow
3024e43355 Fix a fallthrough error in GCC 7+
GCC 7 and higher turn a previous warning into an error for implicit
fallthrough. Let's hint the compiler that this is intentional here.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2017-11-14 07:00:28 +01:00
Kai Krakow
270a91cf17 Fix a fallthrough error in GCC 7+
GCC 7 and higher turn a previous warning into an error for implicit
fallthrough. Let's hint the compiler that this is intentional here.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2017-11-14 06:58:43 +01:00
Kai Krakow
93ba0f48de Make clear that options must be supplied in one variable
Previously, expectations may fail when just uncommenting both lines.
2017-11-14 06:58:43 +01:00
Kai Krakow
d930136484 Remove process forking from frontend script
Now with the patches integrated to filter logging output, we can finally
remove forking a subprocess and stop redirecting file descriptors.

We instead use exec to replace the process with the final daemon.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2017-11-14 06:58:43 +01:00
Kai Krakow
f7320baa56 Fix indentation/alignment after integration 2017-11-14 06:58:43 +01:00
Kai Krakow
21212cd3e3 Fix example config for timestamp logging 2017-11-14 06:58:43 +01:00
Kai Krakow
0c6a4d00c8 Remove filter path logic from frontend script
Now with relative path filtering in place, we can now give sub spawning
subshells in the frontend script.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2017-11-14 01:16:06 +01:00
Kai Krakow
52997936d5 getopt: Add logic to set relative path from $CWD
This commit adds a new option to set relative path output for name_fd().

Signed-off-by: Kai Krakow <kai@kaishome.de>
2017-11-14 01:16:06 +01:00
Kai Krakow
755f16a948 crucible: Allow setting a relative path option for name_fd()
This commit adds an option to store a relative path in prepartion for
more human-friendly log output.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2017-11-14 01:16:06 +01:00
53 changed files with 3456 additions and 1355 deletions

3
.gitignore vendored
View File

@@ -1,6 +1,7 @@
*.[ao]
*.bak
*.new
*.dep
*.so*
Doxyfile
README.html
@@ -11,4 +12,6 @@ latex/
make.log
make.log.new
localconf
lib/configure.h
scripts/beesd
scripts/beesd@.service

8
Defines.mk Normal file
View File

@@ -0,0 +1,8 @@
MAKE += PREFIX=$(PREFIX) LIBEXEC_PREFIX=$(LIBEXEC_PREFIX) ETC_PREFIX=$(ETC_PREFIX)
define TEMPLATE_COMPILER =
sed $< >$@ \
-e's#@PREFIX@#$(PREFIX)#' \
-e's#@ETC_PREFIX@#$(ETC_PREFIX)#' \
-e's#@LIBEXEC_PREFIX@#$(LIBEXEC_PREFIX)#'
endef

View File

@@ -1,35 +1,51 @@
PREFIX ?= /
LIBEXEC_PREFIX ?= $(PREFIX)/usr/lib/bees
PREFIX ?= /usr
ETC_PREFIX ?= /etc
LIBDIR ?= lib
MARKDOWN := $(firstword $(shell which markdown markdown2 markdown_py 2>/dev/null))
MARKDOWN ?= markdown
LIB_PREFIX ?= $(PREFIX)/$(LIBDIR)
LIBEXEC_PREFIX ?= $(LIB_PREFIX)/bees
SYSTEMD_SYSTEM_UNIT_DIR ?= $(shell pkg-config systemd --variable=systemdsystemunitdir)
MARKDOWN := $(firstword $(shell type -P markdown markdown2 markdown_py 2>/dev/null || echo markdown))
BEES_VERSION ?= $(shell git describe --always --dirty || echo UNKNOWN)
# allow local configuration to override above variables
-include localconf
default all: lib src test README.html
DEFAULT_MAKE_TARGET ?= reallyall
ifeq ($(DEFAULT_MAKE_TARGET),reallyall)
RUN_INSTALL_TESTS = test
endif
include Defines.mk
default: $(DEFAULT_MAKE_TARGET)
all: lib src scripts
docs: README.html
reallyall: all docs test
clean: ## Cleanup
git clean -dfx
git clean -dfx -e localconf
.PHONY: lib src test
lib: ## Build libs
$(MAKE) -C lib
$(MAKE) TAG="$(BEES_VERSION)" -C lib
src: ## Build bins
src: lib
$(MAKE) -C src
$(MAKE) BEES_VERSION="$(BEES_VERSION)" -C src
test: ## Run tests
test: lib src
$(MAKE) -C test
scripts/beesd: scripts/beesd.in
sed -e's#@LIBEXEC_PREFIX@#$(LIBEXEC_PREFIX)#' -e's#@PREFIX@#$(PREFIX)#' "$<" >"$@"
scripts/beesd@.service: scripts/beesd@.service.in
sed -e's#@LIBEXEC_PREFIX@#$(LIBEXEC_PREFIX)#' -e's#@PREFIX@#$(PREFIX)#' "$<" >"$@"
scripts/%: scripts/%.in
$(TEMPLATE_COMPILER)
scripts: scripts/beesd scripts/beesd@.service
@@ -37,16 +53,31 @@ README.html: README.md
$(MARKDOWN) README.md > README.html.new
mv -f README.html.new README.html
install: ## Install bees + libs
install: lib src test
install -Dm644 lib/libcrucible.so $(PREFIX)/usr/lib/libcrucible.so
install -Dm755 bin/bees $(LIBEXEC_PREFIX)/bees
install_libs: lib
install -Dm644 lib/libcrucible.so $(DESTDIR)$(LIB_PREFIX)/libcrucible.so
install_tools: ## Install support tools + libs
install_tools: install_libs src
install -Dm755 bin/fiemap $(DESTDIR)$(PREFIX)/bin/fiemap
install -Dm755 bin/fiewalk $(DESTDIR)$(PREFIX)/sbin/fiewalk
install_bees: ## Install bees + libs
install_bees: install_libs src $(RUN_INSTALL_TESTS)
install -Dm755 bin/bees $(DESTDIR)$(LIBEXEC_PREFIX)/bees
install_scripts: ## Install scipts
install_scripts:
install -Dm755 scripts/beesd $(PREFIX)/usr/sbin/beesd
install -Dm644 scripts/beesd.conf.sample $(PREFIX)/etc/bees/beesd.conf.sample
install -Dm644 scripts/beesd@.service $(PREFIX)/lib/systemd/system/beesd@.service
install_scripts: scripts
install -Dm755 scripts/beesd $(DESTDIR)$(PREFIX)/sbin/beesd
install -Dm644 scripts/beesd.conf.sample $(DESTDIR)/$(ETC_PREFIX)/bees/beesd.conf.sample
ifneq (SYSTEMD_SYSTEM_UNIT_DIR,)
install -Dm644 scripts/beesd@.service $(DESTDIR)$(SYSTEMD_SYSTEM_UNIT_DIR)/beesd@.service
endif
install: ## Install distribution
install: install_bees install_scripts $(OPTIONAL_INSTALL_TARGETS)
help: ## Show help
@fgrep -h "##" $(MAKEFILE_LIST) | fgrep -v fgrep | sed -e 's/\\$$//' | sed -e 's/##/\t/'
bees: reallyall
fly: install

210
README.md
View File

@@ -102,7 +102,7 @@ and some metadata bits). Each entry represents a minimum of 4K on disk.
To change the size of the hash table, use 'truncate' to change the hash
table size, delete `beescrawl.dat` so that bees will start over with a
fresh full-filesystem rescan, and restart `bees'.
fresh full-filesystem rescan, and restart `bees`.
Things You Might Expect That Bees Doesn't Have
----------------------------------------------
@@ -152,13 +152,13 @@ Good Btrfs Feature Interactions
Bees has been tested in combination with the following:
* btrfs compression (either method), mixtures of compressed and uncompressed extents
* btrfs compression (zlib, lzo, zstd), mixtures of compressed and uncompressed extents
* PREALLOC extents (unconditionally replaced with holes)
* HOLE extents and btrfs no-holes feature
* Other deduplicators, reflink copies (though Bees may decide to redo their work)
* btrfs snapshots and non-snapshot subvols (RW only)
* btrfs snapshots and non-snapshot subvols (RW and RO)
* Concurrent file modification (e.g. PostgreSQL and sqlite databases, build daemons)
* all btrfs RAID profiles (people ask about this, but it's irrelevant)
* all btrfs RAID profiles (people ask about this, but it's irrelevant to bees)
* IO errors during dedup (read errors will throw exceptions, Bees will catch them and skip over the affected extent)
* Filesystems mounted *with* the flushoncommit option
* 4K filesystem data block size / clone alignment
@@ -166,25 +166,40 @@ Bees has been tested in combination with the following:
* Large (>16M) extents
* Huge files (>1TB--although Btrfs performance on such files isn't great in general)
* filesystems up to 25T bytes, 100M+ files
* btrfs read-only snapshots
* btrfs receive
* btrfs nodatacow/nodatasum inode attribute or mount option (bees skips all nodatasum files)
* open(O_DIRECT) (seems to work as well--or as poorly--with bees as with any other btrfs feature)
Bad Btrfs Feature Interactions
------------------------------
Bees has been tested in combination with the following, and various problems are known:
* bcache, lvmcache: *severe (filesystem-destroying) metadata corruption
issues* observed in testing and reported by users, apparently only when
used with bees. Plain SSD and HDD seem to be OK.
* btrfs send: sometimes aborts with an I/O error when bees changes the
data layout during a send. The send can be restarted and will work
if bees has finished processing the snapshot being sent. No data
corruption observed other than the truncated send.
* btrfs qgroups: very slow, sometimes hangs
* btrfs autodefrag mount option: hangs and high CPU usage problems
reported by users. bees cannot distinguish autodefrag activity from
normal filesystem activity and will likely try to undo the autodefrag,
so it should probably be turned off for bees in any case.
Untested Btrfs Feature Interactions
-----------------------------------
Bees has not been tested with the following, and undesirable interactions may occur:
* Non-4K filesystem data block size (should work if recompiled)
* Non-equal hash (SUM) and filesystem data block (CLONE) sizes (probably never will work)
* btrfs send/receive (receive is probably OK, but send could be confused?)
* btrfs qgroups (never tested, no idea what might happen)
* btrfs seed filesystems (does anyone even use those?)
* btrfs autodefrag mount option (never tested, could fight with Bees)
* btrfs nodatacow/nodatasum inode attribute or mount option (bees skips all nodatasum files)
* btrfs out-of-tree kernel patches (e.g. in-band dedup or encryption)
* btrfs-convert from ext2/3/4 (never tested, might run out of space or ignore significant portions of the filesystem due to sanity checks)
* btrfs mixed block groups (don't know a reason why it would *not* work, but never tested)
* open(O_DIRECT)
* Filesystems mounted *without* the flushoncommit option
* Filesystems mounted *without* the flushoncommit option (don't know the impact of crashes during dedup writes vs. ordinary writes)
Other Caveats
-------------
@@ -251,7 +266,7 @@ in the future):
Bug fixes (sometimes included in older LTS kernels):
* Bugs fixed prior to 4.4.3 are not listed here.
* Bugs fixed prior to 4.4.107 are not listed here.
* 4.5: hang in the `INO_PATHS` ioctl used by Bees.
* 4.5: use-after-free in the `FILE_EXTENT_SAME` ioctl used by Bees.
* 4.6: lost inodes after a rename, crash, and log tree replay
@@ -260,10 +275,30 @@ Bug fixes (sometimes included in older LTS kernels):
takes too long to resolve a block address to a root/inode/offset triple.
* 4.10: reduced CPU time cost of the LOGICAL_INO ioctl and dedup
backref processing in general.
* 4.11: yet another dedup deadlock case is fixed.
* 4.14: backref performance improvements make LOGICAL_INO even faster.
* 4.11: yet another dedup deadlock case is fixed. Alas, it is not the
last one.
* 4.14: backref performance improvements make LOGICAL_INO even faster
in the worst cases (but possibly slower in the best cases?).
* 4.14.29: WARN_ON(ref->count < 0) in fs/btrfs/backref.c triggers
almost once per second. The WARN_ON is incorrect and can be removed.
Unfixed kernel bugs (as of 4.11.9) with workarounds in Bees:
Unfixed kernel bugs (as of 4.14.34) with workarounds in Bees:
* *Deadlocks* in the kernel dedup ioctl when files are modified
immediately before dedup. `BeesTempFile::make_copy` calls `fsync()`
immediately before dedup to work around this. If the `fsync()` is
removed, the filesystem hangs within a few hours, requiring a reboot
to recover. Even with the `fsync()`, it is possible to lose the
kernel race condition and encounter a deadlock within a machine-year.
VM image workloads may trigger this faster. Over the past years
several specific deadlock cases have been fixed, but at least one
remains.
* *Bad interactions* with other Linux block layers: bcache and lvmcache
can fail spectacularly, and apparently only while running bees.
This is definitely a kernel bug, either in btrfs or the lower block
layers. Avoid using bees with these tools, or test very carefully
before deployment.
* *slow backrefs* (aka toxic extents): If the number of references to a
single shared extent within a single file grows above a few thousand,
@@ -272,7 +307,8 @@ Unfixed kernel bugs (as of 4.11.9) with workarounds in Bees:
measuring the time the kernel spends performing certain operations
and permanently blacklisting any extent or hash where the kernel
starts to get slow. Inside Bees, such blocks are marked as 'toxic'
hash/block addresses. *Needs to be retested after v4.14.*
hash/block addresses. Linux kernel v4.14 is better but can still
have problems.
* `LOGICAL_INO` output is arbitrarily limited to 2730 references
even if more buffer space is provided for results. Once this number
@@ -295,75 +331,128 @@ Unfixed kernel bugs (as of 4.11.9) with workarounds in Bees:
list of all extent refs referencing a data extent (i.e. Bees wants
the compressed-extent behavior in all cases). *Fixed in v4.14.*
* `LOGICAL_INO` is only called from one thread at any time per process.
This means at most one core is irretrievably stuck in this ioctl.
* `FILE_EXTENT_SAME` is arbitrarily limited to 16MB. This is less than
128MB which is the maximum extent size that can be created by defrag
or prealloc. Bees avoids feedback loops this can generate while
attempting to replace extents over 16MB in length.
* If the `fsync()` in `BeesTempFile::make_copy` is removed, the filesystem
hangs within a few hours, requiring a reboot to recover. On the other
hand, the `fsync()` only costs about 8% of overall performance.
* **Systems with many CPU cores** may [lock up when bees runs with one
worker thread for every core](https://github.com/Zygo/bees/issues/91).
bees limits the number of threads it will try to create based on
detected CPU core count. Users may override this limit with the
[`--thread-count` option](options.md).
Not really a bug, but a gotcha nonetheless:
Not really bugs, but gotchas nonetheless:
* If a process holds a directory FD open, the subvol containing the
directory cannot be deleted (`btrfs sub del` will start the deletion
process, but it will not proceed past the first open directory FD).
`btrfs-cleaner` will simply skip over the directory *and all of its
children* until the FD is closed. Bees avoids this gotcha by closing
all of the FDs in its directory FD cache every 15 minutes.
all of the FDs in its directory FD cache every 10 btrfs transactions.
* If a file is deleted while Bees is caching an open FD to the file,
Bees continues to scan the file. For very large files (e.g. VM
images), the deletion of the file can be delayed indefinitely.
To limit this delay, Bees closes all FDs in its file FD cache every
15 minutes.
10 btrfs transactions.
Build
-----
* If a snapshot is deleted, bees will generate a burst of exceptions
for references to files in the snapshot that no longer exist. This
lasts until the FD caches are cleared.
Installation
============
Bees can be installed by following one these instructions:
Arch package
------------
Bees is available in Arch Linux AUR. Install with:
`$ pacaur -S bees-git`
Gentoo ebuild
-------------
Bees is available as a Gentoo ebuild. Just copy `bees-9999.ebuild` from
`contrib/gentoo` including the `files` subdirectory to your local
overlay category `sys-fs`.
You can copy the ebuild to match a Bees version number, and it will
build that tagged version. It is partly supported since v0.5,
previous versions won't work.
Build from source
-----------------
Build with `make`. The build produces `bin/bees` and `lib/libcrucible.so`,
which must be copied to somewhere in `$PATH` and `$LD_LIBRARY_PATH`
on the target system respectively.
It will also generate `scripts/beesd@.service` for systemd users. This
service makes use of a helper script `scripts/beesd` to boot the service.
Both of the latter use the filesystem UUID to mount the root subvolume
within a temporary runtime directory.
### Ubuntu 16.04 - 17.04:
`$ apt -y install build-essential btrfs-tools uuid-dev markdown && make`
### Ubuntu 14.04:
You can try to carry on the work done here: https://gist.github.com/dagelf/99ee07f5638b346adb8c058ab3d57492
Packaging
---------
See 'Dependencies' below. Package maintainers can pick ideas for building and
configuring the source package from the Gentoo ebuild in `contrib/gentoo`.
You can configure some build options by creating a file `localconf` and
adjust settings for your distribution environment there.
Please also review the Makefile for additional hints.
Dependencies
------------
* C++11 compiler (tested with GCC 4.9 and 6.2.0)
* C++11 compiler (tested with GCC 4.9, 6.2.0, 8.1.0)
Sorry. I really like closures and shared_ptr, so support
for earlier compiler versions is unlikely.
* btrfs-progs (tested with 4.1..4.7)
* btrfs-progs (tested with 4.1..4.15.1) or libbtrfs-dev
(tested with version 4.16.1)
Needed for btrfs.h and ctree.h during compile.
Not needed at runtime.
Also needed by the service wrapper script.
* libuuid-dev
This library is only required for a feature that was removed after v0.1.
The lingering support code can be removed.
* Linux kernel 4.4.3 or later
* Linux kernel version: *minimum* 4.4.107, *4.14.29 or later recommended*
Don't bother trying to make Bees work with older kernels.
It won't end well.
Don't bother trying to make Bees work with kernel versions older than
4.4.107. It may appear to work, but it won't end well: there are
too many missing features and bugs (including data corruption bugs)
to work around in older kernels.
Kernel versions between 4.4.107 and 4.14.29 are usable with bees,
but bees can trigger known performance bugs and hangs in dedup-related
functions.
* markdown
* util-linux version that provides `blkid` command for the helper
script `scripts/beesd` to work
Setup
-----
If you don't want to use the helper script `scripts/beesd` to setup and
configure bees, here's how you manually setup bees.
Create a directory for bees state files:
export BEESHOME=/some/path
@@ -404,7 +493,7 @@ be the name of a subvol):
Configuration
-------------
The only runtime configurable options are environment variables:
There are some runtime configurable options using environment variables:
* BEESHOME: Directory containing Bees state files:
* beeshash.dat | persistent hash table. Must be a multiple of 16M.
@@ -420,7 +509,7 @@ The only runtime configurable options are environment variables:
watch -n1 cat $BEESSTATUS
Other options (e.g. interval between filesystem crawls) can be configured
in src/bees.h.
in src/bees.h or on the cmdline (see 'Command Line Options' below).
Running
-------
@@ -447,6 +536,57 @@ of information about the contents of the filesystem through the log file.
There are also some shell wrappers in the `scripts/` directory.
Command Line Options
--------------------
* --thread-count (-c) COUNT
* Specify maximum number of worker threads for scanning. Overrides
--thread-factor (-C) and default/autodetected values,
and the hardcoded thread limit.
* --thread-factor (-C) FACTOR
* Specify ratio of worker threads to CPU cores. Overridden by --thread-count (-c).
Default is 1.0, i.e. 1 worker thread per detected CPU. Use values
below 1.0 to leave some cores idle, or above 1.0 if there are more
disks than CPUs in the filesystem.
If the computed thread count is higher than `BEES_DEFAULT_THREAD_LIMIT`
(currently 8), then only that number of threads will be created.
This limit can be overridden by the `--thread-count` option; however,
be aware that there are kernel issues with systems that have many CPU
cores when users try to run bees on all of them.
* --loadavg-target (-g) LOADAVG
* Specify load average target for dynamic worker threads.
Threads will be started or stopped subject to the upper limit imposed
by thread-factor, thread-min and thread-count until the load average
is within +/- 0.5 of LOADAVG.
* --thread-min (-G) COUNT
* Specify minimum number of worker threads for scanning.
Ignored unless -g option is used to specify a target load.
* --scan-mode (-m) MODE
* Specify extent scanning algorithm. Default mode is 0.
_EXPERIMENTAL_ feature that may go away.
* Mode 0: scan extents in ascending order of (inode, subvol, offset).
Keeps shared extents between snapshots together. Reads files sequentially.
Minimizes temporary space usage.
* Mode 1: scan extents from all subvols in parallel. Good performance
on non-spinning media when subvols are unrelated.
* Mode 2: scan all extents from one subvol at a time. Good sequential
read performance for spinning media. Maximizes temporary space usage.
* --timestamps (-t)
* Enable timestamps in log output.
* --no-timestamps (-T)
* Disable timestamps in log output.
* --absolute-paths (-p)
* Paths in log output will be absolute.
* --strip-paths (-P)
* Paths in log output will have the working directory at Bees startup
stripped.
* --verbose (-v)
* Set log verbosity (0 = no output, 8 = all output, default 8).
Bug Reports and Contributions
-----------------------------

View File

@@ -0,0 +1,18 @@
# manifest-hashes specify hashes used for new/updated entries
# the current set went live on 2017-11-21, per 2017-11-12 Council meeting
# https://archives.gentoo.org/gentoo-dev/message/ba2e5d9666ebd7e1bff1143485a37856
manifest-hashes = BLAKE2B SHA512
# The following hashes are required on all Manifest entries. If any
# of them are missing, repoman will refetch and rehash old distfiles.
# Otherwise, old distfiles will keep using their current hash set.
manifest-required-hashes = BLAKE2B
# No more old ChangeLogs in Git
update-changelog = false
# Sign Git commits, and NOT Manifests
sign-commits = true
sign-manifests = false
masters = gentoo

View File

@@ -0,0 +1 @@
bees

View File

@@ -0,0 +1,2 @@
EBUILD bees-9999.ebuild 2001 BLAKE2B 7fa1c9d043a4334579dfad3560d1593717e548c0d31695cf8ccf8ffe45f2347584c7da43b47cad873745f3c843207433c6b892a0469c5618f107c68f78fd5fe2 SHA512 d49266e007895c049e1c9f7e28ec2f649b386a6441eccba02ee411f14ad395925eecdaa8a747962ccc526f9e1d3aba9fd68f4452a1d276d4e5b7d48c80102cd8
MISC metadata.xml 479 BLAKE2B ef5e110ba8d88f0188dbc0d12bec2ad45c51abf707656f6fe4e0fa498d933fe9c32c5dc4c9b446402ec686084459f9f075e52f33402810962c1ac6b149fb70c8 SHA512 3fcc136ed4c55323cac4f8cf542210eb77f73e2a80f95fcce2d688bc645f6e5126404776536dedc938b18287b54abbc264610cc2f587a42a3a8e6d7bf8415aaa

View File

@@ -0,0 +1,66 @@
# Copyright 1999-2018 Gentoo Foundation
# Distributed under the terms of the GNU General Public License v2
EAPI=7
inherit linux-info
DESCRIPTION="Best-Effort Extent-Same, a btrfs dedup agent"
HOMEPAGE="https://github.com/Zygo/bees"
if [[ ${PV} == "9999" ]] ; then
EGIT_REPO_URI="https://github.com/Zygo/bees.git"
inherit git-r3
else
SRC_URI="https://github.com/Zygo/bees/archive/v${PV}.tar.gz -> ${P}.tar.gz"
KEYWORDS="~amd64"
fi
LICENSE="GPL-3"
SLOT="0"
IUSE="tools"
DEPEND="
>=sys-apps/util-linux-2.30.2
>=sys-fs/btrfs-progs-4.1
"
RDEPEND="${DEPEND}"
CONFIG_CHECK="~BTRFS_FS"
ERROR_BTRFS_FS="CONFIG_BTRFS_FS: bees does currently only work with btrfs"
pkg_pretend() {
if [[ ${MERGE_TYPE} != buildonly ]]; then
if kernel_is -lt 4 4 3; then
ewarn "Kernel versions below 4.4.3 lack critical features needed for bees to"
ewarn "properly operate, so it won't work. It's recommended to run at least"
ewarn "kernel version 4.11 for best performance and reliability."
ewarn
elif kernel_is -lt 4 11; then
ewarn "With kernel versions below 4.11, bees may severely degrade system performance"
ewarn "and responsiveness. Especially, the kernel may deadlock while bees is"
ewarn "running, it's recommended to run at least kernel 4.11."
ewarn
elif kernel_is -lt 4 14 29; then
ewarn "With kernel versions below 4.14.29, bees may generate a lot of bogus WARN_ON()"
ewarn "messages in the kernel log. These messages can be ignored and this is fixed"
ewarn "with more recent kernels:"
ewarn "# WARNING: CPU: 3 PID: 18172 at fs/btrfs/backref.c:1391 find_parent_nodes+0xc41/0x14e0"
ewarn
fi
elog "Bees recommends to run the latest current kernel for performance and"
elog "reliability reasons, see README.md."
fi
}
src_configure() {
cat >localconf <<-EOF || die
LIBEXEC_PREFIX=/usr/libexec
PREFIX=/usr
LIBDIR=$(get_libdir)
DEFAULT_MAKE_TARGET=all
EOF
if use tools; then
echo OPTIONAL_INSTALL_TARGETS=install_tools >>localconf || die
fi
}

View File

@@ -0,0 +1,15 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE pkgmetadata SYSTEM "http://www.gentoo.org/dtd/metadata.dtd">
<pkgmetadata>
<maintainer type="person">
<email>hurikhan77+bgo@gmail.com</email>
<name>Kai Krakow</name>
</maintainer>
<use>
<flag name="tools">Build extra tools useful for debugging (fiemap, feiwalk, beestop)</flag>
</use>
<upstream>
<bugs-to>https://github.com/Zygo/bees/issues</bugs-to>
<remote-id type="github">Zygo/bees</remote-id>
</upstream>
</pkgmetadata>

View File

@@ -23,6 +23,7 @@
#undef min
#undef max
#undef mutex
#undef swap
#ifndef BTRFS_FIRST_FREE_OBJECTID

View File

@@ -18,17 +18,27 @@ namespace crucible {
public:
using Key = tuple<Arguments...>;
using Func = function<Return(Arguments...)>;
using Time = size_t;
using Value = pair<Time, Return>;
private:
struct Value {
Value *fp = nullptr;
Value *bp = nullptr;
Key key;
Return ret;
Value(Key k, Return r) : key(k), ret(r) { }
// Crash early!
~Value() { fp = bp = nullptr; };
};
Func m_fn;
Time m_ctr;
map<Key, Value> m_map;
LockSet<Key> m_lockset;
size_t m_max_size;
mutex m_mutex;
Value *m_last = nullptr;
bool check_overflow();
void check_overflow();
void move_to_front(Value *vp);
void erase_one(Value *vp);
public:
LRUCache(Func f = Func(), size_t max_size = 100);
@@ -46,30 +56,82 @@ namespace crucible {
template <class Return, class... Arguments>
LRUCache<Return, Arguments...>::LRUCache(Func f, size_t max_size) :
m_fn(f),
m_ctr(0),
m_max_size(max_size)
{
}
template <class Return, class... Arguments>
bool
void
LRUCache<Return, Arguments...>::erase_one(Value *vp)
{
THROW_CHECK0(invalid_argument, vp);
Value *vp_bp = vp->bp;
THROW_CHECK0(runtime_error, vp_bp);
Value *vp_fp = vp->fp;
THROW_CHECK0(runtime_error, vp_fp);
vp_fp->bp = vp_bp;
vp_bp->fp = vp_fp;
// If we delete the head of the list then advance the head by one
if (vp == m_last) {
// If the head of the list is also the tail of the list then clear m_last
if (vp_fp == m_last) {
m_last = nullptr;
} else {
m_last = vp_fp;
}
}
m_map.erase(vp->key);
if (!m_last) {
THROW_CHECK0(runtime_error, m_map.empty());
} else {
THROW_CHECK0(runtime_error, !m_map.empty());
}
}
template <class Return, class... Arguments>
void
LRUCache<Return, Arguments...>::check_overflow()
{
if (m_map.size() <= m_max_size) {
return false;
while (m_map.size() >= m_max_size) {
THROW_CHECK0(runtime_error, m_last);
THROW_CHECK0(runtime_error, m_last->bp);
erase_one(m_last->bp);
}
vector<pair<Key, Time>> key_times;
key_times.reserve(m_map.size());
for (auto i : m_map) {
key_times.push_back(make_pair(i.first, i.second.first));
}
template <class Return, class... Arguments>
void
LRUCache<Return, Arguments...>::move_to_front(Value *vp)
{
if (!m_last) {
// Create new LRU list
m_last = vp->fp = vp->bp = vp;
} else if (m_last != vp) {
Value *vp_fp = vp->fp;
Value *vp_bp = vp->bp;
if (vp_fp && vp_bp) {
// There are at least two and we are removing one that isn't m_last
// Connect adjacent nodes to each other (has no effect if vp is new), removing vp from list
vp_fp->bp = vp_bp;
vp_bp->fp = vp_fp;
} else {
// New insertion, both must be null
THROW_CHECK0(runtime_error, !vp_fp);
THROW_CHECK0(runtime_error, !vp_bp);
}
// Splice new node into list
Value *last_bp = m_last->bp;
THROW_CHECK0(runtime_error, last_bp);
// New element points to both ends of list
vp->fp = m_last;
vp->bp = last_bp;
// Insert vp as fp from the end of the list
last_bp->fp = vp;
// Insert vp as bp from the second from the start of the list
m_last->bp = vp;
// Update start of list
m_last = vp;
}
sort(key_times.begin(), key_times.end(), [](const pair<Key, Time> &a, const pair<Key, Time> &b) {
return a.second < b.second;
});
for (size_t i = 0; i < key_times.size() / 2; ++i) {
m_map.erase(key_times[i].first);
}
return true;
}
template <class Return, class... Arguments>
@@ -78,6 +140,9 @@ namespace crucible {
{
unique_lock<mutex> lock(m_mutex);
m_max_size = new_max_size;
// FIXME: this really reduces the cache size to new_max_size - 1
// because every other time we call this method, it is immediately
// followed by insert.
check_overflow();
}
@@ -93,8 +158,11 @@ namespace crucible {
void
LRUCache<Return, Arguments...>::clear()
{
// Move the map onto the stack, then destroy it after we've released the lock.
decltype(m_map) new_map;
unique_lock<mutex> lock(m_mutex);
m_map.clear();
m_map.swap(new_map);
m_last = nullptr;
}
template <class Return, class... Arguments>
@@ -104,8 +172,8 @@ namespace crucible {
unique_lock<mutex> lock(m_mutex);
for (auto it = m_map.begin(); it != m_map.end(); ) {
auto next_it = ++it;
if (pred(it.second.second)) {
m_map.erase(it);
if (pred(it.second.ret)) {
erase_one(&it.second);
}
it = next_it;
}
@@ -133,12 +201,18 @@ namespace crucible {
// No, we hold key and cache locks, but item not in cache.
// Release cache lock and call function
auto ctr_copy = m_ctr++;
lock.unlock();
Value v(ctr_copy, m_fn(args...));
// Create new value
Value v(k, m_fn(args...));
// Reacquire cache lock
lock.lock();
// Make room
check_overflow();
// Reacquire cache lock and insert return value
lock.lock();
tie(found, inserted) = m_map.insert(make_pair(k, v));
// We hold a lock on this key so we are the ones to insert it
@@ -147,23 +221,17 @@ namespace crucible {
// Release key lock, keep the cache lock
key_lock.unlock();
// Check to see if we have too many items and reduce if so.
if (check_overflow()) {
// Reset iterator
found = m_map.find(k);
}
}
}
// Item should be in cache now
THROW_CHECK0(runtime_error, found != m_map.end());
// We are using this object so update the timestamp
if (!inserted) {
found->second.first = m_ctr++;
}
// (Re)insert at head of LRU
move_to_front(&(found->second));
// Make copy before releasing lock
auto rv = found->second.second;
auto rv = found->second.ret;
return rv;
}
@@ -173,7 +241,10 @@ namespace crucible {
{
Key k(args...);
unique_lock<mutex> lock(m_mutex);
m_map.erase(k);
auto found = m_map.find(k);
if (found != m_map.end()) {
erase_one(&found->second);
}
}
template<class Return, class... Arguments>
@@ -204,33 +275,24 @@ namespace crucible {
found = m_map.find(k);
if (found == m_map.end()) {
// Make room
check_overflow();
// No, we hold key and cache locks, but item not in cache.
// Release cache lock and insert the provided return value
auto ctr_copy = m_ctr++;
Value v(ctr_copy, r);
// Insert the provided return value (no need to unlock here)
Value v(k, r);
tie(found, inserted) = m_map.insert(make_pair(k, v));
// We hold a lock on this key so we are the ones to insert it
THROW_CHECK0(runtime_error, inserted);
// Release key lock and clean out overflow
key_lock.unlock();
// Check to see if we have too many items and reduce if so.
if (check_overflow()) {
// Reset iterator
found = m_map.find(k);
}
}
}
// Item should be in cache now
THROW_CHECK0(runtime_error, found != m_map.end());
// We are using this object so update the timestamp
if (!inserted) {
found->second.first = m_ctr++;
}
// (Re)insert at head of LRU
move_to_front(&(found->second));
}
}

View File

@@ -8,6 +8,8 @@
#include <string>
#include <typeinfo>
#include <syslog.h>
/** \brief Chatter wraps a std::ostream reference with a destructor that
writes a newline, and inserts timestamp, pid, and tid prefixes on output.
@@ -33,12 +35,13 @@ namespace crucible {
using namespace std;
class Chatter {
int m_loglevel;
string m_name;
ostream &m_os;
ostringstream m_oss;
public:
Chatter(string name, ostream &os = cerr);
Chatter(int loglevel, string name, ostream &os = cerr);
Chatter(Chatter &&c);
ostream &get_os() { return m_oss; }
@@ -103,7 +106,7 @@ namespace crucible {
template <class T> Chatter operator<<(const T &t)
{
Chatter c(m_pretty_function, m_os);
Chatter c(LOG_NOTICE, m_pretty_function, m_os);
c << t;
return c;
}

View File

@@ -0,0 +1,18 @@
#ifndef CRUCIBLE_CLEANUP_H
#define CRUCIBLE_CLEANUP_H
#include <functional>
namespace crucible {
using namespace std;
class Cleanup {
function<void()> m_cleaner;
public:
Cleanup(function<void()> func);
~Cleanup();
};
}
#endif // CRUCIBLE_CLEANUP_H

View File

@@ -81,21 +81,21 @@ namespace crucible {
// macro for throwing an error
#define THROW_ERROR(type, expr) do { \
std::ostringstream _te_oss; \
_te_oss << expr; \
_te_oss << expr << " at " << __FILE__ << ":" << __LINE__; \
throw type(_te_oss.str()); \
} while (0)
// macro for throwing a system_error with errno
#define THROW_ERRNO(expr) do { \
std::ostringstream _te_oss; \
_te_oss << expr; \
_te_oss << expr << " at " << __FILE__ << ":" << __LINE__; \
throw std::system_error(std::error_code(errno, std::system_category()), _te_oss.str()); \
} while (0)
// macro for throwing a system_error with some other variable
#define THROW_ERRNO_VALUE(value, expr) do { \
std::ostringstream _te_oss; \
_te_oss << expr; \
_te_oss << expr << " at " << __FILE__ << ":" << __LINE__; \
throw std::system_error(std::error_code((value), std::system_category()), _te_oss.str()); \
} while (0)

View File

@@ -58,7 +58,7 @@ namespace crucible {
virtual Vec get_extent_map(off_t pos);
static const unsigned sc_extent_fetch_max = 64;
static const unsigned sc_extent_fetch_max = 16;
static const unsigned sc_extent_fetch_min = 4;
static const off_t sc_step_size = 0x1000 * (sc_extent_fetch_max / 2);

View File

@@ -57,6 +57,10 @@ namespace crucible {
typedef ResourceHandle<int, IOHandle> Fd;
static string __relative_path;
void set_relative_path(string path);
string relative_path();
// Functions named "foo_or_die" throw exceptions on failure.
// Attempt to open the file with the given mode

View File

@@ -61,7 +61,6 @@ namespace crucible {
size_t size();
bool empty();
set_type copy();
void wait_unlock(double interval);
void max_size(size_t max);
@@ -118,7 +117,7 @@ namespace crucible {
while (full() || locked(name)) {
m_condvar.wait(lock);
}
auto rv = m_set.insert(make_pair(name, gettid()));
auto rv = m_set.insert(make_pair(name, crucible::gettid()));
THROW_CHECK0(runtime_error, rv.second);
}
@@ -130,7 +129,7 @@ namespace crucible {
if (full() || locked(name)) {
return false;
}
auto rv = m_set.insert(make_pair(name, gettid()));
auto rv = m_set.insert(make_pair(name, crucible::gettid()));
THROW_CHECK1(runtime_error, name, rv.second);
return true;
}
@@ -145,15 +144,6 @@ namespace crucible {
THROW_CHECK1(invalid_argument, erase_count, erase_count == 1);
}
template <class T>
void
LockSet<T>::wait_unlock(double interval)
{
unique_lock<mutex> lock(m_mutex);
if (m_set.empty()) return;
m_condvar.wait_for(lock, chrono::duration<double>(interval));
}
template <class T>
size_t
LockSet<T>::size()

View File

@@ -74,5 +74,8 @@ namespace crucible {
typedef ResourceHandle<Process::id, Process> Pid;
pid_t gettid();
double getloadavg1();
double getloadavg5();
double getloadavg15();
}
#endif // CRUCIBLE_PROCESS_H

122
include/crucible/progress.h Normal file
View File

@@ -0,0 +1,122 @@
#ifndef CRUCIBLE_PROGRESS_H
#define CRUCIBLE_PROGRESS_H
#include "crucible/error.h"
#include <functional>
#include <map>
#include <memory>
#include <mutex>
namespace crucible {
using namespace std;
template <class T>
class ProgressTracker {
struct ProgressTrackerState;
class ProgressHolderState;
public:
using value_type = T;
using ProgressHolder = shared_ptr<ProgressHolderState>;
ProgressTracker(const value_type &v);
value_type begin();
value_type end();
ProgressHolder hold(const value_type &v);
friend class ProgressHolderState;
private:
struct ProgressTrackerState {
using key_type = pair<value_type, ProgressHolderState *>;
mutex m_mutex;
map<key_type, bool> m_in_progress;
value_type m_begin;
value_type m_end;
};
class ProgressHolderState {
shared_ptr<ProgressTrackerState> m_state;
const value_type m_value;
public:
ProgressHolderState(shared_ptr<ProgressTrackerState> state, const value_type &v);
~ProgressHolderState();
value_type get() const;
};
shared_ptr<ProgressTrackerState> m_state;
};
template <class T>
typename ProgressTracker<T>::value_type
ProgressTracker<T>::begin()
{
unique_lock<mutex> lock(m_state->m_mutex);
return m_state->m_begin;
}
template <class T>
typename ProgressTracker<T>::value_type
ProgressTracker<T>::end()
{
unique_lock<mutex> lock(m_state->m_mutex);
return m_state->m_end;
}
template <class T>
typename ProgressTracker<T>::value_type
ProgressTracker<T>::ProgressHolderState::get() const
{
return m_value;
}
template <class T>
ProgressTracker<T>::ProgressTracker(const ProgressTracker::value_type &t) :
m_state(make_shared<ProgressTrackerState>())
{
m_state->m_begin = t;
m_state->m_end = t;
}
template <class T>
ProgressTracker<T>::ProgressHolderState::ProgressHolderState(shared_ptr<ProgressTrackerState> state, const value_type &v) :
m_state(state),
m_value(v)
{
unique_lock<mutex> lock(m_state->m_mutex);
m_state->m_in_progress[make_pair(m_value, this)] = true;
if (m_state->m_end < m_value) {
m_state->m_end = m_value;
}
}
template <class T>
ProgressTracker<T>::ProgressHolderState::~ProgressHolderState()
{
unique_lock<mutex> lock(m_state->m_mutex);
m_state->m_in_progress[make_pair(m_value, this)] = false;
auto p = m_state->m_in_progress.begin();
while (p != m_state->m_in_progress.end()) {
if (p->second) {
break;
}
if (m_state->m_begin < p->first.first) {
m_state->m_begin = p->first.first;
}
m_state->m_in_progress.erase(p);
p = m_state->m_in_progress.begin();
}
}
template <class T>
shared_ptr<typename ProgressTracker<T>::ProgressHolderState>
ProgressTracker<T>::hold(const value_type &v)
{
return make_shared<ProgressHolderState>(m_state, v);
}
}
#endif // CRUCIBLE_PROGRESS_H

View File

@@ -3,6 +3,7 @@
#include "crucible/error.h"
#include <cassert>
#include <map>
#include <memory>
#include <mutex>
@@ -52,7 +53,6 @@ namespace crucible {
// A bunch of static variables and functions
static mutex s_map_mutex;
static mutex s_ptr_mutex;
static map_type s_map;
static resource_ptr_type insert(const key_type &key);
static resource_ptr_type insert(const resource_ptr_type &res);
@@ -83,14 +83,14 @@ namespace crucible {
ResourceHandle(const resource_ptr_type &res);
ResourceHandle& operator=(const resource_ptr_type &res);
// default constructor is public and mostly harmless
// default construct/assign/move is public and mostly harmless
ResourceHandle() = default;
ResourceHandle(const ResourceHandle &that) = default;
ResourceHandle(ResourceHandle &&that) = default;
ResourceHandle& operator=(const ResourceHandle &that) = default;
ResourceHandle& operator=(ResourceHandle &&that) = default;
// copy/assign/move/move-assign - with a mutex to help shared_ptr be atomic
ResourceHandle(const ResourceHandle &that);
ResourceHandle(ResourceHandle &&that);
ResourceHandle& operator=(const ResourceHandle &that);
ResourceHandle& operator=(ResourceHandle &&that);
// Nontrivial destructor
~ResourceHandle();
// forward anything else to the Resource constructor
@@ -239,7 +239,6 @@ namespace crucible {
template <class Key, class Resource>
ResourceHandle<Key, Resource>::ResourceHandle(const key_type &key)
{
unique_lock<mutex> lock(s_ptr_mutex);
m_ptr = insert(key);
}
@@ -247,7 +246,6 @@ namespace crucible {
ResourceHandle<Key, Resource>&
ResourceHandle<Key, Resource>::operator=(const key_type &key)
{
unique_lock<mutex> lock(s_ptr_mutex);
m_ptr = insert(key);
return *this;
}
@@ -255,7 +253,6 @@ namespace crucible {
template <class Key, class Resource>
ResourceHandle<Key, Resource>::ResourceHandle(const resource_ptr_type &res)
{
unique_lock<mutex> lock(s_ptr_mutex);
m_ptr = insert(res);
}
@@ -263,65 +260,32 @@ namespace crucible {
ResourceHandle<Key, Resource>&
ResourceHandle<Key, Resource>::operator=(const resource_ptr_type &res)
{
unique_lock<mutex> lock(s_ptr_mutex);
m_ptr = insert(res);
return *this;
}
template <class Key, class Resource>
ResourceHandle<Key, Resource>::ResourceHandle(const ResourceHandle &that)
{
unique_lock<mutex> lock(s_ptr_mutex);
m_ptr = that.m_ptr;
}
template <class Key, class Resource>
ResourceHandle<Key, Resource>::ResourceHandle(ResourceHandle &&that)
{
unique_lock<mutex> lock(s_ptr_mutex);
swap(m_ptr, that.m_ptr);
}
template <class Key, class Resource>
ResourceHandle<Key, Resource> &
ResourceHandle<Key, Resource>::operator=(ResourceHandle &&that)
{
unique_lock<mutex> lock(s_ptr_mutex);
m_ptr = that.m_ptr;
that.m_ptr.reset();
return *this;
}
template <class Key, class Resource>
ResourceHandle<Key, Resource> &
ResourceHandle<Key, Resource>::operator=(const ResourceHandle &that)
{
unique_lock<mutex> lock(s_ptr_mutex);
m_ptr = that.m_ptr;
return *this;
}
template <class Key, class Resource>
ResourceHandle<Key, Resource>::~ResourceHandle()
{
unique_lock<mutex> lock_ptr(s_ptr_mutex);
// No pointer, nothing to do
if (!m_ptr) {
return;
}
// Save key so we can clean the map
auto key = s_traits.get_key(*m_ptr);
// Save pointer so we can release lock before deleting
auto ptr_copy = m_ptr;
// Save a weak_ptr so we can tell if we need to clean the map
weak_ptr_type wp = m_ptr;
// Drop shared_ptr
m_ptr.reset();
// Release lock
lock_ptr.unlock();
// Delete our (possibly last) reference to pointer
ptr_copy.reset();
// If there are still other references to the shared_ptr, we can stop now
if (!wp.expired()) {
return;
}
// Remove weak_ptr from map if it has expired
// (and not been replaced in the meantime)
unique_lock<mutex> lock_map(s_map_mutex);
auto found = s_map.find(key);
// Map entry may have been replaced, so check for expiry again
if (found != s_map.end() && found->second.expired()) {
s_map.erase(key);
}
@@ -331,23 +295,17 @@ namespace crucible {
typename ResourceHandle<Key, Resource>::resource_ptr_type
ResourceHandle<Key, Resource>::get_resource_ptr() const
{
unique_lock<mutex> lock(s_ptr_mutex);
// Make isolated copy of pointer with lock held, and return the copy
auto rv = m_ptr;
return rv;
return m_ptr;
}
template <class Key, class Resource>
typename ResourceHandle<Key, Resource>::resource_ptr_type
ResourceHandle<Key, Resource>::operator->() const
{
unique_lock<mutex> lock(s_ptr_mutex);
if (!m_ptr) {
THROW_ERROR(out_of_range, __PRETTY_FUNCTION__ << " called on null Resource");
}
// Make isolated copy of pointer with lock held, and return the copy
auto rv = m_ptr;
return rv;
return m_ptr;
}
template <class Key, class Resource>
@@ -355,7 +313,6 @@ namespace crucible {
shared_ptr<T>
ResourceHandle<Key, Resource>::cast() const
{
unique_lock<mutex> lock(s_ptr_mutex);
shared_ptr<T> dp;
if (!m_ptr) {
return dp;
@@ -371,7 +328,6 @@ namespace crucible {
typename ResourceHandle<Key, Resource>::key_type
ResourceHandle<Key, Resource>::get_key() const
{
unique_lock<mutex> lock(s_ptr_mutex);
if (!m_ptr) {
return s_traits.get_null_key();
} else {
@@ -399,13 +355,9 @@ namespace crucible {
template <class Key, class Resource>
mutex ResourceHandle<Key, Resource>::s_map_mutex;
template <class Key, class Resource>
mutex ResourceHandle<Key, Resource>::s_ptr_mutex;
template <class Key, class Resource>
typename ResourceHandle<Key, Resource>::map_type ResourceHandle<Key, Resource>::s_map;
}
#endif // RESOURCE_H

163
include/crucible/task.h Normal file
View File

@@ -0,0 +1,163 @@
#ifndef CRUCIBLE_TASK_H
#define CRUCIBLE_TASK_H
#include <functional>
#include <memory>
#include <ostream>
#include <string>
namespace crucible {
using namespace std;
class TaskState;
using TaskId = uint64_t;
class Task {
shared_ptr<TaskState> m_task_state;
Task(shared_ptr<TaskState> pts);
public:
// create empty Task object
Task() = default;
// create Task object containing closure and description
Task(string title, function<void()> exec_fn);
// schedule Task at end of queue.
// May run Task in current thread or in other thread.
// May run Task before or after returning.
void run() const;
// schedule Task before other queued tasks
void run_earlier() const;
// describe Task as text
string title() const;
// Returns currently executing task if called from exec_fn.
// Usually used to reschedule the currently executing Task.
static Task current_task();
// Ordering for containers
bool operator<(const Task &that) const;
// Null test
operator bool() const;
// Unique non-repeating(ish) ID for task
TaskId id() const;
};
ostream &operator<<(ostream &os, const Task &task);
class TaskMaster {
public:
// Blocks until the running thread count reaches this number
static void set_thread_count(size_t threads);
// Sets minimum thread count when load average tracking enabled
static void set_thread_min_count(size_t min_threads);
// Calls set_thread_count with default
static void set_thread_count();
// Creates thread to track load average and adjust thread count dynamically
static void set_loadavg_target(double target);
// Writes the current non-executing Task queue
static ostream & print_queue(ostream &);
// Writes the current executing Task for each worker
static ostream & print_workers(ostream &);
// Gets the current number of queued Tasks
static size_t get_queue_count();
};
// Barrier executes waiting Tasks once the last BarrierLock
// is released. Multiple unique Tasks may be scheduled while
// BarrierLocks exist and all will be run() at once upon
// release. If no BarrierLocks exist, Tasks are executed
// immediately upon insertion.
class BarrierState;
class BarrierLock {
shared_ptr<BarrierState> m_barrier_state;
BarrierLock(shared_ptr<BarrierState> pbs);
friend class Barrier;
public:
// Release this Lock immediately and permanently
void release();
};
class Barrier {
shared_ptr<BarrierState> m_barrier_state;
Barrier(shared_ptr<BarrierState> pbs);
public:
Barrier();
// Prevent execution of tasks behind barrier until
// BarrierLock destructor or release() method is called.
BarrierLock lock();
// Schedule a task for execution when no Locks exist
void insert_task(Task t);
};
// Exclusion provides exclusive access to a ExclusionLock.
// One Task will be able to obtain the ExclusionLock; other Tasks
// may schedule themselves for re-execution after the ExclusionLock
// is released.
class ExclusionState;
class Exclusion;
class ExclusionLock {
shared_ptr<ExclusionState> m_exclusion_state;
ExclusionLock(shared_ptr<ExclusionState> pes);
ExclusionLock() = default;
friend class Exclusion;
public:
// Calls release()
~ExclusionLock();
// Release this Lock immediately and permanently
void release();
// Test for locked state
operator bool() const;
};
class Exclusion {
shared_ptr<ExclusionState> m_exclusion_state;
Exclusion(shared_ptr<ExclusionState> pes);
public:
Exclusion();
// Attempt to obtain a Lock. If successful, current Task
// owns the Lock until the ExclusionLock is released
// (it is the ExclusionLock that owns the lock, so it can
// be passed to other Tasks or threads, but this is not
// recommended practice).
// If not successful, current Task is expected to call
// insert_task(current_task()), release any ExclusionLock
// objects it holds, and exit its Task function.
ExclusionLock try_lock();
// Execute Task when Exclusion is unlocked (possibly immediately).
// First Task is scheduled with run_earlier(), all others are
// scheduled with run().
void insert_task(Task t);
};
}
#endif // CRUCIBLE_TASK_H

View File

@@ -4,6 +4,8 @@
#include "crucible/error.h"
#include <chrono>
#include <condition_variable>
#include <limits>
#include <mutex>
#include <ostream>
@@ -17,10 +19,9 @@ namespace crucible {
public:
Timer();
double age() const;
chrono::high_resolution_clock::time_point get() const;
double report(int precision = 1000) const;
void reset();
void set(const chrono::high_resolution_clock::time_point &start);
void set(double delta);
double lap();
bool operator<(double d) const;
bool operator>(double d) const;
@@ -45,6 +46,59 @@ namespace crucible {
void borrow(double cost = 1.0);
};
class RateEstimator {
mutable mutex m_mutex;
mutable condition_variable m_condvar;
Timer m_timer;
double m_num = 0.0;
double m_den = 0.0;
uint64_t m_last_count = numeric_limits<uint64_t>::max();
Timer m_last_update;
const double m_decay = 0.99;
Timer m_last_decay;
double m_min_delay;
double m_max_delay;
chrono::duration<double> duration_unlocked(uint64_t relative_count) const;
chrono::high_resolution_clock::time_point time_point_unlocked(uint64_t absolute_count) const;
double rate_unlocked() const;
pair<double, double> ratio_unlocked() const;
void update_unlocked(uint64_t new_count);
public:
RateEstimator(double min_delay = 1, double max_delay = 3600);
// Block until count reached
void wait_for(uint64_t new_count_relative) const;
void wait_until(uint64_t new_count_absolute) const;
// Computed rates and ratios
double rate() const;
pair<double, double> ratio() const;
// Inspect raw num/den
pair<double, double> raw() const;
// Write count
void update(uint64_t new_count);
// Ignore counts that go backwards
void update_monotonic(uint64_t new_count);
// Read count
uint64_t count() const;
// Convert counts to chrono types
chrono::high_resolution_clock::time_point time_point(uint64_t absolute_count) const;
chrono::duration<double> duration(uint64_t relative_count) const;
// Polling delay until count reached (limited by min/max delay)
double seconds_for(uint64_t new_count_relative) const;
double seconds_until(uint64_t new_count_absolute) const;
};
ostream &
operator<<(ostream &os, const RateEstimator &re);
}
#endif // CRUCIBLE_TIME_H

View File

@@ -1,188 +0,0 @@
#ifndef CRUCIBLE_TIMEQUEUE_H
#define CRUCIBLE_TIMEQUEUE_H
#include <crucible/error.h>
#include <crucible/time.h>
#include <condition_variable>
#include <limits>
#include <list>
#include <memory>
#include <mutex>
#include <set>
namespace crucible {
using namespace std;
template <class Task>
class TimeQueue {
public:
using Timestamp = chrono::high_resolution_clock::time_point;
private:
struct Item {
Timestamp m_time;
unsigned long m_id;
Task m_task;
bool operator<(const Item &that) const {
if (m_time < that.m_time) return true;
if (that.m_time < m_time) return false;
return m_id < that.m_id;
}
static unsigned s_id;
Item(const Timestamp &time, const Task& task) :
m_time(time),
m_id(++s_id),
m_task(task)
{
}
};
set<Item> m_set;
mutable mutex m_mutex;
condition_variable m_cond_full, m_cond_empty;
size_t m_max_queue_depth;
public:
~TimeQueue();
TimeQueue(size_t max_queue_depth = numeric_limits<size_t>::max());
void push(const Task &task, double delay = 0);
void push_nowait(const Task &task, double delay = 0);
Task pop();
bool pop_nowait(Task &t);
double when() const;
size_t size() const;
bool empty() const;
list<Task> peek(size_t count) const;
};
template <class Task> unsigned TimeQueue<Task>::Item::s_id = 0;
template <class Task>
TimeQueue<Task>::~TimeQueue()
{
if (!m_set.empty()) {
cerr << "ERROR: " << m_set.size() << " locked items still in TimeQueue at destruction" << endl;
}
}
template <class Task>
void
TimeQueue<Task>::push(const Task &task, double delay)
{
Timestamp time = chrono::high_resolution_clock::now() +
chrono::duration_cast<chrono::high_resolution_clock::duration>(chrono::duration<double>(delay));
unique_lock<mutex> lock(m_mutex);
while (m_set.size() > m_max_queue_depth) {
m_cond_full.wait(lock);
}
m_set.insert(Item(time, task));
m_cond_empty.notify_all();
}
template <class Task>
void
TimeQueue<Task>::push_nowait(const Task &task, double delay)
{
Timestamp time = chrono::high_resolution_clock::now() +
chrono::duration_cast<chrono::high_resolution_clock::duration>(chrono::duration<double>(delay));
unique_lock<mutex> lock(m_mutex);
m_set.insert(Item(time, task));
m_cond_empty.notify_all();
}
template <class Task>
Task
TimeQueue<Task>::pop()
{
unique_lock<mutex> lock(m_mutex);
while (1) {
while (m_set.empty()) {
m_cond_empty.wait(lock);
}
Timestamp now = chrono::high_resolution_clock::now();
if (now > m_set.begin()->m_time) {
Task rv = m_set.begin()->m_task;
m_set.erase(m_set.begin());
m_cond_full.notify_all();
return rv;
}
m_cond_empty.wait_until(lock, m_set.begin()->m_time);
}
}
template <class Task>
bool
TimeQueue<Task>::pop_nowait(Task &t)
{
unique_lock<mutex> lock(m_mutex);
if (m_set.empty()) {
return false;
}
Timestamp now = chrono::high_resolution_clock::now();
if (now <= m_set.begin()->m_time) {
return false;
}
t = m_set.begin()->m_task;
m_set.erase(m_set.begin());
m_cond_full.notify_all();
return true;
}
template <class Task>
double
TimeQueue<Task>::when() const
{
unique_lock<mutex> lock(m_mutex);
if (m_set.empty()) {
return numeric_limits<double>::infinity();
}
return chrono::duration<double>(m_set.begin()->m_time - chrono::high_resolution_clock::now()).count();
}
template <class Task>
size_t
TimeQueue<Task>::size() const
{
unique_lock<mutex> lock(m_mutex);
return m_set.size();
}
template <class Task>
bool
TimeQueue<Task>::empty() const
{
unique_lock<mutex> lock(m_mutex);
return m_set.empty();
}
template <class Task>
list<Task>
TimeQueue<Task>::peek(size_t count) const
{
unique_lock<mutex> lock(m_mutex);
list<Task> rv;
auto it = m_set.begin();
while (count-- && it != m_set.end()) {
rv.push_back(it->m_task);
++it;
}
return rv;
}
template <class Task>
TimeQueue<Task>::TimeQueue(size_t max_depth) :
m_max_queue_depth(max_depth)
{
}
}
#endif // CRUCIBLE_TIMEQUEUE_H

View File

@@ -1,192 +0,0 @@
#ifndef CRUCIBLE_WORKQUEUE_H
#define CRUCIBLE_WORKQUEUE_H
#include <crucible/error.h>
#include <condition_variable>
#include <limits>
#include <list>
#include <memory>
#include <mutex>
#include <set>
namespace crucible {
using namespace std;
template <class Task>
class WorkQueue {
public:
using set_type = set<Task>;
using key_type = Task;
private:
set_type m_set;
mutable mutex m_mutex;
condition_variable m_cond_full, m_cond_empty;
size_t m_max_queue_depth;
public:
~WorkQueue();
template <class... Args> WorkQueue(size_t max_queue_depth, Args... args);
template <class... Args> WorkQueue(Args... args);
void push(const key_type &name);
void push_wait(const key_type &name, size_t limit);
void push_nowait(const key_type &name);
key_type pop();
bool pop_nowait(key_type &rv);
key_type peek();
size_t size() const;
bool empty();
set_type copy();
list<Task> peek(size_t count) const;
};
template <class Task>
WorkQueue<Task>::~WorkQueue()
{
if (!m_set.empty()) {
cerr << "ERROR: " << m_set.size() << " locked items still in WorkQueue " << this << " at destruction" << endl;
}
}
template <class Task>
void
WorkQueue<Task>::push(const key_type &name)
{
unique_lock<mutex> lock(m_mutex);
while (!m_set.count(name) && m_set.size() > m_max_queue_depth) {
m_cond_full.wait(lock);
}
m_set.insert(name);
m_cond_empty.notify_all();
}
template <class Task>
void
WorkQueue<Task>::push_wait(const key_type &name, size_t limit)
{
unique_lock<mutex> lock(m_mutex);
while (!m_set.count(name) && m_set.size() >= limit) {
m_cond_full.wait(lock);
}
m_set.insert(name);
m_cond_empty.notify_all();
}
template <class Task>
void
WorkQueue<Task>::push_nowait(const key_type &name)
{
unique_lock<mutex> lock(m_mutex);
m_set.insert(name);
m_cond_empty.notify_all();
}
template <class Task>
typename WorkQueue<Task>::key_type
WorkQueue<Task>::pop()
{
unique_lock<mutex> lock(m_mutex);
while (m_set.empty()) {
m_cond_empty.wait(lock);
}
key_type rv = *m_set.begin();
m_set.erase(m_set.begin());
m_cond_full.notify_all();
return rv;
}
template <class Task>
bool
WorkQueue<Task>::pop_nowait(key_type &rv)
{
unique_lock<mutex> lock(m_mutex);
if (m_set.empty()) {
return false;
}
rv = *m_set.begin();
m_set.erase(m_set.begin());
m_cond_full.notify_all();
return true;
}
template <class Task>
typename WorkQueue<Task>::key_type
WorkQueue<Task>::peek()
{
unique_lock<mutex> lock(m_mutex);
if (m_set.empty()) {
return key_type();
} else {
// Make copy with lock held
auto rv = *m_set.begin();
return rv;
}
}
template <class Task>
size_t
WorkQueue<Task>::size() const
{
unique_lock<mutex> lock(m_mutex);
return m_set.size();
}
template <class Task>
bool
WorkQueue<Task>::empty()
{
unique_lock<mutex> lock(m_mutex);
return m_set.empty();
}
template <class Task>
typename WorkQueue<Task>::set_type
WorkQueue<Task>::copy()
{
unique_lock<mutex> lock(m_mutex);
auto rv = m_set;
return rv;
}
template <class Task>
list<Task>
WorkQueue<Task>::peek(size_t count) const
{
unique_lock<mutex> lock(m_mutex);
list<Task> rv;
for (auto i : m_set) {
if (count--) {
rv.push_back(i);
} else {
break;
}
}
return rv;
}
template <class Task>
template <class... Args>
WorkQueue<Task>::WorkQueue(Args... args) :
m_set(args...),
m_max_queue_depth(numeric_limits<size_t>::max())
{
}
template <class Task>
template <class... Args>
WorkQueue<Task>::WorkQueue(size_t max_depth, Args... args) :
m_set(args...),
m_max_queue_depth(max_depth)
{
}
}
#endif // CRUCIBLE_WORKQUEUE_H

View File

@@ -1,8 +1,12 @@
default: libcrucible.so
TAG ?= $(shell git describe --always --dirty || echo UNKNOWN)
OBJS = \
crc64.o \
default: libcrucible.so
%.so: Makefile
CRUCIBLE_OBJS = \
chatter.o \
cleanup.o \
crc64.o \
error.o \
extentwalker.o \
fd.o \
@@ -11,24 +15,33 @@ OBJS = \
path.o \
process.o \
string.o \
task.o \
time.o \
uuid.o \
.version.o \
include ../makeflags
-include ../localconf
include ../Defines.mk
depends.mk: *.cc
for x in *.cc; do $(CXX) $(CXXFLAGS) -M "$$x"; done >> depends.mk.new
mv -fv depends.mk.new depends.mk
configure.h: configure.h.in
$(TEMPLATE_COMPILER)
.version.cc: Makefile ../makeflags *.cc ../include/crucible/*.h
echo "namespace crucible { const char *VERSION = \"$(shell git describe --always --dirty || echo UNKNOWN)\"; }" > .version.new.cc
mv -f .version.new.cc .version.cc
.depends/%.dep: %.cc configure.h Makefile
@mkdir -p .depends
$(CXX) $(CXXFLAGS) -M -MF $@ -MT $(<:.cc=.o) $<
-include depends.mk
depends.mk: $(CRUCIBLE_OBJS:%.o=.depends/%.dep)
cat $^ > $@.new
mv -f $@.new $@
%.o: %.cc ../include/crucible/%.h
$(CXX) $(CXXFLAGS) -o $@ -c $<
.version.cc: configure.h Makefile ../makeflags $(CRUCIBLE_OBJS:.o=.cc) ../include/crucible/*.h
echo "namespace crucible { const char *VERSION = \"$(TAG)\"; }" > $@.new
mv -f $@.new $@
libcrucible.so: $(OBJS) Makefile
$(CXX) $(LDFLAGS) -o $@ $(OBJS) -shared -luuid
include depends.mk
%.o: %.cc ../makeflags
$(CXX) $(CXXFLAGS) -fPIC -o $@ -c $<
libcrucible.so: $(CRUCIBLE_OBJS) .version.o
$(CXX) $(LDFLAGS) -fPIC -shared -Wl,-soname,$@ -o $@ $^ -luuid

View File

@@ -44,8 +44,8 @@ namespace crucible {
}
}
Chatter::Chatter(string name, ostream &os)
: m_name(name), m_os(os)
Chatter::Chatter(int loglevel, string name, ostream &os)
: m_loglevel(loglevel), m_name(name), m_os(os)
{
}
@@ -69,14 +69,16 @@ namespace crucible {
DIE_IF_ZERO(strftime(buf, sizeof(buf), "%Y-%m-%d %H:%M:%S", &ltm));
header_stream << buf;
header_stream << " " << getpid() << "." << gettid();
header_stream << " " << getpid() << "." << crucible::gettid() << "<" << m_loglevel << ">";
if (!m_name.empty()) {
header_stream << " " << m_name;
}
} else {
header_stream << "tid " << gettid();
header_stream << "<" << m_loglevel << ">";
header_stream << (m_name.empty() ? "thread" : m_name);
header_stream << "[" << crucible::gettid() << "]";
}
if (!m_name.empty()) {
header_stream << " " << m_name;
}
header_stream << ": ";
string out = m_oss.str();
@@ -98,7 +100,7 @@ namespace crucible {
}
Chatter::Chatter(Chatter &&c)
: m_name(c.m_name), m_os(c.m_os), m_oss(c.m_oss.str())
: m_loglevel(c.m_loglevel), m_name(c.m_name), m_os(c.m_os), m_oss(c.m_oss.str())
{
c.m_oss.str("");
}
@@ -122,6 +124,7 @@ namespace crucible {
} else if (!chatter_names->empty()) {
cerr << "CRUCIBLE_CHATTER does not list '" << m_file << "' or '" << m_pretty_function << "'" << endl;
}
(void)m_line; // not implemented yet
// cerr << "ChatterBox " << reinterpret_cast<void*>(this) << " constructed" << endl;
}

17
lib/cleanup.cc Normal file
View File

@@ -0,0 +1,17 @@
#include <crucible/cleanup.h>
namespace crucible {
Cleanup::Cleanup(function<void()> func) :
m_cleaner(func)
{
}
Cleanup::~Cleanup()
{
if (m_cleaner) {
m_cleaner();
}
}
}

6
lib/configure.h.in Normal file
View File

@@ -0,0 +1,6 @@
#ifndef _CONFIGURE_H
#define ETC_PREFIX "@ETC_PREFIX@"
#define _CONFIGURE_H
#endif

View File

@@ -32,7 +32,7 @@ namespace crucible {
// FIXME: could probably avoid some of these levels of indirection
static
function<void(string s)> current_catch_explainer = [&](string s) {
function<void(string s)> current_catch_explainer = [](string s) {
cerr << s << endl;
};

View File

@@ -6,7 +6,6 @@
#include "crucible/limits.h"
#include "crucible/string.h"
namespace crucible {
using namespace std;
@@ -15,7 +14,6 @@ namespace crucible {
// fm_start, fm_length, fm_flags, m_extents
// fe_logical, fe_physical, fe_length, fe_flags
static const off_t MAX_OFFSET = numeric_limits<off_t>::max();
static const off_t FIEMAP_BLOCK_SIZE = 4096;
static bool __ew_do_log = getenv("EXTENTWALKER_DEBUG");
@@ -333,7 +331,9 @@ namespace crucible {
THROW_CHECK1(runtime_error, new_vec.size(), !new_vec.empty());
// Allow last extent to extend beyond desired range (e.g. at EOF)
THROW_CHECK2(runtime_error, ipos, new_vec.rbegin()->m_end, ipos <= new_vec.rbegin()->m_end);
// ...but that's not what this does
// THROW_CHECK3(runtime_error, ipos, new_vec.rbegin()->m_end, m_stat.st_size, ipos <= new_vec.rbegin()->m_end);
// If we have the last extent in the file, truncate it to the file size.
if (ipos >= m_stat.st_size) {
THROW_CHECK2(runtime_error, new_vec.rbegin()->m_begin, m_stat.st_size, m_stat.st_size > new_vec.rbegin()->m_begin);

View File

@@ -174,11 +174,13 @@ namespace crucible {
static const struct bits_ntoa_table mmap_flags_table[] = {
NTOA_TABLE_ENTRY_BITS(MAP_SHARED),
NTOA_TABLE_ENTRY_BITS(MAP_PRIVATE),
#ifdef MAP_32BIT
NTOA_TABLE_ENTRY_BITS(MAP_32BIT),
#endif
NTOA_TABLE_ENTRY_BITS(MAP_ANONYMOUS),
NTOA_TABLE_ENTRY_BITS(MAP_DENYWRITE),
NTOA_TABLE_ENTRY_BITS(MAP_EXECUTABLE),
#if MAP_FILE
#ifdef MAP_FILE
NTOA_TABLE_ENTRY_BITS(MAP_FILE),
#endif
NTOA_TABLE_ENTRY_BITS(MAP_FIXED),
@@ -527,6 +529,22 @@ namespace crucible {
THROW_ERROR(runtime_error, "readlink: maximum buffer size exceeded");
}
string
relative_path()
{
return __relative_path;
}
void
set_relative_path(string path)
{
path = path + "/";
for (string::size_type i = path.find("//"); i != string::npos; i = path.find("//")) {
path.erase(i, 1);
}
__relative_path = path;
}
// Turn a FD into a human-recognizable filename OR an error message.
string
name_fd(int fd)
@@ -534,7 +552,12 @@ namespace crucible {
try {
ostringstream oss;
oss << "/proc/self/fd/" << fd;
return readlink_or_die(oss.str());
string path = readlink_or_die(oss.str());
if (!__relative_path.empty() && 0 == path.find(__relative_path))
{
path.erase(0, __relative_path.length());
}
return path;
} catch (exception &e) {
return string(e.what());
}

View File

@@ -701,7 +701,7 @@ namespace crucible {
BtrfsIoctlSearchHeader::set_data(const vector<char> &v, size_t offset)
{
THROW_CHECK2(invalid_argument, offset, v.size(), offset + sizeof(btrfs_ioctl_search_header) <= v.size());
memcpy(this, &v[offset], sizeof(btrfs_ioctl_search_header));
*static_cast<btrfs_ioctl_search_header *>(this) = *reinterpret_cast<const btrfs_ioctl_search_header *>(&v[offset]);
offset += sizeof(btrfs_ioctl_search_header);
THROW_CHECK2(invalid_argument, offset + len, v.size(), offset + len <= v.size());
m_data = vector<char>(&v[offset], &v[offset + len]);

View File

@@ -3,6 +3,7 @@
#include "crucible/chatter.h"
#include "crucible/error.h"
#include <cstdlib>
#include <utility>
// for gettid()
@@ -109,13 +110,43 @@ namespace crucible {
}
}
template<>
struct ResourceHandle<Process::id, Process>;
pid_t
gettid()
{
return syscall(SYS_gettid);
}
double
getloadavg1()
{
double loadavg[1];
const int rv = ::getloadavg(loadavg, 1);
if (rv != 1) {
THROW_ERRNO("getloadavg(..., 1)");
}
return loadavg[0];
}
double
getloadavg5()
{
double loadavg[2];
const int rv = ::getloadavg(loadavg, 2);
if (rv != 2) {
THROW_ERRNO("getloadavg(..., 2)");
}
return loadavg[1];
}
double
getloadavg15()
{
double loadavg[3];
const int rv = ::getloadavg(loadavg, 3);
if (rv != 3) {
THROW_ERRNO("getloadavg(..., 3)");
}
return loadavg[2];
}
}

644
lib/task.cc Normal file
View File

@@ -0,0 +1,644 @@
#include "crucible/task.h"
#include "crucible/cleanup.h"
#include "crucible/error.h"
#include "crucible/process.h"
#include "crucible/time.h"
#include <atomic>
#include <cmath>
#include <condition_variable>
#include <list>
#include <map>
#include <mutex>
#include <set>
#include <thread>
namespace crucible {
using namespace std;
static thread_local weak_ptr<TaskState> tl_current_task_wp;
class TaskState : public enable_shared_from_this<TaskState> {
const function<void()> m_exec_fn;
const string m_title;
TaskId m_id;
static atomic<TaskId> s_next_id;
public:
TaskState(string title, function<void()> exec_fn);
void exec();
string title() const;
TaskId id() const;
};
atomic<TaskId> TaskState::s_next_id;
class TaskConsumer;
class TaskMasterState;
class TaskMasterState : public enable_shared_from_this<TaskMasterState> {
mutex m_mutex;
condition_variable m_condvar;
list<shared_ptr<TaskState>> m_queue;
size_t m_thread_max;
size_t m_thread_min = 0;
set<shared_ptr<TaskConsumer>> m_threads;
shared_ptr<thread> m_load_tracking_thread;
double m_load_target = 0;
double m_prev_loadavg;
size_t m_configured_thread_max;
double m_thread_target;
friend class TaskConsumer;
friend class TaskMaster;
void start_threads_nolock();
void start_stop_threads();
void set_thread_count(size_t thread_max);
void set_thread_min_count(size_t thread_min);
void adjust_thread_count();
size_t calculate_thread_count_nolock();
void set_loadavg_target(double target);
void loadavg_thread_fn();
public:
~TaskMasterState();
TaskMasterState(size_t thread_max = thread::hardware_concurrency());
static void push_back(shared_ptr<TaskState> task);
static void push_front(shared_ptr<TaskState> task);
size_t get_queue_count();
};
class TaskConsumer : public enable_shared_from_this<TaskConsumer> {
weak_ptr<TaskMasterState> m_master;
thread m_thread;
shared_ptr<TaskState> m_current_task;
void consumer_thread();
shared_ptr<TaskState> current_task_locked();
public:
TaskConsumer(weak_ptr<TaskMasterState> tms);
shared_ptr<TaskState> current_task();
friend class TaskMaster;
friend class TaskMasterState;
};
static shared_ptr<TaskMasterState> s_tms = make_shared<TaskMasterState>();
TaskState::TaskState(string title, function<void()> exec_fn) :
m_exec_fn(exec_fn),
m_title(title),
m_id(++s_next_id)
{
THROW_CHECK0(invalid_argument, !m_title.empty());
}
void
TaskState::exec()
{
THROW_CHECK0(invalid_argument, m_exec_fn);
THROW_CHECK0(invalid_argument, !m_title.empty());
char buf[24];
memset(buf, '\0', sizeof(buf));
DIE_IF_MINUS_ERRNO(pthread_getname_np(pthread_self(), buf, sizeof(buf)));
Cleanup pthread_name_cleaner([&]() {
pthread_setname_np(pthread_self(), buf);
});
DIE_IF_MINUS_ERRNO(pthread_setname_np(pthread_self(), m_title.c_str()));
weak_ptr<TaskState> this_task_wp = shared_from_this();
Cleanup current_task_cleaner([&]() {
swap(this_task_wp, tl_current_task_wp);
});
swap(this_task_wp, tl_current_task_wp);
m_exec_fn();
}
string
TaskState::title() const
{
THROW_CHECK0(runtime_error, !m_title.empty());
return m_title;
}
TaskId
TaskState::id() const
{
return m_id;
}
TaskMasterState::TaskMasterState(size_t thread_max) :
m_thread_max(thread_max),
m_configured_thread_max(thread_max),
m_thread_target(thread_max)
{
}
void
TaskMasterState::start_threads_nolock()
{
while (m_threads.size() < m_thread_max) {
m_threads.insert(make_shared<TaskConsumer>(shared_from_this()));
}
}
void
TaskMasterState::start_stop_threads()
{
unique_lock<mutex> lock(m_mutex);
while (m_threads.size() != m_thread_max) {
if (m_threads.size() < m_thread_max) {
m_threads.insert(make_shared<TaskConsumer>(shared_from_this()));
} else if (m_threads.size() > m_thread_max) {
m_condvar.wait(lock);
}
}
}
void
TaskMasterState::push_back(shared_ptr<TaskState> task)
{
THROW_CHECK0(runtime_error, task);
unique_lock<mutex> lock(s_tms->m_mutex);
s_tms->m_queue.push_back(task);
s_tms->m_condvar.notify_all();
s_tms->start_threads_nolock();
}
void
TaskMasterState::push_front(shared_ptr<TaskState> task)
{
THROW_CHECK0(runtime_error, task);
unique_lock<mutex> lock(s_tms->m_mutex);
s_tms->m_queue.push_front(task);
s_tms->m_condvar.notify_all();
s_tms->start_threads_nolock();
}
TaskMasterState::~TaskMasterState()
{
set_thread_count(0);
}
size_t
TaskMaster::get_queue_count()
{
unique_lock<mutex> lock(s_tms->m_mutex);
return s_tms->m_queue.size();
}
ostream &
TaskMaster::print_queue(ostream &os)
{
unique_lock<mutex> lock(s_tms->m_mutex);
os << "Queue (size " << s_tms->m_queue.size() << "):" << endl;
size_t counter = 0;
for (auto i : s_tms->m_queue) {
os << "Queue #" << ++counter << " Task ID " << i->id() << " " << i->title() << endl;
}
return os << "Queue End" << endl;
}
ostream &
TaskMaster::print_workers(ostream &os)
{
unique_lock<mutex> lock(s_tms->m_mutex);
os << "Workers (size " << s_tms->m_threads.size() << "):" << endl;
size_t counter = 0;
for (auto i : s_tms->m_threads) {
os << "Worker #" << ++counter << " ";
auto task = i->current_task_locked();
if (task) {
os << "Task ID " << task->id() << " " << task->title();
} else {
os << "(idle)";
}
os << endl;
}
return os << "Workers End" << endl;
}
size_t
TaskMasterState::calculate_thread_count_nolock()
{
if (m_load_target == 0) {
// No limits, no stats, use configured thread count
return m_configured_thread_max;
}
if (m_configured_thread_max == 0) {
// Not a lot of choice here, and zeros break the algorithm
return 0;
}
const double loadavg = getloadavg1();
static const double load_exp = exp(-5.0 / 60.0);
// Averages are fun, but want to know the load from the last 5 seconds.
// Invert the load average function:
// LA = LA * load_exp + N * (1 - load_exp)
// LA2 - LA1 = LA1 * load_exp + N * (1 - load_exp) - LA1
// LA2 - LA1 + LA1 = LA1 * load_exp + N * (1 - load_exp)
// LA2 - LA1 + LA1 - LA1 * load_exp = N * (1 - load_exp)
// LA2 - LA1 * load_exp = N * (1 - load_exp)
// LA2 / (1 - load_exp) - (LA1 * load_exp / 1 - load_exp) = N
// (LA2 - LA1 * load_exp) / (1 - load_exp) = N
// except for rounding error which might make this just a bit below zero.
const double current_load = max(0.0, (loadavg - m_prev_loadavg * load_exp) / (1 - load_exp));
m_prev_loadavg = loadavg;
// Change the thread target based on the
// difference between current and desired load
// but don't get too close all at once due to rounding and sample error.
// If m_load_target < 1.0 then we are just doing PWM with one thread.
if (m_load_target <= 1.0) {
m_thread_target = 1.0;
} else if (m_load_target - current_load >= 1.0) {
m_thread_target += (m_load_target - current_load - 1.0) / 2.0;
} else if (m_load_target < current_load) {
m_thread_target += m_load_target - current_load;
}
// Cannot exceed configured maximum thread count or less than zero
m_thread_target = min(max(0.0, m_thread_target), double(m_configured_thread_max));
// Convert to integer but keep within range
const size_t rv = max(m_thread_min, min(size_t(ceil(m_thread_target)), m_configured_thread_max));
return rv;
}
void
TaskMasterState::adjust_thread_count()
{
unique_lock<mutex> lock(m_mutex);
size_t new_thread_max = calculate_thread_count_nolock();
size_t old_thread_max = m_thread_max;
m_thread_max = new_thread_max;
// If we are reducing the number of threads we have to wake them up so they can exit their loops
// If we are increasing the number of threads we have to notify start_stop_threads it can stop waiting for threads to stop
if (new_thread_max != old_thread_max) {
m_condvar.notify_all();
start_threads_nolock();
}
}
void
TaskMasterState::set_thread_count(size_t thread_max)
{
unique_lock<mutex> lock(m_mutex);
m_configured_thread_max = thread_max;
lock.unlock();
adjust_thread_count();
start_stop_threads();
}
void
TaskMaster::set_thread_count(size_t thread_max)
{
s_tms->set_thread_count(thread_max);
}
void
TaskMasterState::set_thread_min_count(size_t thread_min)
{
unique_lock<mutex> lock(m_mutex);
m_thread_min = thread_min;
lock.unlock();
adjust_thread_count();
start_stop_threads();
}
void
TaskMaster::set_thread_min_count(size_t thread_min)
{
s_tms->set_thread_min_count(thread_min);
}
void
TaskMasterState::loadavg_thread_fn()
{
pthread_setname_np(pthread_self(), "load_tracker");
while (true) {
adjust_thread_count();
nanosleep(5.0);
}
}
void
TaskMasterState::set_loadavg_target(double target)
{
THROW_CHECK1(out_of_range, target, target >= 0);
unique_lock<mutex> lock(m_mutex);
m_load_target = target;
m_prev_loadavg = getloadavg1();
if (target && !m_load_tracking_thread) {
m_load_tracking_thread = make_shared<thread>([=] () { loadavg_thread_fn(); });
m_load_tracking_thread->detach();
}
}
void
TaskMaster::set_loadavg_target(double target)
{
s_tms->set_loadavg_target(target);
}
void
TaskMaster::set_thread_count()
{
set_thread_count(thread::hardware_concurrency());
}
Task::Task(shared_ptr<TaskState> pts) :
m_task_state(pts)
{
}
Task::Task(string title, function<void()> exec_fn) :
m_task_state(make_shared<TaskState>(title, exec_fn))
{
}
void
Task::run() const
{
THROW_CHECK0(runtime_error, m_task_state);
TaskMasterState::push_back(m_task_state);
}
void
Task::run_earlier() const
{
THROW_CHECK0(runtime_error, m_task_state);
TaskMasterState::push_front(m_task_state);
}
Task
Task::current_task()
{
return Task(tl_current_task_wp.lock());
}
string
Task::title() const
{
THROW_CHECK0(runtime_error, m_task_state);
return m_task_state->title();
}
ostream &
operator<<(ostream &os, const Task &task)
{
return os << task.title();
};
TaskId
Task::id() const
{
THROW_CHECK0(runtime_error, m_task_state);
return m_task_state->id();
}
bool
Task::operator<(const Task &that) const
{
return id() < that.id();
}
Task::operator bool() const
{
return !!m_task_state;
}
shared_ptr<TaskState>
TaskConsumer::current_task_locked()
{
return m_current_task;
}
shared_ptr<TaskState>
TaskConsumer::current_task()
{
auto master_locked = m_master.lock();
unique_lock<mutex> lock(master_locked->m_mutex);
return current_task_locked();
}
void
TaskConsumer::consumer_thread()
{
auto master_locked = m_master.lock();
while (true) {
unique_lock<mutex> lock(master_locked->m_mutex);
if (master_locked->m_thread_max < master_locked->m_threads.size()) {
break;
}
if (master_locked->m_queue.empty()) {
master_locked->m_condvar.wait(lock);
continue;
}
m_current_task = *master_locked->m_queue.begin();
master_locked->m_queue.pop_front();
lock.unlock();
catch_all([&]() {
m_current_task->exec();
});
lock.lock();
m_current_task.reset();
}
unique_lock<mutex> lock(master_locked->m_mutex);
m_thread.detach();
master_locked->m_threads.erase(shared_from_this());
master_locked->m_condvar.notify_all();
}
TaskConsumer::TaskConsumer(weak_ptr<TaskMasterState> tms) :
m_master(tms),
m_thread([=](){ consumer_thread(); })
{
}
class BarrierState {
mutex m_mutex;
set<Task> m_tasks;
void release();
public:
~BarrierState();
void insert_task(Task t);
};
Barrier::Barrier(shared_ptr<BarrierState> pbs) :
m_barrier_state(pbs)
{
}
Barrier::Barrier() :
m_barrier_state(make_shared<BarrierState>())
{
}
void
BarrierState::release()
{
unique_lock<mutex> lock(m_mutex);
for (auto i : m_tasks) {
i.run();
}
m_tasks.clear();
}
BarrierState::~BarrierState()
{
release();
}
BarrierLock::BarrierLock(shared_ptr<BarrierState> pbs) :
m_barrier_state(pbs)
{
}
void
BarrierLock::release()
{
m_barrier_state.reset();
}
void
BarrierState::insert_task(Task t)
{
unique_lock<mutex> lock(m_mutex);
m_tasks.insert(t);
}
void
Barrier::insert_task(Task t)
{
m_barrier_state->insert_task(t);
}
BarrierLock
Barrier::lock()
{
return BarrierLock(m_barrier_state);
}
class ExclusionState {
mutex m_mutex;
bool m_locked = false;
set<Task> m_tasks;
public:
~ExclusionState();
void release();
bool try_lock();
void insert_task(Task t);
};
Exclusion::Exclusion(shared_ptr<ExclusionState> pbs) :
m_exclusion_state(pbs)
{
}
Exclusion::Exclusion() :
m_exclusion_state(make_shared<ExclusionState>())
{
}
void
ExclusionState::release()
{
unique_lock<mutex> lock(m_mutex);
m_locked = false;
bool first = true;
for (auto i : m_tasks) {
if (first) {
i.run_earlier();
first = false;
} else {
i.run();
}
}
m_tasks.clear();
}
ExclusionState::~ExclusionState()
{
release();
}
ExclusionLock::ExclusionLock(shared_ptr<ExclusionState> pbs) :
m_exclusion_state(pbs)
{
}
void
ExclusionLock::release()
{
if (m_exclusion_state) {
m_exclusion_state->release();
m_exclusion_state.reset();
}
}
ExclusionLock::~ExclusionLock()
{
release();
}
void
ExclusionState::insert_task(Task task)
{
unique_lock<mutex> lock(m_mutex);
m_tasks.insert(task);
}
bool
ExclusionState::try_lock()
{
unique_lock<mutex> lock(m_mutex);
if (m_locked) {
return false;
} else {
m_locked = true;
return true;
}
}
void
Exclusion::insert_task(Task t)
{
m_exclusion_state->insert_task(t);
}
ExclusionLock::operator bool() const
{
return !!m_exclusion_state;
}
ExclusionLock
Exclusion::try_lock()
{
THROW_CHECK0(runtime_error, m_exclusion_state);
if (m_exclusion_state->try_lock()) {
return ExclusionLock(m_exclusion_state);
} else {
return ExclusionLock();
}
}
}

View File

@@ -1,11 +1,13 @@
#include "crucible/time.h"
#include "crucible/error.h"
#include "crucible/process.h"
#include <algorithm>
#include <thread>
#include <cmath>
#include <ctime>
#include <thread>
namespace crucible {
@@ -59,16 +61,10 @@ namespace crucible {
m_start = chrono::high_resolution_clock::now();
}
void
Timer::set(const chrono::high_resolution_clock::time_point &start)
chrono::high_resolution_clock::time_point
Timer::get() const
{
m_start = start;
}
void
Timer::set(double delta)
{
m_start += chrono::duration_cast<chrono::high_resolution_clock::duration>(chrono::duration<double>(delta));
return m_start;
}
double
@@ -155,4 +151,189 @@ namespace crucible {
m_tokens -= cost;
}
RateEstimator::RateEstimator(double min_delay, double max_delay) :
m_min_delay(min_delay),
m_max_delay(max_delay)
{
THROW_CHECK1(invalid_argument, min_delay, min_delay > 0);
THROW_CHECK1(invalid_argument, max_delay, max_delay > 0);
THROW_CHECK2(invalid_argument, min_delay, max_delay, max_delay > min_delay);
}
void
RateEstimator::update_unlocked(uint64_t new_count)
{
// Gradually reduce the effect of previous updates
if (m_last_decay.age() > 1) {
m_num *= m_decay;
m_den *= m_decay;
m_last_decay.reset();
}
// Add units over time to running totals
auto increment = new_count - min(new_count, m_last_count);
auto delta = max(0.0, m_last_update.lap());
m_num += increment;
m_den += delta;
m_last_count = new_count;
// If count increased, wake up any waiters
if (delta > 0) {
m_condvar.notify_all();
}
}
void
RateEstimator::update(uint64_t new_count)
{
unique_lock<mutex> lock(m_mutex);
return update_unlocked(new_count);
}
void
RateEstimator::update_monotonic(uint64_t new_count)
{
unique_lock<mutex> lock(m_mutex);
if (m_last_count == numeric_limits<uint64_t>::max() || new_count > m_last_count) {
return update_unlocked(new_count);
} else {
return update_unlocked(m_last_count);
}
}
uint64_t
RateEstimator::count() const
{
unique_lock<mutex> lock(m_mutex);
return m_last_count;
}
pair<double, double>
RateEstimator::ratio_unlocked() const
{
auto num = max(m_num, 1.0);
// auto den = max(m_den, 1.0);
// Rate estimation slows down if there are no new units to count
auto den = max(m_den + m_last_update.age(), 1.0);
auto sec_per_count = den / num;
if (sec_per_count < m_min_delay) {
return make_pair(1.0, m_min_delay);
}
if (sec_per_count > m_max_delay) {
return make_pair(1.0, m_max_delay);
}
return make_pair(num, den);
}
pair<double, double>
RateEstimator::ratio() const
{
unique_lock<mutex> lock(m_mutex);
return ratio_unlocked();
}
pair<double, double>
RateEstimator::raw() const
{
unique_lock<mutex> lock(m_mutex);
return make_pair(m_num, m_den);
}
double
RateEstimator::rate_unlocked() const
{
auto r = ratio_unlocked();
return r.first / r.second;
}
double
RateEstimator::rate() const
{
unique_lock<mutex> lock(m_mutex);
return rate_unlocked();
}
ostream &
operator<<(ostream &os, const RateEstimator &re)
{
os << "RateEstimator { ";
auto ratio = re.ratio();
auto raw = re.raw();
os << "count = " << re.count() << ", raw = " << raw.first << " / " << raw.second << ", ratio = " << ratio.first << " / " << ratio.second << ", rate = " << re.rate() << ", duration(1) = " << re.duration(1).count() << ", seconds_for(1) = " << re.seconds_for(1) << " }";
return os;
}
chrono::duration<double>
RateEstimator::duration_unlocked(uint64_t relative_count) const
{
auto dur = relative_count / rate_unlocked();
dur = min(m_max_delay, dur);
dur = max(m_min_delay, dur);
return chrono::duration<double>(dur);
}
chrono::duration<double>
RateEstimator::duration(uint64_t relative_count) const
{
unique_lock<mutex> lock(m_mutex);
return duration_unlocked(relative_count);
}
chrono::high_resolution_clock::time_point
RateEstimator::time_point_unlocked(uint64_t absolute_count) const
{
auto relative_count = absolute_count - min(m_last_count, absolute_count);
auto relative_duration = duration_unlocked(relative_count);
return m_last_update.get() + chrono::duration_cast<chrono::high_resolution_clock::duration>(relative_duration);
// return chrono::high_resolution_clock::now() + chrono::duration_cast<chrono::high_resolution_clock::duration>(relative_duration);
}
chrono::high_resolution_clock::time_point
RateEstimator::time_point(uint64_t absolute_count) const
{
unique_lock<mutex> lock(m_mutex);
return time_point_unlocked(absolute_count);
}
void
RateEstimator::wait_until(uint64_t new_count_absolute) const
{
unique_lock<mutex> lock(m_mutex);
auto saved_count = m_last_count;
while (saved_count <= m_last_count && m_last_count < new_count_absolute) {
// Stop waiting if clock runs backwards
saved_count = m_last_count;
m_condvar.wait(lock);
}
}
void
RateEstimator::wait_for(uint64_t new_count_relative) const
{
unique_lock<mutex> lock(m_mutex);
auto saved_count = m_last_count;
auto new_count_absolute = m_last_count + new_count_relative;
while (saved_count <= m_last_count && m_last_count < new_count_absolute) {
// Stop waiting if clock runs backwards
saved_count = m_last_count;
m_condvar.wait(lock);
}
}
double
RateEstimator::seconds_for(uint64_t new_count_relative) const
{
unique_lock<mutex> lock(m_mutex);
auto ts = time_point_unlocked(new_count_relative + m_last_count);
auto delta_dur = ts - chrono::high_resolution_clock::now();
return max(min(chrono::duration<double>(delta_dur).count(), m_max_delay), m_min_delay);
}
double
RateEstimator::seconds_until(uint64_t new_count_absolute) const
{
unique_lock<mutex> lock(m_mutex);
auto ts = time_point_unlocked(new_count_absolute);
auto delta_dur = ts - chrono::high_resolution_clock::now();
return max(min(chrono::duration<double>(delta_dur).count(), m_max_delay), m_min_delay);
}
}

View File

@@ -1,4 +1,11 @@
CCFLAGS = -Wall -Wextra -Werror -O3 -march=native -I../include -ggdb -fpic -D_FILE_OFFSET_BITS=64
# Default:
CCFLAGS = -Wall -Wextra -Werror -I../include -fpic -D_FILE_OFFSET_BITS=64
# Optimized:
# CCFLAGS = -Wall -Wextra -Werror -O3 -march=native -I../include -fpic -D_FILE_OFFSET_BITS=64
# Debug:
# CCFLAGS = -Wall -Wextra -Werror -O0 -I../include -ggdb -fpic -D_FILE_OFFSET_BITS=64
CFLAGS = $(CCFLAGS) -std=c99
CXXFLAGS = $(CCFLAGS) -std=c++11 -Wold-style-cast
CFLAGS += $(CCFLAGS) -std=c99
CXXFLAGS += $(CCFLAGS) -std=c++11 -Wold-style-cast

View File

@@ -15,11 +15,8 @@ UUID=xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
# BEESHOME="$MNT_DIR/.beeshome"
# BEESSTATUS="$WORK_DIR/$UUID.status"
## Make path shorter in logs
# LOG_SHORT_PATH=N
## Remove timestamp from bees output
# LOG_FILTER_TIME=N
## Options to apply, see `beesd --help` for details
# OPTIONS="--strip-paths --no-timestamps"
## Bees DB size
# Hash Table Sizing

View File

@@ -12,7 +12,71 @@ export CONFIG_FILE
export UUID AL16M
readonly AL16M="$((16*1024*1024))"
readonly CONFIG_DIR=@PREFIX@/etc/bees/
readonly CONFIG_DIR=@ETC_PREFIX@/bees/
readonly bees_bin=$(realpath @LIBEXEC_PREFIX@/bees)
command -v "$bees_bin" &> /dev/null || ERRO "Missing 'bees' agent"
uuid_valid(){
if uuidparse -n -o VARIANT $1 | grep -i -q invalid; then
false
fi
}
help(){
echo "Usage: beesd [options] <btrfs_uuid>"
echo "- - -"
exec "$bees_bin" --help
}
get_bees_supp_opts(){
"$bees_bin" --help |& awk '/--../ { gsub( ",", "" ); print $1 " " $2}'
}
SUPPORTED_ARGS=(
$(get_bees_supp_opts)
)
NOT_SUPPORTED_ARGS=()
ARGUMENTS=()
for arg in "${@}"; do
supp=false
for supp_arg in "${SUPPORTED_ARGS[@]}"; do
if [ "$arg" == "$supp_arg" ]; then
supp=true
break
fi
done
if $supp; then
ARGUMENTS+=($arg)
else
NOT_SUPPORTED_ARGS+=($arg)
fi
done
for arg in "${ARGUMENTS[@]}"; do
case $arg in
-h) help;;
--help) help;;
esac
done
for arg in "${NOT_SUPPORTED_ARGS[@]}"; do
if uuid_valid $arg; then
[ ! -z "$UUID" ] && help
UUID=$arg
fi
done
[ -z "$UUID" ] && help
FILE_CONFIG="$(egrep -l '^[^#]*UUID\s*=\s*"?'"$UUID" "$CONFIG_DIR"/*.conf | head -1)"
[ ! -f "$FILE_CONFIG" ] && ERRO "No config for $UUID"
INFO "Find $UUID in $FILE_CONFIG, use as conf"
source "$FILE_CONFIG"
## Pre checks
{
@@ -20,51 +84,12 @@ readonly CONFIG_DIR=@PREFIX@/etc/bees/
[ "$UID" == "0" ] || ERRO "Must be run as root"
}
command -v @LIBEXEC_PREFIX@/bees &> /dev/null || ERRO "Missing 'bees' agent"
## Parse args
ARGUMENTS=()
while [ $# -gt 0 ]; do
case "$1" in
-*)
ARGUMENTS+=($1)
;;
*)
if [ -z "$UUID" ]; then
UUID="$1"
else
ERRO "Only one filesystem may be supplied"
fi
;;
esac
shift
done
case "$UUID" in
*-*-*-*-*)
FILE_CONFIG=""
for file in "$CONFIG_DIR"/*.conf; do
[ ! -f "$file" ] && continue
if grep -q "$UUID" "$file"; then
INFO "Find $UUID in $file, use as conf"
FILE_CONFIG="$file"
fi
done
[ ! -f "$FILE_CONFIG" ] && ERRO "No config for $UUID"
source "$FILE_CONFIG"
;;
*)
echo "beesd [options] <btrfs_uuid>"
exit 1
;;
esac
WORK_DIR="${WORK_DIR:-/run/bees/}"
MNT_DIR="${MNT_DIR:-$WORK_DIR/mnt/$UUID}"
BEESHOME="${BEESHOME:-$MNT_DIR/.beeshome}"
BEESSTATUS="${BEESSTATUS:-$WORK_DIR/$UUID.status}"
DB_SIZE="${DB_SIZE:-$((64*AL16M))}"
LOG_SHORT_PATH="${LOG_SHORT_PATH:-N}"
INFO "Check: Disk exists"
if [ ! -b "/dev/disk/by-uuid/$UUID" ]; then
@@ -114,16 +139,7 @@ fi
chmod 700 "$DB_PATH"
}
MNT_DIR="${MNT_DIR//\/\//\/}"
MNT_DIR="$(realpath $MNT_DIR)"
filter_path(){
if YN $LOG_SHORT_PATH; then
sed -e "s#$MNT_DIR##g"
else
cat
fi
}
@LIBEXEC_PREFIX@/bees ${ARGUMENTS[@]} $OPTIONS "$MNT_DIR" 3>&1 2>&1 | filter_path
exit 0
cd "$MNT_DIR"
"$bees_bin" "${ARGUMENTS[@]}" $OPTIONS "$MNT_DIR"

View File

@@ -1,24 +1,24 @@
[Unit]
Description=Bees - Best-Effort Extent-Same, a btrfs deduplicator daemon: %i
After=local-fs.target
Description=Bees (%i)
Documentation=https://github.com/Zygo/bees
After=sysinit.target
[Service]
Type=simple
ExecStart=/usr/sbin/beesd %i
Nice=19
KillMode=control-group
KillSignal=SIGTERM
CPUShares=128
StartupCPUShares=256
BlockIOWeight=100
StartupBlockIOWeight=250
ExecStart=@PREFIX@/sbin/beesd --no-timestamps %i
CPUAccounting=true
CPUSchedulingPolicy=batch
CPUWeight=12
IOSchedulingClass=idle
IOSchedulingPriority=7
CPUSchedulingPolicy=batch
IOWeight=10
KillMode=control-group
KillSignal=SIGTERM
MemoryAccounting=true
Nice=19
Restart=on-abnormal
CPUAccounting=true
MemoryAccounting=true
StartupCPUWeight=25
StartupIOWeight=25
[Install]
WantedBy=local-fs.target
WantedBy=basic.target

1
src/.gitignore vendored
View File

@@ -1 +1,2 @@
bees-version.[ch]
bees-version.new.c

View File

@@ -3,29 +3,13 @@ PROGRAMS = \
../bin/fiemap \
../bin/fiewalk \
all: $(PROGRAMS) depends.mk
all: $(PROGRAMS)
include ../makeflags
-include ../localconf
LIBS = -lcrucible -lpthread
LDFLAGS = -L../lib -Wl,-rpath=$(shell realpath ../lib)
depends.mk: Makefile *.cc
for x in *.cc; do $(CXX) $(CXXFLAGS) -M "$$x"; done > depends.mk.new
mv -fv depends.mk.new depends.mk
bees-version.c: Makefile *.cc *.h
echo "const char *BEES_VERSION = \"$(shell git describe --always --dirty || echo UNKNOWN)\";" > bees-version.new.c
mv -f bees-version.new.c bees-version.c
-include depends.mk
%.o: %.cc %.h
$(CXX) $(CXXFLAGS) -o "$@" -c "$<"
../bin/%: %.o
@echo Implicit bin rule "$<" '->' "$@"
$(CXX) $(CXXFLAGS) -o "$@" "$<" $(LDFLAGS) $(LIBS)
LDFLAGS = -L../lib
BEES_OBJS = \
bees.o \
@@ -35,11 +19,30 @@ BEES_OBJS = \
bees-roots.o \
bees-thread.o \
bees-types.o \
bees-version.o \
../bin/bees: $(BEES_OBJS)
$(CXX) $(CXXFLAGS) -o "$@" $(BEES_OBJS) $(LDFLAGS) $(LIBS)
bees-version.c: bees.h $(BEES_OBJS:.o=.cc) Makefile
echo "const char *BEES_VERSION = \"$(BEES_VERSION)\";" > bees-version.new.c
mv -f bees-version.new.c bees-version.c
.depends/%.dep: %.cc Makefile
@mkdir -p .depends
$(CXX) $(CXXFLAGS) -M -MF $@ -MT $(<:.cc=.o) $<
depends.mk: $(BEES_OBJS:%.o=.depends/%.dep)
cat $^ > $@.new
mv -f $@.new $@
include depends.mk
%.o: %.cc %.h
$(CXX) $(CXXFLAGS) -o $@ -c $<
../bin/%: %.o
@echo Implicit bin rule "$<" '->' "$@"
$(CXX) $(CXXFLAGS) $(LDFLAGS) -o $@ $< $(LIBS)
../bin/bees: $(BEES_OBJS) bees-version.o
$(CXX) $(CXXFLAGS) $(LDFLAGS) -o $@ $^ $(LIBS)
clean:
-rm -fv bees-version.h
-rm -fv *.o bees-version.c
rm -fv *.o bees-version.c

View File

@@ -2,6 +2,7 @@
#include "crucible/limits.h"
#include "crucible/string.h"
#include "crucible/task.h"
#include <fstream>
#include <iostream>
@@ -10,17 +11,6 @@
using namespace crucible;
using namespace std;
static inline
const char *
getenv_or_die(const char *name)
{
const char *rv = getenv(name);
if (!rv) {
THROW_ERROR(runtime_error, "Environment variable " << name << " not defined");
}
return rv;
}
BeesFdCache::BeesFdCache()
{
m_root_cache.func([&](shared_ptr<BeesContext> ctx, uint64_t root) -> Fd {
@@ -29,39 +19,36 @@ BeesFdCache::BeesFdCache()
BEESCOUNTADD(open_root_ms, open_timer.age() * 1000);
return rv;
});
m_root_cache.max_size(BEES_ROOT_FD_CACHE_SIZE);
m_file_cache.func([&](shared_ptr<BeesContext> ctx, uint64_t root, uint64_t ino) -> Fd {
Timer open_timer;
auto rv = ctx->roots()->open_root_ino_nocache(root, ino);
BEESCOUNTADD(open_ino_ms, open_timer.age() * 1000);
return rv;
});
m_file_cache.max_size(BEES_FD_CACHE_SIZE);
m_file_cache.max_size(BEES_FILE_FD_CACHE_SIZE);
}
void
BeesFdCache::clear()
{
BEESNOTE("Clearing root FD cache to enable subvol delete");
m_root_cache.clear();
BEESCOUNT(root_clear);
BEESNOTE("Clearing open FD cache to enable file delete");
m_file_cache.clear();
BEESCOUNT(open_clear);
}
Fd
BeesFdCache::open_root(shared_ptr<BeesContext> ctx, uint64_t root)
{
// Don't hold root FDs open too long.
// The open FDs prevent snapshots from being deleted.
// cleaner_kthread just keeps skipping over the open dir and all its children.
if (m_root_cache_timer.age() > BEES_COMMIT_INTERVAL) {
BEESINFO("Clearing root FD cache to enable subvol delete");
m_root_cache.clear();
m_root_cache_timer.reset();
BEESCOUNT(root_clear);
}
return m_root_cache(ctx, root);
}
Fd
BeesFdCache::open_root_ino(shared_ptr<BeesContext> ctx, uint64_t root, uint64_t ino)
{
if (m_file_cache_timer.age() > BEES_COMMIT_INTERVAL) {
BEESINFO("Clearing open FD cache to enable file delete");
m_file_cache.clear();
m_file_cache_timer.reset();
BEESCOUNT(open_clear);
}
return m_file_cache(ctx, root, ino);
}
@@ -78,7 +65,7 @@ BeesContext::dump_status()
auto status_charp = getenv("BEESSTATUS");
if (!status_charp) return;
string status_file(status_charp);
BEESLOG("Writing status to file '" << status_file << "' every " << BEES_STATUS_INTERVAL << " sec");
BEESLOGINFO("Writing status to file '" << status_file << "' every " << BEES_STATUS_INTERVAL << " sec");
while (1) {
BEESNOTE("waiting " << BEES_STATUS_INTERVAL);
sleep(BEES_STATUS_INTERVAL);
@@ -93,11 +80,19 @@ BeesContext::dump_status()
ofs << "RATES:\n";
ofs << "\t" << avg_rates << "\n";
ofs << "THREADS:\n";
for (auto t : BeesNote::get_status()) {
ofs << "THREADS (work queue " << TaskMaster::get_queue_count() << " tasks):\n";
for (auto t : BeesNote::get_status()) {
ofs << "\ttid " << t.first << ": " << t.second << "\n";
}
#if 0
// Huge amount of data, not a lot of information (yet)
ofs << "WORKERS:\n";
TaskMaster::print_workers(ofs);
ofs << "QUEUE:\n";
TaskMaster::print_queue(ofs);
#endif
ofs.close();
BEESNOTE("renaming status file '" << status_file << "'");
@@ -119,24 +114,24 @@ BeesContext::show_progress()
auto thisStats = BeesStats::s_global;
auto avg_rates = lastStats / BEES_STATS_INTERVAL;
BEESLOG("TOTAL: " << thisStats);
BEESLOG("RATES: " << avg_rates);
BEESLOGINFO("TOTAL: " << thisStats);
BEESLOGINFO("RATES: " << avg_rates);
lastStats = thisStats;
}
BEESLOG("ACTIVITY:");
BEESLOGINFO("ACTIVITY:");
auto thisStats = BeesStats::s_global;
auto deltaStats = thisStats - lastProgressStats;
if (deltaStats) {
BEESLOG("\t" << deltaStats / BEES_PROGRESS_INTERVAL);
BEESLOGINFO("\t" << deltaStats / BEES_PROGRESS_INTERVAL);
};
lastProgressStats = thisStats;
BEESLOG("THREADS:");
BEESLOGINFO("THREADS:");
for (auto t : BeesNote::get_status()) {
BEESLOG("\ttid " << t.first << ": " << t.second);
for (auto t : BeesNote::get_status()) {
BEESLOGINFO("\ttid " << t.first << ": " << t.second);
}
}
}
@@ -144,6 +139,10 @@ BeesContext::show_progress()
Fd
BeesContext::home_fd()
{
if (!!m_home_fd) {
return m_home_fd;
}
const char *base_dir = getenv("BEESHOME");
if (!base_dir) {
base_dir = ".beeshome";
@@ -163,29 +162,35 @@ BeesContext::BeesContext(shared_ptr<BeesContext> parent) :
}
}
bool
BeesContext::is_root_ro(uint64_t root)
{
return roots()->is_root_ro(root);
}
bool
BeesContext::dedup(const BeesRangePair &brp)
{
// TOOLONG and NOTE can retroactively fill in the filename details, but LOG can't
BEESNOTE("dedup " << brp);
brp.first.fd(shared_from_this());
brp.second.fd(shared_from_this());
#if 0
// This avoids some sort of kernel race condition;
// however, it also doubles our dedup times.
// Is avoiding a crash every few weeks worth it?
bees_sync(brp.first.fd());
#endif
if (is_root_ro(brp.second.fid().root())) {
// BEESLOGDEBUG("WORKAROUND: dst subvol is read-only in " << name_fd(brp.second.fd()));
BEESCOUNT(dedup_workaround_btrfs_send);
return false;
}
brp.first.fd(shared_from_this());
BEESTOOLONG("dedup " << brp);
BeesAddress first_addr(brp.first.fd(), brp.first.begin());
BeesAddress second_addr(brp.second.fd(), brp.second.begin());
BEESLOG("dedup: src " << pretty(brp.first.size()) << " [" << to_hex(brp.first.begin()) << ".." << to_hex(brp.first.end()) << "] {" << first_addr << "} " << name_fd(brp.first.fd()));
BEESLOG(" dst " << pretty(brp.second.size()) << " [" << to_hex(brp.second.begin()) << ".." << to_hex(brp.second.end()) << "] {" << second_addr << "} " << name_fd(brp.second.fd()));
BEESLOGINFO("dedup: src " << pretty(brp.first.size()) << " [" << to_hex(brp.first.begin()) << ".." << to_hex(brp.first.end()) << "] {" << first_addr << "} " << name_fd(brp.first.fd()) << "\n"
<< " dst " << pretty(brp.second.size()) << " [" << to_hex(brp.second.begin()) << ".." << to_hex(brp.second.end()) << "] {" << second_addr << "} " << name_fd(brp.second.fd()));
if (first_addr.get_physical_or_zero() == second_addr.get_physical_or_zero()) {
BEESLOGTRACE("equal physical addresses in dedup");
@@ -210,7 +215,7 @@ BeesContext::dedup(const BeesRangePair &brp)
}
} else {
BEESCOUNT(dedup_miss);
BEESLOG("NO Dedup! " << brp);
BEESLOGWARN("NO Dedup! " << brp);
}
return rv;
@@ -286,7 +291,7 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
Extent::OBSCURED | Extent::PREALLOC
)) {
BEESCOUNT(scan_interesting);
BEESLOG("Interesting extent flags " << e << " from fd " << name_fd(bfr.fd()));
BEESLOGWARN("Interesting extent flags " << e << " from fd " << name_fd(bfr.fd()));
}
if (e.flags() & Extent::HOLE) {
@@ -298,8 +303,10 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
if (e.flags() & Extent::PREALLOC) {
// Prealloc is all zero and we replace it with a hole.
// No special handling is required here. Nuke it and move on.
BEESLOG("prealloc extent " << e);
BeesFileRange prealloc_bfr(m_ctx->tmpfile()->make_hole(e.size()));
BEESLOGINFO("prealloc extent " << e);
// Must not extend past EOF
auto extent_size = min(e.end(), bfr.file_size()) - e.begin();
BeesFileRange prealloc_bfr(m_ctx->tmpfile()->make_hole(extent_size));
BeesRangePair brp(prealloc_bfr, bfr);
// Raw dedup here - nothing else to do with this extent, nothing to merge with
if (m_ctx->dedup(brp)) {
@@ -312,7 +319,7 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
}
// OK we need to read extent now
posix_fadvise(bfr.fd(), bfr.begin(), bfr.size(), POSIX_FADV_WILLNEED);
readahead(bfr.fd(), bfr.begin(), bfr.size());
map<off_t, pair<BeesHash, BeesAddress>> insert_map;
set<off_t> noinsert_set;
@@ -372,7 +379,7 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
// Do not attempt to lookup hash of zero block
continue;
} else {
BEESLOG("zero bbd " << bbd << "\n\tin extent " << e);
BEESLOGINFO("zero bbd " << bbd << "\n\tin extent " << e);
BEESCOUNT(scan_zero_uncompressed);
rewrite_extent = true;
break;
@@ -429,7 +436,7 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
// Hash is toxic
if (found_addr.is_toxic()) {
BEESINFO("WORKAROUND: abandoned toxic match for hash " << hash << " addr " << found_addr);
BEESLOGWARN("WORKAROUND: abandoned toxic match for hash " << hash << " addr " << found_addr << " matching bbd " << bbd);
// Don't push these back in because we'll never delete them.
// Extents may become non-toxic so give them a chance to expire.
// hash_table->push_front_hash_addr(hash, found_addr);
@@ -446,7 +453,7 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
BeesResolver resolved(m_ctx, found_addr);
// Toxic extents are really toxic
if (resolved.is_toxic()) {
BEESINFO("WORKAROUND: abandoned toxic match at found_addr " << found_addr << " matching bbd " << bbd);
BEESLOGWARN("WORKAROUND: discovered toxic match at found_addr " << found_addr << " matching bbd " << bbd);
BEESCOUNT(scan_toxic_match);
// Make sure we never see this hash again.
// It has become toxic since it was inserted into the hash table.
@@ -488,7 +495,8 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
BeesAddress last_replaced_addr;
for (auto it = resolved_addrs.begin(); it != resolved_addrs.end(); ++it) {
catch_all([&]() {
// FIXME: Need to terminate this loop on replace_dst exception condition
// catch_all([&]() {
auto it_copy = *it;
BEESNOTE("finding one match (out of " << it_copy.count() << ") at " << it_copy.addr() << " for " << bbd);
BEESTRACE("finding one match (out of " << it_copy.count() << ") at " << it_copy.addr() << " for " << bbd);
@@ -500,7 +508,7 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
if (it_copy.found_hash()) {
BEESCOUNT(scan_hash_hit);
} else {
// BEESINFO("erase src hash " << hash << " addr " << it_copy.addr());
// BEESLOGDEBUG("erase src hash " << hash << " addr " << it_copy.addr());
BEESCOUNT(scan_hash_miss);
hash_table->erase_hash_addr(hash, it_copy.addr());
}
@@ -511,7 +519,7 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
// FIXME: we will thrash if we let multiple references to identical blocks
// exist in the hash table. Erase all but the last one.
if (last_replaced_addr) {
BEESLOG("Erasing redundant hash " << hash << " addr " << last_replaced_addr);
BEESLOGINFO("Erasing redundant hash " << hash << " addr " << last_replaced_addr);
hash_table->erase_hash_addr(hash, last_replaced_addr);
BEESCOUNT(scan_erase_redundant);
}
@@ -540,7 +548,7 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
} else {
BEESCOUNT(scan_dup_miss);
}
});
// });
}
if (last_replaced_addr) {
// If we replaced extents containing the incoming addr,
@@ -673,7 +681,7 @@ BeesContext::scan_one_extent(const BeesFileRange &bfr, const Extent &e)
// Visualize
if (bar != string(block_count, '.')) {
BEESLOG("scan: " << pretty(e.size()) << " " << to_hex(e.begin()) << " [" << bar << "] " << to_hex(e.end()) << ' ' << name_fd(bfr.fd()));
BEESLOGINFO("scan: " << pretty(e.size()) << " " << to_hex(e.begin()) << " [" << bar << "] " << to_hex(e.end()) << ' ' << name_fd(bfr.fd()));
}
return bfr;
@@ -703,14 +711,14 @@ BeesContext::scan_forward(const BeesFileRange &bfr)
// No FD? Well, that was quick.
if (!bfr.fd()) {
BEESINFO("No FD in " << root_path() << " for " << bfr);
// BEESLOGINFO("No FD in " << root_path() << " for " << bfr);
BEESCOUNT(scan_no_fd);
return bfr;
}
// Sanity check
if (bfr.begin() >= bfr.file_size()) {
BEESLOG("past EOF: " << bfr);
BEESLOGWARN("past EOF: " << bfr);
BEESCOUNT(scan_eof);
return bfr;
}
@@ -725,6 +733,9 @@ BeesContext::scan_forward(const BeesFileRange &bfr)
e = ew.current();
catch_all([&]() {
uint64_t extent_bytenr = e.bytenr();
BEESNOTE("waiting for extent bytenr " << to_hex(extent_bytenr));
auto extent_lock = m_extent_lock_set.make_lock(extent_bytenr);
Timer one_extent_timer;
return_bfr = scan_one_extent(bfr, e);
BEESCOUNTADD(scanf_extent_ms, one_extent_timer.age() * 1000);
@@ -751,11 +762,42 @@ BeesResolveAddrResult::BeesResolveAddrResult()
{
}
void
BeesContext::wait_for_balance()
{
Timer balance_timer;
BEESNOTE("WORKAROUND: waiting for balance to stop");
while (true) {
btrfs_ioctl_balance_args args;
memset_zero<btrfs_ioctl_balance_args>(&args);
const int ret = ioctl(root_fd(), BTRFS_IOC_BALANCE_PROGRESS, &args);
if (ret < 0) {
// Either can't get balance status or not running, exit either way
break;
}
if (!(args.state & BTRFS_BALANCE_STATE_RUNNING)) {
// Balance not running, doesn't matter if paused or cancelled
break;
}
BEESLOGDEBUG("WORKAROUND: Waiting " << balance_timer << "s for balance to stop");
sleep(BEES_BALANCE_POLL_INTERVAL);
}
}
BeesResolveAddrResult
BeesContext::resolve_addr_uncached(BeesAddress addr)
{
THROW_CHECK1(invalid_argument, addr, !addr.is_magic());
THROW_CHECK0(invalid_argument, !!root_fd());
// Is there a bug where resolve and balance cause a crash (BUG_ON at fs/btrfs/ctree.c:1227)?
// Apparently yes, and more than one.
// Wait for the balance to finish before we run LOGICAL_INO
wait_for_balance();
// Time how long this takes
Timer resolve_timer;
// There is no performance benefit if we restrict the buffer size.
@@ -780,7 +822,7 @@ BeesContext::resolve_addr_uncached(BeesAddress addr)
if (rt_age < BEES_TOXIC_DURATION && log_ino.m_iors.size() < BEES_MAX_EXTENT_REF_COUNT) {
rv.m_is_toxic = false;
} else {
BEESLOG("WORKAROUND: toxic address " << addr << " in " << root_path() << " with " << log_ino.m_iors.size() << " refs took " << rt_age << "s in LOGICAL_INO");
BEESLOGWARN("WORKAROUND: toxic address " << addr << " in " << root_path() << " with " << log_ino.m_iors.size() << " refs took " << rt_age << "s in LOGICAL_INO");
BEESCOUNT(resolve_toxic);
rv.m_is_toxic = true;
}
@@ -805,7 +847,7 @@ void
BeesContext::set_root_fd(Fd fd)
{
uint64_t root_fd_treeid = btrfs_get_root_id(fd);
BEESLOG("set_root_fd " << name_fd(fd));
BEESLOGINFO("set_root_fd " << name_fd(fd));
BEESTRACE("set_root_fd " << name_fd(fd));
THROW_CHECK1(invalid_argument, root_fd_treeid, root_fd_treeid == BTRFS_FS_TREE_OBJECTID);
Stat st(fd);
@@ -814,9 +856,10 @@ BeesContext::set_root_fd(Fd fd)
BtrfsIoctlFsInfoArgs fsinfo;
fsinfo.do_ioctl(fd);
m_root_uuid = fsinfo.uuid();
BEESLOG("Filesystem UUID is " << m_root_uuid);
BEESLOGINFO("Filesystem UUID is " << m_root_uuid);
// 65536 is big enough for two max-sized extents
// 65536 is big enough for two max-sized extents.
// Need enough total space in the cache for the maximum number of active threads.
m_resolve_cache.max_size(65536);
m_resolve_cache.func([&](BeesAddress addr) -> BeesResolveAddrResult {
return resolve_addr_uncached(addr);
@@ -825,13 +868,13 @@ BeesContext::set_root_fd(Fd fd)
// Start queue producers
roots();
BEESLOG("returning from set_root_fd in " << name_fd(fd));
BEESLOGINFO("returning from set_root_fd in " << name_fd(fd));
}
void
BeesContext::blacklist_add(const BeesFileId &fid)
{
BEESLOG("Adding " << fid << " to blacklist");
BEESLOGDEBUG("Adding " << fid << " to blacklist");
unique_lock<mutex> lock(m_blacklist_mutex);
m_blacklist.insert(fid);
}
@@ -900,7 +943,7 @@ BeesContext::hash_table()
void
BeesContext::set_root_path(string path)
{
BEESLOG("set_root_path " << path);
BEESLOGINFO("set_root_path " << path);
m_root_path = path;
set_root_fd(open_or_die(m_root_path, FLAGS_OPEN_DIR));
}

View File

@@ -24,14 +24,16 @@ operator<<(ostream &os, const BeesHashTable::Cell &bhte)
<< BeesAddress(bhte.e_addr) << " }";
}
#if 0
static
void
dump_bucket(BeesHashTable::Cell *p, BeesHashTable::Cell *q)
dump_bucket_locked(BeesHashTable::Cell *p, BeesHashTable::Cell *q)
{
// Must be called while holding m_bucket_mutex
for (auto i = p; i < q; ++i) {
BEESLOG("Entry " << i - p << " " << *i);
}
}
#endif
const bool VERIFY_CLEARS_BUGS = false;
@@ -44,7 +46,7 @@ verify_cell_range(BeesHashTable::Cell *p, BeesHashTable::Cell *q, bool clear_bug
for (BeesHashTable::Cell *cell = p; cell < q; ++cell) {
if (cell->e_addr && cell->e_addr < 0x1000) {
BEESCOUNT(bug_hash_magic_addr);
BEESINFO("Bad hash table address hash " << to_hex(cell->e_hash) << " addr " << to_hex(cell->e_addr));
BEESLOGDEBUG("Bad hash table address hash " << to_hex(cell->e_hash) << " addr " << to_hex(cell->e_addr));
if (clear_bugs) {
cell->e_addr = 0;
cell->e_hash = 0;
@@ -53,8 +55,8 @@ verify_cell_range(BeesHashTable::Cell *p, BeesHashTable::Cell *q, bool clear_bug
}
if (cell->e_addr && !seen_it.insert(*cell).second) {
BEESCOUNT(bug_hash_duplicate_cell);
// BEESLOG("Duplicate hash table entry:\nthis = " << *cell << "\nold = " << *seen_it.find(*cell));
BEESINFO("Duplicate hash table entry: " << *cell);
// BEESLOGDEBUG("Duplicate hash table entry:\nthis = " << *cell << "\nold = " << *seen_it.find(*cell));
BEESLOGDEBUG("Duplicate hash table entry: " << *cell);
if (clear_bugs) {
cell->e_addr = 0;
cell->e_hash = 0;
@@ -91,52 +93,74 @@ BeesHashTable::get_extent_range(HashType hash)
return make_pair(bp, ep);
}
bool
BeesHashTable::flush_dirty_extent(uint64_t extent_index)
{
BEESNOTE("flushing extent #" << extent_index << " of " << m_extents << " extents");
auto lock = lock_extent_by_index(extent_index);
// Not dirty, nothing to do
if (!m_extent_metadata.at(extent_index).m_dirty) {
return false;
}
bool wrote_extent = false;
catch_all([&]() {
uint8_t *dirty_extent = m_extent_ptr[extent_index].p_byte;
uint8_t *dirty_extent_end = m_extent_ptr[extent_index + 1].p_byte;
THROW_CHECK1(out_of_range, dirty_extent, dirty_extent >= m_byte_ptr);
THROW_CHECK1(out_of_range, dirty_extent_end, dirty_extent_end <= m_byte_ptr_end);
THROW_CHECK2(out_of_range, dirty_extent_end, dirty_extent, dirty_extent_end - dirty_extent == BLOCK_SIZE_HASHTAB_EXTENT);
BEESTOOLONG("pwrite(fd " << m_fd << " '" << name_fd(m_fd)<< "', length " << to_hex(dirty_extent_end - dirty_extent) << ", offset " << to_hex(dirty_extent - m_byte_ptr) << ")");
// Copy the extent because we might be stuck writing for a while
vector<uint8_t> extent_copy(dirty_extent, dirty_extent_end);
// Mark extent non-dirty while we still hold the lock
m_extent_metadata.at(extent_index).m_dirty = false;
// Release the lock
lock.unlock();
// Write the extent (or not)
pwrite_or_die(m_fd, extent_copy, dirty_extent - m_byte_ptr);
BEESCOUNT(hash_extent_out);
wrote_extent = true;
});
BEESNOTE("flush rate limited after extent #" << extent_index << " of " << m_extents << " extents");
m_flush_rate_limit.sleep_for(BLOCK_SIZE_HASHTAB_EXTENT);
return wrote_extent;
}
void
BeesHashTable::flush_dirty_extents()
{
THROW_CHECK1(runtime_error, m_buckets, m_buckets > 0);
unique_lock<mutex> lock(m_extent_mutex);
auto dirty_extent_copy = m_buckets_dirty;
m_buckets_dirty.clear();
if (dirty_extent_copy.empty()) {
BEESNOTE("idle");
m_condvar.wait(lock);
return; // please call later, i.e. immediately
uint64_t wrote_extents = 0;
for (size_t extent_index = 0; extent_index < m_extents; ++extent_index) {
if (flush_dirty_extent(extent_index)) {
++wrote_extents;
}
}
lock.unlock();
size_t extent_counter = 0;
for (auto extent_number : dirty_extent_copy) {
++extent_counter;
BEESNOTE("flush extent #" << extent_number << " (" << extent_counter << " of " << dirty_extent_copy.size() << ")");
catch_all([&]() {
uint8_t *dirty_extent = m_extent_ptr[extent_number].p_byte;
uint8_t *dirty_extent_end = m_extent_ptr[extent_number + 1].p_byte;
THROW_CHECK1(out_of_range, dirty_extent, dirty_extent >= m_byte_ptr);
THROW_CHECK1(out_of_range, dirty_extent_end, dirty_extent_end <= m_byte_ptr_end);
THROW_CHECK2(out_of_range, dirty_extent_end, dirty_extent, dirty_extent_end - dirty_extent == BLOCK_SIZE_HASHTAB_EXTENT);
BEESTOOLONG("pwrite(fd " << m_fd << " '" << name_fd(m_fd)<< "', length " << to_hex(dirty_extent_end - dirty_extent) << ", offset " << to_hex(dirty_extent - m_byte_ptr) << ")");
// Page locks slow us down more than copying the data does
vector<uint8_t> extent_copy(dirty_extent, dirty_extent_end);
pwrite_or_die(m_fd, extent_copy, dirty_extent - m_byte_ptr);
BEESCOUNT(hash_extent_out);
});
BEESNOTE("flush rate limited at extent #" << extent_number << " (" << extent_counter << " of " << dirty_extent_copy.size() << ")");
m_flush_rate_limit.sleep_for(BLOCK_SIZE_HASHTAB_EXTENT);
}
BEESNOTE("idle after writing " << wrote_extents << " of " << m_extents << " extents");
unique_lock<mutex> lock(m_dirty_mutex);
m_dirty_condvar.wait(lock);
}
void
BeesHashTable::set_extent_dirty(HashType hash)
BeesHashTable::set_extent_dirty_locked(uint64_t extent_index)
{
THROW_CHECK1(runtime_error, m_buckets, m_buckets > 0);
auto pr = get_extent_range(hash);
uint64_t extent_number = reinterpret_cast<Extent *>(pr.first) - m_extent_ptr;
THROW_CHECK1(runtime_error, extent_number, extent_number < m_extents);
unique_lock<mutex> lock(m_extent_mutex);
m_buckets_dirty.insert(extent_number);
m_condvar.notify_one();
// Must already be locked
m_extent_metadata.at(extent_index).m_dirty = true;
// Signal writeback thread
unique_lock<mutex> dirty_lock(m_dirty_mutex);
m_dirty_condvar.notify_one();
}
void
@@ -161,14 +185,8 @@ percent(size_t num, size_t den)
void
BeesHashTable::prefetch_loop()
{
// Always do the mlock, whether shared or not
THROW_CHECK1(runtime_error, m_size, m_size > 0);
catch_all([&]() {
BEESNOTE("mlock " << pretty(m_size));
DIE_IF_NON_ZERO(mlock(m_byte_ptr, m_size));
});
while (1) {
bool not_locked = true;
while (true) {
size_t width = 64;
vector<size_t> occupancy(width, 0);
size_t occupied_count = 0;
@@ -179,13 +197,13 @@ BeesHashTable::prefetch_loop()
size_t unaligned_eof_count = 0;
for (uint64_t ext = 0; ext < m_extents; ++ext) {
BEESNOTE("prefetching hash table extent " << ext << " of " << m_extent_ptr_end - m_extent_ptr);
BEESNOTE("prefetching hash table extent #" << ext << " of " << m_extents);
catch_all([&]() {
fetch_missing_extent(ext * c_buckets_per_extent);
fetch_missing_extent_by_index(ext);
BEESNOTE("analyzing hash table extent " << ext << " of " << m_extent_ptr_end - m_extent_ptr);
BEESNOTE("analyzing hash table extent #" << ext << " of " << m_extents);
bool duplicate_bugs_found = false;
unique_lock<mutex> lock(m_bucket_mutex);
auto lock = lock_extent_by_index(ext);
for (Bucket *bucket = m_extent_ptr[ext].p_buckets; bucket < m_extent_ptr[ext + 1].p_buckets; ++bucket) {
if (verify_cell_range(bucket[0].p_cells, bucket[1].p_cells)) {
duplicate_bugs_found = true;
@@ -214,9 +232,8 @@ BeesHashTable::prefetch_loop()
// Count these instead of calculating the number so we get better stats in case of exceptions
occupied_count += this_bucket_occupied_count;
}
lock.unlock();
if (duplicate_bugs_found) {
set_extent_dirty(ext);
set_extent_dirty_locked(ext);
}
});
}
@@ -252,8 +269,7 @@ BeesHashTable::prefetch_loop()
out << "\n";
}
size_t uncompressed_count = occupied_count - compressed_count;
size_t legacy_count = compressed_count - compressed_offset_count;
size_t uncompressed_count = occupied_count - compressed_offset_count;
ostringstream graph_blob;
@@ -264,9 +280,7 @@ BeesHashTable::prefetch_loop()
graph_blob
<< "\nHash table page occupancy histogram (" << occupied_count << "/" << total_count << " cells occupied, " << (occupied_count * 100 / total_count) << "%)\n"
<< out.str() << "0% | 25% | 50% | 75% | 100% page fill\n"
<< "compressed " << compressed_count << " (" << percent(compressed_count, occupied_count) << ")"
<< " new-style " << compressed_offset_count << " (" << percent(compressed_offset_count, occupied_count) << ")"
<< " old-style " << legacy_count << " (" << percent(legacy_count, occupied_count) << ")\n"
<< "compressed " << compressed_count << " (" << percent(compressed_count, occupied_count) << ")\n"
<< "uncompressed " << uncompressed_count << " (" << percent(uncompressed_count, occupied_count) << ")"
<< " unaligned_eof " << unaligned_eof_count << " (" << percent(unaligned_eof_count, occupied_count) << ")"
<< " toxic " << toxic_count << " (" << percent(toxic_count, occupied_count) << ")";
@@ -281,62 +295,93 @@ BeesHashTable::prefetch_loop()
auto avg_rates = thisStats / m_ctx->total_timer().age();
graph_blob << "\t" << avg_rates << "\n";
BEESLOG(graph_blob.str());
BEESLOGINFO(graph_blob.str());
catch_all([&]() {
m_stats_file.write(graph_blob.str());
});
if (not_locked) {
// Always do the mlock, whether shared or not
THROW_CHECK1(runtime_error, m_size, m_size > 0);
BEESLOGINFO("mlock(" << pretty(m_size) << ")...");
Timer lock_time;
catch_all([&]() {
BEESNOTE("mlock " << pretty(m_size));
DIE_IF_NON_ZERO(mlock(m_byte_ptr, m_size));
});
BEESLOGINFO("mlock(" << pretty(m_size) << ") done in " << lock_time << " sec");
not_locked = false;
}
BEESNOTE("idle " << BEES_HASH_TABLE_ANALYZE_INTERVAL << "s");
nanosleep(BEES_HASH_TABLE_ANALYZE_INTERVAL);
}
}
void
BeesHashTable::fetch_missing_extent(HashType hash)
size_t
BeesHashTable::hash_to_extent_index(HashType hash)
{
auto pr = get_extent_range(hash);
uint64_t extent_index = reinterpret_cast<const Extent *>(pr.first) - m_extent_ptr;
THROW_CHECK2(runtime_error, extent_index, m_extents, extent_index < m_extents);
return extent_index;
}
BeesHashTable::ExtentMetaData::ExtentMetaData() :
m_mutex_ptr(make_shared<mutex>())
{
}
unique_lock<mutex>
BeesHashTable::lock_extent_by_index(uint64_t extent_index)
{
THROW_CHECK2(out_of_range, extent_index, m_extents, extent_index < m_extents);
return unique_lock<mutex>(*m_extent_metadata.at(extent_index).m_mutex_ptr);
}
unique_lock<mutex>
BeesHashTable::lock_extent_by_hash(HashType hash)
{
BEESTOOLONG("fetch_missing_extent for hash " << to_hex(hash));
THROW_CHECK1(runtime_error, m_buckets, m_buckets > 0);
auto pr = get_extent_range(hash);
uint64_t extent_number = reinterpret_cast<Extent *>(pr.first) - m_extent_ptr;
THROW_CHECK1(runtime_error, extent_number, extent_number < m_extents);
return lock_extent_by_index(hash_to_extent_index(hash));
}
unique_lock<mutex> lock(m_extent_mutex);
if (!m_buckets_missing.count(extent_number)) {
void
BeesHashTable::fetch_missing_extent_by_index(uint64_t extent_index)
{
BEESNOTE("checking hash extent #" << extent_index << " of " << m_extents << " extents");
auto lock = lock_extent_by_index(extent_index);
if (!m_extent_metadata.at(extent_index).m_missing) {
return;
}
size_t missing_buckets = m_buckets_missing.size();
lock.unlock();
BEESNOTE("waiting to fetch hash extent #" << extent_number << ", " << missing_buckets << " left to fetch");
// Acquire blocking lock on this extent only
auto extent_lock = m_extent_lock_set.make_lock(extent_number);
// Check missing again because someone else might have fetched this
// extent for us while we didn't hold any locks
lock.lock();
if (!m_buckets_missing.count(extent_number)) {
BEESCOUNT(hash_extent_in_twice);
return;
}
lock.unlock();
// OK we have to read this extent
BEESNOTE("fetching hash extent #" << extent_number << ", " << missing_buckets << " left to fetch");
BEESNOTE("fetching hash extent #" << extent_index << " of " << m_extents << " extents");
BEESTRACE("Fetching hash extent #" << extent_index << " of " << m_extents << " extents");
BEESTOOLONG("Fetching hash extent #" << extent_index << " of " << m_extents << " extents");
BEESTRACE("Fetching missing hash extent " << extent_number);
uint8_t *dirty_extent = m_extent_ptr[extent_number].p_byte;
uint8_t *dirty_extent_end = m_extent_ptr[extent_number + 1].p_byte;
uint8_t *dirty_extent = m_extent_ptr[extent_index].p_byte;
uint8_t *dirty_extent_end = m_extent_ptr[extent_index + 1].p_byte;
{
// If the read fails don't retry, just go with whatever data we have
m_extent_metadata.at(extent_index).m_missing = false;
catch_all([&]() {
BEESTOOLONG("pread(fd " << m_fd << " '" << name_fd(m_fd)<< "', length " << to_hex(dirty_extent_end - dirty_extent) << ", offset " << to_hex(dirty_extent - m_byte_ptr) << ")");
pread_or_die(m_fd, dirty_extent, dirty_extent_end - dirty_extent, dirty_extent - m_byte_ptr);
}
});
// Only count extents successfully read
BEESCOUNT(hash_extent_in);
lock.lock();
m_buckets_missing.erase(extent_number);
}
void
BeesHashTable::fetch_missing_extent_by_hash(HashType hash)
{
uint64_t extent_index = hash_to_extent_index(hash);
BEESNOTE("waiting to fetch hash extent #" << extent_index << " of " << m_extents << " extents");
fetch_missing_extent_by_index(extent_index);
}
bool
@@ -358,10 +403,10 @@ BeesHashTable::find_cell(HashType hash)
rv.push_back(toxic_cell);
return rv;
}
fetch_missing_extent(hash);
fetch_missing_extent_by_hash(hash);
BEESTOOLONG("find_cell hash " << BeesHash(hash));
vector<Cell> rv;
unique_lock<mutex> lock(m_bucket_mutex);
auto lock = lock_extent_by_hash(hash);
auto er = get_cell_range(hash);
// FIXME: Weed out zero addresses in the table due to earlier bugs
copy_if(er.first, er.second, back_inserter(rv), [=](const Cell &ip) { return ip.e_hash == hash && ip.e_addr >= 0x1000; });
@@ -377,9 +422,9 @@ BeesHashTable::find_cell(HashType hash)
void
BeesHashTable::erase_hash_addr(HashType hash, AddrType addr)
{
fetch_missing_extent(hash);
fetch_missing_extent_by_hash(hash);
BEESTOOLONG("erase hash " << to_hex(hash) << " addr " << addr);
unique_lock<mutex> lock(m_bucket_mutex);
auto lock = lock_extent_by_hash(hash);
auto er = get_cell_range(hash);
Cell mv(hash, addr);
Cell *ip = find(er.first, er.second, mv);
@@ -387,11 +432,11 @@ BeesHashTable::erase_hash_addr(HashType hash, AddrType addr)
if (found) {
// Lookups on invalid addresses really hurt us. Kill it with fire!
*ip = Cell(0, 0);
set_extent_dirty(hash);
set_extent_dirty_locked(hash_to_extent_index(hash));
BEESCOUNT(hash_erase);
#if 0
if (verify_cell_range(er.first, er.second)) {
BEESINFO("while erasing hash " << hash << " addr " << addr);
BEESLOGDEBUG("while erasing hash " << hash << " addr " << addr);
}
#endif
}
@@ -405,9 +450,9 @@ BeesHashTable::erase_hash_addr(HashType hash, AddrType addr)
bool
BeesHashTable::push_front_hash_addr(HashType hash, AddrType addr)
{
fetch_missing_extent(hash);
fetch_missing_extent_by_hash(hash);
BEESTOOLONG("push_front_hash_addr hash " << BeesHash(hash) <<" addr " << BeesAddress(addr));
unique_lock<mutex> lock(m_bucket_mutex);
auto lock = lock_extent_by_hash(hash);
auto er = get_cell_range(hash);
Cell mv(hash, addr);
Cell *ip = find(er.first, er.second, mv);
@@ -437,12 +482,12 @@ BeesHashTable::push_front_hash_addr(HashType hash, AddrType addr)
// There is now a space at the front, insert there if different
if (er.first[0] != mv) {
er.first[0] = mv;
set_extent_dirty(hash);
set_extent_dirty_locked(hash_to_extent_index(hash));
BEESCOUNT(hash_front);
}
#if 0
if (verify_cell_range(er.first, er.second)) {
BEESINFO("while push_fronting hash " << hash << " addr " << addr);
BEESLOGDEBUG("while push_fronting hash " << hash << " addr " << addr);
}
#endif
return found;
@@ -456,9 +501,9 @@ BeesHashTable::push_front_hash_addr(HashType hash, AddrType addr)
bool
BeesHashTable::push_random_hash_addr(HashType hash, AddrType addr)
{
fetch_missing_extent(hash);
fetch_missing_extent_by_hash(hash);
BEESTOOLONG("push_random_hash_addr hash " << BeesHash(hash) << " addr " << BeesAddress(addr));
unique_lock<mutex> lock(m_bucket_mutex);
auto lock = lock_extent_by_hash(hash);
auto er = get_cell_range(hash);
Cell mv(hash, addr);
Cell *ip = find(er.first, er.second, mv);
@@ -521,14 +566,14 @@ BeesHashTable::push_random_hash_addr(HashType hash, AddrType addr)
case_cond = 5;
ret_dirty:
BEESCOUNT(hash_insert);
set_extent_dirty(hash);
set_extent_dirty_locked(hash_to_extent_index(hash));
ret:
#if 0
if (verify_cell_range(er.first, er.second, false)) {
BEESLOG("while push_randoming (case " << case_cond << ") pos " << pos
<< " ip " << (ip - er.first) << " " << mv);
// dump_bucket(saved.data(), saved.data() + saved.size());
// dump_bucket(er.first, er.second);
// dump_bucket_locked(saved.data(), saved.data() + saved.size());
// dump_bucket_locked(er.first, er.second);
}
#else
(void)case_cond;
@@ -543,9 +588,9 @@ BeesHashTable::try_mmap_flags(int flags)
THROW_CHECK1(out_of_range, m_size, m_size > 0);
Timer map_time;
catch_all([&]() {
BEESLOG("mapping hash table size " << m_size << " with flags " << mmap_flags_ntoa(flags));
BEESLOGINFO("mapping hash table size " << m_size << " with flags " << mmap_flags_ntoa(flags));
void *ptr = mmap_or_die(nullptr, m_size, PROT_READ | PROT_WRITE, flags, flags & MAP_ANONYMOUS ? -1 : int(m_fd), 0);
BEESLOG("mmap done in " << map_time << " sec");
BEESLOGINFO("mmap done in " << map_time << " sec");
m_cell_ptr = static_cast<Cell *>(ptr);
void *ptr_end = static_cast<uint8_t *>(ptr) + m_size;
m_cell_ptr_end = static_cast<Cell *>(ptr_end);
@@ -565,12 +610,15 @@ BeesHashTable::open_file()
// If that doesn't work, try to make a new one
if (!new_fd) {
string tmp_filename = m_filename + ".tmp";
BEESLOGNOTE("creating new hash table '" << tmp_filename << "'");
BEESNOTE("creating new hash table '" << tmp_filename << "'");
BEESLOGINFO("Creating new hash table '" << tmp_filename << "'");
unlinkat(m_ctx->home_fd(), tmp_filename.c_str(), 0);
new_fd = openat_or_die(m_ctx->home_fd(), tmp_filename, FLAGS_CREATE_FILE, 0700);
BEESLOGNOTE("truncating new hash table '" << tmp_filename << "' size " << m_size << " (" << pretty(m_size) << ")");
BEESNOTE("truncating new hash table '" << tmp_filename << "' size " << m_size << " (" << pretty(m_size) << ")");
BEESLOGINFO("Truncating new hash table '" << tmp_filename << "' size " << m_size << " (" << pretty(m_size) << ")");
ftruncate_or_die(new_fd, m_size);
BEESLOGNOTE("truncating new hash table '" << tmp_filename << "' -> '" << m_filename << "'");
BEESNOTE("truncating new hash table '" << tmp_filename << "' -> '" << m_filename << "'");
BEESLOGINFO("Truncating new hash table '" << tmp_filename << "' -> '" << m_filename << "'");
renameat_or_die(m_ctx->home_fd(), tmp_filename, m_ctx->home_fd(), m_filename);
}
@@ -614,13 +662,13 @@ BeesHashTable::BeesHashTable(shared_ptr<BeesContext> ctx, string filename, off_t
BEESTRACE("hash table bucket size " << BLOCK_SIZE_HASHTAB_BUCKET);
BEESTRACE("hash table extent size " << BLOCK_SIZE_HASHTAB_EXTENT);
BEESLOG("opened hash table filename '" << filename << "' length " << m_size);
BEESLOGINFO("opened hash table filename '" << filename << "' length " << m_size);
m_buckets = m_size / BLOCK_SIZE_HASHTAB_BUCKET;
m_cells = m_buckets * c_cells_per_bucket;
m_extents = (m_size + BLOCK_SIZE_HASHTAB_EXTENT - 1) / BLOCK_SIZE_HASHTAB_EXTENT;
BEESLOG("\tcells " << m_cells << ", buckets " << m_buckets << ", extents " << m_extents);
BEESLOGINFO("\tcells " << m_cells << ", buckets " << m_buckets << ", extents " << m_extents);
BEESLOG("\tflush rate limit " << BEES_FLUSH_RATE);
BEESLOGINFO("\tflush rate limit " << BEES_FLUSH_RATE);
// Try to mmap that much memory
try_mmap_flags(MAP_PRIVATE | MAP_ANONYMOUS);
@@ -648,13 +696,11 @@ BeesHashTable::BeesHashTable(shared_ptr<BeesContext> ctx, string filename, off_t
for (auto fp = madv_flags; fp->value; ++fp) {
BEESTOOLONG("madvise(" << fp->name << ")");
if (madvise(m_byte_ptr, m_size, fp->value)) {
BEESLOG("madvise(..., " << fp->name << "): " << strerror(errno) << " (ignored)");
BEESLOGWARN("madvise(..., " << fp->name << "): " << strerror(errno) << " (ignored)");
}
}
for (uint64_t i = 0; i < m_size / sizeof(Extent); ++i) {
m_buckets_missing.insert(i);
}
m_extent_metadata.resize(m_extents);
m_writeback_thread.exec([&]() {
writeback_loop();

View File

@@ -98,90 +98,77 @@ BeesResolver::adjust_offset(const BeesFileRange &haystack, const BeesBlockData &
return BeesBlockData();
}
off_t lower_offset = haystack.begin();
off_t upper_offset = haystack.end();
off_t haystack_offset = haystack.begin();
bool is_compressed_offset = false;
bool is_exact = false;
bool is_legacy = false;
if (m_addr.is_compressed()) {
BtrfsExtentWalker ew(haystack.fd(), haystack.begin(), m_ctx->root_fd());
BEESTRACE("haystack extent data " << ew);
Extent e = ew.current();
if (m_addr.has_compressed_offset()) {
off_t coff = m_addr.get_compressed_offset();
if (e.offset() > coff) {
// this extent begins after the target block
BEESCOUNT(adjust_offset_low);
return BeesBlockData();
}
coff -= e.offset();
if (e.size() <= coff) {
// this extent ends before the target block
BEESCOUNT(adjust_offset_high);
return BeesBlockData();
}
lower_offset = e.begin() + coff;
upper_offset = lower_offset + BLOCK_SIZE_CLONE;
BEESCOUNT(adjust_offset_hit);
is_compressed_offset = true;
} else {
lower_offset = e.begin();
upper_offset = e.end();
BEESCOUNT(adjust_legacy);
is_legacy = true;
THROW_CHECK1(runtime_error, m_addr, m_addr.has_compressed_offset());
off_t coff = m_addr.get_compressed_offset();
if (e.offset() > coff) {
// this extent begins after the target block
BEESCOUNT(adjust_offset_low);
return BeesBlockData();
}
coff -= e.offset();
if (e.size() <= coff) {
// this extent ends before the target block
BEESCOUNT(adjust_offset_high);
return BeesBlockData();
}
haystack_offset = e.begin() + coff;
BEESCOUNT(adjust_offset_hit);
is_compressed_offset = true;
} else {
BEESCOUNT(adjust_exact);
is_exact = true;
}
BEESTRACE("Checking haystack " << haystack << " offsets " << to_hex(lower_offset) << ".." << to_hex(upper_offset));
BEESTRACE("Checking haystack " << haystack << " offset " << to_hex(haystack_offset));
// Check all the blocks in the list
for (off_t haystack_offset = lower_offset; haystack_offset < upper_offset; haystack_offset += BLOCK_SIZE_CLONE) {
THROW_CHECK1(out_of_range, haystack_offset, (haystack_offset & BLOCK_MASK_CLONE) == 0);
THROW_CHECK1(out_of_range, haystack_offset, (haystack_offset & BLOCK_MASK_CLONE) == 0);
// Straw cannot extend beyond end of haystack
if (haystack_offset + needle.size() > haystack_size) {
BEESCOUNT(adjust_needle_too_long);
break;
}
// Read the haystack
BEESTRACE("straw " << name_fd(haystack.fd()) << ", offset " << to_hex(haystack_offset) << ", length " << needle.size());
BeesBlockData straw(haystack.fd(), haystack_offset, needle.size());
BEESTRACE("straw = " << straw);
// Stop if we find a match
if (straw.is_data_equal(needle)) {
BEESCOUNT(adjust_hit);
m_found_data = true;
m_found_hash = true;
if (is_compressed_offset) BEESCOUNT(adjust_compressed_offset_correct);
if (is_legacy) BEESCOUNT(adjust_legacy_correct);
if (is_exact) BEESCOUNT(adjust_exact_correct);
return straw;
}
if (straw.hash() != needle.hash()) {
// Not the same hash or data, try next block
BEESCOUNT(adjust_miss);
continue;
}
// Found the hash but not the data. Yay!
m_found_hash = true;
BEESLOG("HASH COLLISION\n"
<< "\tneedle " << needle << "\n"
<< "\tstraw " << straw);
BEESCOUNT(hash_collision);
// Straw cannot extend beyond end of haystack
if (haystack_offset + needle.size() > haystack_size) {
BEESCOUNT(adjust_needle_too_long);
return BeesBlockData();
}
// Read the haystack
BEESTRACE("straw " << name_fd(haystack.fd()) << ", offset " << to_hex(haystack_offset) << ", length " << needle.size());
BeesBlockData straw(haystack.fd(), haystack_offset, needle.size());
BEESTRACE("straw = " << straw);
// Stop if we find a match
if (straw.is_data_equal(needle)) {
BEESCOUNT(adjust_hit);
m_found_data = true;
m_found_hash = true;
if (is_compressed_offset) BEESCOUNT(adjust_compressed_offset_correct);
if (is_exact) BEESCOUNT(adjust_exact_correct);
return straw;
}
if (straw.hash() != needle.hash()) {
// Not the same hash or data, try next block
BEESCOUNT(adjust_miss);
return BeesBlockData();
}
// Found the hash but not the data. Yay!
m_found_hash = true;
BEESLOGINFO("HASH COLLISION\n"
<< "\tneedle " << needle << "\n"
<< "\tstraw " << straw);
BEESCOUNT(hash_collision);
// Ran out of offsets to try
BEESCOUNT(adjust_no_match);
if (is_compressed_offset) BEESCOUNT(adjust_compressed_offset_wrong);
if (is_legacy) BEESCOUNT(adjust_legacy_wrong);
if (is_exact) BEESCOUNT(adjust_exact_wrong);
m_wrong_data = true;
return BeesBlockData();
@@ -197,7 +184,7 @@ BeesResolver::chase_extent_ref(const BtrfsInodeOffsetRoot &bior, BeesBlockData &
Fd file_fd = m_ctx->roots()->open_root_ino(bior.m_root, bior.m_inum);
if (!file_fd) {
// Deleted snapshots generate craptons of these
// BEESINFO("No FD in chase_extent_ref " << bior);
// BEESLOGDEBUG("No FD in chase_extent_ref " << bior);
BEESCOUNT(chase_no_fd);
return BeesFileRange();
}
@@ -211,7 +198,7 @@ BeesResolver::chase_extent_ref(const BtrfsInodeOffsetRoot &bior, BeesBlockData &
// ...or are we?
if (file_addr.is_magic()) {
BEESINFO("file_addr is magic: file_addr = " << file_addr << " bior = " << bior << " needle_bbd = " << needle_bbd);
BEESLOGDEBUG("file_addr is magic: file_addr = " << file_addr << " bior = " << bior << " needle_bbd = " << needle_bbd);
BEESCOUNT(chase_wrong_magic);
return BeesFileRange();
}
@@ -220,7 +207,7 @@ BeesResolver::chase_extent_ref(const BtrfsInodeOffsetRoot &bior, BeesBlockData &
// Did we get the physical block we asked for? The magic bits have to match too,
// but the compressed offset bits do not.
if (file_addr.get_physical_or_zero() != m_addr.get_physical_or_zero()) {
// BEESINFO("found addr " << file_addr << " at " << name_fd(file_fd) << " offset " << to_hex(bior.m_offset) << " but looking for " << m_addr);
// BEESLOGDEBUG("found addr " << file_addr << " at " << name_fd(file_fd) << " offset " << to_hex(bior.m_offset) << " but looking for " << m_addr);
// FIEMAP/resolve are working, but the data is old.
BEESCOUNT(chase_wrong_addr);
return BeesFileRange();
@@ -243,7 +230,7 @@ BeesResolver::chase_extent_ref(const BtrfsInodeOffsetRoot &bior, BeesBlockData &
auto new_bbd = adjust_offset(haystack_bbd, needle_bbd);
if (new_bbd.empty()) {
// matching offset search failed
BEESCOUNT(chase_wrong_data);
BEESCOUNT(chase_no_data);
return BeesFileRange();
}
if (new_bbd.begin() == haystack_bbd.begin()) {
@@ -368,7 +355,8 @@ BeesResolver::for_each_extent_ref(BeesBlockData bbd, function<bool(const BeesFil
}
// Look at the old data
catch_all([&]() {
// FIXME: propagate exceptions for now. Proper fix requires a rewrite.
// catch_all([&]() {
BEESTRACE("chase_extent_ref ino " << ino_off_root << " bbd " << bbd);
auto new_range = chase_extent_ref(ino_off_root, bbd);
// XXX: should we catch visitor's exceptions here?
@@ -383,7 +371,7 @@ BeesResolver::for_each_extent_ref(BeesBlockData bbd, function<bool(const BeesFil
// to a different extent between them.
// stop_now = true;
}
});
// });
if (stop_now) {
break;
@@ -424,7 +412,8 @@ BeesResolver::replace_dst(const BeesFileRange &dst_bfr)
BeesBlockData src_bbd(src_bfr.fd(), src_bfr.begin(), min(BLOCK_SIZE_SUMS, src_bfr.size()));
if (bbd.addr().get_physical_or_zero() == src_bbd.addr().get_physical_or_zero()) {
BEESCOUNT(replacedst_same);
return false; // i.e. continue
// stop looping here, all the other srcs will probably fail this test too
throw runtime_error("FIXME: bailing out here, need to fix this further up the call stack");
}
// Make pair(src, dst)

File diff suppressed because it is too large Load Diff

View File

@@ -13,19 +13,16 @@ void
BeesThread::exec(function<void()> func)
{
m_timer.reset();
BEESLOG("BeesThread exec " << m_name);
BEESLOGDEBUG("BeesThread exec " << m_name);
m_thread_ptr = make_shared<thread>([=]() {
BEESLOG("Starting thread " << m_name);
BeesNote::set_name(m_name);
BEESLOGDEBUG("Starting thread " << m_name);
BEESNOTE("thread function");
Timer thread_time;
catch_all([&]() {
DIE_IF_MINUS_ERRNO(pthread_setname_np(pthread_self(), m_name.c_str()));
});
catch_all([&]() {
func();
});
BEESLOG("Exiting thread " << m_name << ", " << thread_time << " sec");
BEESLOGDEBUG("Exiting thread " << m_name << ", " << thread_time << " sec");
});
}
@@ -33,7 +30,7 @@ BeesThread::BeesThread(string name, function<void()> func) :
m_name(name)
{
THROW_CHECK1(invalid_argument, name, !name.empty());
BEESLOG("BeesThread construct " << m_name);
BEESLOGDEBUG("BeesThread construct " << m_name);
exec(func);
}
@@ -41,20 +38,20 @@ void
BeesThread::join()
{
if (!m_thread_ptr) {
BEESLOG("Thread " << m_name << " no thread ptr");
BEESLOGDEBUG("Thread " << m_name << " no thread ptr");
return;
}
BEESLOG("BeesThread::join " << m_name);
BEESLOGDEBUG("BeesThread::join " << m_name);
if (m_thread_ptr->joinable()) {
BEESLOG("Joining thread " << m_name);
BEESLOGDEBUG("Joining thread " << m_name);
Timer thread_time;
m_thread_ptr->join();
BEESLOG("Waited for " << m_name << ", " << thread_time << " sec");
BEESLOGDEBUG("Waited for " << m_name << ", " << thread_time << " sec");
} else if (!m_name.empty()) {
BEESLOG("BeesThread " << m_name << " not joinable");
BEESLOGDEBUG("BeesThread " << m_name << " not joinable");
} else {
BEESLOG("BeesThread else " << m_name);
BEESLOGDEBUG("BeesThread else " << m_name);
}
}
@@ -67,25 +64,25 @@ BeesThread::set_name(const string &name)
BeesThread::~BeesThread()
{
if (!m_thread_ptr) {
BEESLOG("Thread " << m_name << " no thread ptr");
BEESLOGDEBUG("Thread " << m_name << " no thread ptr");
return;
}
BEESLOG("BeesThread destructor " << m_name);
BEESLOGDEBUG("BeesThread destructor " << m_name);
if (m_thread_ptr->joinable()) {
BEESLOG("Cancelling thread " << m_name);
BEESLOGDEBUG("Cancelling thread " << m_name);
int rv = pthread_cancel(m_thread_ptr->native_handle());
if (rv) {
BEESLOG("pthread_cancel returned " << strerror(-rv));
BEESLOGDEBUG("pthread_cancel returned " << strerror(-rv));
}
BEESLOG("Waiting for thread " << m_name);
BEESLOGDEBUG("Waiting for thread " << m_name);
Timer thread_time;
m_thread_ptr->join();
BEESLOG("Waited for " << m_name << ", " << thread_time << " sec");
BEESLOGDEBUG("Waited for " << m_name << ", " << thread_time << " sec");
} else if (!m_name.empty()) {
BEESLOG("Thread " << m_name << " not joinable");
BEESLOGDEBUG("Thread " << m_name << " not joinable");
} else {
BEESLOG("Thread destroy else " << m_name);
BEESLOGDEBUG("Thread destroy else " << m_name);
}
}

View File

@@ -160,7 +160,8 @@ BeesFileRange::file_size() const
// lost a race (e.g. a file was truncated while we were building a
// matching range pair with it). In such cases we should probably stop
// whatever we were doing and backtrack to some higher level anyway.
THROW_CHECK1(invalid_argument, m_file_size, m_file_size > 0);
// Well, OK, but we call this function from exception handlers...
THROW_CHECK1(invalid_argument, m_file_size, m_file_size >= 0);
// THROW_CHECK2(invalid_argument, m_file_size, m_end, m_end <= m_file_size || m_end == numeric_limits<off_t>::max());
}
return m_file_size;
@@ -368,6 +369,7 @@ BeesRangePair::grow(shared_ptr<BeesContext> ctx, bool constrained)
BEESTOOLONG("grow constrained = " << constrained << " *this = " << *this);
BEESTRACE("grow constrained = " << constrained << " *this = " << *this);
bool rv = false;
Timer grow_backward_timer;
THROW_CHECK1(invalid_argument, first.begin(), (first.begin() & BLOCK_MASK_CLONE) == 0);
THROW_CHECK1(invalid_argument, second.begin(), (second.begin() & BLOCK_MASK_CLONE) == 0);
@@ -384,8 +386,8 @@ BeesRangePair::grow(shared_ptr<BeesContext> ctx, bool constrained)
BEESTRACE("e_second " << e_second);
// Preread entire extent
posix_fadvise(second.fd(), e_second.begin(), e_second.size(), POSIX_FADV_WILLNEED);
posix_fadvise(first.fd(), e_second.begin() + first.begin() - second.begin(), e_second.size(), POSIX_FADV_WILLNEED);
readahead(second.fd(), e_second.begin(), e_second.size());
readahead(first.fd(), e_second.begin() + first.begin() - second.begin(), e_second.size());
auto hash_table = ctx->hash_table();
@@ -404,7 +406,7 @@ BeesRangePair::grow(shared_ptr<BeesContext> ctx, bool constrained)
BEESCOUNT(pairbackward_hole);
break;
}
posix_fadvise(second.fd(), e_second.begin(), e_second.size(), POSIX_FADV_WILLNEED);
readahead(second.fd(), e_second.begin(), e_second.size());
#else
// This tends to repeatedly process extents that were recently processed.
// We tend to catch duplicate blocks early since we scan them forwards.
@@ -428,7 +430,7 @@ BeesRangePair::grow(shared_ptr<BeesContext> ctx, bool constrained)
if (!first_addr.is_magic()) {
auto first_resolved = ctx->resolve_addr(first_addr);
if (first_resolved.is_toxic()) {
BEESLOG("WORKAROUND: not growing matching pair backward because src addr is toxic:\n" << *this);
BEESLOGWARN("WORKAROUND: not growing matching pair backward because src addr is toxic:\n" << *this);
BEESCOUNT(pairbackward_toxic_addr);
break;
}
@@ -484,7 +486,7 @@ BeesRangePair::grow(shared_ptr<BeesContext> ctx, bool constrained)
}
}
if (found_toxic) {
BEESLOG("WORKAROUND: found toxic hash in " << first_bbd << " while extending backward:\n" << *this);
BEESLOGWARN("WORKAROUND: found toxic hash in " << first_bbd << " while extending backward:\n" << *this);
BEESCOUNT(pairbackward_toxic_hash);
break;
}
@@ -496,9 +498,11 @@ BeesRangePair::grow(shared_ptr<BeesContext> ctx, bool constrained)
BEESCOUNT(pairbackward_hit);
}
BEESCOUNT(pairbackward_stop);
BEESCOUNTADD(pairbackward_ms, grow_backward_timer.age() * 1000);
// Look forward
BEESTRACE("grow_forward " << *this);
Timer grow_forward_timer;
while (first.size() < BLOCK_SIZE_MAX_EXTENT) {
if (second.end() >= e_second.end()) {
if (constrained) {
@@ -511,7 +515,7 @@ BeesRangePair::grow(shared_ptr<BeesContext> ctx, bool constrained)
BEESCOUNT(pairforward_hole);
break;
}
posix_fadvise(second.fd(), e_second.begin(), e_second.size(), POSIX_FADV_WILLNEED);
readahead(second.fd(), e_second.begin(), e_second.size());
}
BEESCOUNT(pairforward_try);
@@ -529,7 +533,7 @@ BeesRangePair::grow(shared_ptr<BeesContext> ctx, bool constrained)
if (!first_addr.is_magic()) {
auto first_resolved = ctx->resolve_addr(first_addr);
if (first_resolved.is_toxic()) {
BEESLOG("WORKAROUND: not growing matching pair forward because src is toxic:\n" << *this);
BEESLOGWARN("WORKAROUND: not growing matching pair forward because src is toxic:\n" << *this);
BEESCOUNT(pairforward_toxic);
break;
}
@@ -593,7 +597,7 @@ BeesRangePair::grow(shared_ptr<BeesContext> ctx, bool constrained)
}
}
if (found_toxic) {
BEESLOG("WORKAROUND: found toxic hash in " << first_bbd << " while extending forward:\n" << *this);
BEESLOGWARN("WORKAROUND: found toxic hash in " << first_bbd << " while extending forward:\n" << *this);
BEESCOUNT(pairforward_toxic_hash);
break;
}
@@ -612,6 +616,7 @@ BeesRangePair::grow(shared_ptr<BeesContext> ctx, bool constrained)
}
BEESCOUNT(pairforward_stop);
BEESCOUNTADD(pairforward_ms, grow_forward_timer.age() * 1000);
return rv;
}
@@ -872,6 +877,9 @@ operator<<(ostream &os, const BeesBlockData &bbd)
os << ", hash = " << bbd.m_hash;
}
if (!bbd.m_data.empty()) {
// Turn this on to debug BeesBlockData, but leave it off otherwise.
// It's a massive data leak that is only interesting to developers.
#if 0
os << ", data[" << bbd.m_data.size() << "] = '";
size_t max_print = 12;
@@ -888,6 +896,9 @@ operator<<(ostream &os, const BeesBlockData &bbd)
}
}
os << "...'";
#else
os << ", data[" << bbd.m_data.size() << "]";
#endif
}
return os << " }";
}
@@ -934,9 +945,9 @@ BeesBlockData::data() const
BEESTOOLONG("Reading BeesBlockData " << *this);
Timer read_timer;
Blob rv(m_length);
Blob rv(size());
pread_or_die(m_fd, rv, m_offset);
THROW_CHECK2(runtime_error, rv.size(), m_length, ranged_cast<off_t>(rv.size()) == m_length);
THROW_CHECK2(runtime_error, rv.size(), size(), ranged_cast<off_t>(rv.size()) == size());
m_data = rv;
BEESCOUNT(block_read);
BEESCOUNTADD(block_bytes, rv.size());

View File

@@ -3,12 +3,14 @@
#include "crucible/limits.h"
#include "crucible/process.h"
#include "crucible/string.h"
#include "crucible/task.h"
#include <cctype>
#include <cmath>
#include <iostream>
#include <memory>
#include <sstream>
// PRIx64
#include <inttypes.h>
@@ -19,14 +21,21 @@
#include <linux/fs.h>
#include <sys/ioctl.h>
// setrlimit
#include <sys/time.h>
#include <sys/resource.h>
#include <getopt.h>
using namespace crucible;
using namespace std;
int
int bees_log_level = 8;
void
do_cmd_help(char *argv[])
{
// 80col 01234567890123456789012345678901234567890123456789012345678901234567890123456789
cerr << "Usage: " << argv[0] << " [options] fs-root-path [fs-root-path-2...]\n"
"Performs best-effort extent-same deduplication on btrfs.\n"
"\n"
@@ -34,25 +43,40 @@ do_cmd_help(char *argv[])
"Other directories will be rejected.\n"
"\n"
"Options:\n"
"\t-h, --help\t\tShow this help\n"
"\t-t, --timestamps\tShow timestamps in log output (default)\n"
"\t-T, --notimestamps\tOmit timestamps in log output\n"
" -h, --help Show this help\n"
"\n"
"Load management options:\n"
" -c, --thread-count Worker thread count (default CPU count * factor)\n"
" -C, --thread-factor Worker thread factor (default " << BEES_DEFAULT_THREAD_FACTOR << ")\n"
" -G, --thread-min Minimum worker thread count (default 0)\n"
" -g, --loadavg-target Target load average for worker threads (default none)\n"
"\n"
"Filesystem tree traversal options:\n"
" -m, --scan-mode Scanning mode (0..2, default 0)\n"
"\n"
"Workarounds:\n"
" -a, --workaround-btrfs-send Workaround for btrfs send\n"
"\n"
"Logging options:\n"
" -t, --timestamps Show timestamps in log output (default)\n"
" -T, --no-timestamps Omit timestamps in log output\n"
" -p, --absolute-paths Show absolute paths (default)\n"
" -P, --strip-paths Strip $CWD from beginning of all paths in the log\n"
" -v, --verbose Set maximum log level (0..8, default 8)\n"
"\n"
"Optional environment variables:\n"
"\tBEESHOME\tPath to hash table and configuration files\n"
"\t\t\t(default is .beeshome/ in the root of each filesystem).\n"
" BEESHOME Path to hash table and configuration files\n"
" (default is .beeshome/ in the root of each filesystem).\n"
"\n"
"\tBEESSTATUS\tFile to write status to (tmpfs recommended, e.g. /run).\n"
"\t\t\tNo status is written if this variable is unset.\n"
" BEESSTATUS File to write status to (tmpfs recommended, e.g. /run).\n"
" No status is written if this variable is unset.\n"
"\n"
// 80col 01234567890123456789012345678901234567890123456789012345678901234567890123456789
<< endl;
return 0;
}
// tracing ----------------------------------------
RateLimiter bees_info_rate_limit(BEES_INFO_RATE, BEES_INFO_BURST);
thread_local BeesTracer *BeesTracer::tl_next_tracer = nullptr;
BeesTracer::~BeesTracer()
@@ -61,12 +85,12 @@ BeesTracer::~BeesTracer()
try {
m_func();
} catch (exception &e) {
BEESLOG("Nested exception: " << e.what());
BEESLOGERR("Nested exception: " << e.what());
} catch (...) {
BEESLOG("Nested exception ...");
BEESLOGERR("Nested exception ...");
}
if (!m_next_tracer) {
BEESLOG("--- END TRACE --- exception ---");
BEESLOGERR("--- END TRACE --- exception ---");
}
}
tl_next_tracer = m_next_tracer;
@@ -83,12 +107,12 @@ void
BeesTracer::trace_now()
{
BeesTracer *tp = tl_next_tracer;
BEESLOG("--- BEGIN TRACE ---");
BEESLOGERR("--- BEGIN TRACE ---");
while (tp) {
tp->m_func();
tp = tp->m_next_tracer;
}
BEESLOG("--- END TRACE ---");
BEESLOGERR("--- END TRACE ---");
}
thread_local BeesNote *BeesNote::tl_next = nullptr;
@@ -101,36 +125,62 @@ BeesNote::~BeesNote()
tl_next = m_prev;
unique_lock<mutex> lock(s_mutex);
if (tl_next) {
s_status[gettid()] = tl_next;
s_status[crucible::gettid()] = tl_next;
} else {
s_status.erase(gettid());
s_status.erase(crucible::gettid());
}
}
BeesNote::BeesNote(function<void(ostream &os)> f) :
m_func(f)
{
m_name = tl_name;
m_name = get_name();
m_prev = tl_next;
tl_next = this;
unique_lock<mutex> lock(s_mutex);
s_status[gettid()] = tl_next;
s_status[crucible::gettid()] = tl_next;
}
void
BeesNote::set_name(const string &name)
{
tl_name = name;
catch_all([&]() {
DIE_IF_MINUS_ERRNO(pthread_setname_np(pthread_self(), name.c_str()));
});
}
string
BeesNote::get_name()
{
if (tl_name.empty()) {
return "bees";
} else {
// Use explicit name if given
if (!tl_name.empty()) {
return tl_name;
}
// Try a Task name. If there is one, return it, but do not
// remember it. Each output message may be a different Task.
// The current task is thread_local so we don't need to worry
// about it being destroyed under us.
auto current_task = Task::current_task();
if (current_task) {
return current_task.title();
}
// OK try the pthread name next.
char buf[24];
memset(buf, '\0', sizeof(buf));
int err = pthread_getname_np(pthread_self(), buf, sizeof(buf));
if (err) {
return string("pthread_getname_np: ") + strerror(err);
}
buf[sizeof(buf) - 1] = '\0';
// thread_getname_np returns process name
// ...by default? ...for the main thread?
// ...except during exception handling?
// ...randomly?
return buf;
}
BeesNote::ThreadStatusMap
@@ -154,20 +204,6 @@ BeesNote::get_status()
// static inline helpers ----------------------------------------
static inline
bool
bees_addr_check(uint64_t v)
{
return !(v & (1ULL << 63));
}
static inline
bool
bees_addr_check(int64_t v)
{
return !(v & (1ULL << 63));
}
string
pretty(double d)
{
@@ -326,7 +362,7 @@ BeesTooLong::check() const
if (age() > m_limit) {
ostringstream oss;
m_func(oss);
BEESLOG("PERFORMANCE: " << *this << " sec: " << oss.str());
BEESLOGWARN("PERFORMANCE: " << *this << " sec: " << oss.str());
}
}
@@ -358,7 +394,7 @@ BeesStringFile::BeesStringFile(Fd dir_fd, string name, size_t limit) :
m_name(name),
m_limit(limit)
{
BEESLOG("BeesStringFile " << name_fd(m_dir_fd) << "/" << m_name << " max size " << pretty(m_limit));
BEESLOGINFO("BeesStringFile " << name_fd(m_dir_fd) << "/" << m_name << " max size " << pretty(m_limit));
}
void
@@ -410,6 +446,12 @@ BeesStringFile::write(string contents)
// This triggers too many btrfs bugs. I wish I was kidding.
// Forget snapshots, balance, compression, and dedup:
// the system call you have to fear on btrfs is fsync().
// Also note that when bees renames a temporary over an
// existing file, it flushes the temporary, so we get
// the right behavior if we just do nothing here
// (except when the file is first created; however,
// in that case the result is the same as if the file
// did not exist, was empty, or was filled with garbage).
BEESNOTE("fsyncing " << tmpname << " in " << name_fd(m_dir_fd));
DIE_IF_NON_ZERO(fsync(ofd));
#endif
@@ -485,7 +527,7 @@ void
BeesTempFile::realign()
{
if (m_end_offset > BLOCK_SIZE_MAX_TEMP_FILE) {
BEESLOG("temporary file size " << to_hex(m_end_offset) << " > max " << BLOCK_SIZE_MAX_TEMP_FILE);
BEESLOGINFO("temporary file size " << to_hex(m_end_offset) << " > max " << BLOCK_SIZE_MAX_TEMP_FILE);
BEESCOUNT(tmp_trunc);
return create();
}
@@ -519,7 +561,7 @@ BeesTempFile::make_hole(off_t count)
BeesFileRange
BeesTempFile::make_copy(const BeesFileRange &src)
{
BEESLOG("copy: " << src);
BEESLOGINFO("copy: " << src);
BEESNOTE("Copying " << src);
BEESTRACE("Copying " << src);
@@ -570,9 +612,11 @@ BeesTempFile::make_copy(const BeesFileRange &src)
// We seem to get lockups without this!
if (did_block_write) {
#if 1
#if 0
// Is this fixed by "Btrfs: fix deadlock between dedup on same file and starting writeback"?
// No.
// Is this fixed in kernel 4.14.34?
// No.
bees_sync(m_fd);
#endif
}
@@ -585,66 +629,181 @@ int
bees_main(int argc, char *argv[])
{
set_catch_explainer([&](string s) {
BEESLOG("\n\n*** EXCEPTION ***\n\t" << s << "\n***\n");
BEESLOGERR("\n\n*** EXCEPTION ***\n\t" << s << "\n***\n");
BEESCOUNT(exception_caught);
});
// The thread name for the main function is also what the kernel
// Oops messages call the entire process. So even though this
// thread's proper title is "main", let's call it "bees".
BeesNote::set_name("bees");
BEESNOTE("main");
BeesNote::set_name("main");
list<shared_ptr<BeesContext>> all_contexts;
shared_ptr<BeesContext> bc;
THROW_CHECK1(invalid_argument, argc, argc >= 0);
// Create a context so we can apply configuration to it
shared_ptr<BeesContext> bc = make_shared<BeesContext>();
string cwd(readlink_or_die("/proc/self/cwd"));
// Defaults
bool chatter_prefix_timestamp = true;
double thread_factor = 0;
unsigned thread_count = 0;
unsigned thread_min = 0;
double load_target = 0;
bool workaround_btrfs_send = false;
BeesRoots::ScanMode root_scan_mode = BeesRoots::SCAN_MODE_ZERO;
// Configure getopt_long
static const struct option long_options[] = {
{ "thread-factor", required_argument, NULL, 'C' },
{ "thread-min", required_argument, NULL, 'G' },
{ "strip-paths", no_argument, NULL, 'P' },
{ "no-timestamps", no_argument, NULL, 'T' },
{ "workaround-btrfs-send", no_argument, NULL, 'a' },
{ "thread-count", required_argument, NULL, 'c' },
{ "loadavg-target", required_argument, NULL, 'g' },
{ "help", no_argument, NULL, 'h' },
{ "scan-mode", required_argument, NULL, 'm' },
{ "absolute-paths", no_argument, NULL, 'p' },
{ "timestamps", no_argument, NULL, 't' },
{ "verbose", required_argument, NULL, 'v' },
{ 0, 0, 0, 0 },
};
// Build getopt_long's short option list from the long_options table.
// While we're at it, make sure we didn't duplicate any options.
string getopt_list;
set<decltype(option::val)> option_vals;
for (const struct option *op = long_options; op->val; ++op) {
THROW_CHECK1(runtime_error, op->val, !option_vals.count(op->val));
option_vals.insert(op->val);
if ((op->val & 0xff) != op->val) {
continue;
}
getopt_list += op->val;
if (op->has_arg == required_argument) {
getopt_list += ':';
}
}
// Parse options
int c;
while (1) {
while (true) {
int option_index = 0;
static struct option long_options[] = {
{ "timestamps", no_argument, NULL, 't' },
{ "notimestamps", no_argument, NULL, 'T' },
{ "help", no_argument, NULL, 'h' }
};
c = getopt_long(argc, argv, "Tth", long_options, &option_index);
c = getopt_long(argc, argv, getopt_list.c_str(), long_options, &option_index);
if (-1 == c) {
break;
}
switch (c) {
case 'C':
thread_factor = stod(optarg);
break;
case 'G':
thread_min = stoul(optarg);
break;
case 'P':
crucible::set_relative_path(cwd);
break;
case 'T':
chatter_prefix_timestamp = false;
break;
case 'a':
workaround_btrfs_send = true;
break;
case 'c':
thread_count = stoul(optarg);
break;
case 'g':
load_target = stod(optarg);
break;
case 'm':
root_scan_mode = static_cast<BeesRoots::ScanMode>(stoul(optarg));
break;
case 'p':
crucible::set_relative_path("");
break;
case 't':
chatter_prefix_timestamp = true;
break;
case 'v':
{
int new_log_level = stoul(optarg);
THROW_CHECK1(out_of_range, new_log_level, new_log_level <= 8);
THROW_CHECK1(out_of_range, new_log_level, new_log_level >= 0);
bees_log_level = new_log_level;
BEESLOGNOTICE("log level set to " << bees_log_level);
}
break;
case 'h':
do_cmd_help(argv); // fallthrough
default:
return 2;
do_cmd_help(argv);
return EXIT_FAILURE;
}
}
if (optind + 1 != argc) {
BEESLOGERR("Only one filesystem path per bees process");
return EXIT_FAILURE;
}
Chatter::enable_timestamp(chatter_prefix_timestamp);
// Create a context and start crawlers
bool did_subscription = false;
while (optind < argc) {
catch_all([&]() {
bc = make_shared<BeesContext>(bc);
bc->set_root_path(argv[optind++]);
did_subscription = true;
});
if (!relative_path().empty()) {
BEESLOGINFO("using relative path " << relative_path() << "\n");
}
if (!did_subscription) {
BEESLOG("WARNING: no filesystems added");
BEESLOGINFO("setting rlimit NOFILE to " << BEES_OPEN_FILE_LIMIT);
struct rlimit lim = {
.rlim_cur = BEES_OPEN_FILE_LIMIT,
.rlim_max = BEES_OPEN_FILE_LIMIT,
};
int rv = setrlimit(RLIMIT_NOFILE, &lim);
if (rv) {
BEESLOGINFO("setrlimit(RLIMIT_NOFILE, { " << lim.rlim_cur << " }): " << strerror(errno));
};
// Set up worker thread pool
THROW_CHECK1(out_of_range, thread_factor, thread_factor >= 0);
if (thread_count < 1) {
if (thread_factor == 0) {
thread_factor = BEES_DEFAULT_THREAD_FACTOR;
}
thread_count = max(1U, static_cast<unsigned>(ceil(thread::hardware_concurrency() * thread_factor)));
if (thread_count > BEES_DEFAULT_THREAD_LIMIT) {
BEESLOGNOTICE("Limiting computed thread count to " << BEES_DEFAULT_THREAD_LIMIT);
BEESLOGNOTICE("Use --thread-count to override this limit");
thread_count = BEES_DEFAULT_THREAD_LIMIT;
}
}
if (load_target != 0) {
BEESLOGNOTICE("setting load average target to " << load_target);
BEESLOGNOTICE("setting worker thread pool minimum size to " << thread_min);
TaskMaster::set_thread_min_count(thread_min);
}
TaskMaster::set_loadavg_target(load_target);
BEESLOGNOTICE("setting worker thread pool maximum size to " << thread_count);
TaskMaster::set_thread_count(thread_count);
// Set root path
string root_path = argv[optind++];
BEESLOGNOTICE("setting root path to '" << root_path << "'");
bc->set_root_path(root_path);
// Workaround for btrfs send
bc->roots()->set_workaround_btrfs_send(workaround_btrfs_send);
// Set root scan mode
bc->roots()->set_scan_mode(root_scan_mode);
BeesThread status_thread("status", [&]() {
bc->dump_status();
});
@@ -653,7 +812,7 @@ bees_main(int argc, char *argv[])
bc->show_progress();
// That is all.
return 0;
return EXIT_SUCCESS;
}
int
@@ -663,7 +822,7 @@ main(int argc, char *argv[])
if (argc < 2) {
do_cmd_help(argv);
return 2;
return EXIT_FAILURE;
}
int rv = 1;

View File

@@ -8,17 +8,18 @@
#include "crucible/fd.h"
#include "crucible/fs.h"
#include "crucible/lockset.h"
#include "crucible/progress.h"
#include "crucible/time.h"
#include "crucible/timequeue.h"
#include "crucible/workqueue.h"
#include "crucible/task.h"
#include <array>
#include <atomic>
#include <functional>
#include <list>
#include <mutex>
#include <string>
#include <thread>
#include <syslog.h>
#include <endian.h>
using namespace crucible;
@@ -60,11 +61,8 @@ const off_t BLOCK_SIZE_HASHTAB_EXTENT = 16 * 1024 * 1024;
// Bytes per second we want to flush (8GB every two hours)
const double BEES_FLUSH_RATE = 8.0 * 1024 * 1024 * 1024 / 7200.0;
// How long we should wait for new btrfs transactions
const double BEES_COMMIT_INTERVAL = 900;
// Interval between writing non-hash-table things to disk, and starting new subvol crawlers
const int BEES_WRITEBACK_INTERVAL = BEES_COMMIT_INTERVAL;
// Interval between writing crawl state to disk
const int BEES_WRITEBACK_INTERVAL = 900;
// Statistics reports while scanning
const int BEES_STATS_INTERVAL = 3600;
@@ -75,33 +73,53 @@ const int BEES_PROGRESS_INTERVAL = BEES_STATS_INTERVAL;
// Status is output every freakin second. Use a ramdisk.
const int BEES_STATUS_INTERVAL = 1;
// Number of FDs to open (not counting 100 roots)
const size_t BEES_FD_CACHE_SIZE = 384;
// Number of file FDs to cache when not in active use
const size_t BEES_FILE_FD_CACHE_SIZE = 4096;
// Number of root FDs to cache when not in active use
const size_t BEES_ROOT_FD_CACHE_SIZE = 1024;
// Number of FDs to open (rlimit)
const size_t BEES_OPEN_FILE_LIMIT = (BEES_FILE_FD_CACHE_SIZE + BEES_ROOT_FD_CACHE_SIZE) * 2 + 100;
// Worker thread factor (multiplied by detected number of CPU cores)
const double BEES_DEFAULT_THREAD_FACTOR = 1.0;
// Don't use more than this number of threads unless explicitly configured
const size_t BEES_DEFAULT_THREAD_LIMIT = 8;
// Log warnings when an operation takes too long
const double BEES_TOO_LONG = 2.5;
const double BEES_TOO_LONG = 5.0;
// Avoid any extent where LOGICAL_INO takes this long
const double BEES_TOXIC_DURATION = 9.9;
// EXPERIMENT: Kernel v4.14+ may let us ignore toxicity
// NOPE: kernel 4.14 has the same toxicity problems as any previous kernel
// const double BEES_TOXIC_DURATION = 99.9;
// How long between hash table histograms
const double BEES_HASH_TABLE_ANALYZE_INTERVAL = BEES_STATS_INTERVAL;
// Rate limiting of informational messages
const double BEES_INFO_RATE = 10.0;
const double BEES_INFO_BURST = 1.0;
// After we have this many events queued, wait
const size_t BEES_MAX_QUEUE_SIZE = 1024;
// Stop growing the work queue after we have this many tasks queued
const size_t BEES_MAX_QUEUE_SIZE = 128;
// Read this many items at a time in SEARCHv2
const size_t BEES_MAX_CRAWL_SIZE = 4096;
const size_t BEES_MAX_CRAWL_SIZE = 1024;
// Insert this many items before switching to a new subvol
const size_t BEES_MAX_CRAWL_BATCH = 128;
// Wait this many transids between crawls
const size_t BEES_TRANSID_FACTOR = 10;
// If an extent has this many refs, pretend it does not exist
// to avoid a crippling btrfs performance bug
// The actual limit in LOGICAL_INO seems to be 2730, but let's leave a little headroom
const size_t BEES_MAX_EXTENT_REF_COUNT = 2560;
// Wait this long for a balance to stop
const double BEES_BALANCE_POLL_INTERVAL = 60.0;
// Flags
const int FLAGS_OPEN_COMMON = O_NOFOLLOW | O_NONBLOCK | O_CLOEXEC | O_NOATIME | O_LARGEFILE | O_NOCTTY;
const int FLAGS_OPEN_DIR = FLAGS_OPEN_COMMON | O_RDONLY | O_DIRECTORY;
@@ -115,21 +133,18 @@ const int FLAGS_OPEN_FANOTIFY = O_RDWR | O_NOATIME | O_CLOEXEC | O_LARGEFILE;
// macros ----------------------------------------
#define BEESLOG(x) do { Chatter c(BeesNote::get_name()); c << x; } while (0)
#define BEESLOGTRACE(x) do { BEESLOG(x); BeesTracer::trace_now(); } while (0)
#define BEESLOG(lv,x) do { if (lv < bees_log_level) { Chatter c(lv, BeesNote::get_name()); c << x; } } while (0)
#define BEESLOGTRACE(x) do { BEESLOG(LOG_DEBUG, x); BeesTracer::trace_now(); } while (0)
#define BEESTRACE(x) BeesTracer SRSLY_WTF_C(beesTracer_, __LINE__) ([&]() { BEESLOG(x); })
#define BEESTRACE(x) BeesTracer SRSLY_WTF_C(beesTracer_, __LINE__) ([&]() { BEESLOG(LOG_ERR, x); })
#define BEESTOOLONG(x) BeesTooLong SRSLY_WTF_C(beesTooLong_, __LINE__) ([&](ostream &_btl_os) { _btl_os << x; })
#define BEESNOTE(x) BeesNote SRSLY_WTF_C(beesNote_, __LINE__) ([&](ostream &_btl_os) { _btl_os << x; })
#define BEESINFO(x) do { \
if (bees_info_rate_limit.is_ready()) { \
bees_info_rate_limit.borrow(1); \
Chatter c(BeesNote::get_name()); \
c << x; \
} \
} while (0)
#define BEESLOGNOTE(x) BEESLOG(x); BEESNOTE(x)
#define BEESLOGERR(x) BEESLOG(LOG_ERR, x)
#define BEESLOGWARN(x) BEESLOG(LOG_WARNING, x)
#define BEESLOGNOTICE(x) BEESLOG(LOG_NOTICE, x)
#define BEESLOGINFO(x) BEESLOG(LOG_INFO, x)
#define BEESLOGDEBUG(x) BEESLOG(LOG_DEBUG, x)
#define BEESCOUNT(stat) do { \
BeesStats::s_global.add_count(#stat); \
@@ -158,7 +173,7 @@ public:
T at(string idx) const;
friend ostream& operator<< <>(ostream &os, const BeesStatTmpl<T> &bs);
friend class BeesStats;
friend struct BeesStats;
};
using BeesRates = BeesStatTmpl<double>;
@@ -177,7 +192,7 @@ class BeesBlockData;
class BeesTracer {
function<void()> m_func;
BeesTracer *m_next_tracer = 0;
thread_local static BeesTracer *tl_next_tracer;
public:
BeesTracer(function<void()> f);
@@ -432,18 +447,24 @@ private:
uint64_t m_buckets;
uint64_t m_extents;
uint64_t m_cells;
set<uint64_t> m_buckets_dirty;
set<uint64_t> m_buckets_missing;
BeesThread m_writeback_thread;
BeesThread m_prefetch_thread;
RateLimiter m_flush_rate_limit;
mutex m_extent_mutex;
mutex m_bucket_mutex;
condition_variable m_condvar;
set<HashType> m_toxic_hashes;
BeesStringFile m_stats_file;
LockSet<uint64_t> m_extent_lock_set;
// Mutex/condvar for the writeback thread
mutex m_dirty_mutex;
condition_variable m_dirty_condvar;
// Per-extent structures
struct ExtentMetaData {
shared_ptr<mutex> m_mutex_ptr; // Access serializer
bool m_dirty = false; // Needs to be written back to disk
bool m_missing = true; // Needs to be read from disk
ExtentMetaData();
};
vector<ExtentMetaData> m_extent_metadata;
void open_file();
void writeback_loop();
@@ -451,11 +472,17 @@ private:
void try_mmap_flags(int flags);
pair<Cell *, Cell *> get_cell_range(HashType hash);
pair<uint8_t *, uint8_t *> get_extent_range(HashType hash);
void fetch_missing_extent(HashType hash);
void set_extent_dirty(HashType hash);
void fetch_missing_extent_by_hash(HashType hash);
void fetch_missing_extent_by_index(uint64_t extent_index);
void set_extent_dirty_locked(uint64_t extent_index);
void flush_dirty_extents();
bool flush_dirty_extent(uint64_t extent_index);
bool is_toxic_hash(HashType h) const;
size_t hash_to_extent_index(HashType ht);
unique_lock<mutex> lock_extent_by_hash(HashType ht);
unique_lock<mutex> lock_extent_by_index(uint64_t extent_index);
BeesHashTable(const BeesHashTable &) = delete;
BeesHashTable &operator=(const BeesHashTable &) = delete;
};
@@ -479,9 +506,10 @@ class BeesCrawl {
mutex m_mutex;
set<BeesFileRange> m_extents;
bool m_deferred = false;
bool m_finished = false;
mutex m_state_mutex;
BeesCrawlState m_state;
ProgressTracker<BeesCrawlState> m_state;
bool fetch_extents();
void fetch_extents_harder();
@@ -491,40 +519,51 @@ public:
BeesCrawl(shared_ptr<BeesContext> ctx, BeesCrawlState initial_state);
BeesFileRange peek_front();
BeesFileRange pop_front();
BeesCrawlState get_state();
ProgressTracker<BeesCrawlState>::ProgressHolder hold_state(const BeesFileRange &bfr);
BeesCrawlState get_state_begin();
BeesCrawlState get_state_end();
void set_state(const BeesCrawlState &bcs);
void deferred(bool def_setting);
};
class BeesRoots {
class BeesRoots : public enable_shared_from_this<BeesRoots> {
shared_ptr<BeesContext> m_ctx;
BeesStringFile m_crawl_state_file;
BeesCrawlState m_crawl_current;
map<uint64_t, shared_ptr<BeesCrawl>> m_root_crawl_map;
mutex m_mutex;
condition_variable m_condvar;
bool m_crawl_dirty = false;
Timer m_crawl_timer;
BeesThread m_crawl_thread;
BeesThread m_writeback_thread;
RateEstimator m_transid_re;
size_t m_transid_factor = BEES_TRANSID_FACTOR;
Task m_crawl_task;
bool m_workaround_btrfs_send = false;
LRUCache<bool, uint64_t> m_root_ro_cache;
void insert_new_crawl();
void insert_root(const BeesCrawlState &bcs);
Fd open_root_nocache(uint64_t root);
Fd open_root_ino_nocache(uint64_t root, uint64_t ino);
bool is_root_ro_nocache(uint64_t root);
uint64_t transid_min();
uint64_t transid_max();
uint64_t transid_max_nocache();
void state_load();
ostream &state_to_stream(ostream &os);
void state_save();
void crawl_roots();
bool crawl_roots();
string crawl_state_filename() const;
BeesCrawlState crawl_state_get(uint64_t root);
void crawl_state_set_dirty();
void crawl_state_erase(const BeesCrawlState &bcs);
void crawl_thread();
void writeback_thread();
uint64_t next_root(uint64_t root = 0);
void current_state_set(const BeesCrawlState &bcs);
RateEstimator& transid_re();
size_t crawl_batch(shared_ptr<BeesCrawl> crawl);
void clear_caches();
friend class BeesFdCache;
friend class BeesCrawl;
@@ -534,6 +573,24 @@ public:
Fd open_root(uint64_t root);
Fd open_root_ino(uint64_t root, uint64_t ino);
Fd open_root_ino(const BeesFileId &bfi) { return open_root_ino(bfi.root(), bfi.ino()); }
bool is_root_ro(uint64_t root);
// TODO: think of better names for these.
// or TODO: do extent-tree scans instead
enum ScanMode {
SCAN_MODE_ZERO,
SCAN_MODE_ONE,
SCAN_MODE_TWO,
SCAN_MODE_COUNT, // must be last
};
void set_scan_mode(ScanMode new_mode);
void set_workaround_btrfs_send(bool do_avoid);
private:
ScanMode m_scan_mode = SCAN_MODE_ZERO;
static string scan_mode_ntoa(ScanMode new_mode);
};
struct BeesHash {
@@ -545,13 +602,13 @@ struct BeesHash {
BeesHash& operator=(const Type that) { m_hash = that; return *this; }
private:
Type m_hash;
};
ostream & operator<<(ostream &os, const BeesHash &bh);
class BeesBlockData {
using Blob = vector<char>;
using Blob = vector<uint8_t>;
mutable Fd m_fd;
off_t m_offset;
@@ -623,6 +680,7 @@ public:
Fd open_root(shared_ptr<BeesContext> ctx, uint64_t root);
Fd open_root_ino(shared_ptr<BeesContext> ctx, uint64_t root, uint64_t ino);
void insert_root_ino(shared_ptr<BeesContext> ctx, Fd fd);
void clear();
};
struct BeesResolveAddrResult {
@@ -656,9 +714,12 @@ class BeesContext : public enable_shared_from_this<BeesContext> {
Timer m_total_timer;
LockSet<uint64_t> m_extent_lock_set;
void set_root_fd(Fd fd);
BeesResolveAddrResult resolve_addr_uncached(BeesAddress addr);
void wait_for_balance();
BeesFileRange scan_one_extent(const BeesFileRange &bfr, const Extent &e);
void rewrite_file_range(const BeesFileRange &bfr);
@@ -675,6 +736,7 @@ public:
BeesFileRange scan_forward(const BeesFileRange &bfr);
bool is_root_ro(uint64_t root);
BeesRangePair dup_extent(const BeesFileRange &src);
bool dedup(const BeesRangePair &brp);
@@ -693,6 +755,7 @@ public:
shared_ptr<BeesTempFile> tmpfile();
const Timer &total_timer() const { return m_total_timer; }
LockSet<uint64_t> &extent_lock_set() { return m_extent_lock_set; }
// TODO: move the rest of the FD cache methods here
void insert_root_ino(Fd fd);
@@ -775,9 +838,9 @@ public:
};
// And now, a giant pile of extern declarations
extern int bees_log_level;
extern const char *BEES_VERSION;
string pretty(double d);
extern RateLimiter bees_info_rate_limit;
void bees_sync(int fd);
string format_time(time_t t);

View File

@@ -5,30 +5,40 @@ PROGRAMS = \
limits \
path \
process \
progress \
task \
all: test
test: $(PROGRAMS)
set -x; for prog in $(PROGRAMS); do ./$$prog || exit 1; done
test: $(PROGRAMS:%=%.txt) Makefile
FORCE:
include ../makeflags
-include ../localconf
LIBS = -lcrucible
LIBS = -lcrucible -lpthread
LDFLAGS = -L../lib -Wl,-rpath=$(shell realpath ../lib)
depends.mk: *.cc
for x in *.cc; do $(CXX) $(CXXFLAGS) -M "$$x"; done >> depends.mk.new
mv -fv depends.mk.new depends.mk
.depends/%.dep: %.cc tests.h Makefile
@mkdir -p .depends
$(CXX) $(CXXFLAGS) -M -MF $@ -MT $(<:.cc=.o) $<
-include depends.mk
depends.mk: $(PROGRAMS:%=.depends/%.dep)
cat $^ > $@.new
mv -f $@.new $@
%.o: %.cc %.h ../makeflags
-echo "Implicit rule %.o: %.cc" >&2
$(CXX) $(CXXFLAGS) -o "$@" -c "$<"
include depends.mk
%: %.o ../makeflags
-echo "Implicit rule %: %.o" >&2
$(CXX) $(CXXFLAGS) -o "$@" "$<" $(LDFLAGS) $(LIBS)
%.o: %.cc %.h ../makeflags Makefile
@echo "Implicit rule %.o: %.cc"
$(CXX) $(CXXFLAGS) -o $@ -c $<
$(PROGRAMS): %: %.o ../makeflags Makefile
@echo "Implicit rule %: %.o"
$(CXX) $(CXXFLAGS) $(LDFLAGS) -o $@ $< $(LIBS)
%.txt: % Makefile FORCE
./$< >$@ 2>&1 || (RC=$$?; cat $@; exit $$RC)
clean:
-rm -fv *.o
rm -fv $(PROGRAMS:%=%.o) $(PROGRAMS:%=%.txt) $(PROGRAMS)

View File

@@ -32,7 +32,7 @@ void
test_chatter_three()
{
cerr << endl;
Chatter c("tct");
Chatter c(0, "tct");
c << "More complicated";
c << "\ncase with\n";
c << "some \\ns";

40
test/progress.cc Normal file
View File

@@ -0,0 +1,40 @@
#include "tests.h"
#include "crucible/progress.h"
#include <cassert>
#include <unistd.h>
using namespace crucible;
using namespace std;
void
test_progress()
{
ProgressTracker<uint64_t> pt(123);
auto hold = pt.hold(234);
auto hold2 = pt.hold(345);
assert(pt.begin() == 123);
assert(pt.end() == 345);
auto hold3 = pt.hold(456);
assert(pt.begin() == 123);
assert(pt.end() == 456);
hold2.reset();
assert(pt.begin() == 123);
assert(pt.end() == 456);
hold.reset();
assert(pt.begin() == 345);
assert(pt.end() == 456);
hold3.reset();
assert(pt.begin() == 456);
assert(pt.end() == 456);
}
int
main(int, char**)
{
RUN_A_TEST(test_progress());
exit(EXIT_SUCCESS);
}

227
test/task.cc Normal file
View File

@@ -0,0 +1,227 @@
#include "tests.h"
#include "crucible/task.h"
#include "crucible/time.h"
#include <cassert>
#include <condition_variable>
#include <mutex>
#include <sstream>
#include <vector>
#include <unistd.h>
using namespace crucible;
using namespace std;
void
test_tasks(size_t count)
{
TaskMaster::set_thread_count();
vector<bool> task_done(count, false);
mutex mtx;
condition_variable cv;
unique_lock<mutex> lock(mtx);
// Run several tasks in parallel
for (size_t c = 0; c < count; ++c) {
ostringstream oss;
oss << "task #" << c;
Task t(
oss.str(),
[c, &task_done, &mtx, &cv]() {
unique_lock<mutex> lock(mtx);
// cerr << "Task #" << c << endl;
task_done.at(c) = true;
cv.notify_one();
}
);
t.run();
}
// Get current status
ostringstream oss;
TaskMaster::print_queue(oss);
TaskMaster::print_workers(oss);
while (true) {
size_t tasks_done = 0;
for (auto i : task_done) {
if (i) {
++tasks_done;
}
}
if (tasks_done == count) {
return;
}
// cerr << "Tasks done: " << tasks_done << endl;
cv.wait(lock);
}
}
void
test_finish()
{
ostringstream oss;
TaskMaster::print_queue(oss);
TaskMaster::print_workers(oss);
TaskMaster::set_thread_count(0);
// cerr << "finish done" << endl;
}
void
test_unfinish()
{
TaskMaster::set_thread_count();
}
void
test_barrier(size_t count)
{
vector<bool> task_done(count, false);
mutex mtx;
condition_variable cv;
unique_lock<mutex> lock(mtx);
auto b = make_shared<Barrier>();
// Run several tasks in parallel
for (size_t c = 0; c < count; ++c) {
auto bl = b->lock();
ostringstream oss;
oss << "task #" << c;
Task t(
oss.str(),
[c, &task_done, &mtx, bl]() mutable {
// cerr << "Task #" << c << endl;
unique_lock<mutex> lock(mtx);
task_done.at(c) = true;
bl.release();
}
);
t.run();
}
// Get current status
ostringstream oss;
TaskMaster::print_queue(oss);
TaskMaster::print_workers(oss);
bool done_flag = false;
Task completed(
"Waiting for Barrier",
[&mtx, &cv, &done_flag]() {
unique_lock<mutex> lock(mtx);
// cerr << "Running cv notify" << endl;
done_flag = true;
cv.notify_all();
}
);
b->insert_task(completed);
b.reset();
while (true) {
size_t tasks_done = 0;
for (auto i : task_done) {
if (i) {
++tasks_done;
}
}
// cerr << "Tasks done: " << tasks_done << " done_flag " << done_flag << endl;
if (tasks_done == count && done_flag) {
break;
}
cv.wait(lock);
}
// cerr << "test_barrier return" << endl;
}
void
test_exclusion(size_t count)
{
mutex only_one;
Exclusion excl;
mutex mtx;
condition_variable cv;
unique_lock<mutex> lock(mtx);
auto b = make_shared<Barrier>();
// Run several tasks in parallel
for (size_t c = 0; c < count; ++c) {
auto bl = b->lock();
ostringstream oss;
oss << "task #" << c;
Task t(
oss.str(),
[c, &only_one, &excl, bl]() mutable {
// cerr << "Task #" << c << endl;
(void)c;
auto lock = excl.try_lock();
if (!lock) {
excl.insert_task(Task::current_task());
return;
}
bool locked = only_one.try_lock();
assert(locked);
nanosleep(0.0001);
only_one.unlock();
bl.release();
}
);
t.run();
}
bool done_flag = false;
Task completed(
"Waiting for Barrier",
[&mtx, &cv, &done_flag]() {
unique_lock<mutex> lock(mtx);
// cerr << "Running cv notify" << endl;
done_flag = true;
cv.notify_all();
}
);
b->insert_task(completed);
b.reset();
while (true) {
if (done_flag) {
break;
}
cv.wait(lock);
}
}
int
main(int, char**)
{
// in case of deadlock
alarm(9);
RUN_A_TEST(test_tasks(256));
RUN_A_TEST(test_finish());
RUN_A_TEST(test_unfinish());
RUN_A_TEST(test_barrier(256));
RUN_A_TEST(test_finish());
RUN_A_TEST(test_unfinish());
RUN_A_TEST(test_exclusion(256));
RUN_A_TEST(test_finish());
exit(EXIT_SUCCESS);
}