1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-17 21:35:45 +02:00

435 Commits

Author SHA1 Message Date
Zygo Blaxell
27125b8140 README: add scan-mode 2 and expand descriptions of modes 0 and 1
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-29 00:48:06 -05:00
Zygo Blaxell
636328fdc2 roots: add scan-mode 2 "oldest crawler first"
Add a third scan mode with alternative trade-offs.

Benefits:  Good sequential read performance.  Avoids race conditions
described in https://github.com/Zygo/bees/issues/27.  Avoids diverting
scan resources into short-lived snapshots before their long-lived
origin subvols are fully scanned.

Drawbacks:  Takes the longest time of the three implemented scan-modes
to free space in extents that are shared between snapshots.  Uses the
maximum amount of temporary space.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-29 00:48:05 -05:00
Zygo Blaxell
ef44947145 roots: move common code for creating crawl Tasks into a method
Duplicated code between the different scan modes has slowly been
becoming less and less trivial.  Move the code to a method and
make both scan-modes call it.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-28 22:52:17 -05:00
Zygo Blaxell
72cc9c2b60 ExtentWalker: increase efficiency for typical btrfs extent sizes
Perf was blaming more than 50% of cycles on TREE_SEARCH_V2.  strace
showed 4 TREE_SEARCH_V2 calls for every pread in grow_backward().

Fix by increasing the extent fetch batch size so it is more likely
to include the desired items in the first fetch attempt.

This removes TREE_SEARCH_V2 from the top 10 list of cycle consumers.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-28 22:52:07 -05:00
Zygo Blaxell
e74c0a9d80 scan: fix length mismatch exception for prealloc extents at EOF
Prealloc extent sizes were taken from the Extent object and did not
take the file size into account.  If a file with a non-4K-aligned
size is preallocated, the resulting dedup fails with an exception
because the size of both ranges of the BeesRangePair do not match.

Limit the size of the replacement hole extent to not extend past the
end of the file.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-28 01:46:08 -05:00
Zygo Blaxell
762f833ab0 roots: poll every 10 transids
Restartng scans for each transid is a bit aggressive.  Scan every 10
transids for a polling rate close to the former BEES_COMMIT_INTERVAL.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
48e78bbe82 roots: use RateEstimator as a transid_max cache and clean up logs
transid_max is now measured at a single point in the crawl_transid thread.

Move the Crawl deferred logic into BeesRoots so it restarts all crawls
when transid_max increases.  Gets rid of some messy time arithmetic.

Change name of Crawl thread to "crawl_master" in both thread name and
log messages.

Replace "Next transid" with "Crawl started".

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
ded26ff044 FdCache: clear cache on every new transid / crawl cycle
The periodic cache age check was not protected by a lock, so multiple
threads may decide to concurrently clear the cache.  This led to
duplicate log messages.

Fix by moving the cache expiry trigger out of FdCache and into Roots,
which knows when transids change and can perform cache clears at exactly
the time they are most relevant, i.e. after something that was deleted
becomes permanently so.

This removes the last references to BEES_COMMIT_INTERVAL, so get rid
of its definition too.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
72857e84c0 crawl: combine two messages per crawl cycle into one
Now that the polling interval is up to 30 times faster,
next_transid seems too verbose again.

Make it clearer that the interval quoted in the "Deferring..."
message is the computed transaction polling interval.

Combine "Next transid" and "Restarted crawl" into a single message.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
0fdae37962 roots: use RateEstimator to track transids
Make the crawl polling interval more closely track the commit interval
on the btrfs filesystem.  In the future this will provide opportunities
to do things like clear FD caches and stop crawls on deleted subvols,
but triggered by transaction commits instead of arbitrary time intervals.

Rename the "crawl" thread so it no longer has the same name as the "crawl"
task, and repurpose it for dedicated transid polling.  Cancel the deletion
of crawl_thread and repurpose it to trigger new crawls and wake up the
main crawl Task when it runs out of data.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
4694c7d250 time: add RateEstimator, a class for optimally polling irregular external events
RateEstimator estimates the rate of external events by sampling a
counter.

Conversion functions are provided to predict the time when the
event counter will be incremented to particular values based on past
observations of the event counter.

Synchronization functions are provided to block a thread until a specific
counter value is reached.

Event polling is supported using the history of previous event counts
to determine the predicted time of the next event.  A decay function
emphasizes more recent event history.

Polling delays are bounded by minimum and maximum values in the constructor
parameters.

wait_for() and wait_until() block the calling thread until the target
event count is reached (or the counter is reset).  These functions are
not bounded by min_delay or max_delay, and require a separate tread
to call update().  wait_for() waits for the counter to be incremented
from its current value by the given count.  wait_until() waits for the
counter to reach an absolute value.

update() counts external events and unblocks threads that are blocked
in wait_for() or wait_until().  If the event counter decreases then it
is reset to the new value.

duration() and time_point() convert relative and absolute event counts
into relative and absolute C++11 time quantities based on the last update
time, last observed event count, and the observed event rate.

Convenience functions seconds_for() and seconds_until() calculate
polling delays for for the desired relative and absolute event counts
respectively.  These delays are bounded by max and min delay parameters.

rate() and ratio() provide conversion factors based on the current
estimated event rate.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
a3f02d5dec roots: comment updates and general cleanup
Fix discussion of nodatasum files, clarifying what we can and cannot do.

Get rid of some BEESNOTE and BEESTRACE calls which cannot be observed
(well, BEESNOTE can, but you have to be quick!).

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
f6909dac17 bees: drop BEESINFO
Having too many "write a message to the log" primitives is confusing,
and having one that intermittently and silently discards output is even
_more_ confusing.

Replace all BEESINFO with appropriate BEESLOG*s.  Usually DEBUG.
Except for one or two that occur too often.  Just delete those.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
bd2a15733c README: update Linux kernel bugs list (v4.14)
Add the new WARN_ON bug in v4.14.

Clarify what happens when bees is run on a kernel that is too old.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:05 -05:00
Zygo Blaxell
4ecd467ca0 BeesBlockData: don't leak file contents in the log
The data field of BeesBlockData is only interesting to those who want
to debug the BeesBlockData implementation or other battle-tested parts
of bees.  Users who want to do this can modify and rebuild the source
to enable the output.

To everyone else, the data field is a huge, ongoing infoleak through
the log.

Don't bother with an option, just output the length of the data field
and nothing else.

Fixes:  https://github.com/Zygo/bees/issues/53

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:04 -05:00
Zygo Blaxell
71be53eff6 types: don't throw an exception when it's likely we are already reporting an exception
Empty files are a thing that can happen.  Don't bomb out just reporting
one's existence.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:04 -05:00
Zygo Blaxell
67ac537c5e time: drop unused Timer methods
Timer::set(double d) in particular seems...wrong.

Nothing uses them, so don't bother to fix them.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:04 -05:00
Zygo Blaxell
f64fc78e36 Task: convert print_fn to a string
Since we are now unconditionally rendering the print_fn as a static
string, there is no need for it to be a function.  We also need it to
be brief and mostly constant.

Use a string instead.  Put the string before the function in the Task
constructor arguments so that the title string appears as a heading in
code, since we are making a breaking API change already.

Drop TASK_MACRO as it is broken by this change, but there is no similar
usage of Task anywhere to make it worth fixing.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:04 -05:00
Zygo Blaxell
0710208354 BeesNote: thread naming fixes
Move pthread_setname_np to the same place we do pthread_getname_np.

Detect errors in pthread_getname_np--but don't throw an exception
because we would call ourself recursively from the exception handler
when it tries to log the exception.

Fix the order of set_name and the first BEESNOTE/BEESLOG call in threads,
closing small time intervals where logs have the wrong thread name,
and that wrong name becomes persistent for the thread.

Make the main thread's name "bees" because Linux kernel stack traces use
the pthread name of the main thread instead of the name of the process.

Anonymous threads get the process name (usually "bees").  We should not
have any such threads, but we do.  This appears to occur mostly during
exception stack unwinding.  GCC/pthread bug?

Fixes:  https://github.com/Zygo/bees/issues/51

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:47:47 -05:00
Kai Krakow
c17618c371 README: Some things are simply no longer true
Environment variables are no longer the /only/ option.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-20 14:47:04 -05:00
Kai Krakow
dee6f189bb README: Fix markdown syntax error
Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-20 14:47:04 -05:00
Kai Krakow
de6d7d6f25 Makefile: Get rid of test for-loop
Tests could now be run in parallel. Additionally, single tests can be
run by simply using "make testname", i.e. "make chatter" would run the
chatter test.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-20 14:44:27 -05:00
Kai Krakow
63f249f005 Makefile: force rebuilding tests when Makefile changed
Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-20 14:43:37 -05:00
Kai Krakow
ca1a3bed12 Makefile: -lXXXXX is really a filename parameter
According to gcc docs, -l is converted to a filename which makes it a
filename parameter. Let's move it to the end.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-20 14:43:21 -05:00
Kai Krakow
d6312c338b Logging: Improve text layout when discarding log timestamps
When timestamps are removed from logging, the current text layout shows
lines like

tid 12345 thread_name: Example log

Let's convert it to a more conforming layout:

thread_name[12345]: Example log

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-20 14:42:49 -05:00
Zygo Blaxell
5533d09b3d Merge remote-tracking branch 'kakra/proposal/prepare-for-more-libs' 2018-01-20 14:23:55 -05:00
Zygo Blaxell
4c05c53d28 roots: update Task print functions for new usage
This restores the old "crawl" prefix in the case of Crawler log messages.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-20 14:00:52 -05:00
Zygo Blaxell
5063a635fc logging: get Task names for log messages
When a Task worker thread is executing a Task, the thread name is less
useful than the Task description.

Use the Task description instead of the thread name if the thread has
no BeesThread name and the thread is currently executing a task.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-20 14:00:51 -05:00
Zygo Blaxell
fef7aed8fa BeesNote: if thread name was not set, get it from Task or pthread_getname_np
Threads from the Task module in libcrucible don't set BeesNote::tl_name.
Even if they did, in Task context the thread name is unspecific to the point
of meaninglessness.

Use the Task::print method as the name for such threads, and be sure
that future Task print functions are designed for that usage.

The extra complexity in BeesNote::get_name() seems preferable to
bombarding pthread_setname_np hundreds or thousands of times per second.

FIXME:  we are now calling Task::print() on every BeesNote, which
is effectively unconditionally.  Maybe we should have Task::print()
and get_name() return a closure, or just evaluate Task::print() once
and cache it in TaskState, or define Task's constructor with a string
argument instead of the current print_fn closure.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-20 13:57:51 -05:00
Zygo Blaxell
3f60a0efde task: allow external access to Task print function
This enables bees' thread introspection to use task descriptions in
status and log messages.

BeesNote will be calling Task::current_task() from non-Task contexts,
which means we need to allow Task's shared state pointer to be null.
Remove some asserts that will ruin our day in that case.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-20 13:51:05 -05:00
Zygo Blaxell
e970ac6c02 crawl: make logging less verbose
Silence the three(!) log messages per crawl increment an extra one at
the end of the subvol.

The three critical messages per subvol crawl cycle are:

	Next transid in BeesCrawlState <SUBVOL>:0 offset 0x0 transid <A>..<B> started <T> (<AGO>s ago)

Subvol has been completely scanned and a new transaction range will
be created.  CrawlState is the state of the old subvol.

	Restarted crawl BeesCrawlState <SUBVOL>:0 offset 0x0 transid <B>..<C> started <T+AGO> (0s ago)

Subvol has been restarted.  CRawlState is the state of the new subvol.

	Deferring next transid in BeesCrawlState <SUBVOL>:0 offset 0x0 transid <B>..<C> started <T+AGO> (0s ago)

Subvol has been completely scanned, but it is too soon to start a
new scan.

Fix the "Restart..." message to use the correct verb tense and to use
the correct BeesCrawlState data.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-20 13:50:47 -05:00
Zygo Blaxell
38ccf5c921 counters: track pair growing time
When we find a matching block we attempt to extend ("grow") the matched
pair around the first matching block.  This function takes the IO hit of
reading the second extent from each duplicate extent pair.  It's also
very slow--too many allocations, too small reads, reads in the wrong
order, an order of magnitude too many calls to TREE_SEARCH_V2, and it
is usually in the top 3 most frequent PERFORMANCE warnings.

Start tracking the running time of grows using the pairforward_ms
and pairbackward_ms counters so that we can compare it to various
replacements.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-20 13:04:56 -05:00
Kai Krakow
826b27fde2 Makefile: Fix some dependencies
Some deps are already referenced by depends.mk, some where actually
missing.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-19 01:50:13 +01:00
Kai Krakow
8a5f790a03 Makefile: Some cleanups
Reorder and reformat some arguments so it looks more streamlined during
the build process.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-19 01:50:13 +01:00
Kai Krakow
677da5de45 Logging: Add log levels to output
This commit adds log levels to the output. In systemd, it makes colored
lines, otherwise it's probably just a number. Bees is very chatty, so
this paves the road for log level filtering.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 23:41:29 +01:00
Kai Krakow
d6b847db0d Makefile: speedup dependency generation
Dependencies can be generated in parallel which can be much faster. It
also puts away the problem that for may fail multiple times in a row and
leaving behind a broken intermediate file which would be picked up by
successive runs.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 22:53:00 +01:00
Kai Krakow
b8f933d360 Makefile: do not be verbose about mv
A small left-over from me fixing the same problem as Zygo did in his
merged branch.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 22:53:00 +01:00
Kai Krakow
27b12821ee Makefile: Generalize the .version.cc target
This enables us to move the file around later.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 22:53:00 +01:00
Kai Krakow
fdf434e8eb Makefile: fix dependency generation
Let's generalize the depends.mk target so we can easily move files
around later. While doing it, let's also fix the "gcc -M" call to use
explicit target names and not clobber it with preprocessor output.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 22:53:00 +01:00
Kai Krakow
bc1b67fde1 Makefile: rename OBJS to CRUCIBLE_OBJS
This paves the way for building different .so libs.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 22:53:00 +01:00
Kai Krakow
4cfd5b43da Makefile: generalize .so target
We can generalize the .so target by moving its depends into rules
without build instructions.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 22:53:00 +01:00
Kai Krakow
4789445d7b Makefile: .o already depends on its .h file
We can remove the explicit depend on the .h file because that is covered
by depends.mk. Let's instead depend on makeflags which makes more sense.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 22:53:00 +01:00
Kai Krakow
c8787fecd2 Makefile: depends.mk is not an optional include
We really need depends.mk in the following Makefile reorganization.

Signed-off-by: Kai Krakow <kai@kaishome.de>
2018-01-18 22:53:00 +01:00
Zygo Blaxell
4943a07cce crucible: cache: linked-list LRU implementation
We need a better cache expiration algorithm than "make a copy of
the entire thing, sort it while holding a lock, and delete half
the items in a single burst."

Replace the Lamport clock with a double-linked list.  Each insert
or lookup operation moves the affected item to the head of the list.
Each erase operation deletes one single item at the tail of the list.

Also sort out some iterator invalidation nonsense by doing erases before
inserts instead of "insert, erase, find the inserted item again because
we invalidated the found iterator during the erase."

The new implementation adds a second word-sized member to each Value
as well as a copy of the Key.  Hopefully the enlarged size is not
a deal-breaker.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:58:44 -05:00
Zygo Blaxell
00d9b8ed76 hash: do the mlock after loading the table
The mlock runs much faster, probably because the hash fetches are
doing most of the work that mlock does.

It makes bees startup latency for testing smaller, even if it takes more
time in absolute terms.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:58:44 -05:00
Zygo Blaxell
e8b4ab54c6 README: describe the scanning mode (-m option)
Include a brief description of the two algorithms without getting
into too much detail for an ostensibly temporary feature.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:58:44 -05:00
Zygo Blaxell
56c23c4517 crawl: implement two crawler algorithms and adjust scheduling parameters
There are two subvol scan algorithms implemented so far.  The two modes
are unimaginatively named 0 and 1.

	0:  sorts extents by (inode, subvol, offset),

	1:  scans extents round-robin from all subvols.

Algorithm 0 scans references to the same extent at close to the same
time, which is good for performance; however, whenever a snapshot is
created, the scan of the entire filesystem restarts at the beginning of
the new snapshot.

Algorithm 1 makes continuous forward progress even when new snapshots
are created, but it does not benefit from caching and will force the
kernel to reread data multiple times when there are snapshots.

The algorithm can be selected at run-time using the -m or --scan-mode
option.

We can collect some field data on these before replacing them with
an extent-tree-based scanner.  Alternatively, for pre-4.14 kernels,
we can keep these two modes as non-default options.

Currently these algorithms have terrible names.  TODO:  fix that, but
also TODO: delete all that code and do scans directly from the extent
tree instead.

Augment the scan algorithms relative to their earlier implementation by
batching multiple extents to scan from each subvol before switching to
a different subvol.

Sprinkle some BEESNOTEs on the Task objects so that they don't
disappear from the thread status output.

Adjust some timing constants to deal with the increased latency from
competing threads.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:53:49 -05:00
Zygo Blaxell
055c8d4c75 roots: scan in parallel using Tasks
Distribute incoming extents across a thread pool for faster execution
on multi-core, multi-disk environments.

Switch extent enumeration model to scan extent refs consecutively(ish).

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:52:00 -05:00
Zygo Blaxell
090d79e13b crucible: remove unused TimeQueue and WorkQueue classes
WorkQueue is superceded by Task.  TimeQueue will be replaced by
something based on Tasks.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:52:00 -05:00
Zygo Blaxell
796aaed7f8 roots: remove dead code and #if blocks
In both instances the code contained within (or the conditional
compilation surrounding it) is no longer controversial.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:52:00 -05:00