1
0
mirror of https://github.com/Zygo/bees.git synced 2025-06-17 01:56:16 +02:00

roots: quick fix for task scheduling bug leading to loss of crawl_master

The crawl_master task had a simple atomic variable that was supposed
to prevent duplicate crawl_master tasks from ending up in the queue;
however, this had a race condition that could lead to m_task_running
being set with no crawl_master task running to clear it.  This would in
turn prevent crawl_thread from scheduling any further crawl_master tasks,
and bees would eventually stop doing any more work.

A proper fix is to modify the Task class and its friends such that
Task::run() guarantees that 1) at most one instance of a Task is ever
scheduled or running at any time, and 2) if a Task is scheduled while
an instance of the Task is running, the scheduling is deferred until
after the current instance completes.  This is part of a fairly large
planned change set, but it's not ready to push now.

So instead, unconditionally push a new crawl_master Task into the queue
on every poll, then silently and quickly exit if the queue is too full
or the supply of new extents is empty.  Drop the scheduling-related
members of BeesRoots as they will not be needed when the proper fix lands.

Fixes: 4f0bc78a "crawl: don't block a Task waiting for new transids"
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
Zygo Blaxell
2018-11-24 00:35:16 -05:00
parent f051d96d51
commit f4464c6896
2 changed files with 11 additions and 11 deletions

View File

@ -525,7 +525,6 @@ class BeesRoots : public enable_shared_from_this<BeesRoots> {
BeesThread m_writeback_thread;
RateEstimator m_transid_re;
size_t m_transid_factor = BEES_TRANSID_FACTOR;
atomic<bool> m_task_running;
Task m_crawl_task;
bool m_workaround_btrfs_send = false;
LRUCache<bool, uint64_t> m_root_ro_cache;