mirror of
https://github.com/Zygo/bees.git
synced 2025-05-17 21:35:45 +02:00
context: serialize LOGICAL_INO calls
LOGICAL_INO can trip over the btrfs slow-backrefs bug, resulting in some very long in-kernel runtimes. If too many threads are executing LOGICAL_INO then there may be no cores left on the system to run other tasks. Toxic extent detection is done by a very rudimentary algorithm which can be confused by unrelated sources of latency within btrfs (especially commit latency). The algorithm can also be confused by other threads executing the LOGICAL_INO ioctl. These are two good reasons to prevent any two threads in a single bees process instance from executing LOGICAL_INO at the same time, so let's do that. It is possible to limit the number of threads executing LOGICAL_INO with the -c and -C options; however, this also limits the number of threads which can perform any operation, while only LOGICAL_INO (*) has such a profound effect on the rest of system operation. Also make the status message clearer about exactly when LOGICAL_INO is executed, as opposed to merely waiting to acquire a lock before executing the ioctl. (*) or maybe FILE_EXTENT_SAME. The problem function that keeps showing up in kernel stack traces is find_parent_nodes, which is called by both the LOGICAL_INO and FILE_EXTENT_SAME ioctls. We'll try this change first and see if it prevents any recurrences of forced watchdog reboots; if it does not, then we'll limit FILE_EXTENT_SAME the same way. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
parent
373b9ef038
commit
63ddbb9a4f
@ -761,6 +761,15 @@ BeesContext::resolve_addr_uncached(BeesAddress addr)
|
||||
{
|
||||
THROW_CHECK1(invalid_argument, addr, !addr.is_magic());
|
||||
THROW_CHECK0(invalid_argument, !!root_fd());
|
||||
|
||||
// There can be only one of these running at a time, or the slow
|
||||
// backrefs bug will kill the whole system. Also it looks like there
|
||||
// are so many locks held while LOGICAL_INO runs that there is no
|
||||
// point in trying to run two of them on the same filesystem.
|
||||
BEESNOTE("waiting to resolve addr " << addr);
|
||||
static mutex s_resolve_mutex;
|
||||
unique_lock<mutex> lock(s_resolve_mutex);
|
||||
|
||||
Timer resolve_timer;
|
||||
|
||||
// There is no performance benefit if we restrict the buffer size.
|
||||
@ -768,6 +777,7 @@ BeesContext::resolve_addr_uncached(BeesAddress addr)
|
||||
|
||||
{
|
||||
BEESTOOLONG("Resolving addr " << addr << " in " << root_path() << " refs " << log_ino.m_iors.size());
|
||||
BEESNOTE("resolving addr " << addr << " with LOGICAL_INO");
|
||||
if (log_ino.do_ioctl_nothrow(root_fd())) {
|
||||
BEESCOUNT(resolve_ok);
|
||||
} else {
|
||||
|
Loading…
x
Reference in New Issue
Block a user