1
0
mirror of https://github.com/Zygo/bees.git synced 2025-07-31 21:13:27 +02:00

10 Commits

Author SHA1 Message Date
Zygo Blaxell
ba11d733c0 readahead: flush the readahead cache based on time, not extent count
If the extent wasn't read in the last second, chances are high that
it was evicted from the page cache.  If the extents have been evicted
from the cache by the time we grow or dedupe them, we'll take a serious
performance hit as we read them back in, one page at a time.

Use a 5-second delay to match the default writeback interval.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-22 00:06:11 -04:00
Zygo Blaxell
e87f6e9649 readahead: ignore large and unproductive readahead requests
Sometimes there are absurdly large readahead requests (e.g. 32G),
which tie up a thread holding the readahead lock for a long time (not
to mention the IO the reading hammers the rest of the system with).

These are likely an artifact of the legacy ExtentWalker code interacting
with concurrent filesystem changes.

The maximum btrfs extent size is 128M, so cap the length of readahead
requests at that size.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-21 21:21:54 -04:00
Zygo Blaxell
fb63bd7e06 c++20: Implicit value sharing of this is deprecated in C++20
Fix the handful of instances.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
(cherry picked from commit 4d6b21fb40174c3ecdc9e97670dae0dd22ce74a6)
2025-07-21 21:21:54 -04:00
Zygo Blaxell
27b5b4e113 roots: filter out NODATASUM files before attempting to scan them
Add a cheap check for `FS_NOCOW_FL` when we first encounter
each extent.  In the raw btrfs inode flags, the offending flag is
`BTRFS_INODE_NODATASUM`, because the restriction that prevents reflink
between datacow and "nodatacow" files is that a single inode is allowed
to have csums or not have csums, but must apply that choice to _all_
of its extents.

This extra check is cheaper than opening a file for each individual
reference to the extent, and then discovering that the file is
`FS_NOCOW_FL`, and then closing the file, over and over again.  It will
also avoid emitting a lot of noisy log messages.

Fixes: https://github.com/Zygo/bees/issues/313
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-21 21:21:54 -04:00
Zygo Blaxell
e9e6870de8 fs: add btrfs_inode_flags_ntoa
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-21 21:21:54 -04:00
Zygo Blaxell
16e3dd7f60 btrfs: copy BTRFS_INODE_* flags to build on linux-libc-dev < 6.2
Yet another "this will build on every environment but yours" change.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-21 21:21:54 -04:00
Zygo Blaxell
c658831852 btrfs-tree: add support for inode flags
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-21 21:21:54 -04:00
Zygo Blaxell
e852e3998a openat2: LINUX_VERSION_CODE is defined by linux-libc-dev, not libc
With new kernel headers and old libc, `SYS_openat2` can still end up
undefined, which triggers the fallback build-time code, that doesn't build:

```
openat2.cc: In function 'int openat2(int, const char*, open_how*, size_t)':
openat2.cc:35:2: error: 'errno' was not declared in this scope
   35 |  errno = ENOSYS;
      |  ^~~~~
openat2.cc:24:1: note: 'errno' is defined in header '<cerrno>'; did you forget to '#include <cerrno>'?
   23 | #include <unistd.h>
  +++ |+#include <cerrno>
   24 |
openat2.cc:35:10: error: 'ENOSYS' was not declared in this scope
   35 |  errno = ENOSYS;
      |          ^~~~~~
openat2.cc:29:19: error: unused parameter 'dirfd' [-Werror=unused-parameter]
   29 | openat2(int const dirfd, const char *const pathname, struct open_how *const how, size_t const size)
      |         ~~~~~~~~~~^~~~~
openat2.cc:29:44: error: unused parameter 'pathname' [-Werror=unused-parameter]
   29 | openat2(int const dirfd, const char *const pathname, struct open_how *const how, size_t const size)
      |                          ~~~~~~~~~~~~~~~~~~^~~~~~~~
openat2.cc:29:77: error: unused parameter 'how' [-Werror=unused-parameter]
   29 | t dirfd, const char *const pathname, struct open_how *const how, size_t const size)
      |                                      ~~~~~~~~~~~~~~~~~~~~~~~^~~

openat2.cc:29:95: error: unused parameter 'size' [-Werror=unused-parameter]
   29 | st char *const pathname, struct open_how *const how, size_t const size)
      |                                                      ~~~~~~~~~~~~~^~~~
```

Skip the kernel version check and test for the definition of `SYS_openat2`
directly.  If it's not there, plug in the constant so we can send the
call directly to the kernel, bypassing libc completely.

Fixes: https://github.com/Zygo/bees/issues/318
Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-21 21:21:54 -04:00
Zygo Blaxell
5c0480ec59 progress: calculate point along the range 000000..999999 to avoid 7-digit columns
With the "idle" tag moved out of the `point` column, a `point` value of
1000000 may become visible--and push the table one column to the right.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-21 21:21:54 -04:00
Zygo Blaxell
1b8b7557b6 progress: base progress estimates on queued extents, not completed ones
This means the progress table in the status output reflects the state of
the oldest task in the queue, not the newest.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-07-21 21:21:54 -04:00
11 changed files with 117 additions and 40 deletions

View File

@@ -49,6 +49,7 @@ namespace crucible {
/// @}
/// @{ Inode items
uint64_t inode_flags() const;
uint64_t inode_size() const;
/// @}

View File

@@ -91,7 +91,23 @@ enum btrfs_compression_type {
#define BTRFS_UUID_KEY_SUBVOL 251
#define BTRFS_UUID_KEY_RECEIVED_SUBVOL 252
#define BTRFS_STRING_ITEM_KEY 253
#endif
// BTRFS_INODE_* was added to include/uapi/btrfs_tree.h in v6.2-rc1
#ifndef BTRFS_INODE_NODATASUM
#define BTRFS_INODE_NODATASUM (1U << 0)
#define BTRFS_INODE_NODATACOW (1U << 1)
#define BTRFS_INODE_READONLY (1U << 2)
#define BTRFS_INODE_NOCOMPRESS (1U << 3)
#define BTRFS_INODE_PREALLOC (1U << 4)
#define BTRFS_INODE_SYNC (1U << 5)
#define BTRFS_INODE_IMMUTABLE (1U << 6)
#define BTRFS_INODE_APPEND (1U << 7)
#define BTRFS_INODE_NODUMP (1U << 8)
#define BTRFS_INODE_NOATIME (1U << 9)
#define BTRFS_INODE_DIRSYNC (1U << 10)
#define BTRFS_INODE_COMPRESS (1U << 11)
#define BTRFS_INODE_ROOT_ITEM_INIT (1U << 31)
#endif
#ifndef BTRFS_FREE_SPACE_INFO_KEY

View File

@@ -208,6 +208,7 @@ namespace crucible {
ostream & operator<<(ostream &os, const BtrfsIoctlSearchKey &key);
string btrfs_chunk_type_ntoa(uint64_t type);
string btrfs_inode_flags_ntoa(uint64_t inode_flags);
string btrfs_search_type_ntoa(unsigned type);
string btrfs_search_objectid_ntoa(uint64_t objectid);
string btrfs_compress_type_ntoa(uint8_t type);

View File

@@ -157,6 +157,13 @@ namespace crucible {
return btrfs_get_member(&btrfs_inode_item::size, m_data);
}
uint64_t
BtrfsTreeItem::inode_flags() const
{
THROW_CHECK1(invalid_argument, btrfs_search_type_ntoa(m_type), m_type == BTRFS_INODE_ITEM_KEY);
return btrfs_get_member(&btrfs_inode_item::flags, m_data);
}
uint64_t
BtrfsTreeItem::file_extent_logical_bytes() const
{

View File

@@ -987,6 +987,28 @@ namespace crucible {
return bits_ntoa(objectid, table);
}
string
btrfs_inode_flags_ntoa(uint64_t const inode_flags)
{
static const bits_ntoa_table table[] = {
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_NODATASUM),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_NODATACOW),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_READONLY),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_NOCOMPRESS),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_PREALLOC),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_SYNC),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_IMMUTABLE),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_APPEND),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_NODUMP),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_NOATIME),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_DIRSYNC),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_COMPRESS),
NTOA_TABLE_ENTRY_BITS(BTRFS_INODE_ROOT_ITEM_INIT),
NTOA_TABLE_ENTRY_END()
};
return bits_ntoa(inode_flags, table);
}
ostream &
operator<<(ostream &os, const btrfs_ioctl_search_key &key)
{

View File

@@ -4,9 +4,7 @@
// Compatibility for building on old libc for new kernel
#if LINUX_VERSION_CODE < KERNEL_VERSION(5, 6, 0)
// Every arch that defines this uses 437, except Alpha, where 437 is
// Every arch that defines this (so far) uses 437, except Alpha, where 437 is
// mq_getsetattr.
#ifndef SYS_openat2
@@ -17,8 +15,6 @@
#endif
#endif
#endif // Linux version >= v5.6
#include <fcntl.h>
#include <unistd.h>
@@ -29,12 +25,7 @@ __attribute__((weak))
openat2(int const dirfd, const char *const pathname, struct open_how *const how, size_t const size)
throw()
{
#ifdef SYS_openat2
return syscall(SYS_openat2, dirfd, pathname, how, size);
#else
errno = ENOSYS;
return -1;
#endif
}
};

View File

@@ -754,7 +754,7 @@ namespace crucible {
m_prev_loadavg = getloadavg1();
if (target && !m_load_tracking_thread) {
m_load_tracking_thread = make_shared<thread>([=] () { loadavg_thread_fn(); });
m_load_tracking_thread = make_shared<thread>([this] () { loadavg_thread_fn(); });
m_load_tracking_thread->detach();
}
}
@@ -944,7 +944,7 @@ namespace crucible {
TaskConsumer::TaskConsumer(const shared_ptr<TaskMasterState> &tms) :
m_master(tms)
{
m_thread = make_shared<thread>([=](){ consumer_thread(); });
m_thread = make_shared<thread>([this](){ consumer_thread(); });
}
class BarrierState {

View File

@@ -1126,15 +1126,15 @@ BeesContext::start()
m_progress_thread = make_shared<BeesThread>("progress_report");
m_progress_thread = make_shared<BeesThread>("progress_report");
m_status_thread = make_shared<BeesThread>("status_report");
m_progress_thread->exec([=]() {
m_progress_thread->exec([this]() {
show_progress();
});
m_status_thread->exec([=]() {
m_status_thread->exec([this]() {
dump_status();
});
// Set up temporary file pool
m_tmpfile_pool.generator([=]() -> shared_ptr<BeesTempFile> {
m_tmpfile_pool.generator([this]() -> shared_ptr<BeesTempFile> {
return make_shared<BeesTempFile>(shared_from_this());
});
m_logical_ino_pool.generator([]() {

View File

@@ -183,26 +183,41 @@ BeesScanModeSubvol::crawl_one_inode(const shared_ptr<BeesCrawl>& this_crawl)
}
const auto subvol = this_range.fid().root();
const auto inode = this_range.fid().ino();
ostringstream oss;
oss << "crawl_" << subvol << "_" << inode;
const auto task_title = oss.str();
const auto bfc = make_shared<BeesFileCrawl>((BeesFileCrawl) {
.m_ctx = m_ctx,
.m_crawl = this_crawl,
.m_roots = m_roots,
.m_hold = this_crawl->hold_state(this_state),
.m_state = this_state,
.m_offset = this_range.begin(),
});
BEESNOTE("Starting task " << this_range);
Task(task_title, [bfc]() {
BEESNOTE("crawl_one_inode " << bfc->m_hold->get());
if (bfc->scan_one_ref()) {
// Append the current task to itself to make
// sure we keep a worker processing this file
Task::current_task().append(Task::current_task());
bool run_the_task = false;
catch_all([&]() {
BtrfsInodeFetcher inode_btf(m_ctx->root_fd());
const auto inode_item = inode_btf.stat(subvol, inode);
if (!!inode_item) {
const auto flags = inode_item.inode_flags();
if (0 != (flags & BTRFS_INODE_NODATASUM)) {
BEESLOGDEBUG("unsupported inode flags for ref at root " << subvol << " ino " << inode << ": " << btrfs_inode_flags_ntoa(flags));
} else {
run_the_task = true;
}
}
}).run();
});
if (run_the_task) {
ostringstream oss;
oss << "crawl_" << subvol << "_" << inode;
const auto task_title = oss.str();
const auto bfc = make_shared<BeesFileCrawl>((BeesFileCrawl) {
.m_ctx = m_ctx,
.m_crawl = this_crawl,
.m_roots = m_roots,
.m_hold = this_crawl->hold_state(this_state),
.m_state = this_state,
.m_offset = this_range.begin(),
});
BEESNOTE("Starting task " << this_range);
Task(task_title, [bfc]() {
BEESNOTE("crawl_one_inode " << bfc->m_hold->get());
if (bfc->scan_one_ref()) {
// Append the current task to itself to make
// sure we keep a worker processing this file
Task::current_task().append(Task::current_task());
}
}).run();
}
auto next_state = this_state;
// Skip to EOF. Will repeat up to 16 times if there happens to be an extent at 16EB,
// which would be a neat trick given that off64_t is signed.
@@ -780,10 +795,27 @@ BeesScanModeExtent::SizeTier::create_extent_map(const uint64_t bytenr, const Pro
}
BtrfsExtentDataFetcher bedf(m_ctx->root_fd());
BtrfsInodeFetcher inode_btf(m_ctx->root_fd());
const auto refs_list = make_shared<list<ExtentRef>>();
bool found_nocow = false;
bool check_nocow = true;
for (const auto &i : log_ino.m_iors) {
catch_all([&](){
if (check_nocow) {
BEESTRACE("checking inode flags for extent " << to_hex(bytenr) << " ref at root " << i.m_root << " ino " << i.m_inum);
BEESNOTE("checking inode flags for extent " << to_hex(bytenr) << " ref at root " << i.m_root << " ino " << i.m_inum);
const auto inode_item = inode_btf.stat(i.m_root, i.m_inum);
if (!!inode_item) {
const auto flags = inode_item.inode_flags();
check_nocow = false;
if (0 != (flags & BTRFS_INODE_NODATASUM)) {
BEESLOGDEBUG("unsupported inode flags for extent " << to_hex(bytenr) << " ref at root " << i.m_root << " ino " << i.m_inum << ": " << btrfs_inode_flags_ntoa(flags));
found_nocow = true;
return; // from the catch_all
}
}
}
BEESTRACE("mapping extent " << to_hex(bytenr) << " ref at root " << i.m_root << " ino " << i.m_inum << " offset " << to_hex(i.m_offset));
BEESNOTE("mapping extent " << to_hex(bytenr) << " ref at root " << i.m_root << " ino " << i.m_inum << " offset " << to_hex(i.m_offset));
@@ -808,6 +840,11 @@ BeesScanModeExtent::SizeTier::create_extent_map(const uint64_t bytenr, const Pro
refs_list->push_back(extref);
BEESCOUNT(extent_ref_ok);
});
// Completely abandon the extent if it is nodatasum
if (found_nocow) {
BEESCOUNT(extent_nodatasum);
return;
}
}
BEESCOUNT(extent_mapped);
@@ -1305,8 +1342,8 @@ BeesScanModeExtent::next_transid()
const auto this_crawl = found->second->crawl();
THROW_CHECK1(runtime_error, subvol, this_crawl);
// Get the last _completed_ state
const auto this_state = this_crawl->get_state_begin();
// Get the last _queued_ state
const auto this_state = this_crawl->get_state_end();
auto bytenr = this_state.m_objectid;
const auto bg_found = bg_info_map.lower_bound(bytenr);
@@ -1347,7 +1384,7 @@ BeesScanModeExtent::next_transid()
}
const auto &mma = mes.m_map.at(subvol);
const auto mma_ratio = mes_sample_size_ok ? (mma.m_bytes / double(mes.m_total)) : 1.0;
const auto posn_text = Table::Text(astringprintf("%06d", int(floor(bytenr_norm * 1000000))));
const auto posn_text = Table::Text(astringprintf("%06d", int(floor(bytenr_norm * 999999))));
const auto size_text = Table::Text( mes_sample_size_ok ? pretty(fs_size * mma_ratio) : "-");
eta.insert_row(Table::endpos, vector<Table::Content> {
Table::Text(magic.m_max_size == numeric_limits<uint64_t>::max() ? "max" : pretty(magic.m_max_size)),

View File

@@ -14,7 +14,7 @@ BeesThread::exec(function<void()> func)
{
m_timer.reset();
BEESLOGDEBUG("BeesThread exec " << m_name);
m_thread_ptr = make_shared<thread>([=]() {
m_thread_ptr = make_shared<thread>([this, func]() {
BeesNote::set_name(m_name);
BEESLOGDEBUG("Starting thread " << m_name);
BEESNOTE("thread function");

View File

@@ -228,8 +228,10 @@ bees_readahead_check(int const fd, off_t const offset, size_t const size)
auto tup = make_tuple(offset, size, stat_rv.st_dev, stat_rv.st_ino);
static mutex s_recent_mutex;
static set<decltype(tup)> s_recent;
static Timer s_recent_timer;
unique_lock<mutex> lock(s_recent_mutex);
if (s_recent.size() > BEES_MAX_EXTENT_REF_COUNT) {
if (s_recent_timer.age() > 5.0) {
s_recent_timer.reset();
s_recent.clear();
BEESCOUNT(readahead_clear);
}
@@ -253,7 +255,7 @@ bees_readahead_nolock(int const fd, const off_t offset, const size_t size)
// The btrfs kernel code does readahead with lower ioprio
// and might discard the readahead request entirely.
BEESNOTE("emulating readahead " << name_fd(fd) << " offset " << to_hex(offset) << " len " << pretty(size));
auto working_size = size;
auto working_size = min(size, uint64_t(128 * 1024 * 1024));
auto working_offset = offset;
while (working_size) {
// don't care about multithreaded writes to this buffer--it is garbage anyway