mirror of
https://github.com/Zygo/bees.git
synced 2025-07-05 18:12:27 +02:00
crawl: change scan order to make forward progress at all times
Previously, the scan order processed each subvol in order. This required very large amounts of temporary disk space, as a full filesystem scan was required before any shared extents could be deduped. If the hash table RAM was underprovisioned this would mean some shared dup blocks were removed from the hash table before they could be deduped. Currently the scan order takes the first unscanned extent from each subvol. This works well if--and only if--the subvols are either empty or children of a common ancestor. It forces the same inode/offset pairs to be read at close to the same time from each subvol. When a new snapshot is created, this ordering diverts scanning to the new subvol until it catches up to the existing subvols. For large filesystems with frequent snapshot creation this means that the scanner never reaches the end of all subvols. Each new subvol effectively resets the current scan position for the entire filesystem to zero. This prevents bees from ever completing the first filesystem scan. Change the order again, so that we now read one unscanned extent from each subvol in round-robin fashion. When a new subvol is created, we share scan time between old and new subvols. This ensures we eventually finish scanning initial subvols and enter the incremental scanning state. The cost of this change is more repeated reading of shared extents at scan time with less benefit from disk-device-level caching; however, the only way to really fix this problem is to implement scanning on tree 2 (the btrfs extent tree) instead of the subvol trees. Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
@ -477,11 +477,6 @@ BeesResolver::find_all_matches(BeesBlockData &bbd)
|
||||
bool
|
||||
BeesResolver::operator<(const BeesResolver &that) const
|
||||
{
|
||||
if (that.m_bior_count < m_bior_count) {
|
||||
return true;
|
||||
} else if (m_bior_count < that.m_bior_count) {
|
||||
return false;
|
||||
}
|
||||
return m_addr < that.m_addr;
|
||||
// Lowest count, highest address
|
||||
return tie(that.m_bior_count, m_addr) < tie(m_bior_count, that.m_addr);
|
||||
}
|
||||
|
||||
|
Reference in New Issue
Block a user