crawl: change scan order to make forward progress at all times

mirror of https://github.com/Zygo/bees.git synced 2025-07-05 18:12:27 +02:00

Previously, the scan order processed each subvol in order.  This required
very large amounts of temporary disk space, as a full filesystem scan
was required before any shared extents could be deduped.  If the hash
table RAM was underprovisioned this would mean some shared dup blocks
were removed from the hash table before they could be deduped.

Currently the scan order takes the first unscanned extent from each
subvol.  This works well if--and only if--the subvols are either empty
or children of a common ancestor.  It forces the same inode/offset pairs
to be read at close to the same time from each subvol.

When a new snapshot is created, this ordering diverts scanning to the
new subvol until it catches up to the existing subvols.  For large
filesystems with frequent snapshot creation this means that the scanner
never reaches the end of all subvols.  Each new subvol effectively
resets the current scan position for the entire filesystem to zero.
This prevents bees from ever completing the first filesystem scan.

Change the order again, so that we now read one unscanned extent from
each subvol in round-robin fashion.  When a new subvol is created, we
share scan time between old and new subvols.  This ensures we eventually
finish scanning initial subvols and enter the incremental scanning state.

The cost of this change is more repeated reading of shared extents at
scan time with less benefit from disk-device-level caching; however, the
only way to really fix this problem is to implement scanning on tree 2
(the btrfs extent tree) instead of the subvol trees.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>

This commit is contained in:

Zygo Blaxell

2016-12-27 13:14:22 -05:00

parent 7ecead1700

commit c1e31004b6

2 changed files with 28 additions and 12 deletions

									
										9

src/bees-resolve.cc
									
												View File
												
				@ -477,11 +477,6 @@ BeesResolver::find_all_matches(BeesBlockData &bbd)

				bool

				BeesResolver::operator<(const BeesResolver &that) const

				{

					if (that.m_bior_count < m_bior_count) {

						return true;

					} else if (m_bior_count < that.m_bior_count) {

						return false;

					}

					return m_addr < that.m_addr;

					// Lowest count, highest address

					return tie(that.m_bior_count, m_addr) < tie(m_bior_count, that.m_addr);

				}

crawl: change scan order to make forward progress at all times

9 src/bees-resolve.cc Unescape Escape View File

9

src/bees-resolve.cc

View File