1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-17 21:35:45 +02:00

resolve: don't stop at the first physical address lookup failure

The btrfs LOGICAL_INO ioctl has no way to report references to compressed
blocks precisely, so we must always consider all references to a
compressed block, and discard those that do not have the desired offset.

When we encounter compressed shared extents containing a mix of unique
and duplicate data, we attempt to replace all references to the mixed
extent with the same number of references to multiple extents consisting
entirely of unique or duplicate blocks.  An early exit from the loop
in BeesResolver::for_each_extent_ref was stopping this operation early,
after replacing as few as one shared reference.  This left other shared
references to the unique data on the filesystem, effectively creating
new dup data.

The failing pattern looks like this:

    dedup: replace 0x14000..0x18000 from some other extent
    copy: 0x10000..0x14000
    dedup: replace 0x10000..0x14000 with the copy
    [may be multiple dedup lines due to multiple shared references]
    copy: 0x18000..0x1c000
    [missing dedup 0x18000..0x1c000 with the copy here]
    scan: 0x10000 [++++dddd++++] 0x1c000

If the extent 0x10000..0x1c000 is shared and compressed, we will make
a copy of the extent at 0x18000..1c0000.  When we try to dedup this
copy extent, LOGICAL_INO will return a mix of references to the data
at logical 0x10000 and 0x18000 (which are both references to the
original shared extent with different offsets).  If we break out
of the loop too early, we will stop as soon as a reference to 0x10000
is found, and ignore all other references to the extent we are trying
to remove.

The copy at the beginning of the extent (0x10000..0x14000) usually
works because all references to the extent cover the entire extent.
When bees performs the dedup at 0x14000..0x18000, bees itself creates
the shared references with different offsets.

Uncompressed extents were not affected because LOGICAL_INO can locate
physical blocks precisely if they reside in uncompressed extents.

This change will hurt performance when looking up old physical addresses
that belong to new data, but that is a much less urgent problem.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
This commit is contained in:
Zygo Blaxell 2016-12-27 14:30:14 -05:00
parent 6e7137f282
commit ef8d92a3cb

View File

@ -378,7 +378,10 @@ BeesResolver::for_each_extent_ref(BeesBlockData bbd, function<bool(const BeesFil
// We have reliable block addresses now, so we guarantee we can hit the desired block.
// Failure in chase_extent_ref means we are done, and don't need to look up all the
// other references.
stop_now = true;
// Or...not? If we have a compressed extent, some refs will not match
// if there is are two references to the same extent with a reference
// to a different extent between them.
// stop_now = true;
}
});