diff --git a/docs/missing.md b/docs/missing.md index 7a5066d..a35acbd 100644 --- a/docs/missing.md +++ b/docs/missing.md @@ -15,16 +15,9 @@ specific files (patches welcome). * PREALLOC extents and extents containing blocks filled with zeros will be replaced by holes. There is no way to turn this off. -* Consecutive runs of duplicate blocks that are less than 12K in length -can take 30% of the processing time while saving only 3% of the disk -space. There should be an option to just not bother with those, but it's -complicated by the btrfs requirement to always dedupe complete extents. - -* There is a lot of duplicate reading of blocks in snapshots. bees will -scan all snapshots at close to the same time to try to get better -performance by caching, but really fixing this requires rewriting the -crawler to scan the btrfs extent tree directly instead of the subvol -FS trees. +* The fundamental unit of deduplication is the extent _reference_, when +it should be the _extent_ itself. This is an architectural limitation +that results in excess reads of extent data, even in the Extent scan mode. * Block reads are currently more allocation- and CPU-intensive than they should be, especially for filesystems on SSD where the IO overhead is @@ -33,8 +26,9 @@ much smaller. This is a problem for CPU-power-constrained environments * bees can currently fragment extents when required to remove duplicate blocks, but has no defragmentation capability yet. When possible, bees -will attempt to work with existing extent boundaries, but it will not -aggregate blocks together from multiple extents to create larger ones. +will attempt to work with existing extent boundaries and choose the +largest fragments available, but it will not aggregate blocks together +from multiple extents to create larger ones. * When bees fragments an extent, the copied data is compressed. There is currently no way (other than by modifying the source) to select a