1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-18 05:45:45 +02:00
Zygo Blaxell 8d08a3c06f readahead: inject some sanity at the foundation of an insane architecture
This solves some of the worst problems with bees reads:

1.  The kernel readahead doesn't work.  More precisely, it's much better
adapted for a very different use case:  a single thread alternating
between reading a file sequentially and processing the data that was read.
bees has multiple threads which compete for access to IO and then issue
reads in random order immediately after the call to readahead.  The kernel
uses idle ioprio scheduling for the readaheads, so the readaheads get
preempted by the random reads, or cancels the readaheads because the
data access pattern isn't sequential after the readahead was issued.

2.  Seeking drives perform terribly with multiple competing readers,
especially with btrfs striped profiles where the iops are broken into
tiny stripe-sized pieces.  At one point I intended to read the btrfs
device map and figure out which devices can be read in parallel, but to
make that useful, the user needs to have an array with multiple drives
in single profile, or 4+ drives in raid1 profile.  In all other cases,
the elaborate calculations always return the same result:  there can be
only one reader at a time.

This commit fixes both problems:

1.  Don't use the kernel readahead.  Use normal reads into a dummy
buffer instead.

2.  Allow only one thread to readahead at any time.  Once the read is
completed, the data is in the page cache, and all the random-order small
reads that bees does will hit the page cache, not a spinning disk.
In some cases we need to read two things close together, so add a
`bees_readahead_pair` which holds one lock across both reads.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
..
2023-01-27 22:16:02 -05:00