1
0
mirror of https://github.com/Zygo/bees.git synced 2025-08-03 14:23:29 +02:00

7 Commits
v0.1 ... v0.2

Author SHA1 Message Date
Zygo Blaxell
dd21e6f848 crucible: add missing template specializations of pwrite helper functions
I got a little too enthusiastic when redacting the code, and removed some
overloaded functions bees was using.  C++ silently found replacements,
and the result was a bug that prevented any data from being persisted
from the hash table.

Fixes: https://github.com/Zygo/bees/issues/7
2016-12-02 00:16:51 -05:00
Zygo Blaxell
06e111c229 crawl: remove UUID from file names
Unfortunately we don't get to remove the libuuid dependency because
we still want to read a file that exists in the legacy location.
2016-12-02 00:16:03 -05:00
Zygo Blaxell
38bb70f5d0 build: OK, maybe 32-bit machines could work
I accidentally did a pre-push verification on a 32-bit build host.
There were a surprisingly small number of problems, so fix them.

Bees now builds on a 32-bit host.  Let's not update README just yet,
though:  the 32-bit ioctl support fails immediately after startup on a
64-bit kernel.
2016-11-26 02:06:28 -05:00
Zygo Blaxell
a57404442c execpipe: remove unreachable debug code
This is tripping up builds in stricter build environments.

https://github.com/Zygo/bees/issues/2
2016-11-26 01:06:44 -05:00
Zygo Blaxell
1e621cf4e7 README: Improve "about" section and update compiler dependency
"agent" is a nice generic term for the set of things that userspace
btrfs deduplicators are.  Let's call it that.

Throw out the awkward and rambling "About" text and use the announcement
from linux-btrfs instead.  Terrible English writing I at am.
2016-11-24 23:06:28 -05:00
Zygo Blaxell
1303fb9da8 build: fix FTBFS on GCC 6.2
I'm not surprised that GCC 6 doesn't let me send an ostream ref to itself,
even inside an uninstantiated template specialization.  I am a little
surprised I was trying to, and 4.9 let me get away with it.

It's 2016.  auto_ptr is deprecated now.

Some things were including vector that don't any more.

https://github.com/Zygo/bees/issues/1
2016-11-24 22:20:11 -05:00
Zygo Blaxell
876b76d761 README.md: answer some questions that came in after release 2016-11-17 15:13:47 -05:00
17 changed files with 175 additions and 65 deletions

111
README.md
View File

@@ -1,31 +1,52 @@
BEES
====
Best-Effort Extent-Same, a btrfs deduplication daemon.
Best-Effort Extent-Same, a btrfs dedup agent.
About Bees
----------
Bees is a daemon designed to run continuously on live file servers.
Bees consumes entire filesystems and deduplicates in a single pass, using
minimal RAM to store data. Bees maintains persistent state so it can be
interrupted and resumed, whether by planned upgrades or unplanned crashes.
Bees makes continuous incremental progress instead of using separate
scan and dedup phases. Bees uses the Linux kernel's `dedupe_file_range`
system call to ensure data is handled safely even if other applications
concurrently modify it.
Bees is a block-oriented userspace dedup agent designed to avoid
scalability problems on large filesystems.
Bees is intentionally btrfs-specific for performance and capability.
Bees uses the btrfs `SEARCH_V2` ioctl to scan for new data
without the overhead of repeatedly walking filesystem trees with the
POSIX API. Bees uses `LOGICAL_INO` and `INO_PATHS` to leverage btrfs's
existing metadata instead of building its own redundant data structures.
Bees can cope with Btrfs filesystem compression. Bees can reassemble
Btrfs extents to deduplicate extents that contain a mix of duplicate
and unique data blocks.
Bees is designed to degrade gracefully when underprovisioned with RAM.
Bees does not use more RAM or storage as filesystem data size increases.
The dedup hash table size is fixed at creation time and does not change.
The effective dedup block size is dynamic and adjusts automatically to
fit the hash table into the configured RAM limit. Hash table overflow
is not implemented to eliminate the IO overhead of hash table overflow.
Hash table entries are only 16 bytes per dedup block to keep the average
dedup block size small.
Bees includes a number of workarounds for Btrfs kernel bugs to (try to)
avoid ruining your day. You're welcome.
Bees does not require alignment between dedup blocks or extent boundaries
(i.e. it can handle any multiple-of-4K offset between dup block pairs).
Bees rearranges blocks into shared and unique extents if required to
work within current btrfs kernel dedup limitations.
Bees can dedup any combination of compressed and uncompressed extents.
Bees operates in a single pass which removes duplicate extents immediately
during scan. There are no separate scanning and dedup phases.
Bees uses only data-safe btrfs kernel operations, so it can dedup live
data (e.g. build servers, sqlite databases, VM disk images). It does
not modify file attributes or timestamps.
Bees does not store any information about filesystem structure, so it is
not affected by the number or size of files (except to the extent that
these cause performance problems for btrfs in general). It retrieves such
information on demand through btrfs SEARCH_V2 and LOGICAL_INO ioctls.
This eliminates the storage required to maintain the equivalents of
these functions in userspace. It's also why bees has no XFS support.
Bees is a daemon designed to run continuously and maintain its state
across crahes and reboots. Bees uses checkpoints for persistence to
eliminate the IO overhead of a transactional data store. On restart,
bees will dedup any data that was added to the filesystem since the
last checkpoint.
Bees is used to dedup filesystems ranging in size from 16GB to 35TB, with
hash tables ranging in size from 128MB to 11GB.
How Bees Works
--------------
@@ -37,7 +58,8 @@ using a weighted sampling algorithm. This allows Bees to adapt itself
to its filesystem size without forcing admins to do math at install time.
At the same time, the duplicate block alignment constraint can be as low
as 4K, allowing efficient deduplication of files with narrowly-aligned
duplicate block offsets (e.g. compiled binaries and VM/disk images).
duplicate block offsets (e.g. compiled binaries and VM/disk images)
even if the effective block size is much larger.
The Bees hash table is loaded into RAM at startup (using hugepages if
available), mlocked, and synced to persistent storage by trickle-writing
@@ -78,6 +100,12 @@ and some metadata bits). Each entry represents a minimum of 4K on disk.
1TB 16MB 1024K
64TB 1GB 1024K
It is possible to resize the hash table by changing the size of
`beeshash.dat` (e.g. with `truncate`) and restarting `bees`. This
does not preserve all the existing hash table entries, but it does
preserve more than zero of them--especially if the old and new sizes
are a power-of-two multiple of each other.
Things You Might Expect That Bees Doesn't Have
----------------------------------------------
@@ -113,6 +141,16 @@ this was removed because it made Bees too aggressive to coexist with
other applications on the same machine. It also hit the *slow backrefs*
on N CPU cores instead of just one.
* Block reads are currently more allocation- and CPU-intensive than they
should be, especially for filesystems on SSD where the IO overhead is
much smaller. This is a problem for power-constrained environments
(e.g. laptops with slow CPU).
* Bees can currently fragment extents when required to remove duplicate
blocks, but has no defragmentation capability yet. When possible, Bees
will attempt to work with existing extent boundaries, but it will not
aggregate blocks together from multiple extents to create larger ones.
Good Btrfs Feature Interactions
-------------------------------
@@ -175,7 +213,7 @@ Other Caveats
unallocated space (see `btrfs fi df`) on the filesystem before running
Bees for the first time. Use
btrfs balance start -dusage=100,limit=1 /your/filesystem
btrfs balance start -dusage=100,limit=1 /your/filesystem
If possible, raise the `limit` parameter to the current size of metadata
usage (from `btrfs fi df`) plus 1.
@@ -254,9 +292,10 @@ Not really a bug, but a gotcha nonetheless:
Requirements
------------
* C++11 compiler (tested with GCC 4.9)
* C++11 compiler (tested with GCC 4.9 and 6.2.0)
Sorry. I really like closures.
Sorry. I really like closures and shared_ptr, so support
for earlier compiler versions is unlikely.
* btrfs-progs (tested with 4.1..4.7)
@@ -295,14 +334,14 @@ Setup
Create a directory for bees state files:
export BEESHOME=/some/path
mkdir -p "$BEESHOME"
export BEESHOME=/some/path
mkdir -p "$BEESHOME"
Create an empty hash table (your choice of size, but it must be a multiple
of 16M). This example creates a 1GB hash table:
truncate -s 1g "$BEESHOME/beeshash.dat"
chmod 700 "$BEESHOME/beeshash.dat"
truncate -s 1g "$BEESHOME/beeshash.dat"
chmod 700 "$BEESHOME/beeshash.dat"
Configuration
-------------
@@ -324,11 +363,11 @@ Running
We created this directory in the previous section:
export BEESHOME=/some/path
export BEESHOME=/some/path
Use a tmpfs for BEESSTATUS, it updates once per second:
export BEESSTATUS=/run/bees.status
export BEESSTATUS=/run/bees.status
bees can only process the root subvol of a btrfs (seriously--if the
argument is not the root subvol directory, Bees will just throw an
@@ -336,20 +375,20 @@ exception and stop).
Use a bind mount, and let only bees access it:
mount -osubvol=/ /dev/<your-filesystem> /var/lib/bees/root
mount -osubvol=/ /dev/<your-filesystem> /var/lib/bees/root
Reduce CPU and IO priority to be kinder to other applications
sharing this host (or raise them for more aggressive disk space
recovery). If you use cgroups, put bees in its own cgroup, then reduce
recovery). If you use cgroups, put `bees` in its own cgroup, then reduce
the `blkio.weight` and `cpu.shares` parameters. You can also use
`schedtool` and `ionice in the shell script that launches bees:
`schedtool` and `ionice` in the shell script that launches `bees`:
schedtool -D -n20 $$
ionice -c3 -p $$
schedtool -D -n20 $$
ionice -c3 -p $$
Let the bees fly:
bees /var/lib/bees/root >> /var/log/bees.log 2>&1
bees /var/lib/bees/root >> /var/log/bees.log 2>&1
You'll probably want to arrange for /var/log/bees.log to be rotated
periodically. You may also want to set umask to 077 to prevent disclosure
@@ -363,7 +402,7 @@ Email bug reports and patches to Zygo Blaxell <bees@furryterror.org>.
You can also use Github:
https://github.com/Zygo/bees
https://github.com/Zygo/bees

View File

@@ -8,6 +8,7 @@
#include <map>
#include <mutex>
#include <tuple>
#include <vector>
namespace crucible {
using namespace std;

View File

@@ -86,16 +86,6 @@ namespace crucible {
}
};
template <>
struct ChatterTraits<ostream &> {
Chatter &
operator()(Chatter &c, ostream & arg)
{
c.get_os() << arg;
return c;
}
};
class ChatterBox {
string m_file;
int m_line;

View File

@@ -120,6 +120,9 @@ namespace crucible {
template<> void pread_or_die<string>(int fd, string& str, off_t offset);
template<> void pread_or_die<vector<char>>(int fd, vector<char>& str, off_t offset);
template<> void pread_or_die<vector<uint8_t>>(int fd, vector<uint8_t>& str, off_t offset);
template<> void pwrite_or_die<string>(int fd, const string& str, off_t offset);
template<> void pwrite_or_die<vector<char>>(int fd, const vector<char>& str, off_t offset);
template<> void pwrite_or_die<vector<uint8_t>>(int fd, const vector<uint8_t>& str, off_t offset);
// A different approach to reading a simple string
string read_string(int fd, size_t size);

View File

@@ -171,7 +171,7 @@ namespace crucible {
ostream & operator<<(ostream &os, const BtrfsIoctlSearchKey &key);
string btrfs_search_type_ntoa(unsigned type);
string btrfs_search_objectid_ntoa(unsigned objectid);
string btrfs_search_objectid_ntoa(uint64_t objectid);
uint64_t btrfs_get_root_id(int fd);
uint64_t btrfs_get_root_transid(int fd);

View File

@@ -7,12 +7,12 @@ namespace crucible {
using namespace std;
struct bits_ntoa_table {
unsigned long n;
unsigned long mask;
unsigned long long n;
unsigned long long mask;
const char *a;
};
string bits_ntoa(unsigned long n, const bits_ntoa_table *a);
string bits_ntoa(unsigned long long n, const bits_ntoa_table *a);
};

View File

@@ -23,7 +23,7 @@ namespace crucible {
private:
struct Item {
Timestamp m_time;
unsigned m_id;
unsigned long m_id;
Task m_task;
bool operator<(const Item &that) const {

View File

@@ -15,7 +15,7 @@
namespace crucible {
using namespace std;
static auto_ptr<set<string>> chatter_names;
static shared_ptr<set<string>> chatter_names;
static const char *SPACETAB = " \t";
static

View File

@@ -72,14 +72,10 @@ namespace crucible {
catch_all([&]() {
parent_fd->close();
import_fd_fn(child_fd);
// system("ls -l /proc/$$/fd/ >&2");
rv = f();
});
_exit(rv);
cerr << "PID " << getpid() << " TID " << gettid() << "STILL ALIVE" << endl;
system("ls -l /proc/$$/task/ >&2");
exit(EXIT_FAILURE);
}
}

View File

@@ -426,6 +426,27 @@ namespace crucible {
return pread_or_die(fd, text.data(), text.size(), offset);
}
template<>
void
pwrite_or_die<vector<uint8_t>>(int fd, const vector<uint8_t> &text, off_t offset)
{
return pwrite_or_die(fd, text.data(), text.size(), offset);
}
template<>
void
pwrite_or_die<vector<char>>(int fd, const vector<char> &text, off_t offset)
{
return pwrite_or_die(fd, text.data(), text.size(), offset);
}
template<>
void
pwrite_or_die<string>(int fd, const string &text, off_t offset)
{
return pwrite_or_die(fd, text.data(), text.size(), offset);
}
Stat::Stat()
{
memset_zero<stat>(this);

View File

@@ -834,7 +834,7 @@ namespace crucible {
}
string
btrfs_search_objectid_ntoa(unsigned objectid)
btrfs_search_objectid_ntoa(uint64_t objectid)
{
static const bits_ntoa_table table[] = {
NTOA_TABLE_ENTRY_ENUM(BTRFS_ROOT_TREE_OBJECTID),

View File

@@ -7,7 +7,7 @@
namespace crucible {
using namespace std;
string bits_ntoa(unsigned long n, const bits_ntoa_table *table)
string bits_ntoa(unsigned long long n, const bits_ntoa_table *table)
{
string out;
while (n && table->a) {

View File

@@ -5,6 +5,7 @@
#include <fstream>
#include <iostream>
#include <vector>
using namespace crucible;
using namespace std;

View File

@@ -50,9 +50,18 @@ string
BeesRoots::crawl_state_filename() const
{
string rv;
// Legacy filename included UUID
rv += "beescrawl.";
rv += m_ctx->root_uuid();
rv += ".dat";
struct stat buf;
if (fstatat(m_ctx->home_fd(), rv.c_str(), &buf, AT_SYMLINK_NOFOLLOW)) {
// Use new filename
rv = "beescrawl.dat";
}
return rv;
}
@@ -101,6 +110,12 @@ BeesRoots::state_save()
m_crawl_state_file.write(ofs.str());
// Renaming things is hard after release
if (m_crawl_state_file.name() != "beescrawl.dat") {
renameat(m_ctx->home_fd(), m_crawl_state_file.name().c_str(), m_ctx->home_fd(), "beescrawl.dat");
m_crawl_state_file.name("beescrawl.dat");
}
BEESNOTE("relocking crawl state");
lock.lock();
// Not really correct but probably close enough

View File

@@ -351,6 +351,18 @@ BeesStringFile::BeesStringFile(Fd dir_fd, string name, size_t limit) :
BEESLOG("BeesStringFile " << name_fd(m_dir_fd) << "/" << m_name << " max size " << pretty(m_limit));
}
void
BeesStringFile::name(const string &new_name)
{
m_name = new_name;
}
string
BeesStringFile::name() const
{
return m_name;
}
string
BeesStringFile::read()
{

View File

@@ -374,6 +374,8 @@ public:
BeesStringFile(Fd dir_fd, string name, size_t limit = 1024 * 1024);
string read();
void write(string contents);
void name(const string &new_name);
string name() const;
};
class BeesHashTable {

View File

@@ -126,7 +126,13 @@ test_cast_0x80000000_to_things()
{
auto sv = 0x80000000LL;
auto uv = 0x80000000ULL;
SHOULD_PASS(ranged_cast<off_t>(sv), sv);
if (sizeof(off_t) == 4) {
SHOULD_FAIL(ranged_cast<off_t>(sv));
} else if (sizeof(off_t) == 8) {
SHOULD_PASS(ranged_cast<off_t>(sv), sv);
} else {
assert(!"unhandled case, please add code for off_t here");
}
SHOULD_PASS(ranged_cast<uint64_t>(uv), uv);
SHOULD_PASS(ranged_cast<uint32_t>(uv), uv);
SHOULD_FAIL(ranged_cast<uint16_t>(uv));
@@ -141,7 +147,13 @@ test_cast_0x80000000_to_things()
SHOULD_FAIL(ranged_cast<unsigned short>(uv));
SHOULD_FAIL(ranged_cast<unsigned char>(uv));
SHOULD_PASS(ranged_cast<signed long long>(sv), sv);
SHOULD_PASS(ranged_cast<signed long>(sv), sv);
if (sizeof(long) == 4) {
SHOULD_FAIL(ranged_cast<signed long>(sv));
} else if (sizeof(long) == 8) {
SHOULD_PASS(ranged_cast<signed long>(sv), sv);
} else {
assert(!"unhandled case, please add code for long here");
}
SHOULD_FAIL(ranged_cast<signed short>(sv));
SHOULD_FAIL(ranged_cast<signed char>(sv));
if (sizeof(int) == 4) {
@@ -149,7 +161,7 @@ test_cast_0x80000000_to_things()
} else if (sizeof(int) == 8) {
SHOULD_PASS(ranged_cast<signed int>(sv), sv);
} else {
assert(!"unhandled case, please add code here");
assert(!"unhandled case, please add code for int here");
}
}
@@ -159,7 +171,13 @@ test_cast_0xffffffff_to_things()
{
auto sv = 0xffffffffLL;
auto uv = 0xffffffffULL;
SHOULD_PASS(ranged_cast<off_t>(sv), sv);
if (sizeof(off_t) == 4) {
SHOULD_FAIL(ranged_cast<off_t>(sv));
} else if (sizeof(off_t) == 8) {
SHOULD_PASS(ranged_cast<off_t>(sv), sv);
} else {
assert(!"unhandled case, please add code for off_t here");
}
SHOULD_PASS(ranged_cast<uint64_t>(uv), uv);
SHOULD_PASS(ranged_cast<uint32_t>(uv), uv);
SHOULD_FAIL(ranged_cast<uint16_t>(uv));
@@ -174,7 +192,13 @@ test_cast_0xffffffff_to_things()
SHOULD_FAIL(ranged_cast<unsigned short>(uv));
SHOULD_FAIL(ranged_cast<unsigned char>(uv));
SHOULD_PASS(ranged_cast<signed long long>(sv), sv);
SHOULD_PASS(ranged_cast<signed long>(sv), sv);
if (sizeof(long) == 4) {
SHOULD_FAIL(ranged_cast<signed long>(sv));
} else if (sizeof(long) == 8) {
SHOULD_PASS(ranged_cast<signed long>(sv), sv);
} else {
assert(!"unhandled case, please add code for long here");
}
SHOULD_FAIL(ranged_cast<signed short>(sv));
SHOULD_FAIL(ranged_cast<signed char>(sv));
if (sizeof(int) == 4) {
@@ -182,7 +206,7 @@ test_cast_0xffffffff_to_things()
} else if (sizeof(int) == 8) {
SHOULD_PASS(ranged_cast<signed int>(sv), sv);
} else {
assert(!"unhandled case, please add code here");
assert(!"unhandled case, please add code for int here");
}
}
@@ -192,7 +216,13 @@ test_cast_0xfffffffff_to_things()
{
auto sv = 0xfffffffffLL;
auto uv = 0xfffffffffULL;
SHOULD_PASS(ranged_cast<off_t>(sv), sv);
if (sizeof(off_t) == 4) {
SHOULD_FAIL(ranged_cast<off_t>(sv));
} else if (sizeof(off_t) == 8) {
SHOULD_PASS(ranged_cast<off_t>(sv), sv);
} else {
assert(!"unhandled case, please add code for off_t here");
}
SHOULD_PASS(ranged_cast<uint64_t>(uv), uv);
SHOULD_FAIL(ranged_cast<uint32_t>(uv));
SHOULD_FAIL(ranged_cast<uint16_t>(uv));