1
0
mirror of https://github.com/Zygo/bees.git synced 2025-08-03 14:23:29 +02:00

1 Commits

Author SHA1 Message Date
Zygo Blaxell
11f69ff6c1 fanotify-watch: Not really part of Bees, but a useful tool nonetheless 2016-11-18 12:48:40 -05:00
31 changed files with 335 additions and 605 deletions

179
README.md
View File

@@ -1,52 +1,30 @@
BEES BEES
==== ====
Best-Effort Extent-Same, a btrfs dedup agent. Best-Effort Extent-Same, a btrfs deduplication daemon.
About Bees About Bees
---------- ----------
Bees is a block-oriented userspace dedup agent designed to avoid Bees is a daemon designed to run continuously on live file servers.
scalability problems on large filesystems. Bees scans and deduplicates whole filesystems in a single pass instead
of separate scan and dedup phases. RAM usage does _not_ depend on
unique data size or the number of input files. Hash tables and scan
progress are stored persistently so the daemon can resume after a reboot.
Bees uses the Linux kernel's `dedupe_file_range` feature to ensure data
is handled safely even if other applications concurrently modify it.
Bees is designed to degrade gracefully when underprovisioned with RAM. Bees is intentionally btrfs-specific for performance and capability.
Bees does not use more RAM or storage as filesystem data size increases. Bees uses the btrfs `SEARCH_V2` ioctl to scan for new data without the
The dedup hash table size is fixed at creation time and does not change. overhead of repeatedly walking filesystem trees with the POSIX API.
The effective dedup block size is dynamic and adjusts automatically to Bees uses `LOGICAL_INO` and `INO_PATHS` to leverage btrfs's existing
fit the hash table into the configured RAM limit. Hash table overflow metadata instead of building its own redundant data structures.
is not implemented to eliminate the IO overhead of hash table overflow. Bees can cope with Btrfs filesystem compression. Bees can reassemble
Hash table entries are only 16 bytes per dedup block to keep the average Btrfs extents to deduplicate extents that contain a mix of duplicate
dedup block size small. and unique data blocks.
Bees does not require alignment between dedup blocks or extent boundaries Bees includes a number of workarounds for Btrfs kernel bugs to (try to)
(i.e. it can handle any multiple-of-4K offset between dup block pairs). avoid ruining your day. You're welcome.
Bees rearranges blocks into shared and unique extents if required to
work within current btrfs kernel dedup limitations.
Bees can dedup any combination of compressed and uncompressed extents.
Bees operates in a single pass which removes duplicate extents immediately
during scan. There are no separate scanning and dedup phases.
Bees uses only data-safe btrfs kernel operations, so it can dedup live
data (e.g. build servers, sqlite databases, VM disk images). It does
not modify file attributes or timestamps.
Bees does not store any information about filesystem structure, so it is
not affected by the number or size of files (except to the extent that
these cause performance problems for btrfs in general). It retrieves such
information on demand through btrfs SEARCH_V2 and LOGICAL_INO ioctls.
This eliminates the storage required to maintain the equivalents of
these functions in userspace. It's also why bees has no XFS support.
Bees is a daemon designed to run continuously and maintain its state
across crahes and reboots. Bees uses checkpoints for persistence to
eliminate the IO overhead of a transactional data store. On restart,
bees will dedup any data that was added to the filesystem since the
last checkpoint.
Bees is used to dedup filesystems ranging in size from 16GB to 35TB, with
hash tables ranging in size from 128MB to 11GB.
How Bees Works How Bees Works
-------------- --------------
@@ -100,9 +78,11 @@ and some metadata bits). Each entry represents a minimum of 4K on disk.
1TB 16MB 1024K 1TB 16MB 1024K
64TB 1GB 1024K 64TB 1GB 1024K
To change the size of the hash table, use 'truncate' to change the hash It is possible to resize the hash table by changing the size of
table size, delete `beescrawl.dat` so that bees will start over with a `beeshash.dat` (e.g. with `truncate`) and restarting `bees`. This
fresh full-filesystem rescan, and restart `bees'. does not preserve all the existing hash table entries, but it does
preserve more than zero of them--especially if the old and new sizes
are a power-of-two multiple of each other.
Things You Might Expect That Bees Doesn't Have Things You Might Expect That Bees Doesn't Have
---------------------------------------------- ----------------------------------------------
@@ -149,9 +129,6 @@ blocks, but has no defragmentation capability yet. When possible, Bees
will attempt to work with existing extent boundaries, but it will not will attempt to work with existing extent boundaries, but it will not
aggregate blocks together from multiple extents to create larger ones. aggregate blocks together from multiple extents to create larger ones.
* It is possible to resize the hash table without starting over with
a new full-filesystem scan; however, this has not been implemented yet.
Good Btrfs Feature Interactions Good Btrfs Feature Interactions
------------------------------- -------------------------------
@@ -167,17 +144,19 @@ Bees has been tested in combination with the following:
* IO errors during dedup (read errors will throw exceptions, Bees will catch them and skip over the affected extent) * IO errors during dedup (read errors will throw exceptions, Bees will catch them and skip over the affected extent)
* Filesystems mounted *with* the flushoncommit option * Filesystems mounted *with* the flushoncommit option
* 4K filesystem data block size / clone alignment * 4K filesystem data block size / clone alignment
* 64-bit and 32-bit host CPUs (amd64, x86, arm) * 64-bit CPUs (amd64)
* Large (>16M) extents * Large (>16M) extents
* Huge files (>1TB--although Btrfs performance on such files isn't great in general) * Huge files (>1TB--although Btrfs performance on such files isn't great in general)
* filesystems up to 25T bytes, 100M+ files * filesystems up to 25T bytes, 100M+ files
Bad Btrfs Feature Interactions Bad Btrfs Feature Interactions
------------------------------ ------------------------------
Bees has not been tested with the following, and undesirable interactions may occur: Bees has not been tested with the following, and undesirable interactions may occur:
* Non-4K filesystem data block size (should work if recompiled) * Non-4K filesystem data block size (should work if recompiled)
* 32-bit CPUs (x86, arm)
* Non-equal hash (SUM) and filesystem data block (CLONE) sizes (probably never will work) * Non-equal hash (SUM) and filesystem data block (CLONE) sizes (probably never will work)
* btrfs read-only snapshots (never tested, probably wouldn't work well) * btrfs read-only snapshots (never tested, probably wouldn't work well)
* btrfs send/receive (receive is probably OK, but send requires RO snapshots. See above) * btrfs send/receive (receive is probably OK, but send requires RO snapshots. See above)
@@ -221,26 +200,16 @@ Other Caveats
A Brief List Of Btrfs Kernel Bugs A Brief List Of Btrfs Kernel Bugs
--------------------------------- ---------------------------------
Missing features (usually not available in older LTS kernels): Fixed bugs:
* 3.13: `FILE_EXTENT_SAME` ioctl added. No way to reliably dedup with * 3.13: `FILE_EXTENT_SAME` ioctl added. No way to reliably dedup with
concurrent modifications before this. concurrent modifications before this.
* 3.16: `SEARCH_V2` ioctl added. Bees could use `SEARCH` instead. * 3.16: `SEARCH_V2` ioctl added. Bees could use `SEARCH` instead.
* 4.2: `FILE_EXTENT_SAME` no longer updates mtime, can be used at EOF. * 4.2: `FILE_EXTENT_SAME` no longer updates mtime, can be used at EOF.
Kernel deadlock bugs fixed.
Bug fixes (sometimes included in older LTS kernels):
* 4.5: hang in the `INO_PATHS` ioctl used by Bees.
* 4.5: use-after-free in the `FILE_EXTENT_SAME` ioctl used by Bees.
* 4.7: *slow backref* bug no longer triggers a softlockup panic. It still * 4.7: *slow backref* bug no longer triggers a softlockup panic. It still
too long to resolve a block address to a root/inode/offset triple. too long to resolve a block address to a root/inode/offset triple.
Fixed bugs not yet integrated in mainline Linux:
* 7f8e406 ("btrfs: improve delayed refs iterations"): significantly
reduces the CPU time cost of the LOGICAL_INO ioctl (from 30-70% of
bees running time to under 5%).
Unfixed kernel bugs (as of 4.5.7) with workarounds in Bees: Unfixed kernel bugs (as of 4.5.7) with workarounds in Bees:
* *slow backref*: If the number of references to a single shared extent * *slow backref*: If the number of references to a single shared extent
@@ -274,7 +243,7 @@ Unfixed kernel bugs (as of 4.5.7) with workarounds in Bees:
precisely the specified range of offending fragmented blocks. precisely the specified range of offending fragmented blocks.
* When writing BeesStringFile, a crash can cause the directory entry * When writing BeesStringFile, a crash can cause the directory entry
`beescrawl.dat.tmp` to exist without a corresponding inode. `beescrawl.UUID.dat.tmp` to exist without a corresponding inode.
This directory entry cannot be renamed or removed; however, it does This directory entry cannot be renamed or removed; however, it does
not prevent the creation of a second directory entry with the same not prevent the creation of a second directory entry with the same
name that functions normally, so it doesn't prevent Bees operation. name that functions normally, so it doesn't prevent Bees operation.
@@ -282,13 +251,10 @@ Unfixed kernel bugs (as of 4.5.7) with workarounds in Bees:
The orphan directory entry can be removed by deleting its subvol, The orphan directory entry can be removed by deleting its subvol,
so place BEESHOME on a separate subvol so you can delete these orphan so place BEESHOME on a separate subvol so you can delete these orphan
directory entries when they occur (or use btrfs zero-log before mounting directory entries when they occur (or use btrfs zero-log before mounting
the filesystem after a crash). Alternatively, place BEESHOME on a the filesystem after a crash).
non-btrfs filesystem.
* If the `fsync()` in `BeesTempFile::make_copy` is removed, the filesystem * If the fsync() BeesTempFile::make_copy is removed, the filesystem
hangs within a few hours, requiring a reboot to recover. On the other hangs within a few hours, requiring a reboot to recover.
hand, there may be net performance benefits to calling `fsync()` before
or after each dedup. This needs further investigation.
Not really a bug, but a gotcha nonetheless: Not really a bug, but a gotcha nonetheless:
@@ -304,10 +270,9 @@ Not really a bug, but a gotcha nonetheless:
Requirements Requirements
------------ ------------
* C++11 compiler (tested with GCC 4.9 and 6.2.0) * C++11 compiler (tested with GCC 4.9)
Sorry. I really like closures and shared_ptr, so support Sorry. I really like closures.
for earlier compiler versions is unlikely.
* btrfs-progs (tested with 4.1..4.7) * btrfs-progs (tested with 4.1..4.7)
@@ -319,7 +284,7 @@ Requirements
TODO: remove the one function used from this library. TODO: remove the one function used from this library.
It supports a feature Bees no longer implements. It supports a feature Bees no longer implements.
* Linux kernel 4.4.3 or later * Linux kernel 4.2 or later
Don't bother trying to make Bees work with older kernels. Don't bother trying to make Bees work with older kernels.
It won't end well. It won't end well.
@@ -355,49 +320,17 @@ of 16M). This example creates a 1GB hash table:
truncate -s 1g "$BEESHOME/beeshash.dat" truncate -s 1g "$BEESHOME/beeshash.dat"
chmod 700 "$BEESHOME/beeshash.dat" chmod 700 "$BEESHOME/beeshash.dat"
bees can only process the root subvol of a btrfs (seriously--if the
argument is not the root subvol directory, Bees will just throw an
exception and stop).
Use a bind mount, and let only bees access it:
UUID=3399e413-695a-4b0b-9384-1b0ef8f6c4cd
mkdir -p /var/lib/bees/$UUID
mount /dev/disk/by-uuid/$UUID /var/lib/bees/$UUID -osubvol=/
If you don't set BEESHOME, the path ".beeshome" will be used relative
to the root subvol of the filesystem. For example:
btrfs sub create /var/lib/bees/$UUID/.beeshome
truncate -s 1g /var/lib/bees/$UUID/.beeshome/beeshash.dat
chmod 700 /var/lib/bees/$UUID/.beeshome/beeshash.dat
You can use any relative path in BEESHOME. The path will be taken
relative to the root of the deduped filesystem (in other words it can
be the name of a subvol):
export BEESHOME=@my-beeshome
btrfs sub create /var/lib/bees/$UUID/$BEESHOME
truncate -s 1g /var/lib/bees/$UUID/$BEESHOME/beeshash.dat
chmod 700 /var/lib/bees/$UUID/$BEESHOME/beeshash.dat
Configuration Configuration
------------- -------------
The only runtime configurable options are environment variables: The only runtime configurable options are environment variables:
* BEESHOME: Directory containing Bees state files: * BEESHOME: Directory containing Bees state files:
* beeshash.dat | persistent hash table. Must be a multiple of 16M. * beeshash.dat | persistent hash table (must be a multiple of 16M)
This contains 16-byte records: 8 bytes for CRC64, * beescrawl.`UUID`.dat | state of SEARCH_V2 crawlers
8 bytes for physical address and some metadata bits. * beesstats.txt | statistics and performance counters
* beescrawl.dat | state of SEARCH_V2 crawlers. ASCII text. * BEESSTATS: File containing a snapshot of current Bees state (performance
* beesstats.txt | statistics and performance counters. ASCII text. counters and current status of each thread).
* BEESSTATUS: File containing a snapshot of current Bees state: performance
counters and current status of each thread. The file is meant to be
human readable, but understanding it probably requires reading the source.
You can watch bees run in realtime with a command like:
watch -n1 cat $BEESSTATUS
Other options (e.g. interval between filesystem crawls) can be configured Other options (e.g. interval between filesystem crawls) can be configured
in src/bees.h. in src/bees.h.
@@ -405,27 +338,39 @@ in src/bees.h.
Running Running
------- -------
Reduce CPU and IO priority to be kinder to other applications sharing We created this directory in the previous section:
this host (or raise them for more aggressive disk space recovery). If you
use cgroups, put `bees` in its own cgroup, then reduce the `blkio.weight` export BEESHOME=/some/path
and `cpu.shares` parameters. You can also use `schedtool` and `ionice`
in the shell script that launches `bees`: Use a tmpfs for BEESSTATUS, it updates once per second:
export BEESSTATUS=/run/bees.status
bees can only process the root subvol of a btrfs (seriously--if the
argument is not the root subvol directory, Bees will just throw an
exception and stop).
Use a bind mount, and let only bees access it:
mount -osubvol=/ /dev/<your-filesystem> /var/lib/bees/root
Reduce CPU and IO priority to be kinder to other applications
sharing this host (or raise them for more aggressive disk space
recovery). If you use cgroups, put `bees` in its own cgroup, then reduce
the `blkio.weight` and `cpu.shares` parameters. You can also use
`schedtool` and `ionice` in the shell script that launches `bees`:
schedtool -D -n20 $$ schedtool -D -n20 $$
ionice -c3 -p $$ ionice -c3 -p $$
Let the bees fly: Let the bees fly:
for fs in /var/lib/bees/*-*-*-*-*/; do bees /var/lib/bees/root >> /var/log/bees.log 2>&1
bees "$fs" >> "$fs/.beeshome/bees.log" 2>&1 &
done
You'll probably want to arrange for /var/log/bees.log to be rotated You'll probably want to arrange for /var/log/bees.log to be rotated
periodically. You may also want to set umask to 077 to prevent disclosure periodically. You may also want to set umask to 077 to prevent disclosure
of information about the contents of the filesystem through the log file. of information about the contents of the filesystem through the log file.
There are also some shell wrappers in the `scripts/` directory.
Bug Reports and Contributions Bug Reports and Contributions
----------------------------- -----------------------------

View File

@@ -8,7 +8,6 @@
#include <map> #include <map>
#include <mutex> #include <mutex>
#include <tuple> #include <tuple>
#include <vector>
namespace crucible { namespace crucible {
using namespace std; using namespace std;

View File

@@ -86,6 +86,16 @@ namespace crucible {
} }
}; };
template <>
struct ChatterTraits<ostream &> {
Chatter &
operator()(Chatter &c, ostream & arg)
{
c.get_os() << arg;
return c;
}
};
class ChatterBox { class ChatterBox {
string m_file; string m_file;
int m_line; int m_line;

View File

@@ -3,11 +3,11 @@
#include <cstdint> #include <cstdint>
#include <cstdlib> #include <cstdlib>
#include <cstring>
namespace crucible { namespace crucible {
namespace Digest { namespace Digest {
namespace CRC { namespace CRC {
uint64_t crc64(const char *s);
uint64_t crc64(const void *p, size_t len); uint64_t crc64(const void *p, size_t len);
}; };
}; };

View File

@@ -70,11 +70,10 @@ namespace crucible {
string mmap_flags_ntoa(int flags); string mmap_flags_ntoa(int flags);
// Unlink, rename // Unlink, rename
void unlink_or_die(const string &file);
void rename_or_die(const string &from, const string &to); void rename_or_die(const string &from, const string &to);
void renameat_or_die(int fromfd, const string &frompath, int tofd, const string &topath); void renameat_or_die(int fromfd, const string &frompath, int tofd, const string &topath);
void ftruncate_or_die(int fd, off_t size);
// Read or write structs: // Read or write structs:
// There is a template specialization to read or write strings // There is a template specialization to read or write strings
// Three-arg version of read_or_die/write_or_die throws an error on incomplete read/writes // Three-arg version of read_or_die/write_or_die throws an error on incomplete read/writes
@@ -121,9 +120,6 @@ namespace crucible {
template<> void pread_or_die<string>(int fd, string& str, off_t offset); template<> void pread_or_die<string>(int fd, string& str, off_t offset);
template<> void pread_or_die<vector<char>>(int fd, vector<char>& str, off_t offset); template<> void pread_or_die<vector<char>>(int fd, vector<char>& str, off_t offset);
template<> void pread_or_die<vector<uint8_t>>(int fd, vector<uint8_t>& str, off_t offset); template<> void pread_or_die<vector<uint8_t>>(int fd, vector<uint8_t>& str, off_t offset);
template<> void pwrite_or_die<string>(int fd, const string& str, off_t offset);
template<> void pwrite_or_die<vector<char>>(int fd, const vector<char>& str, off_t offset);
template<> void pwrite_or_die<vector<uint8_t>>(int fd, const vector<uint8_t>& str, off_t offset);
// A different approach to reading a simple string // A different approach to reading a simple string
string read_string(int fd, size_t size); string read_string(int fd, size_t size);

View File

@@ -13,7 +13,6 @@
#include <cstdint> #include <cstdint>
#include <iosfwd> #include <iosfwd>
#include <set>
#include <vector> #include <vector>
#include <fcntl.h> #include <fcntl.h>
@@ -151,14 +150,13 @@ namespace crucible {
BtrfsIoctlSearchHeader(); BtrfsIoctlSearchHeader();
vector<char> m_data; vector<char> m_data;
size_t set_data(const vector<char> &v, size_t offset); size_t set_data(const vector<char> &v, size_t offset);
bool operator<(const BtrfsIoctlSearchHeader &that) const;
}; };
ostream & operator<<(ostream &os, const btrfs_ioctl_search_header &hdr); ostream & operator<<(ostream &os, const btrfs_ioctl_search_header &hdr);
ostream & operator<<(ostream &os, const BtrfsIoctlSearchHeader &hdr); ostream & operator<<(ostream &os, const BtrfsIoctlSearchHeader &hdr);
struct BtrfsIoctlSearchKey : public btrfs_ioctl_search_key { struct BtrfsIoctlSearchKey : public btrfs_ioctl_search_key {
BtrfsIoctlSearchKey(size_t buf_size = 4096); BtrfsIoctlSearchKey(size_t buf_size = 1024 * 1024);
virtual bool do_ioctl_nothrow(int fd); virtual bool do_ioctl_nothrow(int fd);
virtual void do_ioctl(int fd); virtual void do_ioctl(int fd);
@@ -166,15 +164,14 @@ namespace crucible {
void next_min(const BtrfsIoctlSearchHeader& ref); void next_min(const BtrfsIoctlSearchHeader& ref);
size_t m_buf_size; size_t m_buf_size;
set<BtrfsIoctlSearchHeader> m_result; vector<BtrfsIoctlSearchHeader> m_result;
}; };
ostream & operator<<(ostream &os, const btrfs_ioctl_search_key &key); ostream & operator<<(ostream &os, const btrfs_ioctl_search_key &key);
ostream & operator<<(ostream &os, const BtrfsIoctlSearchKey &key); ostream & operator<<(ostream &os, const BtrfsIoctlSearchKey &key);
string btrfs_search_type_ntoa(unsigned type); string btrfs_search_type_ntoa(unsigned type);
string btrfs_search_objectid_ntoa(uint64_t objectid); string btrfs_search_objectid_ntoa(unsigned objectid);
uint64_t btrfs_get_root_id(int fd); uint64_t btrfs_get_root_id(int fd);
uint64_t btrfs_get_root_transid(int fd); uint64_t btrfs_get_root_transid(int fd);

View File

@@ -7,12 +7,12 @@ namespace crucible {
using namespace std; using namespace std;
struct bits_ntoa_table { struct bits_ntoa_table {
unsigned long long n; unsigned long n;
unsigned long long mask; unsigned long mask;
const char *a; const char *a;
}; };
string bits_ntoa(unsigned long long n, const bits_ntoa_table *a); string bits_ntoa(unsigned long n, const bits_ntoa_table *a);
}; };

View File

@@ -23,7 +23,7 @@ namespace crucible {
private: private:
struct Item { struct Item {
Timestamp m_time; Timestamp m_time;
unsigned long m_id; unsigned m_id;
Task m_task; Task m_task;
bool operator<(const Item &that) const { bool operator<(const Item &that) const {

View File

@@ -18,6 +18,8 @@ OBJS = \
include ../makeflags include ../makeflags
LDFLAGS = -shared -luuid
depends.mk: *.c *.cc depends.mk: *.c *.cc
for x in *.c; do $(CC) $(CFLAGS) -M "$$x"; done > depends.mk.new for x in *.c; do $(CC) $(CFLAGS) -M "$$x"; done > depends.mk.new
for x in *.cc; do $(CXX) $(CXXFLAGS) -M "$$x"; done >> depends.mk.new for x in *.cc; do $(CXX) $(CXXFLAGS) -M "$$x"; done >> depends.mk.new
@@ -32,4 +34,4 @@ depends.mk: *.c *.cc
$(CXX) $(CXXFLAGS) -o $@ -c $< $(CXX) $(CXXFLAGS) -o $@ -c $<
libcrucible.so: $(OBJS) Makefile libcrucible.so: $(OBJS) Makefile
$(CXX) $(LDFLAGS) -o $@ $(OBJS) -shared -luuid $(CXX) $(LDFLAGS) -o $@ $(OBJS)

View File

@@ -15,7 +15,7 @@
namespace crucible { namespace crucible {
using namespace std; using namespace std;
static shared_ptr<set<string>> chatter_names; static auto_ptr<set<string>> chatter_names;
static const char *SPACETAB = " \t"; static const char *SPACETAB = " \t";
static static

View File

@@ -1,31 +1,3 @@
/* crc64.c -- compute CRC-64
* Copyright (C) 2013 Mark Adler
* Version 1.4 16 Dec 2013 Mark Adler
*/
/*
This software is provided 'as-is', without any express or implied
warranty. In no event will the author be held liable for any damages
arising from the use of this software.
Permission is granted to anyone to use this software for any purpose,
including commercial applications, and to alter it and redistribute it
freely, subject to the following restrictions:
1. The origin of this software must not be misrepresented; you must not
claim that you wrote the original software. If you use this software
in a product, an acknowledgment in the product documentation would be
appreciated but is not required.
2. Altered source versions must be plainly marked as such, and must not be
misrepresented as being the original software.
3. This notice may not be removed or altered from any source distribution.
Mark Adler
madler@alumni.caltech.edu
*/
/* Substantially modified by Paul Jones for usage in bees */
#include "crucible/crc64.h" #include "crucible/crc64.h"
#define POLY64REV 0xd800000000000000ULL #define POLY64REV 0xd800000000000000ULL
@@ -33,16 +5,13 @@
namespace crucible { namespace crucible {
static bool init = false; static bool init = false;
static uint64_t CRCTable[8][256]; static uint64_t CRCTable[256];
static void init_crc64_table() static void init_crc64_table()
{ {
if (!init) { if (!init) {
uint64_t crc; for (int i = 0; i <= 255; i++) {
uint64_t part = i;
// Generate CRCs for all single byte sequences
for (int n = 0; n < 256; n++) {
uint64_t part = n;
for (int j = 0; j < 8; j++) { for (int j = 0; j < 8; j++) {
if (part & 1) { if (part & 1) {
part = (part >> 1) ^ POLY64REV; part = (part >> 1) ^ POLY64REV;
@@ -50,53 +19,37 @@ namespace crucible {
part >>= 1; part >>= 1;
} }
} }
CRCTable[0][n] = part; CRCTable[i] = part;
}
// Generate nested CRC table for slice-by-8 lookup
for (int n = 0; n < 256; n++) {
crc = CRCTable[0][n];
for (int k = 1; k < 8; k++) {
crc = CRCTable[0][crc & 0xff] ^ (crc >> 8);
CRCTable[k][n] = crc;
}
} }
init = true; init = true;
} }
} }
uint64_t
Digest::CRC::crc64(const char *s)
{
init_crc64_table();
uint64_t crc = 0;
for (; *s; s++) {
uint64_t temp1 = crc >> 8;
uint64_t temp2 = CRCTable[(crc ^ static_cast<uint64_t>(*s)) & 0xff];
crc = temp1 ^ temp2;
}
return crc;
}
uint64_t uint64_t
Digest::CRC::crc64(const void *p, size_t len) Digest::CRC::crc64(const void *p, size_t len)
{ {
init_crc64_table(); init_crc64_table();
const unsigned char *next = static_cast<const unsigned char *>(p);
uint64_t crc = 0; uint64_t crc = 0;
for (const unsigned char *s = static_cast<const unsigned char *>(p); len; --len) {
// Process individual bytes until we reach an 8-byte aligned pointer uint64_t temp1 = crc >> 8;
while (len && (reinterpret_cast<uintptr_t>(next) & 7) != 0) { uint64_t temp2 = CRCTable[(crc ^ *s++) & 0xff];
crc = CRCTable[0][(crc ^ *next++) & 0xff] ^ (crc >> 8); crc = temp1 ^ temp2;
len--;
}
// Fast middle processing, 8 bytes (aligned!) per loop
while (len >= 8) {
crc ^= *(reinterpret_cast< const uint64_t *>(next));
crc = CRCTable[7][crc & 0xff] ^
CRCTable[6][(crc >> 8) & 0xff] ^
CRCTable[5][(crc >> 16) & 0xff] ^
CRCTable[4][(crc >> 24) & 0xff] ^
CRCTable[3][(crc >> 32) & 0xff] ^
CRCTable[2][(crc >> 40) & 0xff] ^
CRCTable[1][(crc >> 48) & 0xff] ^
CRCTable[0][crc >> 56];
next += 8;
len -= 8;
}
// Process remaining bytes (can't be larger than 8)
while (len) {
crc = CRCTable[0][(crc ^ *next++) & 0xff] ^ (crc >> 8);
len--;
} }
return crc; return crc;

View File

@@ -72,10 +72,14 @@ namespace crucible {
catch_all([&]() { catch_all([&]() {
parent_fd->close(); parent_fd->close();
import_fd_fn(child_fd); import_fd_fn(child_fd);
// system("ls -l /proc/$$/fd/ >&2");
rv = f(); rv = f();
}); });
_exit(rv); _exit(rv);
cerr << "PID " << getpid() << " TID " << gettid() << "STILL ALIVE" << endl;
system("ls -l /proc/$$/task/ >&2");
exit(EXIT_FAILURE);
} }
} }

View File

@@ -468,7 +468,7 @@ namespace crucible {
BtrfsExtentWalker::Vec BtrfsExtentWalker::Vec
BtrfsExtentWalker::get_extent_map(off_t pos) BtrfsExtentWalker::get_extent_map(off_t pos)
{ {
BtrfsIoctlSearchKey sk(sc_extent_fetch_max * (sizeof(btrfs_file_extent_item) + sizeof(btrfs_ioctl_search_header))); BtrfsIoctlSearchKey sk;
if (!m_root_fd) { if (!m_root_fd) {
m_root_fd = m_fd; m_root_fd = m_fd;
} }

View File

@@ -230,14 +230,6 @@ namespace crucible {
} }
} }
void
ftruncate_or_die(int fd, off_t size)
{
if (::ftruncate(fd, size)) {
THROW_ERRNO("ftruncate: " << name_fd(fd) << " size " << size);
}
}
string string
socket_domain_ntoa(int domain) socket_domain_ntoa(int domain)
{ {
@@ -434,27 +426,6 @@ namespace crucible {
return pread_or_die(fd, text.data(), text.size(), offset); return pread_or_die(fd, text.data(), text.size(), offset);
} }
template<>
void
pwrite_or_die<vector<uint8_t>>(int fd, const vector<uint8_t> &text, off_t offset)
{
return pwrite_or_die(fd, text.data(), text.size(), offset);
}
template<>
void
pwrite_or_die<vector<char>>(int fd, const vector<char> &text, off_t offset)
{
return pwrite_or_die(fd, text.data(), text.size(), offset);
}
template<>
void
pwrite_or_die<string>(int fd, const string &text, off_t offset)
{
return pwrite_or_die(fd, text.data(), text.size(), offset);
}
Stat::Stat() Stat::Stat()
{ {
memset_zero<stat>(this); memset_zero<stat>(this);

View File

@@ -707,19 +707,11 @@ namespace crucible {
return offset + len; return offset + len;
} }
bool
BtrfsIoctlSearchHeader::operator<(const BtrfsIoctlSearchHeader &that) const
{
return tie(objectid, type, offset, len, transid) < tie(that.objectid, that.type, that.offset, that.len, that.transid);
}
bool bool
BtrfsIoctlSearchKey::do_ioctl_nothrow(int fd) BtrfsIoctlSearchKey::do_ioctl_nothrow(int fd)
{ {
vector<char> ioctl_arg = vector_copy_struct<btrfs_ioctl_search_key>(this); vector<char> ioctl_arg = vector_copy_struct<btrfs_ioctl_search_key>(this);
// Normally we like to be paranoid and fill empty bytes with zero, ioctl_arg.resize(sizeof(btrfs_ioctl_search_args_v2) + m_buf_size, 0);
// but these buffers can be huge. 80% of a 4GHz CPU huge.
ioctl_arg.resize(sizeof(btrfs_ioctl_search_args_v2) + m_buf_size);
btrfs_ioctl_search_args_v2 *ioctl_ptr = reinterpret_cast<btrfs_ioctl_search_args_v2 *>(ioctl_arg.data()); btrfs_ioctl_search_args_v2 *ioctl_ptr = reinterpret_cast<btrfs_ioctl_search_args_v2 *>(ioctl_arg.data());
ioctl_ptr->buf_size = m_buf_size; ioctl_ptr->buf_size = m_buf_size;
@@ -733,12 +725,13 @@ namespace crucible {
static_cast<btrfs_ioctl_search_key&>(*this) = ioctl_ptr->key; static_cast<btrfs_ioctl_search_key&>(*this) = ioctl_ptr->key;
m_result.clear(); m_result.clear();
m_result.reserve(nr_items);
size_t offset = pointer_distance(ioctl_ptr->buf, ioctl_ptr); size_t offset = pointer_distance(ioctl_ptr->buf, ioctl_ptr);
for (decltype(nr_items) i = 0; i < nr_items; ++i) { for (decltype(nr_items) i = 0; i < nr_items; ++i) {
BtrfsIoctlSearchHeader item; BtrfsIoctlSearchHeader item;
offset = item.set_data(ioctl_arg, offset); offset = item.set_data(ioctl_arg, offset);
m_result.insert(item); m_result.push_back(item);
} }
return true; return true;
@@ -841,7 +834,7 @@ namespace crucible {
} }
string string
btrfs_search_objectid_ntoa(uint64_t objectid) btrfs_search_objectid_ntoa(unsigned objectid)
{ {
static const bits_ntoa_table table[] = { static const bits_ntoa_table table[] = {
NTOA_TABLE_ENTRY_ENUM(BTRFS_ROOT_TREE_OBJECTID), NTOA_TABLE_ENTRY_ENUM(BTRFS_ROOT_TREE_OBJECTID),

View File

@@ -7,7 +7,7 @@
namespace crucible { namespace crucible {
using namespace std; using namespace std;
string bits_ntoa(unsigned long long n, const bits_ntoa_table *table) string bits_ntoa(unsigned long n, const bits_ntoa_table *table)
{ {
string out; string out;
while (n && table->a) { while (n && table->a) {

View File

@@ -1,4 +1,4 @@
CCFLAGS = -Wall -Wextra -Werror -O3 -march=native -I../include -ggdb -fpic -D_FILE_OFFSET_BITS=64 CCFLAGS = -Wall -Wextra -Werror -O3 -I../include -ggdb -fpic
# CCFLAGS = -Wall -Wextra -Werror -O0 -I../include -ggdb -fpic # CCFLAGS = -Wall -Wextra -Werror -O0 -I../include -ggdb -fpic
CFLAGS = $(CCFLAGS) -std=c99 CFLAGS = $(CCFLAGS) -std=c99
CXXFLAGS = $(CCFLAGS) -std=c++11 -Wold-style-cast CXXFLAGS = $(CCFLAGS) -std=c++11 -Wold-style-cast

View File

@@ -1,106 +0,0 @@
#!/bin/bash
# /usr/bin/beesd
## Helpful functions
INFO(){ echo "INFO:" "$@"; }
ERRO(){ echo "ERROR:" "$@"; exit 1; }
YN(){ [[ "$1" =~ (1|Y|y) ]]; }
## Global vars
export BEESHOME BEESSTATUS
export WORK_DIR CONFIG_DIR
export CONFIG_FILE
export UUID AL16M
readonly AL16M="$((16*1024*1024))"
readonly CONFIG_DIR=/etc/bees/
## Pre checks
{
[ ! -d "$CONFIG_DIR" ] && ERRO "Missing: $CONFIG_DIR"
[ "$UID" == "0" ] || ERRO "Must be runned as root"
}
command -v bees &> /dev/null || ERRO "Missing 'bees' command"
## Parse args
UUID="$1"
case "$UUID" in
*-*-*-*-*)
FILE_CONFIG=""
for file in "$CONFIG_DIR"/*.conf; do
[ ! -f "$file" ] && continue
if grep -q "$UUID" "$file"; then
INFO "Find $UUID in $file, use as conf"
FILE_CONFIG="$file"
fi
done
[ ! -f "$FILE_CONFIG" ] && ERRO "No config for $UUID"
source "$FILE_CONFIG"
;;
*)
echo "beesd <btrfs_uuid>"
exit 1
;;
esac
WORK_DIR="${WORK_DIR:-/run/bees/}"
MNT_DIR="${MNT_DIR:-$WORK_DIR/mnt/$UUID}"
BEESHOME="${BEESHOME:-$MNT_DIR/.beeshome}"
BEESSTATUS="${BEESSTATUS:-$WORK_DIR/$UUID.status}"
DB_SIZE="${DB_SIZE:-$((64*AL16M))}"
LOG_SHORT_PATH="${LOG_SHORT_PATH:-N}"
INFO "Check: BTRFS UUID exists"
if [ ! -d "/sys/fs/btrfs/$UUID" ]; then
ERRO "Can't find BTRFS UUID: $UUID"
fi
INFO "Check: Disk exists"
if [ ! -b "/dev/disk/by-uuid/$UUID" ]; then
ERRO "Missing disk: /dev/disk/by-uuid/$UUID"
fi
INFO "WORK DIR: $WORK_DIR"
mkdir -p "$WORK_DIR" || exit 1
INFO "MOUNT DIR: $MNT_DIR"
mkdir -p "$MNT_DIR" || exit 1
umount_w(){ mountpoint -q "$1" && umount -l "$1"; }
force_umount(){ umount_w "$MNT_DIR"; }
trap force_umount SIGINT SIGTERM EXIT
mount -osubvolid=5 /dev/disk/by-uuid/$UUID "$MNT_DIR" || exit 1
if [ ! -d "$BEESHOME" ]; then
INFO "Create subvol $BEESHOME for store bees data"
btrfs sub cre "$BEESHOME"
else
btrfs sub show "$BEESHOME" &> /dev/null || ERRO "$BEESHOME MUST BE A SUBVOL!"
fi
# Check DB size
{
DB_PATH="$BEESHOME/beeshash.dat"
touch "$DB_PATH"
OLD_SIZE="$(du -b "$DB_PATH" | sed 's/\t/ /g' | cut -d' ' -f1)"
NEW_SIZE="$DB_SIZE"
if (( "$NEW_SIZE"%AL16M > 0 )); then
ERRO "DB_SIZE Must be multiple of 16M"
fi
if (( "$OLD_SIZE" != "$NEW_SIZE" )); then
INFO "Resize db: $OLD_SIZE -> $NEW_SIZE"
[ -f "$BEESHOME/beescrawl.$UUID.dat" ] && rm "$BEESHOME/beescrawl.$UUID.dat"
truncate -s $NEW_SIZE $DB_PATH
fi
chmod 700 "$DB_PATH"
}
if YN "$LOG_SHORT_PATH"; then
cd "$MNT_DIR" || exit 1
bees .
else
bees "$MNT_DIR"
fi
exit 0

View File

@@ -1,31 +0,0 @@
## Config for Bees: /etc/bees/beesd.conf.sample
## https://github.com/Zygo/bees
## It's a default values, change it, if needed
# Which FS will be used
UUID=5d3c0ad5-bedf-463d-8235-b4d4f6f99476
## System Vars
# Change carefully
# WORK_DIR=/run/bees/
# MNT_DIR="$WORK_DIR/mnt/$UUID"
# BEESHOME="$MNT_DIR/.beeshome"
# BEESSTATUS="$WORK_DIR/$UUID.status"
## Make path shorter in logs
# LOG_SHORT_PATH=N
## Bees DB size
# Hash Table Sizing
# sHash table entries are 16 bytes each
# (64-bit hash, 52-bit block number, and some metadata bits)
# Each entry represents a minimum of 4K on disk.
# unique data size hash table size average dedup block size
# 1TB 4GB 4K
# 1TB 1GB 16K
# 1TB 256MB 64K
# 1TB 16MB 1024K
# 64TB 1GB 1024K
#
# Size MUST be power of 16M
# DB_SIZE=$((64*$AL16M)) # 1G in bytes

View File

@@ -1,14 +0,0 @@
[Unit]
Description=Bees - Best-Effort Extent-Same, a btrfs deduplicator daemon: %i
After=local-fs.target
[Service]
ExecStart=/usr/bin/beesd %i
Nice=19
IOSchedulingClass=idle
CPUAccounting=true
MemoryAccounting=true
# CPUQuota=95%
[Install]
WantedBy=local-fs.target

1
src/.gitignore vendored
View File

@@ -1 +0,0 @@
bees-version.h

View File

@@ -1,5 +1,6 @@
PROGRAMS = \ PROGRAMS = \
../bin/bees \ ../bin/bees \
../bin/fanotify-watch \
../bin/fiemap \ ../bin/fiemap \
../bin/fiewalk \ ../bin/fiewalk \
@@ -11,8 +12,6 @@ LIBS = -lcrucible -lpthread
LDFLAGS = -L../lib -Wl,-rpath=$(shell realpath ../lib) LDFLAGS = -L../lib -Wl,-rpath=$(shell realpath ../lib)
depends.mk: Makefile *.cc depends.mk: Makefile *.cc
echo "#define BEES_VERSION \"$(shell git describe --always --dirty || echo UNKNOWN)\"" > bees-version.new.h
mv -f bees-version.new.h bees-version.h
for x in *.cc; do $(CXX) $(CXXFLAGS) -M "$$x"; done > depends.mk.new for x in *.cc; do $(CXX) $(CXXFLAGS) -M "$$x"; done > depends.mk.new
mv -fv depends.mk.new depends.mk mv -fv depends.mk.new depends.mk
@@ -38,4 +37,4 @@ BEES_OBJS = \
$(CXX) $(CXXFLAGS) -o "$@" $(BEES_OBJS) $(LDFLAGS) $(LIBS) $(CXX) $(CXXFLAGS) -o "$@" $(BEES_OBJS) $(LDFLAGS) $(LIBS)
clean: clean:
-rm -fv *.o bees-version.h -rm -fv *.o

View File

@@ -5,7 +5,6 @@
#include <fstream> #include <fstream>
#include <iostream> #include <iostream>
#include <vector>
using namespace crucible; using namespace crucible;
using namespace std; using namespace std;
@@ -24,16 +23,10 @@ getenv_or_die(const char *name)
BeesFdCache::BeesFdCache() BeesFdCache::BeesFdCache()
{ {
m_root_cache.func([&](shared_ptr<BeesContext> ctx, uint64_t root) -> Fd { m_root_cache.func([&](shared_ptr<BeesContext> ctx, uint64_t root) -> Fd {
Timer open_timer; return ctx->roots()->open_root_nocache(root);
auto rv = ctx->roots()->open_root_nocache(root);
BEESCOUNTADD(open_root_ms, open_timer.age() * 1000);
return rv;
}); });
m_file_cache.func([&](shared_ptr<BeesContext> ctx, uint64_t root, uint64_t ino) -> Fd { m_file_cache.func([&](shared_ptr<BeesContext> ctx, uint64_t root, uint64_t ino) -> Fd {
Timer open_timer; return ctx->roots()->open_root_ino_nocache(root, ino);
auto rv = ctx->roots()->open_root_ino_nocache(root, ino);
BEESCOUNTADD(open_ino_ms, open_timer.age() * 1000);
return rv;
}); });
} }
@@ -235,24 +228,15 @@ BeesContext::show_progress()
} }
} }
Fd
BeesContext::home_fd()
{
const char *base_dir = getenv("BEESHOME");
if (!base_dir) {
base_dir = ".beeshome";
}
m_home_fd = openat(root_fd(), base_dir, FLAGS_OPEN_DIR);
if (!m_home_fd) {
THROW_ERRNO("openat: " << name_fd(root_fd()) << " / " << base_dir);
}
return m_home_fd;
}
BeesContext::BeesContext(shared_ptr<BeesContext> parent) : BeesContext::BeesContext(shared_ptr<BeesContext> parent) :
m_parent_ctx(parent) m_parent_ctx(parent)
{ {
auto base_dir = getenv_or_die("BEESHOME");
BEESLOG("BEESHOME = " << base_dir);
m_home_fd = open_or_die(base_dir, FLAGS_OPEN_DIR);
if (m_parent_ctx) { if (m_parent_ctx) {
m_hash_table = m_parent_ctx->hash_table();
m_hash_table->set_shared(true);
m_fd_cache = m_parent_ctx->fd_cache(); m_fd_cache = m_parent_ctx->fd_cache();
} }
} }

View File

@@ -1,4 +1,3 @@
#include "bees-version.h"
#include "bees.h" #include "bees.h"
#include "crucible/crc64.h" #include "crucible/crc64.h"
@@ -12,6 +11,13 @@
using namespace crucible; using namespace crucible;
using namespace std; using namespace std;
static inline
bool
using_any_madvise()
{
return true;
}
ostream & ostream &
operator<<(ostream &os, const BeesHash &bh) operator<<(ostream &os, const BeesHash &bh)
{ {
@@ -95,6 +101,8 @@ BeesHashTable::get_extent_range(HashType hash)
void void
BeesHashTable::flush_dirty_extents() BeesHashTable::flush_dirty_extents()
{ {
if (using_shared_map()) return;
THROW_CHECK1(runtime_error, m_buckets, m_buckets > 0); THROW_CHECK1(runtime_error, m_buckets, m_buckets > 0);
unique_lock<mutex> lock(m_extent_mutex); unique_lock<mutex> lock(m_extent_mutex);
@@ -116,12 +124,16 @@ BeesHashTable::flush_dirty_extents()
uint8_t *dirty_extent_end = m_extent_ptr[extent_number + 1].p_byte; uint8_t *dirty_extent_end = m_extent_ptr[extent_number + 1].p_byte;
THROW_CHECK1(out_of_range, dirty_extent, dirty_extent >= m_byte_ptr); THROW_CHECK1(out_of_range, dirty_extent, dirty_extent >= m_byte_ptr);
THROW_CHECK1(out_of_range, dirty_extent_end, dirty_extent_end <= m_byte_ptr_end); THROW_CHECK1(out_of_range, dirty_extent_end, dirty_extent_end <= m_byte_ptr_end);
THROW_CHECK2(out_of_range, dirty_extent_end, dirty_extent, dirty_extent_end - dirty_extent == BLOCK_SIZE_HASHTAB_EXTENT); if (using_shared_map()) {
BEESTOOLONG("pwrite(fd " << m_fd << " '" << name_fd(m_fd)<< "', length " << to_hex(dirty_extent_end - dirty_extent) << ", offset " << to_hex(dirty_extent - m_byte_ptr) << ")"); BEESTOOLONG("flush extent " << extent_number);
// Page locks slow us down more than copying the data does copy(dirty_extent, dirty_extent_end, dirty_extent);
vector<uint8_t> extent_copy(dirty_extent, dirty_extent_end); } else {
pwrite_or_die(m_fd, extent_copy, dirty_extent - m_byte_ptr); BEESTOOLONG("pwrite(fd " << m_fd << " '" << name_fd(m_fd)<< "', length " << to_hex(dirty_extent_end - dirty_extent) << ", offset " << to_hex(dirty_extent - m_byte_ptr) << ")");
BEESCOUNT(hash_extent_out); // Page locks slow us down more than copying the data does
vector<uint8_t> extent_copy(dirty_extent, dirty_extent_end);
pwrite_or_die(m_fd, extent_copy, dirty_extent - m_byte_ptr);
BEESCOUNT(hash_extent_out);
}
}); });
BEESNOTE("flush rate limited at extent #" << extent_number << " (" << extent_counter << " of " << dirty_extent_copy.size() << ")"); BEESNOTE("flush rate limited at extent #" << extent_number << " (" << extent_counter << " of " << dirty_extent_copy.size() << ")");
m_flush_rate_limit.sleep_for(BLOCK_SIZE_HASHTAB_EXTENT); m_flush_rate_limit.sleep_for(BLOCK_SIZE_HASHTAB_EXTENT);
@@ -131,6 +143,7 @@ BeesHashTable::flush_dirty_extents()
void void
BeesHashTable::set_extent_dirty(HashType hash) BeesHashTable::set_extent_dirty(HashType hash)
{ {
if (using_shared_map()) return;
THROW_CHECK1(runtime_error, m_buckets, m_buckets > 0); THROW_CHECK1(runtime_error, m_buckets, m_buckets > 0);
auto pr = get_extent_range(hash); auto pr = get_extent_range(hash);
uint64_t extent_number = reinterpret_cast<Extent *>(pr.first) - m_extent_ptr; uint64_t extent_number = reinterpret_cast<Extent *>(pr.first) - m_extent_ptr;
@@ -143,8 +156,10 @@ BeesHashTable::set_extent_dirty(HashType hash)
void void
BeesHashTable::writeback_loop() BeesHashTable::writeback_loop()
{ {
while (true) { if (!using_shared_map()) {
flush_dirty_extents(); while (1) {
flush_dirty_extents();
}
} }
} }
@@ -260,7 +275,6 @@ BeesHashTable::prefetch_loop()
graph_blob << "Now: " << format_time(time(NULL)) << "\n"; graph_blob << "Now: " << format_time(time(NULL)) << "\n";
graph_blob << "Uptime: " << m_ctx->total_timer().age() << " seconds\n"; graph_blob << "Uptime: " << m_ctx->total_timer().age() << " seconds\n";
graph_blob << "Version: " << BEES_VERSION << "\n";
graph_blob graph_blob
<< "\nHash table page occupancy histogram (" << occupied_count << "/" << total_count << " cells occupied, " << (occupied_count * 100 / total_count) << "%)\n" << "\nHash table page occupancy histogram (" << occupied_count << "/" << total_count << " cells occupied, " << (occupied_count * 100 / total_count) << "%)\n"
@@ -296,6 +310,7 @@ void
BeesHashTable::fetch_missing_extent(HashType hash) BeesHashTable::fetch_missing_extent(HashType hash)
{ {
BEESTOOLONG("fetch_missing_extent for hash " << to_hex(hash)); BEESTOOLONG("fetch_missing_extent for hash " << to_hex(hash));
if (using_shared_map()) return;
THROW_CHECK1(runtime_error, m_buckets, m_buckets > 0); THROW_CHECK1(runtime_error, m_buckets, m_buckets > 0);
auto pr = get_extent_range(hash); auto pr = get_extent_range(hash);
uint64_t extent_number = reinterpret_cast<Extent *>(pr.first) - m_extent_ptr; uint64_t extent_number = reinterpret_cast<Extent *>(pr.first) - m_extent_ptr;
@@ -381,6 +396,7 @@ BeesHashTable::find_cell(HashType hash)
void void
BeesHashTable::erase_hash_addr(HashType hash, AddrType addr) BeesHashTable::erase_hash_addr(HashType hash, AddrType addr)
{ {
// if (m_shared) return;
fetch_missing_extent(hash); fetch_missing_extent(hash);
BEESTOOLONG("erase hash " << to_hex(hash) << " addr " << addr); BEESTOOLONG("erase hash " << to_hex(hash) << " addr " << addr);
unique_lock<mutex> lock(m_bucket_mutex); unique_lock<mutex> lock(m_bucket_mutex);
@@ -558,36 +574,12 @@ BeesHashTable::try_mmap_flags(int flags)
} }
void void
BeesHashTable::open_file() BeesHashTable::set_shared(bool shared)
{ {
// OK open hash table m_shared = shared;
BEESNOTE("opening hash table '" << m_filename << "' target size " << m_size << " (" << pretty(m_size) << ")");
// Try to open existing hash table
Fd new_fd = openat(m_ctx->home_fd(), m_filename.c_str(), FLAGS_OPEN_FILE_RW, 0700);
// If that doesn't work, try to make a new one
if (!new_fd) {
string tmp_filename = m_filename + ".tmp";
BEESLOGNOTE("creating new hash table '" << tmp_filename << "'");
unlinkat(m_ctx->home_fd(), tmp_filename.c_str(), 0);
new_fd = openat_or_die(m_ctx->home_fd(), tmp_filename, FLAGS_CREATE_FILE, 0700);
BEESLOGNOTE("truncating new hash table '" << tmp_filename << "' size " << m_size << " (" << pretty(m_size) << ")");
ftruncate_or_die(new_fd, m_size);
BEESLOGNOTE("truncating new hash table '" << tmp_filename << "' -> '" << m_filename << "'");
renameat_or_die(m_ctx->home_fd(), tmp_filename, m_ctx->home_fd(), m_filename);
}
Stat st(new_fd);
off_t new_size = st.st_size;
THROW_CHECK1(invalid_argument, new_size, new_size > 0);
THROW_CHECK1(invalid_argument, new_size, (new_size % BLOCK_SIZE_HASHTAB_EXTENT) == 0);
m_size = new_size;
m_fd = new_fd;
} }
BeesHashTable::BeesHashTable(shared_ptr<BeesContext> ctx, string filename, off_t size) : BeesHashTable::BeesHashTable(shared_ptr<BeesContext> ctx, string filename) :
m_ctx(ctx), m_ctx(ctx),
m_size(0), m_size(0),
m_void_ptr(nullptr), m_void_ptr(nullptr),
@@ -595,30 +587,35 @@ BeesHashTable::BeesHashTable(shared_ptr<BeesContext> ctx, string filename, off_t
m_buckets(0), m_buckets(0),
m_cells(0), m_cells(0),
m_writeback_thread("hash_writeback"), m_writeback_thread("hash_writeback"),
m_prefetch_thread("hash_prefetch"), m_prefetch_thread("hash_prefetch " + m_ctx->root_path()),
m_flush_rate_limit(BEES_FLUSH_RATE), m_flush_rate_limit(BEES_FLUSH_RATE),
m_prefetch_rate_limit(BEES_FLUSH_RATE), m_prefetch_rate_limit(BEES_FLUSH_RATE),
m_stats_file(m_ctx->home_fd(), "beesstats.txt") m_stats_file(m_ctx->home_fd(), "beesstats.txt")
{ {
// Sanity checks to protect the implementation from its weaknesses BEESNOTE("opening hash table " << filename);
m_fd = openat_or_die(m_ctx->home_fd(), filename, FLAGS_OPEN_FILE_RW, 0700);
Stat st(m_fd);
m_size = st.st_size;
BEESTRACE("hash table size " << m_size);
BEESTRACE("hash table bucket size " << BLOCK_SIZE_HASHTAB_BUCKET);
BEESTRACE("hash table extent size " << BLOCK_SIZE_HASHTAB_EXTENT);
THROW_CHECK2(invalid_argument, BLOCK_SIZE_HASHTAB_BUCKET, BLOCK_SIZE_HASHTAB_EXTENT, (BLOCK_SIZE_HASHTAB_EXTENT % BLOCK_SIZE_HASHTAB_BUCKET) == 0); THROW_CHECK2(invalid_argument, BLOCK_SIZE_HASHTAB_BUCKET, BLOCK_SIZE_HASHTAB_EXTENT, (BLOCK_SIZE_HASHTAB_EXTENT % BLOCK_SIZE_HASHTAB_BUCKET) == 0);
// Does the union work?
THROW_CHECK2(runtime_error, m_void_ptr, m_cell_ptr, m_void_ptr == m_cell_ptr);
THROW_CHECK2(runtime_error, m_void_ptr, m_byte_ptr, m_void_ptr == m_byte_ptr);
THROW_CHECK2(runtime_error, m_void_ptr, m_bucket_ptr, m_void_ptr == m_bucket_ptr);
THROW_CHECK2(runtime_error, m_void_ptr, m_extent_ptr, m_void_ptr == m_extent_ptr);
// There's more than one union // There's more than one union
THROW_CHECK2(runtime_error, sizeof(Bucket), BLOCK_SIZE_HASHTAB_BUCKET, BLOCK_SIZE_HASHTAB_BUCKET == sizeof(Bucket)); THROW_CHECK2(runtime_error, sizeof(Bucket), BLOCK_SIZE_HASHTAB_BUCKET, BLOCK_SIZE_HASHTAB_BUCKET == sizeof(Bucket));
THROW_CHECK2(runtime_error, sizeof(Bucket::p_byte), BLOCK_SIZE_HASHTAB_BUCKET, BLOCK_SIZE_HASHTAB_BUCKET == sizeof(Bucket::p_byte)); THROW_CHECK2(runtime_error, sizeof(Bucket::p_byte), BLOCK_SIZE_HASHTAB_BUCKET, BLOCK_SIZE_HASHTAB_BUCKET == sizeof(Bucket::p_byte));
THROW_CHECK2(runtime_error, sizeof(Extent), BLOCK_SIZE_HASHTAB_EXTENT, BLOCK_SIZE_HASHTAB_EXTENT == sizeof(Extent)); THROW_CHECK2(runtime_error, sizeof(Extent), BLOCK_SIZE_HASHTAB_EXTENT, BLOCK_SIZE_HASHTAB_EXTENT == sizeof(Extent));
THROW_CHECK2(runtime_error, sizeof(Extent::p_byte), BLOCK_SIZE_HASHTAB_EXTENT, BLOCK_SIZE_HASHTAB_EXTENT == sizeof(Extent::p_byte)); THROW_CHECK2(runtime_error, sizeof(Extent::p_byte), BLOCK_SIZE_HASHTAB_EXTENT, BLOCK_SIZE_HASHTAB_EXTENT == sizeof(Extent::p_byte));
m_filename = filename;
m_size = size;
open_file();
// Now we know size we can compute stuff
BEESTRACE("hash table size " << m_size);
BEESTRACE("hash table bucket size " << BLOCK_SIZE_HASHTAB_BUCKET);
BEESTRACE("hash table extent size " << BLOCK_SIZE_HASHTAB_EXTENT);
BEESLOG("opened hash table filename '" << filename << "' length " << m_size); BEESLOG("opened hash table filename '" << filename << "' length " << m_size);
m_buckets = m_size / BLOCK_SIZE_HASHTAB_BUCKET; m_buckets = m_size / BLOCK_SIZE_HASHTAB_BUCKET;
m_cells = m_buckets * c_cells_per_bucket; m_cells = m_buckets * c_cells_per_bucket;
@@ -627,30 +624,27 @@ BeesHashTable::BeesHashTable(shared_ptr<BeesContext> ctx, string filename, off_t
BEESLOG("\tflush rate limit " << BEES_FLUSH_RATE); BEESLOG("\tflush rate limit " << BEES_FLUSH_RATE);
// Try to mmap that much memory if (using_shared_map()) {
try_mmap_flags(MAP_PRIVATE | MAP_ANONYMOUS); try_mmap_flags(MAP_SHARED);
} else {
try_mmap_flags(MAP_PRIVATE | MAP_ANONYMOUS);
}
if (!m_cell_ptr) { if (!m_cell_ptr) {
THROW_ERRNO("unable to mmap " << filename); THROW_ERROR(runtime_error, "unable to mmap " << filename);
} }
// Do unions work the way we think (and rely on)? if (!using_shared_map()) {
THROW_CHECK2(runtime_error, m_void_ptr, m_cell_ptr, m_void_ptr == m_cell_ptr); // madvise fails if MAP_SHARED
THROW_CHECK2(runtime_error, m_void_ptr, m_byte_ptr, m_void_ptr == m_byte_ptr); if (using_any_madvise()) {
THROW_CHECK2(runtime_error, m_void_ptr, m_bucket_ptr, m_void_ptr == m_bucket_ptr); // DONTFORK because we sometimes do fork,
THROW_CHECK2(runtime_error, m_void_ptr, m_extent_ptr, m_void_ptr == m_extent_ptr); // but the child doesn't touch any of the many, many pages
BEESTOOLONG("madvise(MADV_HUGEPAGE | MADV_DONTFORK)");
{ DIE_IF_NON_ZERO(madvise(m_byte_ptr, m_size, MADV_HUGEPAGE | MADV_DONTFORK));
// It's OK if this fails (e.g. kernel not built with CONFIG_TRANSPARENT_HUGEPAGE) }
// We don't fork any more so DONTFORK isn't really needed for (uint64_t i = 0; i < m_size / sizeof(Extent); ++i) {
BEESTOOLONG("madvise(MADV_HUGEPAGE | MADV_DONTFORK)"); m_buckets_missing.insert(i);
if (madvise(m_byte_ptr, m_size, MADV_HUGEPAGE | MADV_DONTFORK)) {
BEESLOG("mostly harmless: madvise(MADV_HUGEPAGE | MADV_DONTFORK) failed: " << strerror(errno));
} }
}
for (uint64_t i = 0; i < m_size / sizeof(Extent); ++i) {
m_buckets_missing.insert(i);
} }
m_writeback_thread.exec([&]() { m_writeback_thread.exec([&]() {

View File

@@ -196,7 +196,7 @@ BeesResolver::chase_extent_ref(const BtrfsInodeOffsetRoot &bior, BeesBlockData &
Fd file_fd = m_ctx->roots()->open_root_ino(bior.m_root, bior.m_inum); Fd file_fd = m_ctx->roots()->open_root_ino(bior.m_root, bior.m_inum);
if (!file_fd) { if (!file_fd) {
// Deleted snapshots generate craptons of these // Delete snapshots generate craptons of these
// BEESINFO("No FD in chase_extent_ref " << bior); // BEESINFO("No FD in chase_extent_ref " << bior);
BEESCOUNT(chase_no_fd); BEESCOUNT(chase_no_fd);
return BeesFileRange(); return BeesFileRange();
@@ -378,10 +378,7 @@ BeesResolver::for_each_extent_ref(BeesBlockData bbd, function<bool(const BeesFil
// We have reliable block addresses now, so we guarantee we can hit the desired block. // We have reliable block addresses now, so we guarantee we can hit the desired block.
// Failure in chase_extent_ref means we are done, and don't need to look up all the // Failure in chase_extent_ref means we are done, and don't need to look up all the
// other references. // other references.
// Or...not? If we have a compressed extent, some refs will not match stop_now = true;
// if there is are two references to the same extent with a reference
// to a different extent between them.
// stop_now = true;
} }
}); });
@@ -480,6 +477,11 @@ BeesResolver::find_all_matches(BeesBlockData &bbd)
bool bool
BeesResolver::operator<(const BeesResolver &that) const BeesResolver::operator<(const BeesResolver &that) const
{ {
// Lowest count, highest address if (that.m_bior_count < m_bior_count) {
return tie(that.m_bior_count, m_addr) < tie(m_bior_count, that.m_addr); return true;
} else if (m_bior_count < that.m_bior_count) {
return false;
}
return m_addr < that.m_addr;
} }

View File

@@ -42,26 +42,17 @@ BeesCrawlState::BeesCrawlState() :
bool bool
BeesCrawlState::operator<(const BeesCrawlState &that) const BeesCrawlState::operator<(const BeesCrawlState &that) const
{ {
return tie(m_objectid, m_offset, m_root, m_min_transid, m_max_transid) return tie(m_root, m_objectid, m_offset, m_min_transid, m_max_transid)
< tie(that.m_objectid, that.m_offset, that.m_root, that.m_min_transid, that.m_max_transid); < tie(that.m_root, that.m_objectid, that.m_offset, that.m_min_transid, that.m_max_transid);
} }
string string
BeesRoots::crawl_state_filename() const BeesRoots::crawl_state_filename() const
{ {
string rv; string rv;
// Legacy filename included UUID
rv += "beescrawl."; rv += "beescrawl.";
rv += m_ctx->root_uuid(); rv += m_ctx->root_uuid();
rv += ".dat"; rv += ".dat";
struct stat buf;
if (fstatat(m_ctx->home_fd(), rv.c_str(), &buf, AT_SYMLINK_NOFOLLOW)) {
// Use new filename
rv = "beescrawl.dat";
}
return rv; return rv;
} }
@@ -110,12 +101,6 @@ BeesRoots::state_save()
m_crawl_state_file.write(ofs.str()); m_crawl_state_file.write(ofs.str());
// Renaming things is hard after release
if (m_crawl_state_file.name() != "beescrawl.dat") {
renameat(m_ctx->home_fd(), m_crawl_state_file.name().c_str(), m_ctx->home_fd(), "beescrawl.dat");
m_crawl_state_file.name("beescrawl.dat");
}
BEESNOTE("relocking crawl state"); BEESNOTE("relocking crawl state");
lock.lock(); lock.lock();
// Not really correct but probably close enough // Not really correct but probably close enough
@@ -208,15 +193,15 @@ BeesRoots::crawl_roots()
auto crawl_map_copy = m_root_crawl_map; auto crawl_map_copy = m_root_crawl_map;
lock.unlock(); lock.unlock();
#if 0
// Scan the same inode/offset tuple in each subvol (good for snapshots)
BeesFileRange first_range; BeesFileRange first_range;
shared_ptr<BeesCrawl> first_crawl; shared_ptr<BeesCrawl> first_crawl;
for (auto i : crawl_map_copy) { for (auto i : crawl_map_copy) {
auto this_crawl = i.second; auto this_crawl = i.second;
auto this_range = this_crawl->peek_front(); auto this_range = this_crawl->peek_front();
if (this_range) { if (this_range) {
if (!first_range || this_range < first_range) { auto tuple_this = make_tuple(this_range.fid().ino(), this_range.fid().root(), this_range.begin());
auto tuple_first = make_tuple(first_range.fid().ino(), first_range.fid().root(), first_range.begin());
if (!first_range || tuple_this < tuple_first) {
first_crawl = this_crawl; first_crawl = this_crawl;
first_range = this_range; first_range = this_range;
} }
@@ -234,27 +219,6 @@ BeesRoots::crawl_roots()
THROW_CHECK2(runtime_error, first_range, first_range_popped, first_range == first_range_popped); THROW_CHECK2(runtime_error, first_range, first_range_popped, first_range == first_range_popped);
return; return;
} }
#else
// Scan each subvol one extent at a time (good for continuous forward progress)
bool crawled = false;
for (auto i : crawl_map_copy) {
auto this_crawl = i.second;
auto this_range = this_crawl->peek_front();
if (this_range) {
catch_all([&]() {
// BEESINFO("scan_forward " << this_range);
m_ctx->scan_forward(this_range);
});
crawled = true;
BEESCOUNT(crawl_scan);
m_crawl_current = this_crawl->get_state();
auto this_range_popped = this_crawl->pop_front();
THROW_CHECK2(runtime_error, this_range, this_range_popped, this_range == this_range_popped);
}
}
if (crawled) return;
#endif
BEESLOG("Crawl ran out of data after " << m_crawl_timer.lap() << "s, waiting for more..."); BEESLOG("Crawl ran out of data after " << m_crawl_timer.lap() << "s, waiting for more...");
BEESCOUNT(crawl_done); BEESCOUNT(crawl_done);
@@ -379,8 +343,8 @@ BeesRoots::state_load()
BeesRoots::BeesRoots(shared_ptr<BeesContext> ctx) : BeesRoots::BeesRoots(shared_ptr<BeesContext> ctx) :
m_ctx(ctx), m_ctx(ctx),
m_crawl_state_file(ctx->home_fd(), crawl_state_filename()), m_crawl_state_file(ctx->home_fd(), crawl_state_filename()),
m_crawl_thread("crawl"), m_crawl_thread("crawl " + ctx->root_path()),
m_writeback_thread("crawl_writeback") m_writeback_thread("crawl_writeback " + ctx->root_path())
{ {
m_crawl_thread.exec([&]() { m_crawl_thread.exec([&]() {
catch_all([&]() { catch_all([&]() {
@@ -665,7 +629,7 @@ BeesCrawl::fetch_extents()
Timer crawl_timer; Timer crawl_timer;
BtrfsIoctlSearchKey sk(BEES_MAX_CRAWL_SIZE * (sizeof(btrfs_file_extent_item) + sizeof(btrfs_ioctl_search_header))); BtrfsIoctlSearchKey sk;
sk.tree_id = old_state.m_root; sk.tree_id = old_state.m_root;
sk.min_objectid = old_state.m_objectid; sk.min_objectid = old_state.m_objectid;
sk.min_type = sk.max_type = BTRFS_EXTENT_DATA_KEY; sk.min_type = sk.max_type = BTRFS_EXTENT_DATA_KEY;
@@ -682,9 +646,7 @@ BeesCrawl::fetch_extents()
{ {
BEESNOTE("searching crawl sk " << static_cast<btrfs_ioctl_search_key&>(sk)); BEESNOTE("searching crawl sk " << static_cast<btrfs_ioctl_search_key&>(sk));
BEESTOOLONG("Searching crawl sk " << static_cast<btrfs_ioctl_search_key&>(sk)); BEESTOOLONG("Searching crawl sk " << static_cast<btrfs_ioctl_search_key&>(sk));
Timer crawl_timer;
ioctl_ok = sk.do_ioctl_nothrow(m_ctx->root_fd()); ioctl_ok = sk.do_ioctl_nothrow(m_ctx->root_fd());
BEESCOUNTADD(crawl_ms, crawl_timer.age() * 1000);
} }
if (ioctl_ok) { if (ioctl_ok) {

View File

@@ -1,4 +1,3 @@
#include "bees-version.h"
#include "bees.h" #include "bees.h"
#include "crucible/interp.h" #include "crucible/interp.h"
@@ -33,12 +32,15 @@ do_cmd_help(const ArgList &argv)
"fs-root-path MUST be the root of a btrfs filesystem tree (id 5).\n" "fs-root-path MUST be the root of a btrfs filesystem tree (id 5).\n"
"Other directories will be rejected.\n" "Other directories will be rejected.\n"
"\n" "\n"
"Optional environment variables:\n" "Multiple filesystems can share a single hash table (BEESHOME)\n"
"\tBEESHOME\tPath to hash table and configuration files\n" "but this only works well if the content of each filesystem\n"
"\t\t\t(default is .beeshome/ in the root of each filesystem).\n" "is distinct from all the others.\n"
"\n" "\n"
"\tBEESSTATUS\tFile to write status to (tmpfs recommended, e.g. /run).\n" "Required environment variables:\n"
"\t\t\tNo status is written if this variable is unset.\n" "\tBEESHOME\tPath to hash table and configuration files\n"
"\n"
"Optional environment variables:\n"
"\tBEESSTATUS\tFile to write status to (tmpfs recommended, e.g. /run)\n"
"\n" "\n"
<< endl; << endl;
return 0; return 0;
@@ -349,18 +351,6 @@ BeesStringFile::BeesStringFile(Fd dir_fd, string name, size_t limit) :
BEESLOG("BeesStringFile " << name_fd(m_dir_fd) << "/" << m_name << " max size " << pretty(m_limit)); BEESLOG("BeesStringFile " << name_fd(m_dir_fd) << "/" << m_name << " max size " << pretty(m_limit));
} }
void
BeesStringFile::name(const string &new_name)
{
m_name = new_name;
}
string
BeesStringFile::name() const
{
return m_name;
}
string string
BeesStringFile::read() BeesStringFile::read()
{ {
@@ -394,13 +384,8 @@ BeesStringFile::write(string contents)
Fd ofd = openat_or_die(m_dir_fd, tmpname, FLAGS_CREATE_FILE, S_IRUSR | S_IWUSR); Fd ofd = openat_or_die(m_dir_fd, tmpname, FLAGS_CREATE_FILE, S_IRUSR | S_IWUSR);
BEESNOTE("writing " << tmpname << " in " << name_fd(m_dir_fd)); BEESNOTE("writing " << tmpname << " in " << name_fd(m_dir_fd));
write_or_die(ofd, contents); write_or_die(ofd, contents);
#if 0
// This triggers too many btrfs bugs. I wish I was kidding.
// Forget snapshots, balance, compression, and dedup:
// the system call you have to fear on btrfs is fsync().
BEESNOTE("fsyncing " << tmpname << " in " << name_fd(m_dir_fd)); BEESNOTE("fsyncing " << tmpname << " in " << name_fd(m_dir_fd));
DIE_IF_NON_ZERO(fsync(ofd)); DIE_IF_NON_ZERO(fsync(ofd));
#endif
} }
BEESNOTE("renaming " << tmpname << " to " << m_name << " in FD " << name_fd(m_dir_fd)); BEESNOTE("renaming " << tmpname << " to " << m_name << " in FD " << name_fd(m_dir_fd));
BEESTRACE("renaming " << tmpname << " to " << m_name << " in FD " << name_fd(m_dir_fd)); BEESTRACE("renaming " << tmpname << " to " << m_name << " in FD " << name_fd(m_dir_fd));
@@ -504,13 +489,8 @@ BeesTempFile::make_copy(const BeesFileRange &src)
THROW_CHECK1(invalid_argument, src, src.size() > 0); THROW_CHECK1(invalid_argument, src, src.size() > 0);
// FIEMAP used to give us garbage data, e.g. distinct adjacent // FIXME: don't know where these come from, but we can't handle them.
// extents merged into a single entry in the FIEMAP output. // Grab a trace for the log.
// FIEMAP didn't stop giving us garbage data, we just stopped
// using FIEMAP.
// We shouldn't get absurdly large extents any more; however,
// it's still a problem if we do, so bail out and leave a trace
// in the log.
THROW_CHECK1(invalid_argument, src, src.size() < BLOCK_SIZE_MAX_TEMP_FILE); THROW_CHECK1(invalid_argument, src, src.size() < BLOCK_SIZE_MAX_TEMP_FILE);
realign(); realign();
@@ -568,7 +548,7 @@ bees_main(ArgList args)
list<shared_ptr<BeesContext>> all_contexts; list<shared_ptr<BeesContext>> all_contexts;
shared_ptr<BeesContext> bc; shared_ptr<BeesContext> bc;
// Create a context and start crawlers // Subscribe to fanotify events
bool did_subscription = false; bool did_subscription = false;
for (string arg : args) { for (string arg : args) {
catch_all([&]() { catch_all([&]() {
@@ -596,8 +576,6 @@ bees_main(ArgList args)
int int
main(int argc, const char **argv) main(int argc, const char **argv)
{ {
cerr << "bees version " << BEES_VERSION << endl;
if (argc < 2) { if (argc < 2) {
do_cmd_help(argv); do_cmd_help(argv);
return 2; return 2;

View File

@@ -136,8 +136,6 @@ const int FLAGS_OPEN_FANOTIFY = O_RDWR | O_NOATIME | O_CLOEXEC | O_LARGEFILE;
} \ } \
} while (0) } while (0)
#define BEESLOGNOTE(x) BEESLOG(x); BEESNOTE(x)
#define BEESCOUNT(stat) do { \ #define BEESCOUNT(stat) do { \
BeesStats::s_global.add_count(#stat); \ BeesStats::s_global.add_count(#stat); \
} while (0) } while (0)
@@ -376,8 +374,6 @@ public:
BeesStringFile(Fd dir_fd, string name, size_t limit = 1024 * 1024); BeesStringFile(Fd dir_fd, string name, size_t limit = 1024 * 1024);
string read(); string read();
void write(string contents); void write(string contents);
void name(const string &new_name);
string name() const;
}; };
class BeesHashTable { class BeesHashTable {
@@ -411,7 +407,7 @@ public:
uint8_t p_byte[BLOCK_SIZE_HASHTAB_EXTENT]; uint8_t p_byte[BLOCK_SIZE_HASHTAB_EXTENT];
} __attribute__((packed)); } __attribute__((packed));
BeesHashTable(shared_ptr<BeesContext> ctx, string filename, off_t size = BLOCK_SIZE_HASHTAB_EXTENT); BeesHashTable(shared_ptr<BeesContext> ctx, string filename);
~BeesHashTable(); ~BeesHashTable();
vector<Cell> find_cell(HashType hash); vector<Cell> find_cell(HashType hash);
@@ -419,6 +415,8 @@ public:
void erase_hash_addr(HashType hash, AddrType addr); void erase_hash_addr(HashType hash, AddrType addr);
bool push_front_hash_addr(HashType hash, AddrType addr); bool push_front_hash_addr(HashType hash, AddrType addr);
void set_shared(bool shared);
private: private:
string m_filename; string m_filename;
Fd m_fd; Fd m_fd;
@@ -454,7 +452,8 @@ private:
LockSet<uint64_t> m_extent_lock_set; LockSet<uint64_t> m_extent_lock_set;
void open_file(); DefaultBool m_shared;
void writeback_loop(); void writeback_loop();
void prefetch_loop(); void prefetch_loop();
void try_mmap_flags(int flags); void try_mmap_flags(int flags);
@@ -465,6 +464,8 @@ private:
void flush_dirty_extents(); void flush_dirty_extents();
bool is_toxic_hash(HashType h) const; bool is_toxic_hash(HashType h) const;
bool using_shared_map() const { return false; }
BeesHashTable(const BeesHashTable &) = delete; BeesHashTable(const BeesHashTable &) = delete;
BeesHashTable &operator=(const BeesHashTable &) = delete; BeesHashTable &operator=(const BeesHashTable &) = delete;
}; };
@@ -713,7 +714,7 @@ public:
void set_root_path(string path); void set_root_path(string path);
Fd root_fd() const { return m_root_fd; } Fd root_fd() const { return m_root_fd; }
Fd home_fd(); Fd home_fd() const { return m_home_fd; }
string root_path() const { return m_root_path; } string root_path() const { return m_root_path; }
string root_uuid() const { return m_root_uuid; } string root_uuid() const { return m_root_uuid; }

91
src/fanotify-watch.cc Normal file
View File

@@ -0,0 +1,91 @@
#include <crucible/error.h>
#include <crucible/fd.h>
#include <crucible/ntoa.h>
#include <iostream>
#include <iomanip>
#include <sstream>
#include <string>
#include <unistd.h>
#include <sys/fanotify.h>
using namespace crucible;
using namespace std;
static
void
usage(const char *name)
{
cerr << "Usage: " << name << " directory" << endl;
cerr << "Reports fanotify events from directory" << endl;
}
struct fan_read_block {
struct fanotify_event_metadata fem;
// more here in the future. Maybe.
};
static inline
string
fan_flag_ntoa(uint64_t ui)
{
static const bits_ntoa_table flag_names[] = {
NTOA_TABLE_ENTRY_BITS(FAN_ACCESS),
NTOA_TABLE_ENTRY_BITS(FAN_OPEN),
NTOA_TABLE_ENTRY_BITS(FAN_MODIFY),
NTOA_TABLE_ENTRY_BITS(FAN_CLOSE),
NTOA_TABLE_ENTRY_BITS(FAN_CLOSE_WRITE),
NTOA_TABLE_ENTRY_BITS(FAN_CLOSE_NOWRITE),
NTOA_TABLE_ENTRY_BITS(FAN_Q_OVERFLOW),
NTOA_TABLE_ENTRY_BITS(FAN_ACCESS_PERM),
NTOA_TABLE_ENTRY_BITS(FAN_OPEN_PERM),
NTOA_TABLE_ENTRY_END()
};
return bits_ntoa(ui, flag_names);
}
int
main(int argc, char **argv)
{
if (argc < 1) {
usage(argv[0]);
exit(EXIT_FAILURE);
}
Fd fd;
DIE_IF_MINUS_ONE(fd = fanotify_init(FAN_CLASS_NOTIF, O_RDONLY | O_LARGEFILE | O_CLOEXEC | O_NOATIME));
for (char **argvp = argv + 1; *argvp; ++argvp) {
cerr << "fanotify_mark(" << *argvp << ")..." << flush;
DIE_IF_MINUS_ONE(fanotify_mark(fd, FAN_MARK_ADD | FAN_MARK_MOUNT, FAN_CLOSE_WRITE | FAN_CLOSE_NOWRITE | FAN_OPEN, FAN_NOFD, *argvp));
cerr << endl;
}
while (1) {
struct fan_read_block frb;
read_or_die(fd, frb);
#if 0
cout << "event_len\t= " << frb.fem.event_len << endl;
cout << "vers\t= " << static_cast<int>(frb.fem.vers) << endl;
cout << "reserved\t= " << static_cast<int>(frb.fem.reserved) << endl;
cout << "metadata_len\t= " << frb.fem.metadata_len << endl;
cout << "mask\t= " << hex << frb.fem.mask << dec << "\t" << fan_flag_ntoa(frb.fem.mask) << endl;
cout << "fd\t= " << frb.fem.fd << endl;
cout << "pid\t= " << frb.fem.pid << endl;
#endif
cout << "flags " << fan_flag_ntoa(frb.fem.mask) << " pid " << frb.fem.pid << ' ' << flush;
Fd event_fd(frb.fem.fd);
ostringstream oss;
oss << "/proc/self/fd/" << event_fd;
cout << "file " << readlink_or_die(oss.str()) << endl;
// cout << endl;
}
return EXIT_SUCCESS;
}

View File

@@ -5,6 +5,18 @@
using namespace crucible; using namespace crucible;
static
void
test_getcrc64_strings()
{
assert(Digest::CRC::crc64("John") == 5942451273432301568);
assert(Digest::CRC::crc64("Paul") == 5838402100630913024);
assert(Digest::CRC::crc64("George") == 6714394476893704192);
assert(Digest::CRC::crc64("Ringo") == 6038837226071130112);
assert(Digest::CRC::crc64("") == 0);
assert(Digest::CRC::crc64("\377\277\300\200") == 15615382887346470912ULL);
}
static static
void void
test_getcrc64_byte_arrays() test_getcrc64_byte_arrays()
@@ -20,6 +32,7 @@ test_getcrc64_byte_arrays()
int int
main(int, char**) main(int, char**)
{ {
RUN_A_TEST(test_getcrc64_strings());
RUN_A_TEST(test_getcrc64_byte_arrays()); RUN_A_TEST(test_getcrc64_byte_arrays());
exit(EXIT_SUCCESS); exit(EXIT_SUCCESS);

View File

@@ -141,13 +141,7 @@ test_cast_0x80000000_to_things()
SHOULD_FAIL(ranged_cast<unsigned short>(uv)); SHOULD_FAIL(ranged_cast<unsigned short>(uv));
SHOULD_FAIL(ranged_cast<unsigned char>(uv)); SHOULD_FAIL(ranged_cast<unsigned char>(uv));
SHOULD_PASS(ranged_cast<signed long long>(sv), sv); SHOULD_PASS(ranged_cast<signed long long>(sv), sv);
if (sizeof(long) == 4) { SHOULD_PASS(ranged_cast<signed long>(sv), sv);
SHOULD_FAIL(ranged_cast<signed long>(sv));
} else if (sizeof(long) == 8) {
SHOULD_PASS(ranged_cast<signed long>(sv), sv);
} else {
assert(!"unhandled case, please add code for long here");
}
SHOULD_FAIL(ranged_cast<signed short>(sv)); SHOULD_FAIL(ranged_cast<signed short>(sv));
SHOULD_FAIL(ranged_cast<signed char>(sv)); SHOULD_FAIL(ranged_cast<signed char>(sv));
if (sizeof(int) == 4) { if (sizeof(int) == 4) {
@@ -155,7 +149,7 @@ test_cast_0x80000000_to_things()
} else if (sizeof(int) == 8) { } else if (sizeof(int) == 8) {
SHOULD_PASS(ranged_cast<signed int>(sv), sv); SHOULD_PASS(ranged_cast<signed int>(sv), sv);
} else { } else {
assert(!"unhandled case, please add code for int here"); assert(!"unhandled case, please add code here");
} }
} }
@@ -180,13 +174,7 @@ test_cast_0xffffffff_to_things()
SHOULD_FAIL(ranged_cast<unsigned short>(uv)); SHOULD_FAIL(ranged_cast<unsigned short>(uv));
SHOULD_FAIL(ranged_cast<unsigned char>(uv)); SHOULD_FAIL(ranged_cast<unsigned char>(uv));
SHOULD_PASS(ranged_cast<signed long long>(sv), sv); SHOULD_PASS(ranged_cast<signed long long>(sv), sv);
if (sizeof(long) == 4) { SHOULD_PASS(ranged_cast<signed long>(sv), sv);
SHOULD_FAIL(ranged_cast<signed long>(sv));
} else if (sizeof(long) == 8) {
SHOULD_PASS(ranged_cast<signed long>(sv), sv);
} else {
assert(!"unhandled case, please add code for long here");
}
SHOULD_FAIL(ranged_cast<signed short>(sv)); SHOULD_FAIL(ranged_cast<signed short>(sv));
SHOULD_FAIL(ranged_cast<signed char>(sv)); SHOULD_FAIL(ranged_cast<signed char>(sv));
if (sizeof(int) == 4) { if (sizeof(int) == 4) {
@@ -194,7 +182,7 @@ test_cast_0xffffffff_to_things()
} else if (sizeof(int) == 8) { } else if (sizeof(int) == 8) {
SHOULD_PASS(ranged_cast<signed int>(sv), sv); SHOULD_PASS(ranged_cast<signed int>(sv), sv);
} else { } else {
assert(!"unhandled case, please add code for int here"); assert(!"unhandled case, please add code here");
} }
} }