1
0
mirror of https://github.com/Zygo/bees.git synced 2025-05-17 21:35:45 +02:00

38 Commits

Author SHA1 Message Date
Zygo Blaxell
c53fa04a2f task: fixes for priority and idle Tasks
Tasks are not allowed to be queued more than once, but it is allowed
to queue a Task while it's already running, which means a Task can be
executed on two threads in parallel.  Tasks detect this and handle it
by queueing the Task on its own post-exec queue.  That in turn leads
to Workers which continually execute the same Task if that Task doesn't
create any new Tasks, while other Tasks sit on the Master queue waiting
for a Worker to dequeue them.

For idle Tasks, we don't want the Task to be rescheduled immediately.
We want the idle Task to execute again after every available Task on
both the main and idle queues has been executed.

Fix these by having each Task reschedule itself on the appropriate
queue when it finishes executing.

Priority queued Tasks should executed in priority order not just one
Task's post-exec queue, but the entire local queue of the TaskConsumer.

Fix this by moving the sort into either the TaskConsumer that receives
a post-exec queue, if there is one, or into the Task that is created
to insert the post-exec queue into a TaskConsumer when one becomes
available in the future.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-15 00:43:25 -05:00
Zygo Blaxell
a819d623f7 task: do not allow queue loops in priority queueing mode
Tasks using non-priority FIFO dependency tracking can insert themselves
into their own queue, to run the Task again immediately after it exits.

For priority queues, this attempts to splice the post-exec queue into
itself, which doesn't seem like a good idea.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-12 15:28:26 -05:00
Zygo Blaxell
de9d72da80 task: flatten queues of dependent Tasks
Suppose Task A, B, and C are created in that order, and currently running.
Task T acquires Exclusion E.  Task B, A, and C attempt to acquire the
same Exclusion, in that order, but fail because Task T holds it.

The result is Task T with a post-exec queue:

        T, [ B, A, C ]  sort_requested

Now suppose Task U acquires Exclusion F, then Task T attempts to acquire
Exclusion F.  Task T fails to acquire F, so T is inserted into U's
post-exec queue.  The result at the end of the execution of T is a tree:

        U, [ T ]  sort_requested
             \-> [ B, A, C ] sort_requested

Task T exits after failing to acquire a lock.  When T exits, T will
sort its post-exec queue and submit the post-exec queue for execution
immediately:

        Worker 1: U, [ T ]  sort_requested
        Worker 2: A, B, C

This isn't ideal because T, A, B, and C all depend on at least one
common Exclusion, so they are likely to immediately conflict with T
when U exits and T runs again.

Ideally, A, B, and C would at least remain in a common queue with T,
and ideally that queue is sorted.

Instead of inserting T into U's post-exec queue, insert T and all
of T's post-exec queue, which creates a single flattened Task list:

        U, [ T, B, A, C ]   sort_requested

Then when U exits, it will sort [ T, B, A, C ] into [ A, B, C, T ],
and run all of the queued Tasks in age priority order:

        U exited, [ T, B, A, C ]   sort_requested

        U exited, [ A, B, C, T ]

        [ A, B, C, T ] on TaskConsumer queue

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-12 14:05:44 -05:00
Zygo Blaxell
74d8bdd60f task: add an insert method for priority-queueing Tasks by age
Task started out as a self-organizing parallel-make algorithm, but ended
up becoming a half-broken wait-die algorithm.  When a contended object
is already locked, Tasks enter a FIFO queue to restart and acquire the
lock.  This is the "die" part of wait-die (all locks on an Exclusion are
non-blocking, so no Task ever does "wait").  The lock queue is FIFO wrt
_lock acquisition order_, not _Task age_ as required by the wait-die
algorithm.

Make it a 25%-broken wait-die algorithm by sorting the Tasks on lock
queues in order of Task ID, i.e. oldest-first, or FIFO wrt Task age.
This ensures the oldest Task waiting for an object is the one to get
it when it becomes available, as expected from the wait-die algorithm.

This should reduce the amount of time Tasks spend on the execution queue,
and reduce memory usage by avoiding the accumulation of Tasks that cannot
make forward progress.

Note that turning `TaskQueue` into an ordered container would have
undesirable side-effects:

 * `std::list` has some useful properties wrt stability of object
 location and cost of splicing.  Other containers may not have these,
 and `std::list` does have a `sort` method.

 * Some Task objects are created at the beginning and reused continually,
 but we really do want those Tasks to be executed in FIFO order wrt
 submission, not Task ID.  We can exclude these tasks by only doing the
 sorting when a Task is queued for an Exclusin object.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-12 00:35:37 -05:00
Zygo Blaxell
8bc90b743b task: get rid of the insert_task method
Nothing calls it (not even tests), and there's significant functional
overlap with `try_lock`.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2025-01-11 23:39:55 -05:00
Zygo Blaxell
9beb602b16 task: ignore paused status while calculating dynamic thread count
bees might be unpaused at any time, so make sure that the dynamic load
calculation is ready with a non-zero thread count.

This avoids a delay of up to 5 seconds when responding to SIGUSR2
when loadavg tracking is enabled.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-12 23:10:15 -05:00
Zygo Blaxell
1cbc894e6f task: start up more worker threads when unpausing
When paused, TaskConsumer threads will eventually notice the paused
condition and exit; however, there's nothing to restart threads when
exiting the paused state.

When unpausing, and while the lock is already held, create TaskConsumer
threads as needed to reach the target thread count.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-12-12 22:53:00 -05:00
Zygo Blaxell
b99d80b40f task: add an idle queue
Add a second level queue which is only serviced when the local and global
queues are empty.

At some point there might be a need to implement a full priority queue,
but for now two classes are sufficient.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2024-11-30 23:30:33 -05:00
Zygo Blaxell
563e584da4 task: use pthread_setname_np correctly
It turns out I've been using pthread_setname_np wrong the whole time:

 * on Linux, the thread name length is 15 characters.
   TASK_COMM_LEN is 16 bytes, and the last one is always 0.
   This is now hardcoded in many places and cannot be changed.

 * pthread_setname_np doesn't return -errno, so DIE_IF_MINUS_ERRNO
   was the wrong macro.  On the other hand, we never want to do anything
   differently when pthread_setname_np fails, so we never needed to
   check the return value.

Also, libc silently ignores attempts to set the thread name when it is too
long.  That's almost certainly a libc bug, but libc probably suppresses
the error result for the same reasons I ignore the error result.

Wrap the pthread_setname function with a C++ std::string overload that
truncates the argument at 15 characters, so we at least get the first
part of the task name in the thread name field.  Later commits can deal
with making the bees thread names shorter.

Also wrap pthread_getname for symmetry.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2023-01-05 01:10:17 -05:00
Zygo Blaxell
b143664747 task: use exponential backoff algorithm to set thread count
Tasks are often running longer than 5 seconds (especially extents with
multiple references requiring copy operations), so the load tracking
algorithm needs to average several samples over a longer period of time
than 5 seconds.  If the sample period is 60 seconds, we end up recomputing
the original load average from current_load, so skip the rounding error
and use the original load average value.

Arguably the real fix is to break up the more complex extent operations
over several downstream Task objects, but that's a more significant
design change.

Tweak the attack and decay rates so that threads are started a little
more slowly, but still stopped rapidly when load spikes up.

Remove the hysteresis to provide support for load average targets
below 1, or with fractional components, with a PWM-like effect.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:57 -05:00
Zygo Blaxell
a85ada3a49 task: export load tracking statistics
Provide an interface so that programs can monitor the Task load
average calculations.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:57 -05:00
Zygo Blaxell
46a38fe016 task: rescue post-exec queue on Task destruction
task1.append(task2) is supposed to run task2 after task1 is executed;
however, if task1 was just executed, and its last reference was owned by
a TaskConsumer, then task2 will be appended to a Task that will never
run again.

A similar problem arises in Exclusion, which can cause blocked tasks
to occasionally be dropped without executing them.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:57 -05:00
Zygo Blaxell
2aafa802a9 task: increase saved thread name length to 64
24 bytes seems a little low.  64 is a rounder (and more square) number.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:57 -05:00
Zygo Blaxell
cdef59e2f3 task: add more Doxygen comments for PairLock
I need to remind myself why it's there, and not just std::lock.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:57 -05:00
Zygo Blaxell
dc2dc8d08a task: delete the queue after deleting all of its children
This was resulting in an assertion failure later on if a queue was
being rescued from a deleted task with only one post-exec queue.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:57 -05:00
Zygo Blaxell
7873988dac task: add a pause() method as an alternative to cancel()
pause(true) stops the TaskMaster from processing any more Tasks,
but does not destroy any queued Tasks.

pause(false) re-enables Task processing.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:57 -05:00
Zygo Blaxell
3f740d6b2d task: simplify clear_queue
Simplify the loop in clear_queue because we can't be modifying a
queue while we are clearing it.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:56 -05:00
Zygo Blaxell
c0a7533dd4 task: use const for current_consumer
The const version of this code has much more testing, but any
effect at run time is unlikely.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:56 -05:00
Zygo Blaxell
090fa39995 task: don't hold the mutex while disposing of pending Tasks
In the event that someday Barrier allows users to force execution of
its pending tasks prior to the destruction of the BarrierState object,
we'll be ready to submit those Tasks for execution without waiting for
the BarrierState mutex lock.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:56 -05:00
Zygo Blaxell
2f25f89067 task: get rid of separate Exclusion and ExclusionState
Exclusion was generating a new Task every time a lock was contended.
That results in thousands of empty Task objects which contain a single
Task item.

Get rid of ExclusionState.  Exclusion is now a simple weak_ptr to a Task.
If the weak_ptr is expired, the Exclusion is unlocked.  If the weak_ptr
is not expired, it points to the Task which owns the Exclusion.

try_lock now appends the Task attempting to lock the Exclusion directly
to the owning Task, eliminating the need for Exclusion to have one.
This also removes the need to call insert_task separately, though
insert_task remains for other use cases.

With no ExclusionState there is no need for a string argument to
Exclusion's constructor, so get rid of that too.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:56 -05:00
Zygo Blaxell
7fdb87143c task: get rid of the separate Barrier and BarrierLock
Make one class Barrier which is copiable, so we don't have to
have users making shared Barrier all the time.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2022-12-20 20:50:55 -05:00
Zygo Blaxell
0103a04ca0 task: concurrency cleanups
Update thread_local task state pointers while locked.  This avoids
potential concurrent access of the pointers while making copies of them.

Verify that the queue is really empty after splicing lists, and the
current consumer is really gone after swapping the empty one.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-11-29 21:27:48 -05:00
Zygo Blaxell
5e346beb2d task: delete the move constructor for TaskState
Move-constructing isn't good for that class either.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-11-29 21:27:48 -05:00
Zygo Blaxell
b828f14dd1 task: optimize for common case of single following Task
If there is only one Task in the post exec queue, we can
simply insert that Task instead of creating a task to hold
a post exec queue of one item.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-11-29 21:27:48 -05:00
Zygo Blaxell
955b8ae459 task: set the name of consumer threads so it is not "load_tracker"
The default name of a newly constructed thread is apparently the name
of the thread that created it.  That's very misleading when there are
a lot of TaskConsumer threads and they have nothing to do, so set the
name of each TaskConsumer thread as soon as it is created.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 21:02:00 -04:00
Zygo Blaxell
5f763f6d41 task: handle thread lifecycle more strictly
Testing sometimes crashes during exec of the first Task object, which
triggers construction of TaskConsumer threads.  Manage the life cycle
of the thread more strictly--don't access any methods of TaskConsumer
or std::thread until the constructor's caller's lock on TaskMaster
is released.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:49:15 -04:00
Zygo Blaxell
0928362aab task: replace waiting state with run/exec counter
Task::run() would schedule a new execution of Task, unless it was waiting
on a queue for execution.  This cannot be implemented with a bool,
since a Task might be included in multiple queues, and should still be
in waiting state even when executed in that case.

Replace the bool with a counter.  run() and append() (but not
append_nolock) increment the counter, exec() decrements the counter.
If the counter is non-zero when run() or append() is called, the Task
is not scheduled.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:49:15 -04:00
Zygo Blaxell
d5ff35eacf task: track number of Task objects in program and provide report
This is a simple lightweight counter that tracks the number of Task
objects that exist.  Useful for leak detection.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:49:15 -04:00
Zygo Blaxell
b7f9ce3f08 task: serialize Task execution when Tasks block due to mutex contention
Quite often we want to execute task B after task A finishes executing,
especially if tasks A and B attempt to acquire locks on the same objects.

Implement that capability in Task directly:  each Task holds a queue
of Tasks which will be executed strictly after this Task has finished
executing, or if the Task is destroyed.

Add a local queue to each TaskConsumer.  This queue contains a list
of Tasks which are to be executed by a single thread in sequential
order.  These tasks are executed before fetching any tasks from
TaskMaster.

Each time a Task finishes executing, the list of tasks appended to the
recently executed Task are spliced at the beginning of the thread's
TaskConsumer local queue.  These tasks will be executed in the same
thread in the same order they were appended to the recently executed Task.

If a Task is destroyed with a post-execution queue, that queue is
also inserted at the front of the current TaskConsumer's local queue.

If a Task is destroyed or somehow executed outside of a TaskConsumer
thread, or a TaskConsumer thread is destroyed, the local queue of Tasks
is wrapped in a "rescue_task" Task, and spliced before the head of the
global queue.  This preserves the sequential ordering of tasks.

In all cases the order of sequential execution of Tasks that are
appended to another Task is preserved.

The unused queue insertion functions are removed.

Exclusion is now simply a mutex, a bool, and a Task with an empty
function.  Tasks that queue up waiting for the mutex are stored in
Exclusion's Task, and Exclusion simply runs that task when the
ExclusionState is released.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2021-06-11 20:49:15 -04:00
Zygo Blaxell
978c577412 status: report number of active worker threads in status output
This is especially useful when dynamic load management allocates more
worker threads than active tasks, so the extra threads are effectively
invisible.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2019-01-07 22:52:12 -05:00
Zygo Blaxell
4021dd42ca task: queue and run exactly once per instance
Enable much simpler Task management:  each time a Task needs to be done
at least once in the future, simply invoke the run() method on the Task.
The Task will ensure that it only runs once, only appears in a queue
once, and will run again if a run request is made while the Task is
already running.

Make the queue policy a member of the Task rather than a method.  This
enables Tasks to reschedule themselves, possibly on the appropriate queue
if we have more than one of those some day.

This happens to make Tasks more similar to Linux kernel workers.
This similarity is coincidental, but not undesirable.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2019-01-07 22:48:15 -05:00
Zygo Blaxell
4e962172a7 task: add cancel method
Add a method to have TaskMaster discard any entries in its queue, terminate
all worker threads, and prevent any new Tasks from being queued.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-12-09 01:15:24 -05:00
Zygo Blaxell
9dbe2d6fee bees: add -G/--thread-min option for minimum thread count
The -g option limits the number of worker threads when the target load
average is exceeded.  On some systems the load normally runs high, and
continuous bees operation is required to avoid running out of disk space.

Add a -G/--thread-min option to force at least some threads to continue
running.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-09-14 23:50:07 -04:00
Zygo Blaxell
e66086516f bees: dynamic thread pool size based on system load average
Add -g / --loadavg-target parameter to track system load and add or
remove bees worker threads dynamically to keep system load close to the
loadavg target.  Thread count may vary from zero to the maximum
specified by -c or -C, and is adjusted every 5 seconds.

This is better than implementing a similar load average scheme from
outside of the process (though that is still possible) because the
in-process load tracker does not disrupt the performance timing feedback
mechanisms as a freezer cgroup or SIGSTOP would when controlling bees
from outside.  The internal load average tracker can also adjust the
number of active threads while an external tracker can only choose from
the maximum or zero.

Also fix a bug where a Task could deadlock waiting for itself to exit
if it tries to insert a new Task after the number of worker threads has
been set to zero.

Also correct usage message for --scan-mode (values are 0..2) since
we are touching adjacent lines anyway.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-09-14 23:50:03 -04:00
Zygo Blaxell
f64fc78e36 Task: convert print_fn to a string
Since we are now unconditionally rendering the print_fn as a static
string, there is no need for it to be a function.  We also need it to
be brief and mostly constant.

Use a string instead.  Put the string before the function in the Task
constructor arguments so that the title string appears as a heading in
code, since we are making a breaking API change already.

Drop TASK_MACRO as it is broken by this change, but there is no similar
usage of Task anywhere to make it worth fixing.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:48:04 -05:00
Zygo Blaxell
0710208354 BeesNote: thread naming fixes
Move pthread_setname_np to the same place we do pthread_getname_np.

Detect errors in pthread_getname_np--but don't throw an exception
because we would call ourself recursively from the exception handler
when it tries to log the exception.

Fix the order of set_name and the first BEESNOTE/BEESLOG call in threads,
closing small time intervals where logs have the wrong thread name,
and that wrong name becomes persistent for the thread.

Make the main thread's name "bees" because Linux kernel stack traces use
the pthread name of the main thread instead of the name of the process.

Anonymous threads get the process name (usually "bees").  We should not
have any such threads, but we do.  This appears to occur mostly during
exception stack unwinding.  GCC/pthread bug?

Fixes:  https://github.com/Zygo/bees/issues/51

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-26 23:47:47 -05:00
Zygo Blaxell
3f60a0efde task: allow external access to Task print function
This enables bees' thread introspection to use task descriptions in
status and log messages.

BeesNote will be calling Task::current_task() from non-Task contexts,
which means we need to allow Task's shared state pointer to be null.
Remove some asserts that will ruin our day in that case.

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-20 13:51:05 -05:00
Zygo Blaxell
8849e57bf0 crucible: add Task class
We need a mechanism for distributing work across processor cores and
disks.

Task implements a simple FIFO/LIFO queue model for executing closures.
Some locking primitives are included (mutex and barrier).

Signed-off-by: Zygo Blaxell <bees@furryterror.org>
2018-01-17 22:51:59 -05:00