I’ve written a few scripts to create an initrd from Alpine Linux
packages (based on my experience working for the
FRμIT testbed project), with a custom
init script that installs the system onto a (virtual) hard disk and
configures it for booting.
The second-stage init is just syndicate-server, configured to start system services that
self-assemble into
the correct startup ordering as a result of monitoring their
dependencies using assertions in the dataspace.
I’ve also been studying the various pieces of existing system layers
(like s6, systemd, SysV init, inetd and so on) to try to map the
design landscape a bit.
In addition, I’ve explored use of PF_NETLINK sockets to subscribe to
kernel-originated device plug-and-play events (like udev does). It
seems straightforward.
The things I’ve written as a result of all this aren’t quite ready for
public viewing yet. I’ll post again when the repository is available.
Implementing a quality message broker involves quite a bit of
subtlety. Dealing with high-speed message publication, where either
the server, downstream consumers, or both can become overloaded can be
especially challenging.
These were problems that faced me when I was putting together the
first versions of the RabbitMQ server, and the same issues crop up in
servers that speak the Syndicate network protocol, too.
I recently came up with a really interesting and apparently effective
approach to internal flow control while working on
syndicate-server.
The idea is to associate all activity with an account that records
how much outstanding work the server has left to do to fully complete
processing of that activity.
Actors reading from e.g. network sockets call an
ensure_clear_funds
function on their associated
Accounts.
This will suspend the reader until enough of the account’s “debt”
has been “cleared”. This will in turn cause the TCP socket’s read
buffers to fill up and the TCP window to close, which will throttle
upstream senders.
Every time an actor (in particular, actors triggered by data from
sockets) sends an event to some other actor, a
LoanedItem
is constructed which “borrows” some credit from the sending actor’s
nominated account.
Every time a LoanedItem is completely processed (in the Rust
implementation, when it is
dropped),
its cost is “repaid” to its associated account.
But crucially, when an actor is responding to an event by
sending more events, it charges the sent events to the account
associated with the triggering event. This lets the server
automatically account for fan-out of events.1
Example. Imagine an actor A receiving publications from a TCP/IP
socket. If it ever “owes” more than, say, 5 units of cost on its
account, it stops reading from its socket until its debt decreases.
Each message it forwards on to another actor costs it 1 unit. Say a
given incoming message M is routed to a dataspace actor D (thereby
charging A’s account 1 unit), where it results in nine outbound
events M′ to peer actors O1···O9.
Then, when D receives M, 1 unit is repaid to A’s account. When
D sends M′ on to each of O1···O9, 1 unit is charged to A’s
account, resulting in a total of 9 units charged. At this point in
time, A’s account has had net +1−1+9 = 9 units withdrawn from it as
a result of M’s processing.
Imagine now that all of O1···O9 are busy with other work. Then,
next time around A’s main loop, A notices that its outstanding
debt is higher than its configured threshold, and stops reading from
its socket. As each of O1···O9 eventually gets around to
processing its copy of M′, it repays the associated 1 unit to A’s
account.2 Eventually, A’s account drops
below the threshold, A is woken up, and it resumes reading from its
socket. □
Anecdotally, this appears to work well. Experimenting with producers
sending as quickly as they can, producers are throttled by the server,
and the server seems stable even though its consumers are not able to
keep up with the unthrottled send rate of each producer.
It’ll be interesting to see how it does with other workloads!
A corollary to this is that, for each event internal to
the system, you can potentially identify the “ultimate cause” of
the event: namely, the actor owning the associated account. ↩
Of course, if O1, say, sends more
events internally as a result of receiving M′, more units will
be charged to A’s account! ↩
Because there are a number of different Syndicate implementations, and they all need to interoperate, I’ve used
Preserves Schema1 to define message formats
that all the implementations can share.
You can find the common protocol definitions in a new git repository:
I used
git-subtree
to carve out a
portion of the novy-syndicate source tree
to form the new repository. It seems to be working well. In the
projects that depend on the protocol definitions, I run the following
every now and then to sync up with the syndicate-protocols
repository:
1
2
3
4
git subtree pull -P protocols \
-m 'Merge latest changes from the syndicate-protocols repository' \
git@git.syndicate-lang.org:syndicate-lang/syndicate-protocols \
main
Previously, the
mini-syndicate
package for Python 3 implemented an older version of the Syndicate
network protocol.
A couple of weeks ago, I dusted it off, updated it to the new
capability-oriented Syndicate protocol, and fleshed out its nascent
Syndicated Actor Model code to be a full implementation of the model,
including capabilities, object references, actors, facets, assertions
and so on.
The new implementation makes heavy use of Python decorators to work
around Python’s limited lambda forms and its poor support for
syntactic extensibility. The result is surprisingly not terrible!
The revised codebase is different enough to the previous one that it
deserves its own new git repository:
It’s also available on pypi.org, as package
syndicate-py.
Updated Preserves for Python
As part of the work, I updated the Python Preserves implementation
(for both python 2 and python 3) to include the text-based Preserves
syntax as well as the binary syntax, plus implementations of Preserves
Schema and Preserves Path. Version 0.7.0 or newer of the
preserves package on pypi.org
has the new features.
For my system layer project, I need a
fast, low-RAM-usage, flexible, extensible daemon that speaks the
Syndicate network protocol and
exposes dataspace and system management services to other processes.
It uses the Syndicated Actor Model internally to structure its
concurrent activities. The actor model implementation is split out
into a crate of its own,
syndicate, that can be used by
other programs. There is also a crate of macros,
syndicate-macros, that
makes working with dataspace patterns over Preserves values a bit
easier.1
A gatekeeper that resolves long-lived,
“sturdy” references
like <ref "syndicate" [] #[acowDB2/oI+6aSEC3YIxGg==]>
to on-the-wire usable object capability references;
An inotify-based configuration watcher service that monitors a
directory full of Preserves text files for changes;
Future work for syndicate-macros is to
add syntactic constructs for easily establishing handlers for
responding to assertions and messages, cutting out the
boilerplate, in the same way that
Racket’s syndicate macros
do. In particular, having a during! macro would be very useful. ↩
As part of my other implementation efforts, I made enough improvements
to the Rust Preserves implementation to warrant releasing version
1.0.0.
This release supports the Preserves data model and the binary and text
codecs. It also includes a Preserves Schema compiler for Rust (for use
e.g. in build.rs) and an implementation of Preserves Path.
There are four crates:
preserves, Rust
representations of
Values
plus utilities and binary and text codecs;
At the beginning of August, I designed a query language for Preserves
documents inspired by
XPath.
Here’s the draft specification.
To give just a taste of the language, here are a couple of example
selectors:
🙂 Really! The Syndicate network protocol is simple, easily within reach of a
shell script plus e.g. netcat or openssl.
Here’s the code so far.
The heart of it is about 90 SLOC, showing how easy it can be to
interoperate with a Syndicate ecosystem.
Instructions are in
the README,
if you’d like to try it out!
The code so far contains functions to interact with a Syndicate server
along with a small demo, which implements an interactive chat program
with presence notifications.
Because Syndicate’s network protocol is polyglot, and the bash demo
uses
generic chat assertions,
the demo automatically interoperates with other implementations, such
as the analogous python chat demo.
The next step would be to factor out the protocol implementation from
the demo and come up with a simple make install step to make it
available for system scripting.
I actually have a real use for this: it’ll be convenient for
implementing simple system monitoring services as part of a
Syndicate-based system layer. Little bash
script services could easily publish the battery charge level, screen
brightness level, WiFi connection status, etc. etc., by reading files
from /sys and publishing them to the system-wide Syndicate server
using this library.
One promising application of dataspaces is dependency tracking for
orderly service startup.
The problem of service startup appears at all scales. It could be
services cooperating within a single program and process; services
cooperating as separate processes on a single machine; containers
running in a distributed system; or some combination of them all.
Syndicate programs are composed of multiple services running together,
with dependencies on each other, so it makes sense to express service
dependency tracking and startup within the programming language.
In the following, I’ll sketch service dependency support for
cooperating modules within a single program and process. The same
pattern can be used in larger systems; the only essential differences
are the service names and the procedures for loading and starting
services.
A scenario
Let’s imagine we have the following situation:
A program we are writing depends on the “tcp” service, which in turn
depends on the “stream” service. Separately, the top-level program
depends on the “timer” service.
An asserted RequireService record indicates demand for a running
instance of the named service; an asserted ServiceRunning record
indicates presence of the same; and interest in a ServiceRunningimplies assertion of a RequireService.
A library “service manager” process, started alongside the top level
program, translates observed interest in ServiceRunning into
RequireService, and then translates observed RequireService
assertions into service startup actions and provision of matching
ServiceRunning assertions.
1
2
3
4
5
6
(during(Observe(:pattern(ServiceRunning,(DLit$service-name)))_)(assert(RequireServiceservice-name)))(during/spawn(RequireService$service-name);; ... code to load and start the named service ...)
Putting these pieces together, we can write a program that waits for a
service called 'some-service-name to be running as follows:
1
(during(ServiceRunning'some-service-name)...)
When the service appears, the facet in the ellipsis will be started,
and if the service crashes, the facet will be stopped (and restarted
if the service is restarted).
Services can wait for their own dependencies, of course. This
automatically gives a topologically sorted startup order.
Modules as services, and macros for declaring dependencies
First, services can be required using a with-services macro:
1
2
3
4
(with-services[syndicate/drivers/tcpsyndicate/drivers/timer];; ... body expressions ...)
Second, each Racket module can offer a service named after the
module by using a provide-service macro at module toplevel. For
example, in the syndicate/drivers/tcp Racket module, we find the
following form:
1
2
3
4
5
(provide-service[ds](with-services[syndicate/drivers/stream](atds;; ... set up tcp driver subscriptions ...)))
Finally, the main entry point to a Syndicate/rkt program can use a
standard-actor-system macro to arrange for the startup of the
“service manager” process and a few of the most frequently-used
library services:
1
2
3
4
(standard-actor-system[ds];; ... code making use of a pre-made dataspace (ds) and;; preloaded standard services ...)
This past week I have been dusting off my old implementation of the SSH protocol in order to
exercise the new Syndicated Actor design and implementations. You can
find the new SSH code here.
(You’ll need the latest Syndicate/rkt
implementation to run it.)
The old SSH code used a 2014 dataspace-like language dialect called
“Marketplace”, which shared some concepts with modern Syndicate but
relied much more heavily on a pure-functional programming style within
each actor. The new code uses mutability within each actor where it
makes sense, and takes advantage of Syndicate DSL features like facets
and dataflow variables that I didn’t have back in 2014 to make the
code more modular and easier to read and work with.
plus a couple of small Syndicate/rkt syntax changes (renaming when
to on, making the notion of “event expander” actually useful, and a
new once form for convenient definition of state-machine-like
behaviour).