Journal entries

An Atom feed Atom feed of these posts is also available.

Major progress on capability-based syndicate-rkt implementation

I’ve been working on the novy branch of syndicate-rkt (Update: this is now the main branch), following the new design I developed for the novy-syndicate TypeScript prototype, driving the design further and working out new syntax ideas.

Syndicate/rkt example

Here’s an example program, box-and-client.rkt in the new Syndicate/rkt language:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#lang syndicate

(message-struct set-box (new-value))
(assertion-struct box-state (value))

(module+ main
  (actor-system/dataspace (ds)
    (spawn #:name 'box
           (define-field current-value 0)
           (at ds
             (assert (box-state (current-value)))
             (on (message (set-box $new-value))
               (log-info "box: taking on new-value ~v" new-value)
               (current-value new-value)))
           (stop-on-true (= (current-value) 10)
             (log-info "box: terminating")))

    (spawn #:name 'client
           (at ds
             (stop-on (retracted (Observe (:pattern (set-box ,_)) _))
               (log-info "client: box has gone"))
             (on (asserted (box-state $v))
               (log-info "client: learned that box's value is now ~v" v)
               (send! ds (set-box (+ v 1))))
             (on (retracted (box-state _))
               (log-info "client: box state disappeared"))))))

The program consists of two actors, 'box and 'client. The box actor publishes the value of its current-value field, wrapped in a box-state record constructor, to the dataspace (line 11). It reacts to set-box messages sent by peers (lines 12–14); in this case, the client actor, which sends set-box to increment the value each time it learns of an updated value from the box (lines 22–24).

The box actor terminates once current-value reaches 10. The client notices the termination of the box actor in two ways (just to show them off): first, by noticing that the box-state record was unpublished from the dataspace (lines 25–26); and second, by noticing that all subscribers to set-box messages have vanished (lines 20–21).

What’s different? What’s new?

Explicit object references

The most notable change from previous dataspace programs is the explicit reference to the dataspace, ds. Assertions and subscriptions are now located at a specific (possibly remote) object, usually but not always a dataspace.

Capability-based security

Related is support for macaroon-style “sturdy references” (analogous to the SturdyRef concept from E). Here’s an example from a secure* chat demo app:

1
2
3
4
5
6
7
<ref "syndicate" [[<or [
  <rewrite <bind p <compound <rec Present 1> {0: <lit "tonyg">}>> <ref p>>,
  <rewrite <bind p <compound <rec Says 2> {
    0: <lit "tonyg">,
    1: String
  }>> <ref p>>
]>]] #[oHFy7B4NPVqhD6zJmNPbhg==]>

The oid ("syndicate" on line 1) identifies the target object. The patterns (lines 2–6) attenuate the authority of the capability to only permit transmission of Present and Says records. The signature (line 7) proves to the target object that the capability is genuine and untampered-with.

I’ve implemented most of the necessary plumbing for these, but have yet to complete the client/server portion of the system that actually makes use of them. For an example of their use, see novy-syndicate.

Schema support

Another interesting change is support for (the relatively new) Preserves Schema. You can use assertion-struct and message-struct as in previous dialects, or you can use Schema-defined types to establish subscriptions and place assertions with a peer.

Full pattern-matching dataspace implementation

Unlike the novy-syndicate prototype, this implementation is the first capability-based design to have a proper “skeleton”-based dataspace that supports the full range of dataspace patterns. This allows us to write, for example, subscriptions like

1
(on (retracted (box-state _)) ...)

which only fires when all box-state assertions are withdrawn.

Patterns over hash-tables

Previous implementations could only match fields in records (with constant labels) and elements of arrays/lists. This new implementation is also able to express and match patterns over named-key elements in Dictionary Values.

This lets actors express patterns over JSON-like Preserves documents, for example.

Pattern quasiquotation

One of the issues I hoped the new architecture would shed light on is pattern quotation. In order to express interest in interest expressed by some other party, you need to be able to describe the subscriptions that are of interest to you. That means you must be able to write patterns over patterns.

Previous implementations didn’t get this right. It was not possible to precisely express interest in subscriptions that bound (or did not bind) certain portions of their input; and it was not possible to precisely express the difference between being interested in a binding or binding a portion of the pattern to be matched itself.

The new design solves these issues with a quasiquote-like facility. Here’s a pattern that matches “subscriptions to unary set-box records”:

1
(Observe (:pattern (set-box ,_)) _)

The :pattern wrapper introduces a quoted pattern, and unquote-discard (“,_”) pops back out a level to say that we don’t care what the subscriber has put in their pattern at that position. For example, they may have elected to bind the value inside the set-box, or they may have elected to ignore it, or they may have elected to match only certain values of it, and so on. By discarding that portion of the pattern, we ignore the specific choice the matching subscriber made.

If instead we use unquote-bind (“,$id”), we extract a portion of the pattern each subscriber placed in the dataspace:

1
(Observe (:pattern (set-box ,$value-pat)) _)

For example, if some subscriber is binding the value in the set-box to an identifier new-value, but otherwise placing no constraints on it, we will be given the following value for value-pat:

1
<bind new-value <_>>

If, on the other hand, a subscriber is completely ignoring the value in the set-box, caring only about the set-box wrapper itself, we will be given <_>, the “discard” pattern.

Crucially, we are now able to distinguish between binding-a-portion-of-the-matched-pattern and matching-a-portion-that-is-a-binding. We’ve seen the former already with unquote-bind; the latter is accomplished by using unquote in the structured syntax for a binding:

1
(Observe (:pattern (set-box ($ ,$their-id ,$further-constraint))) _)

Here we unquote twice. The ($ ...) constructor itself specifies that we require matching subscriptions to have a binding at this position. The first unquote extracts the name in the binding, and the second extracts the subpattern for the binding. For the example above, we would end up with their-id bound to the symbol new-value and further-constraint bound to the subpattern <_>.

Finally, let’s examine a couple of alternatives that don’t work. This one is missing the :pattern wrapper, meaning that instead of asking about patterns over set-box records, it is asking about observers that (mistakenly?) specified an actual set-box record instead of a pattern!

1
(Observe (set-box ,_) _)

The compiler won’t actually let you use this version, because the unquote-discard is out of place. There’s no quasiquotation to escape from, so this is a syntax error.

We might try repairing this by simply removing the unquote:

1
(Observe (set-box _) _)

But this is still asking the wrong question, and will never receive any interesting matches from other subscribers in the system.

How does it perform?

Very well! At present it is roughly twice as fast as the previous Racket implementation. Running a benchmark based on the example program above yields the following on one thread of my Ryzen 3960X:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
syndicate/actor: #<actor:0:dataspace> booting
syndicate/task: #<engine:0> starting
syndicate/actor: #<actor:3:box> booting
syndicate/actor: #<actor:6:client> booting
Box got 100000 (66222.3817422824 Hz)
Box got 200000 (68740.1257393843 Hz)
Box got 300000 (68724.52087884837 Hz)
Box got 400000 (68670.19562623161 Hz)
Box got 500000 (68786.55545277878 Hz)
syndicate/actor: #<actor:3:box> terminated OK
Client detected box termination
syndicate/actor: #<actor:6:client> terminated OK
syndicate/task: #<engine:0> stopping
cpu time: 7330 real time: 7330 gc time: 68

It means that the program is able to do ~68,000 complete round-trips per second of update and signalling between the box and client actors.

Preserves and Preserves Schema

The new implementation depends heavily on Preserves and Preserves Schema, so I’ve ended up doing a fair bit of work on those in order to get things working in Syndicate/rkt. (Among other things, fixing the raco pkg install process for the preserves and syndicate Racket packages!)

First, one nice bit of news is a new Preserves implementation, preserves-nim by Emery Hemingway, for the Nim programming language. I’ve linked the various implementations of Preserves and Syrup on the main Preserves webpage.

There have also been changes to the Schema language and tooling. The main change to the Schema language is a reappraisal of the role of Embedded values in schemas. Previously, they were treated as black boxes - given just enough machinery to parse them out of and serialize them back into a Value, but nothing more. Now, they’re given both a (de)serializer and an “interface type”; the idea is that an Embedded represents a capability to some behavioural object - a closure, an object pointer, an actor reference, a web service, that kind of thing - and so there may be an associated API that can be usefully schematized. This makes schematization of Embedded values something closely related to the higher-order contracts of Dimoulas; see the bit on future work in the spec for some additional thoughts along these lines, as well as a little example.

The main change to the Schema tooling is support for plugins in the Schema compiler, allowing Syndicate/rkt to supply a plugin for generating dataspace patterns from parsed Schema values. The #lang preserves-schema support has been likewise extended so you can supply plugins in the #lang line.

Last (and probably least), here’s a fun little schema example:

1
2
3
4
5
6
7
8
9
10
version 1 .
JSON =
     / @string string
     / @integer int
     / @double double
     / @boolean JSONBoolean
     / @null =null
     / @array [JSON ...]
     / @object { string: JSON ...:... } .
JSONBoolean = =true / =false .

It recognises the JSON-interoperable subset of Preserves Values!

Licensing

A note about licensing: I’ve chosen LGPL 3.0+ as the license for Syndicate/rkt. Many thanks to Massimo Zaniboni for pointing out the lack of license, discussing various options with me, and helping sort out the per-file license headers.

Tools for working with Preserves

I’ve added new documentation for a few useful tools for working with Preserves and Preserves Schema. Find an overview here.

More progress on Preserves Schemas

Since my last post, I’ve been working some more on Preserves Schemas.

Schema language documented

Metaschema improvements

  • I also made a major change to the metaschema, moving setof and dictof from CompoundPattern to SimplePattern, and splitting seqof out from tuple*.

    Placing seqof, setof and dictof in SimplePattern rather than CompoundPattern makes both SimplePattern and CompoundPattern actually meaningful, and this leads to significant code simplification, which in turn gives me more confidence in the design of the language.

    Patterns in SimplePattern now denote single host-language values without interesting substructure; that is, they are the values of individual fields within a generated record type. Patterns in CompoundPattern consistently now denote collections of fields rather than individual values.

Reader and checker improvements

Schema implementation improvements

Preserves Schemas for Racket

Today has mostly been spent working on Preserves Schemas.

I made a few changes to the Schema language itself, ancillary changes to the TypeScript/JavaScript implementation, and built a first pass at a Racket implementation of a Schema compiler.

Schema language changes

  • It’s now forbidden to use both “&” (intersection) and “/” (alternation) operators in a single rule. You have to pick one or the other. If you want to use both, you have to lift the inner one out to a separate rule.

        # Not allowed:
        BadRule = A / B & C / D .
    
        # Allowed:
        GoodRule = A / BC / D .
        BC = B & C .
    

    Rationale: It was confusing that there was precedence there at all; and if both operators are available at the same time, there’s an ambiguous case with respect to the names chosen for the branches vs the names chosen for the branches of an intersection. Here’s what the code used to say about this:

    1
    2
    3
    4
    
    // TODO: deal with situation where there's an or of ands, where
    // the branches of the and are named. The parsing is ambiguous, and
    // with the current code I think (?) you end up with the same name
    // attached to the or-branch as to the leftmost and-branch.
    

Metaschema changes

Implementation and tooling changes

  • Some “type checking” of Schemas is now performed at “read” time, not just at code-generation time. This means that all toolchains that use readSchema from reader.ts automatically benefit from checkSchema.

    Concretely, at the moment checkSchema does duplicate-binding checks and also ensures that a Schema specifies a bijection between plain Preserves content and the parsed data structures specified by the Schema.

  • I built an initial Preserves Schema compiler/code-generator for Racket. So far, it is able to read and generate code for working with the metaschema.

    Here’s an example. This definition from the metaschema:

    Schema = <schema {
      version: Version
      embeddedType: EmbeddedTypeName
      definitions: Definitions
    }>.
    

    generates the following code (as part of a complete module for all the definitions in the metaschema):

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    
    (struct Schema (version embeddedType definitions) #:prefab)
    
    (define (parse-Schema input)
      (match input
        [(and dest
              (record 'schema
                      (list
                       (hash-table
                        ('version (app parse-Version (and $version (not (== eof)))))
                        ('embeddedType (app parse-EmbeddedTypeName (and $embeddedType (not (== eof)))))
                        ('definitions (app parse-Definitions (and $definitions (not (== eof)))))
                        (_ _) ...))))
         (Schema $version $embeddedType $definitions)]
        [_ eof]))
    
    (define (Schema->preserves input)
      (match input
        [(Schema $version $embeddedType $definitions)
         (record 'schema
                 (list
                  (hash
                   'version (Version->preserves $version)
                   'embeddedType (EmbeddedTypeName->preserves $embeddedType)
                   'definitions (Definitions->preserves $definitions))))]))
    

Demo of capabilities securing access to a dataspace in novy-syndicate

Here’s a screencast of checking out, building, and running one of the novy-syndicate demos, namely simple-chat.ts.

  • After the project is checked out and built, a generic dataspace server is started.

  • Then, two clients are connected, using unattenuated capabilities to the server’s dataspace to enact a “chat” protocol via the dataspace. (Source code for the clients: here)

  • Then, the “root” capability to the server is attenuated, limiting access only to a user calling themself tonyg. One of the clients reconnects with the attenuated capability. Its messages are not transmitted unless it uses the nick tonyg.

  • Because the attenuated capability didn’t allow Observe assertions at all, the attenuated client can only send presence and messages!

  • So the attenuation is redone to allow observation of others’ presence and utterances, and the client is again reconnected.

Just click play on the embedded recording below, or visit the recording’s page on asciinema.org.

Alternatively, you can follow along with the instructions in the README on your own machine.

 

Project Announcement: Structuring the System Layer with Dataspaces

I’m delighted to be able to announce that the next phase of work on the Syndicate project has been selected for an NLnet Foundation grant.

The funded project is investigating the following question as part of the NGI Zero PET programme:

Could dataspaces be a suitable system layer foundation, perhaps replacing software like systemd and D-Bus?

The project involves placing dataspaces at the heart of a system layer implementation, initially running on a cellphone using some of the code and ideas from last year’s Squeak-on-a-cellphone work.

Please see the system-layer project page and this blog for news about the project.

History of Syndicate, 2012–

I’ve just finished a first draft of a page on the history of Syndicate and Dataspaces. It’s a narrative connecting the earliest dataspace-like ideas I came up with to the most recent designs I’m working on now. Feedback and questions welcome!

OnScreenKeyboardMorph: Smalltalk keyboard on a phone

I finally managed to write a little bit about my OnScreenKeyboardMorph on-screen keyboard for Squeak. It’s part of the foundation of the UI I’ll be using as part of this year’s project on exploring Syndicate for system layers.

Progress on Secur(abl)e Syndication

Besides working on the new website, over the last couple of days I’ve been consolidating the theory and implementation of secure/securable syndicated actors.

I decided to implement a syndicated-actor chat service similar to the one @dustyweb implemented for Spritely Goblins (see blog post and code), to help draw out the requirements and implementation constraints.

The code is here:

It went well. Using Preserves Schemas to define a protocol, and then using capability attenuation to enforce conformant use of the shared dataspace led to a system that prevented users1 from forging messages from other users, and prevented nick collisions from happening.

Here’s the schema defining the protocol:

    version 1 .
    embeddedType Actor.Ref .

    UserId = int .

    Join = <joinedUser @uid UserId @handle ref>.
    NickClaim = <claimNick @uid UserId @name string @k ref>.
    UserInfo = <user @uid UserId @name string>.
    Says = <says @who UserId @what string>.
    NickConflict = <nickConflict>.

The system has two moving pieces (besides the dataspace server itself): a single “moderator”, and zero or more “clients”.

A client asserts interest in a Join record. In response to their interest, the moderator allocates a user ID for the new connection and a new session handle, and asserts a Join record back to the client giving them access to their session.

The session handle is just a reference to the dataspace, attenuated to allow access only using the uid assigned to the connection:

1
2
3
4
5
    attenuate(ds, rfilter(
        pRec($Observe, pLit($user), pEmbedded()),
        pRec($Observe, pLit($says), pEmbedded()),
        pRec($claimNick, pLit(uid), pString(), pEmbedded()),
        pRec($says, pLit(uid), pString()))),

Using this reference and trying to assert anything or send any message not matching one of those patterns results in the assertion or message being silently dropped.

  1. There’s one key difference between the sketch I built and the system @dustyweb implemented: Goblins has sealers for making unforgeable signed secret envelopes containing values. My implementation doesn’t yet have sealers. Sealers make it possible to prove that the chatroom hasn’t forged any messages, as well as that users can’t forge each other’s messages. My sketch can only prove that attached users haven’t forged messages: you have to trust the chatroom more than in @dustyweb’s system. 

New website

Over the past couple of weeks I’ve been working on a new website for the Syndicate project.

A big part of this has been to widen its focus from just Syndicate per se, in order to avoid the confusion caused by treating “Syndicate” as an umbrella term for a novel Actor model, for Dataspaces, for Conversational Concurrency, for the Domain-Specific Language (DSL) features supporting syndicated-actor programming, and so on.

The new site splits out a few concepts to be treated independently:

  • The Syndicated Actor model is the name for the theory that I started to develop in my dissertation and that I’m still working on.

  • A dataspace is a special kind of syndicated actor whose job is to relay and replicate syndicated state from peer to peer, decoupling information flow from actor identity.

  • The Syndicate DSL is a bundle of language features that make programming with syndicated actors and dataspaces more ergonomic. These features are layered atop base languages like Java, Racket, TypeScript/JavaScript and so on.

  • Conversational concurrency is a term I introduced in my dissertation that captures the style of design of syndicated-actor programs.

The site’s not finished yet, but I think it’s rapidly approaching publishable shape.