Journal entries

An Atom feed Atom feed of these posts is also available.

Fixing up protocol mismatches on-the-fly

I’ve been fleshing out the syndicate-rkt Racket implementation based on the novy-syndicate TypeScript sketch. I just reached a milestone of TCP-based interoperability between the two implementations (yay!), but there’s an interesting little side track involved that I thought I’d write about.

The novy-syndicate code had a placeholder “dataspace” implementation that had extremely limited pattern-matching. It was only able to offer subscribers the ability to select (1) record assertions having (2) a user-selected, constant label.

For example, a subscriber could elect to receive all records labelled with Present; or with Says. Subscribers were not able to even specify arity of matched records. It really was a placeholder for a proper implementation to come later (ported across from a previous syndicate/js implementation).

By contrast, the syndicate-rkt code has a full-fledged dataspace able to index assertions according to quite sophisticated patterns:

1
2
3
4
5
6
7
8
9
10
11
12
13
# Dataspace patterns: a sublanguage of attenuation patterns.
Pattern = DDiscard / DBind / DLit / DCompound .

DDiscard = <_>.
DBind = <bind @name symbol @pattern Pattern>.
DLit = <lit @value any>.
DCompound = @rec  <compound @ctor CRec  @members { int: Pattern ...:... }>
          / @arr  <compound @ctor CArr  @members { int: Pattern ...:... }>
          / @dict <compound @ctor CDict @members { any: Pattern ...:... }> .

CRec = <rec @label any @arity int>.
CArr = <arr @arity int>.
CDict = <dict>.

Now, I managed to get the novy-syndicate example programs to talk to the full syndicate/rkt dataspace - without changing the code!

The way I did it was to rewrite assertions travelling between the programs on the fly.

And the way I did that was to include “rewrite” statements in the capability I gave to the novy-syndicate client to allow it to connect to the syndicate/rkt server.

The idea was to rewrite assertions-of-interest (subscriptions) from the simple label-only pattern of novy-syndicate to the equivalent full-dataspace pattern of syndicate/rkt, and to rewrite the responses from the dataspace from the arbitrary-arity responses of syndicate/rkt to the simple unary responses of novy-syndicate.

Here’s the rewrite specification,1 which ultimately appears embedded as a “caveat” inside the Macaroon-style capabilities that Syndicate uses:2

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[ <or [
    <rewrite

     # Step 1:
     <compound <rec Observe 2> {0: <bind label Symbol>, 1: <bind observer Embedded>}>
     <compound <rec Observe 2> {

       # Step 1(a):
       0: <compound <rec bind 2> {
         0: <lit assertion>
         1: <compound <rec compound 2> {
           0: <compound <rec rec 2> {0: <ref label>, 1: <lit 1>}>,
           1: <compound <dict> {}>
         }>
       }>

       # Step 1(b):
       1: <attenuate <ref observer> [
         <rewrite
          <compound <arr 1> {0: <bind v <_>>}>
          <ref v>>
       ]>

     }>>

    # Step 2:
    <rewrite <bind n <_>> <ref n>>

  ]> ]

It reads:

  1. try matching <Observe L C>, where L is a symbol and C an embedded capability; if it does not match, skip the remainder of this step; otherwise, rewrite it into <Observe <bind assertion ⌜P⌝> f(C)>, where

    1. P is a pattern matching records of the form <L _>, and the quotation operator ⌜·⌝ quotes a pattern over assertions into a term conforming to the Pattern schema above; and

    2. f(C) “attenuates” C by attaching rewrites to it. Any assertion sent to C is required to be of the form [V], and is rewritten into just V.

  2. if the rewrite in step 1 didn’t apply then match anything; call it n; and rewrite it to itself.

The net effect is that when the simple chat example from novy-syndicate asserts

<Observe Present #!C>

the syndicate-rkt server actually sees

<Observe <bind assertion ⌜<Present _>⌝> #!f(C)>

and when syndicate-rkt replies with an actual concrete presence record, for example3

[<Present "Tony">]

the novy-syndicate client will actually receive just

<Present "Tony">

Cool huh?

Now, this works great for Present, which is unary, but not so well for the client’s subscription to Says, which is binary: <Says who what>. So our interoperability is limited here: the client only sees presence information from its peers, and the actual utterances sent get dropped on the floor for lack of an appropriate pattern at the syndicate-rkt dataspace. To fix this, we could include a more complex rewrite specification that treated Presence and Says subscriptions separately and explicitly, with the correct arity for each. But I’m done for now, and will focus on getting a proper dataspace implementation into novy-syndicate instead.

  1. We need a DSL for these rewrite specifications! I’m working on it. It’ll probably look like the existing Syndicate DSL syntax for patterns. 

  2. Here’s the whole capability, including an “oid” identifying the service to be accessed, the sequence of “caveats” rewriting and attenuating information flowing through the capability, and the signature proving the capability’s validity:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    
    <ref "syndicate" [[<or [
      <rewrite <compound <rec Observe 2> {
        0: <bind label Symbol>,
        1: <bind observer Embedded>
      }> <compound <rec Observe 2> {
        0: <compound <rec bind 2> {
          0: <lit assertion>,
          1: <compound <rec compound 2> {
            0: <compound <rec rec 2> {
              0: <ref label>,
              1: <lit 1>
            }>,
            1: <compound <dict> {}>
          }>
        }>,
        1: <attenuate <ref observer> [<rewrite <compound <arr 1> {0: <bind v <_>>}> <ref v>>]>
      }>>,
      <rewrite <bind n <_>> <ref n>>
    ]>]] #[1oCXyvdXylgpWRhgg0w+iw==]>
    

  3. The single-element list is there because the rewritten pattern included a single binding named assertion, so there’s a single value in the list of potentially-many values sent back to the subscriber. The simplified novy-syndicate patterns included exactly one implicit whole-assertion binding, and so the list wrapper is also implicit in the novy-syndicate variation, which is why it has to be explicitly removed to get interoperability here. 

Major progress on capability-based syndicate-rkt implementation

I’ve been working on the novy branch of syndicate-rkt (Update: this is now the main branch), following the new design I developed for the novy-syndicate TypeScript prototype, driving the design further and working out new syntax ideas.

Syndicate/rkt example

Here’s an example program, box-and-client.rkt in the new Syndicate/rkt language:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
#lang syndicate

(message-struct set-box (new-value))
(assertion-struct box-state (value))

(module+ main
  (actor-system/dataspace (ds)
    (spawn #:name 'box
           (define-field current-value 0)
           (at ds
             (assert (box-state (current-value)))
             (on (message (set-box $new-value))
               (log-info "box: taking on new-value ~v" new-value)
               (current-value new-value)))
           (stop-on-true (= (current-value) 10)
             (log-info "box: terminating")))

    (spawn #:name 'client
           (at ds
             (stop-on (retracted (Observe (:pattern (set-box ,_)) _))
               (log-info "client: box has gone"))
             (on (asserted (box-state $v))
               (log-info "client: learned that box's value is now ~v" v)
               (send! ds (set-box (+ v 1))))
             (on (retracted (box-state _))
               (log-info "client: box state disappeared"))))))

The program consists of two actors, 'box and 'client. The box actor publishes the value of its current-value field, wrapped in a box-state record constructor, to the dataspace (line 11). It reacts to set-box messages sent by peers (lines 12–14); in this case, the client actor, which sends set-box to increment the value each time it learns of an updated value from the box (lines 22–24).

The box actor terminates once current-value reaches 10. The client notices the termination of the box actor in two ways (just to show them off): first, by noticing that the box-state record was unpublished from the dataspace (lines 25–26); and second, by noticing that all subscribers to set-box messages have vanished (lines 20–21).

What’s different? What’s new?

Explicit object references

The most notable change from previous dataspace programs is the explicit reference to the dataspace, ds. Assertions and subscriptions are now located at a specific (possibly remote) object, usually but not always a dataspace.

Capability-based security

Related is support for macaroon-style “sturdy references” (analogous to the SturdyRef concept from E). Here’s an example from a secure* chat demo app:

1
2
3
4
5
6
7
<ref "syndicate" [[<or [
  <rewrite <bind p <compound <rec Present 1> {0: <lit "tonyg">}>> <ref p>>,
  <rewrite <bind p <compound <rec Says 2> {
    0: <lit "tonyg">,
    1: String
  }>> <ref p>>
]>]] #[oHFy7B4NPVqhD6zJmNPbhg==]>

The oid ("syndicate" on line 1) identifies the target object. The patterns (lines 2–6) attenuate the authority of the capability to only permit transmission of Present and Says records. The signature (line 7) proves to the target object that the capability is genuine and untampered-with.

I’ve implemented most of the necessary plumbing for these, but have yet to complete the client/server portion of the system that actually makes use of them. For an example of their use, see novy-syndicate.

Schema support

Another interesting change is support for (the relatively new) Preserves Schema. You can use assertion-struct and message-struct as in previous dialects, or you can use Schema-defined types to establish subscriptions and place assertions with a peer.

Full pattern-matching dataspace implementation

Unlike the novy-syndicate prototype, this implementation is the first capability-based design to have a proper “skeleton”-based dataspace that supports the full range of dataspace patterns. This allows us to write, for example, subscriptions like

1
(on (retracted (box-state _)) ...)

which only fires when all box-state assertions are withdrawn.

Patterns over hash-tables

Previous implementations could only match fields in records (with constant labels) and elements of arrays/lists. This new implementation is also able to express and match patterns over named-key elements in Dictionary Values.

This lets actors express patterns over JSON-like Preserves documents, for example.

Pattern quasiquotation

One of the issues I hoped the new architecture would shed light on is pattern quotation. In order to express interest in interest expressed by some other party, you need to be able to describe the subscriptions that are of interest to you. That means you must be able to write patterns over patterns.

Previous implementations didn’t get this right. It was not possible to precisely express interest in subscriptions that bound (or did not bind) certain portions of their input; and it was not possible to precisely express the difference between being interested in a binding or binding a portion of the pattern to be matched itself.

The new design solves these issues with a quasiquote-like facility. Here’s a pattern that matches “subscriptions to unary set-box records”:

1
(Observe (:pattern (set-box ,_)) _)

The :pattern wrapper introduces a quoted pattern, and unquote-discard (“,_”) pops back out a level to say that we don’t care what the subscriber has put in their pattern at that position. For example, they may have elected to bind the value inside the set-box, or they may have elected to ignore it, or they may have elected to match only certain values of it, and so on. By discarding that portion of the pattern, we ignore the specific choice the matching subscriber made.

If instead we use unquote-bind (“,$id”), we extract a portion of the pattern each subscriber placed in the dataspace:

1
(Observe (:pattern (set-box ,$value-pat)) _)

For example, if some subscriber is binding the value in the set-box to an identifier new-value, but otherwise placing no constraints on it, we will be given the following value for value-pat:

1
<bind new-value <_>>

If, on the other hand, a subscriber is completely ignoring the value in the set-box, caring only about the set-box wrapper itself, we will be given <_>, the “discard” pattern.

Crucially, we are now able to distinguish between binding-a-portion-of-the-matched-pattern and matching-a-portion-that-is-a-binding. We’ve seen the former already with unquote-bind; the latter is accomplished by using unquote in the structured syntax for a binding:

1
(Observe (:pattern (set-box ($ ,$their-id ,$further-constraint))) _)

Here we unquote twice. The ($ ...) constructor itself specifies that we require matching subscriptions to have a binding at this position. The first unquote extracts the name in the binding, and the second extracts the subpattern for the binding. For the example above, we would end up with their-id bound to the symbol new-value and further-constraint bound to the subpattern <_>.

Finally, let’s examine a couple of alternatives that don’t work. This one is missing the :pattern wrapper, meaning that instead of asking about patterns over set-box records, it is asking about observers that (mistakenly?) specified an actual set-box record instead of a pattern!

1
(Observe (set-box ,_) _)

The compiler won’t actually let you use this version, because the unquote-discard is out of place. There’s no quasiquotation to escape from, so this is a syntax error.

We might try repairing this by simply removing the unquote:

1
(Observe (set-box _) _)

But this is still asking the wrong question, and will never receive any interesting matches from other subscribers in the system.

How does it perform?

Very well! At present it is roughly twice as fast as the previous Racket implementation. Running a benchmark based on the example program above yields the following on one thread of my Ryzen 3960X:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
syndicate/actor: #<actor:0:dataspace> booting
syndicate/task: #<engine:0> starting
syndicate/actor: #<actor:3:box> booting
syndicate/actor: #<actor:6:client> booting
Box got 100000 (66222.3817422824 Hz)
Box got 200000 (68740.1257393843 Hz)
Box got 300000 (68724.52087884837 Hz)
Box got 400000 (68670.19562623161 Hz)
Box got 500000 (68786.55545277878 Hz)
syndicate/actor: #<actor:3:box> terminated OK
Client detected box termination
syndicate/actor: #<actor:6:client> terminated OK
syndicate/task: #<engine:0> stopping
cpu time: 7330 real time: 7330 gc time: 68

It means that the program is able to do ~68,000 complete round-trips per second of update and signalling between the box and client actors.

Preserves and Preserves Schema

The new implementation depends heavily on Preserves and Preserves Schema, so I’ve ended up doing a fair bit of work on those in order to get things working in Syndicate/rkt. (Among other things, fixing the raco pkg install process for the preserves and syndicate Racket packages!)

First, one nice bit of news is a new Preserves implementation, preserves-nim by Emery Hemingway, for the Nim programming language. I’ve linked the various implementations of Preserves and Syrup on the main Preserves webpage.

There have also been changes to the Schema language and tooling. The main change to the Schema language is a reappraisal of the role of Embedded values in schemas. Previously, they were treated as black boxes - given just enough machinery to parse them out of and serialize them back into a Value, but nothing more. Now, they’re given both a (de)serializer and an “interface type”; the idea is that an Embedded represents a capability to some behavioural object - a closure, an object pointer, an actor reference, a web service, that kind of thing - and so there may be an associated API that can be usefully schematized. This makes schematization of Embedded values something closely related to the higher-order contracts of Dimoulas; see the bit on future work in the spec for some additional thoughts along these lines, as well as a little example.

The main change to the Schema tooling is support for plugins in the Schema compiler, allowing Syndicate/rkt to supply a plugin for generating dataspace patterns from parsed Schema values. The #lang preserves-schema support has been likewise extended so you can supply plugins in the #lang line.

Last (and probably least), here’s a fun little schema example:

1
2
3
4
5
6
7
8
9
10
version 1 .
JSON =
     / @string string
     / @integer int
     / @double double
     / @boolean JSONBoolean
     / @null =null
     / @array [JSON ...]
     / @object { string: JSON ...:... } .
JSONBoolean = =true / =false .

It recognises the JSON-interoperable subset of Preserves Values!

Licensing

A note about licensing: I’ve chosen LGPL 3.0+ as the license for Syndicate/rkt. Many thanks to Massimo Zaniboni for pointing out the lack of license, discussing various options with me, and helping sort out the per-file license headers.

Tools for working with Preserves

I’ve added new documentation for a few useful tools for working with Preserves and Preserves Schema. Find an overview here.

More progress on Preserves Schemas

Since my last post, I’ve been working some more on Preserves Schemas.

Schema language documented

Metaschema improvements

  • I also made a major change to the metaschema, moving setof and dictof from CompoundPattern to SimplePattern, and splitting seqof out from tuple*.

    Placing seqof, setof and dictof in SimplePattern rather than CompoundPattern makes both SimplePattern and CompoundPattern actually meaningful, and this leads to significant code simplification, which in turn gives me more confidence in the design of the language.

    Patterns in SimplePattern now denote single host-language values without interesting substructure; that is, they are the values of individual fields within a generated record type. Patterns in CompoundPattern consistently now denote collections of fields rather than individual values.

Reader and checker improvements

Schema implementation improvements

Preserves Schemas for Racket

Today has mostly been spent working on Preserves Schemas.

I made a few changes to the Schema language itself, ancillary changes to the TypeScript/JavaScript implementation, and built a first pass at a Racket implementation of a Schema compiler.

Schema language changes

  • It’s now forbidden to use both “&” (intersection) and “/” (alternation) operators in a single rule. You have to pick one or the other. If you want to use both, you have to lift the inner one out to a separate rule.

        # Not allowed:
        BadRule = A / B & C / D .
    
        # Allowed:
        GoodRule = A / BC / D .
        BC = B & C .
    

    Rationale: It was confusing that there was precedence there at all; and if both operators are available at the same time, there’s an ambiguous case with respect to the names chosen for the branches vs the names chosen for the branches of an intersection. Here’s what the code used to say about this:

    1
    2
    3
    4
    
    // TODO: deal with situation where there's an or of ands, where
    // the branches of the and are named. The parsing is ambiguous, and
    // with the current code I think (?) you end up with the same name
    // attached to the or-branch as to the leftmost and-branch.
    

Metaschema changes

Implementation and tooling changes

  • Some “type checking” of Schemas is now performed at “read” time, not just at code-generation time. This means that all toolchains that use readSchema from reader.ts automatically benefit from checkSchema.

    Concretely, at the moment checkSchema does duplicate-binding checks and also ensures that a Schema specifies a bijection between plain Preserves content and the parsed data structures specified by the Schema.

  • I built an initial Preserves Schema compiler/code-generator for Racket. So far, it is able to read and generate code for working with the metaschema.

    Here’s an example. This definition from the metaschema:

    Schema = <schema {
      version: Version
      embeddedType: EmbeddedTypeName
      definitions: Definitions
    }>.
    

    generates the following code (as part of a complete module for all the definitions in the metaschema):

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    
    (struct Schema (version embeddedType definitions) #:prefab)
    
    (define (parse-Schema input)
      (match input
        [(and dest
              (record 'schema
                      (list
                       (hash-table
                        ('version (app parse-Version (and $version (not (== eof)))))
                        ('embeddedType (app parse-EmbeddedTypeName (and $embeddedType (not (== eof)))))
                        ('definitions (app parse-Definitions (and $definitions (not (== eof)))))
                        (_ _) ...))))
         (Schema $version $embeddedType $definitions)]
        [_ eof]))
    
    (define (Schema->preserves input)
      (match input
        [(Schema $version $embeddedType $definitions)
         (record 'schema
                 (list
                  (hash
                   'version (Version->preserves $version)
                   'embeddedType (EmbeddedTypeName->preserves $embeddedType)
                   'definitions (Definitions->preserves $definitions))))]))
    

Demo of capabilities securing access to a dataspace in novy-syndicate

Here’s a screencast of checking out, building, and running one of the novy-syndicate demos, namely simple-chat.ts.

  • After the project is checked out and built, a generic dataspace server is started.

  • Then, two clients are connected, using unattenuated capabilities to the server’s dataspace to enact a “chat” protocol via the dataspace. (Source code for the clients: here)

  • Then, the “root” capability to the server is attenuated, limiting access only to a user calling themself tonyg. One of the clients reconnects with the attenuated capability. Its messages are not transmitted unless it uses the nick tonyg.

  • Because the attenuated capability didn’t allow Observe assertions at all, the attenuated client can only send presence and messages!

  • So the attenuation is redone to allow observation of others’ presence and utterances, and the client is again reconnected.

Just click play on the embedded recording below, or visit the recording’s page on asciinema.org.

Alternatively, you can follow along with the instructions in the README on your own machine.

 

Project Announcement: Structuring the System Layer with Dataspaces

I’m delighted to be able to announce that the next phase of work on the Syndicate project has been selected for an NLnet Foundation grant.

The funded project is investigating the following question as part of the NGI Zero PET programme:

Could dataspaces be a suitable system layer foundation, perhaps replacing software like systemd and D-Bus?

The project involves placing dataspaces at the heart of a system layer implementation, initially running on a cellphone using some of the code and ideas from last year’s Squeak-on-a-cellphone work.

Please see the system-layer project page and this blog for news about the project.

History of Syndicate, 2012–

I’ve just finished a first draft of a page on the history of Syndicate and Dataspaces. It’s a narrative connecting the earliest dataspace-like ideas I came up with to the most recent designs I’m working on now. Feedback and questions welcome!

OnScreenKeyboardMorph: Smalltalk keyboard on a phone

I finally managed to write a little bit about my OnScreenKeyboardMorph on-screen keyboard for Squeak. It’s part of the foundation of the UI I’ll be using as part of this year’s project on exploring Syndicate for system layers.

Progress on Secur(abl)e Syndication

Besides working on the new website, over the last couple of days I’ve been consolidating the theory and implementation of secure/securable syndicated actors.

I decided to implement a syndicated-actor chat service similar to the one @dustyweb implemented for Spritely Goblins (see blog post and code), to help draw out the requirements and implementation constraints.

The code is here:

It went well. Using Preserves Schemas to define a protocol, and then using capability attenuation to enforce conformant use of the shared dataspace led to a system that prevented users1 from forging messages from other users, and prevented nick collisions from happening.

Here’s the schema defining the protocol:

    version 1 .
    embeddedType Actor.Ref .

    UserId = int .

    Join = <joinedUser @uid UserId @handle ref>.
    NickClaim = <claimNick @uid UserId @name string @k ref>.
    UserInfo = <user @uid UserId @name string>.
    Says = <says @who UserId @what string>.
    NickConflict = <nickConflict>.

The system has two moving pieces (besides the dataspace server itself): a single “moderator”, and zero or more “clients”.

A client asserts interest in a Join record. In response to their interest, the moderator allocates a user ID for the new connection and a new session handle, and asserts a Join record back to the client giving them access to their session.

The session handle is just a reference to the dataspace, attenuated to allow access only using the uid assigned to the connection:

1
2
3
4
5
    attenuate(ds, rfilter(
        pRec($Observe, pLit($user), pEmbedded()),
        pRec($Observe, pLit($says), pEmbedded()),
        pRec($claimNick, pLit(uid), pString(), pEmbedded()),
        pRec($says, pLit(uid), pString()))),

Using this reference and trying to assert anything or send any message not matching one of those patterns results in the assertion or message being silently dropped.

  1. There’s one key difference between the sketch I built and the system @dustyweb implemented: Goblins has sealers for making unforgeable signed secret envelopes containing values. My implementation doesn’t yet have sealers. Sealers make it possible to prove that the chatroom hasn’t forged any messages, as well as that users can’t forge each other’s messages. My sketch can only prove that attached users haven’t forged messages: you have to trust the chatroom more than in @dustyweb’s system.