Abstract
Concurrent computations resemble conversations. In a conversation, participants direct utterances at others and, as the conversation evolves, exploit the known common context to advance the conversation. Similarly, collaborating software components share knowledge with each other in order to make progress as a group towards a common goal.
This dissertation studies concurrency from the perspective of cooperative knowledge-sharing, taking the conversational exchange of knowledge as a central concern in the design of concurrent programming languages. In doing so, it makes five contributions:
- It develops the idea of a common dataspace as a medium for knowledge exchange among concurrent components, enabling a new approach to concurrent programming.
While dataspaces loosely resemble both “fact spaces” from the world of Linda-style languages and Erlang's collaborative model, they significantly differ in many details.
- It offers the first crisp formulation of cooperative, conversational knowledge-exchange as a mathematical model.
- It describes two faithful implementations of the model for two quite different languages.
- It proposes a completely novel suite of linguistic constructs for organizing the internal structure of individual actors in a conversational setting.
The combination of dataspaces with these constructs is dubbed Syndicate.
- It presents and analyzes evidence suggesting that the proposed techniques and constructs combine to simplify concurrent programming.
The dataspace concept stands alone in its focus on representation and manipulation of conversational frames and conversational state and in its integral use of explicit epistemic knowledge. The design is particularly suited to integration of general-purpose I/O with otherwise-functional languages, but also applies to actor-like settings more generally.
Acknowledgments
Networking is interprocess communication.
—Robert Metcalfe, 1972, quoted in Day (2008)
I am deeply grateful to the many, many people who have supported, taught, and encouraged me over the past seven years.
My heartfelt thanks to my advisor, Matthias Felleisen. Matthias, it has been an absolute privilege to be your student. Without your patience, insight and willingness to let me get the crazy ideas out of my system, this work would not have been possible. My gratitude also to the members of my thesis committee, Mitch Wand, Sam Tobin-Hochstadt, and Jan Vitek. Sam in particular helped me convince Matthias that there might be something worth looking into in this concurrency business. I would also like to thank Olin Shivers for providing early guidance during my studies.
Thanks also to my friends and colleagues from the Programming Research Lab, including Claire Alvis, Leif Andersen, William Bowman, Dan Brown, Sam Caldwell, Stephen Chang, Ben Chung, Andrew Cobb, Ryan Culpepper, Christos Dimoulas, Carl Eastlund, Spencer Florence, Oli Flückiger, Dee Glaze, Ben Greenman, Brian LaChance, Ben Lerner, Paley Li, Max New, Jamie Perconti, Gabriel Scherer, Jonathan Schuster, Justin Slepak, Vincent St-Amour, Paul Stansifer, Stevie Strickland, Asumu Takikawa, Jesse Tov, and Aaron Turon. Sam Caldwell deserves particular thanks for being the second ever Syndicate programmer and for being willing to pick up the ideas of Syndicate and run with them.
Many thanks to Alex Warth and Yoshiki Ohshima, who invited me to intern at CDG Labs with a wonderful research group during summer and fall 2014, and to John Day, whose book helped motivate me to return to academia. Thanks also to the DARPA CRASH program and to several NSF grants that helped to fund my PhD research.
I wouldn't have made it here without crucial interventions over the past few decades from a wide range of people. Nigel Bree hooked me on Scheme in the early '90s, igniting a lifelong interest in functional programming. A decade later, while working at a company called LShift, my education as a computer scientist truly began when Matthias Radestock and Greg Meredith introduced me to the -calculus and many related ideas. Andy Wilson broadened my mind with music, philosophy and political ideas both new and old. A few years later, Alexis Richardson showed me the depth and importance of distributed systems as we developed new ideas about messaging middleware and programming languages while working together on RabbitMQ. My colleagues at LShift were instrumental to the development of the ideas that ultimately led to this work. My thanks to all of you. In particular, I owe an enormous debt of gratitude to my good friend Michael Bridgen. Michael, the discussions we have had over the years contributed to this work in so many ways that I'm still figuring some of them out.
Life in Boston wouldn't have been the same without the friendship and hospitality of Scott and Megs Stevens. Thank you both.
Finally, I'm grateful to my family. The depth of my feeling prevents me from adequately conveying quite how grateful I am. Thank you Mum, Dad, Karly, Casey, Sabrina, and Blyss. Each of you has made an essential contribution to the person I've become, and I love you all. Thank you to the Yates family and to Warren, Holden and Felix for much-needed distraction and moments of zen in the midst of the write-up. But most of all, thank you to Donna. You're my person.
Tony Garnock-Jones
Boston, Massachusetts
December 2017
Contents
- IBackground
- 1Introduction
- 2Philosophy and Overview of the Syndicate Design
- 2.1Cooperating by sharing knowledge
- 2.2Knowledge types and knowledge flow
- 2.3Unpredictability at run-time
- 2.4Unpredictability in the design process
- 2.5Syndicate's approach to concurrency
- 2.6Syndicate design principles
- 2.7On the name “Syndicate”
- 3Approaches to Coordination
- 3.1A concurrency design landscape
- 3.2Shared memory
- 3.3Message-passing
- 3.4Tuplespaces and databases
- 3.5The fact space model
- 3.6Surveying the landscape
- IITheory
- 4Computational Model I: The Dataspace Model
- 4.1Abstract dataspace model syntax and informal semantics
- 4.2Formal semantics of the dataspace model
- 4.3Cross-layer communication
- 4.4Messages versus assertions
- 4.5Properties
- 4.6Incremental assertion-set maintenance
- 4.7Programming with the incremental protocol
- 4.8Styles of interaction
- 5Computational Model II: Syndicate
- 5.1Abstract Syndicate/λ syntax and informal semantics
- 5.2Formal semantics of Syndicate/λ
- 5.3Interpretation of events
- 5.4Interfacing Syndicate/λ to the dataspace model
- 5.5Well-formedness and Errors
- 5.6Atomicity and isolation
- 5.7Derived forms: and
- 5.8Properties
- IIIPractice
- 6Syndicate/rkt Tutorial
- 6.1Installation and brief example
- 6.2The structure of a running program: ground dataspace, driver actors
- 6.3Expressions, values, mutability, and data types
- 6.4Core forms
- 6.5Derived and additional forms
- 6.6Ad-hoc assertions
- 7Implementation
- 7.1Representing Assertion Sets
- 7.1.1Background
- 7.1.2Semi-structured assertions & wildcards
- 7.1.3Assertion trie syntax
- 7.1.4Compiling patterns to tries
- 7.1.5Representing Syndicate data structures with assertion tries
- 7.1.6Searching
- 7.1.7Set operations
- 7.1.8Projection
- 7.1.9Iteration
- 7.1.10Implementation considerations
- 7.1.11Evaluation of assertion tries
- 7.1.12Work related to assertion tries
- 7.2Implementing the dataspace model
- 7.2.1Assertions
- 7.2.2Patches and multiplexors
- 7.2.3Processes and behavior functions
- 7.2.4Dataspaces
- 7.2.5Relays
- 7.3Implementing the full Syndicate design
- 7.3.1Runtime
- 7.3.2Syntax
- 7.3.3Dataflow
- 7.4Programming tools
- 7.4.1Sequence diagrams
- 7.4.2Live program display
- 8Idiomatic Syndicate
- 8.1Protocols and Protocol Design
- 8.2Built-in protocols
- 8.3Shared, mutable state
- 8.4I/O, time, timers and timeouts
- 8.5Logic, deduction, databases, and elaboration
- 8.5.1Forward-chaining
- 8.5.2Backward-chaining and Hewitt's “Turing” Syllogism
- 8.5.3External knowledge sources: The file-system driver
- 8.5.4Procedural knowledge and Elaboration: “Make”
- 8.5.5Incremental truth-maintenance and Aggregation: All-pairs shortest paths
- 8.5.6Modal reasoning: Advertisement
- 8.6Dependency resolution and lazy startup: Service presence
- 8.7Transactions: RPC, Streams, Memoization
- 8.8Dataflow and reactive programming
- IVReflection
- 9Evaluation: Patterns
- 9.1Patterns
- 9.2Eliminating and simplifying patterns
- 9.3Simplification as key quality attribute
- 9.4Event broadcast, the observer pattern and state replication
- 9.5The state pattern
- 9.6The cancellation pattern
- 9.7The demand-matcher pattern
- 9.8Actor-language patterns
- 10Evaluation: Performance
- 10.1Reasoning about routing time and delivery time
- 10.2Measuring abstract Syndicate performance
- 10.3Concrete Syndicate performance
- 11Discussion
- 11.1Placing Syndicate on the map
- 11.2Placing Syndicate in a wider context
- 11.2.1Functional I/O
- 11.2.2Functional operating systems
- 11.2.3Process calculi
- 11.2.4Formal actor models
- 11.2.5Messaging middleware
- 11.3Limitations and challenges
- 12Conclusion
- 12.1Review
- 12.2Next steps
- ASyndicate/js Syntax
- BCase study: IRC server
- CPolyglot Syndicate
- DRacket Dataflow Library
IBackground
1Introduction
Concurrency and its constant companions, communication and coordination, are ubiquitous in computing. From warehouse-sized datacenters through multi-processor operating systems to interactive or multi-threaded programs, coroutines, and even the humble function, every computation exists in some context and must exchange information with that context in a prescribed manner at a prescribed time. Functions receive inputs from and transmit outputs to their callers; impure functions may access or update a mutable store; threads update shared memory and transfer control via locks; and network services send and receive messages to and from their peers.
Each of these acts of communication contributes to a shared understanding of the relevant knowledge required to undertake some task common to the involved parties. That is, the purpose of communication is to share state: to replicate information from peer to peer. After all, a communication that does not affect a receiver's view of the world literally has no effect. Put differently, each task shared by a group of components entails various acts of communication in the frame of an overall conversation, each of which conveys knowledge to components that need it. Each act of communication contributes to the overall conversational state involved in the shared task. Some of this conversational state relates to what must be or has been done; some relates to when it must be done. Traditionally, the “what” corresponds closely to “communication,” and the “when” to “coordination.”
The central challenge in programming for a concurrent world is the unpredictability of a component's interactions with its context. Pure, total functions are the only computations whose interactions are completely predictable: a single value in leads to a terminating computation which yields a single value out. Introduction of effects such as non-termination, exceptions, or mutability makes function output unpredictable. Broadening our perspective to coroutines makes even the inputs to a component unpredictable: an input may arrive at an unexpected time or may not arrive at all. Threads may observe shared memory in an unexpected state, or may manipulate locks in an unexpected order. Networks may corrupt, discard, duplicate, or reorder messages; network services may delegate tasks to third parties, transmit out-of-date information, or simply never reply to a request.
This seeming chaos is intrinsic: unpredictability is a defining characteristic of concurrency. To remove the one would eliminate the other. However, we shall not declare defeat. If we cannot eliminate harmful unpredictability, we may try to minimize it on one hand, and to cope with it on the other. We may seek a model of computation that helps programmers eliminate some forms of unpredictability and understand those that remain.
To this end, I have developed new programming language design, Syndicate, which rests on a new model of concurrent computation, the dataspace model. In this dissertation I will defend the thesis that
This claim must be broken down before it can be understood.
- Mechanism for sharing state.
- The dataspace model is, at heart, a mechanism for sharing state among neighboring concurrent components. The design focuses on mechanisms for sharing state because effective mechanisms for communication and coordination follow as special cases. Chapter 2 motivates the Syndicate design, and chapter 3 surveys a number of existing linguistic approaches to coordination and communication, outlining the multi-dimensional design space which results. Chapter 4 then presents a vocabulary for and formal model of dataspaces along with basic correctness theorems.
- Linguistic mechanism.
- The dataspace model, taken alone, explains communication and coordination among components but does not offer the programmer any assistance in structuring the internals of components. The full Syndicate design presents the primitives of the dataspace model to the programmer by way of new language constructs. These constructs extend the underlying programming language used to write a component, bridging between the language's own computational model and the style of interaction offered by the dataspace model. Chapter 5 presents these new constructs along with an example of their application to a simple programming language.
- Realizability.
- A design that cannot be implemented is useless; likewise an implementation that cannot be made performant enough to be fit-for-purpose. Chapter 6 examines an example of the integration of the Syndicate design with an existing host language. Chapter 7 discusses the key data structures, algorithms, and implementation techniques that allowed construction of the two Syndicate prototypes, Syndicate/rkt and Syndicate/js.
- Effectiveness.
- Chapter 8 argues informally for the effectiveness of the programming model by explaining idiomatic Syndicate style through dissection of example protocols and programs. Chapter 9 goes further, arguing that Syndicate eliminates various patterns prevalent in concurrent programming, thereby simplifying programming tasks. Chapter 10 discusses the performance of the Syndicate design, first in terms of the needs of the programmer and second in terms of the actual measured characteristics of the prototype implementations.
- Novelty.
- Chapter 11 places Syndicate within the map sketched in chapter 3, showing that it occupies a point in design space not covered by other models of concurrency.
Concurrency is ubiquitous in computing, from the very smallest scales to the very largest. This dissertation presents Syndicate as an approach to concurrency within a non-distributed program. However, the design has consequences that may be of use in broader settings such as distributed systems, network architecture, or even operating system design. Chapter 12 concludes the dissertation, sketching possible connections between Syndicate and these areas that may be examined more closely in future work.
2Philosophy and Overview of the Syndicate Design
Computer Scientists don't do philosophy.
—Mitch Wand
Taking seriously the idea that concurrency is fundamentally about knowledge-sharing has consequences for programming language design. In this chapter I will explore the ramifications of the idea and outline a mechanism for communication among and coordination of concurrent components that stems directly from it.
Concurrency demands special support from our programming languages. Often specific communication mechanisms like message-passing or shared memory are baked in to a language. Sometimes additional coordination mechanisms such as locks, condition variables, or transactions are provided; in other cases, such as in the actor model, the chosen communication mechanisms double as coordination mechanisms. In some situations, the provided coordination mechanisms are even disguised: the event handlers of browser-based JavaScript programs are carefully sequenced by the system, showing that even sequential programming languages exhibit internal concurrency and must face issues arising from the unpredictability of the outside world.
Let us step back from consideration of specific conversational mechanisms, and take a broader viewpoint. Seen from a distance, all these approaches to communication and coordination appear to be means to an end: namely, they are means by which relevant knowledge is shared among cooperating components. Knowledge-sharing is then simply the means by which they cooperate in performing their common task.
Focusing on knowledge-sharing allows us to ask high-level questions that are unavailable to us when we consider specific communication and coordination mechanisms alone:
- K1What does it mean to cooperate by sharing knowledge?
- K2What general sorts of facts do components know?
- K3What do they need to know to do their jobs?
It also allows us to frame the inherent unpredictability of concurrent systems in terms of knowledge. Unpredictability arises in many different ways. Components may crash, or suffer errors or exceptions during their operation. They may freeze, deadlock, enter unintentional infinite loops, or merely take an unreasonable length of time to reply. Their actions may interleave arbitrarily. New components may join and existing components may leave the group without warning. Connections to the outside world may fail. Demand for shared resources may wax and wane. Considering all these issues in terms of knowledge-sharing allows us to ask:
- K4Which forms of knowledge-sharing are robust in the face of such unpredictability?
- K5What knowledge helps the programmer mitigate such unpredictability?
Beyond the unpredictability of the operation of a concurrent system, the task the system is intended to perform can itself change in unpredictable ways. Unforeseen program change requests may arrive. New features may be invented, demanding new components, new knowledge, and new connections and relationships between existing components. Existing relationships between components may be altered. Again, our knowledge-sharing perspective allows us to raise the question:
- K6Which forms of knowledge-sharing are robust to and help mitigate the impact of changes in the goals of a program?
In the remainder of this chapter, I will examine these questions generally and will outline Syndicate's position on them in particular, concluding with an overview of the Syndicate approach to concurrency. We will revisit these questions in chapter 3 when we make a detailed examination of and comparison with other forms of knowledge-sharing embodied in various programming languages and systems.
2.1Cooperating by sharing knowledge
We have identified conversation among concurrent components abstractly as a mechanism for knowledge-sharing, which itself is the means by which components work together on a common task. However, taken alone, the mere exchange of knowledge is insufficient to judge whether an interaction is cooperative, neutral, or perhaps even malicious. As programmers, we will frequently wish to orchestrate multiple components, all of which are under our control, to cooperate with each other. From time to time, we must equip our programs with the means for responding to non-cooperative, possibly-malicious interactions with components that are not under our control. To achieve these goals, an understanding of what it is to be cooperative is required.
H. Paul Grice, a philosopher of language, proposed the cooperative principle of conversation in order to make sense of the meanings people derive from utterances they hear:
- Cooperative Principle (CP).
- Make your conversational contribution such as is required, at the stage at which it occurs, by the accepted purpose or direction of the talk exchange in which you are engaged. (Grice 1975)
- Quantity.
-
- Make your contribution as informative as required (for the current purposes of the exchange).
- Do not make your contribution more informative than is required.
- Quality.
- Try to make your contribution one that is true.
- Do not say what you believe to be false.
- Do not say that for which you lack adequate evidence.
- Relation.
- Be relevant.
- Manner.
- Be perspicuous.
- Avoid obscurity of expression.
- Avoid ambiguity.
- Be brief (avoid unnecessary prolixity).
- Be orderly.
He further proposed four conversational maxims as corollaries to the CP, presented in figure 1. It is important to note the character of these maxims:
They are not sociological generalizations about speech, nor they are moral prescriptions or proscriptions on what to say or communicate. Although Grice presented them in the form of guidelines for how to communicate successfully, I think they are better construed as presumptions about utterances, presumptions that we as listeners rely on and as speakers exploit. (Bach 2005)
Grice's principle and maxims can help us tackle question K1 in two ways. First, they can be read directly as constructive advice for designing conversational protocols for cooperative interchange of information. Second, they can attune us to particular families of design mistakes in such protocols that result from cases in which these “presumptions” are invalid. This can in turn help us come up with guidelines for protocol design that help us avoid such mistakes. Thus, we may use these maxims to judge a given protocol among concurrent components, asking ourselves whether each communication that a component makes lives up to the demands of each maxim.
Grice introduces various ways of failing to fulfill a maxim, and their consequences:
- Unostentatious violation of a maxim, which can mislead peers.
- Explicit opting-out of participation in a maxim or even the Cooperative Principle in general, making plain a deliberate lack of cooperation.
- Conflict between maxims: for example, there may be tension between speaking some necessary (Quantity(1)) truth (Quality(1)), and a lack of evidence in support of it (Quality(2)), which may lead to shaky conclusions down the line.
- Flouting of a maxim: blatant, obviously deliberate violation of a conversational maxim, which “exploits” the maxim, with the intent to force a hearer out of the usual frame of the conversation and into an analysis of some higher-order conversational context.
Many, but not all, of these can be connected to analogous features of computer communication protocols. In this dissertation, I am primarily assuming a setting involving components that deliberately aim to cooperate. We will not dwell on deliberate violation of conversational maxims. However, we will from time to time see that consideration of accidental violation of conversational maxims is relevant to the design and analysis of computer protocols. For example, Grice writes that
[the] second maxim [of Quantity] is disputable; it might be said that to be overinformative is not a transgression of the [Cooperative Principle] but merely a waste of time. However, it might be answered that such overinformativeness may be confusing in that it is liable to raise side issues; and there may also be an indirect effect, in that the hearers may be misled as a result of thinking that there is some particular point in the provision of the excess of information. (Grice 1975)
This directly connects to (perhaps accidental) excessive bandwidth use (“waste of time”) as well as programmer errors arising from exactly the misunderstanding that Grice describes.
It may seem surprising to bring ideas from philosophy of language to bear in the setting of cooperating concurrent computerized components. However, Grice himself makes the connection between his specific conversational maxims and “their analogues in the sphere of transactions that are not talk exchanges,” drawing on examples of shared tasks such as cooking and car repair, so it does not seem out of place to apply them to the design and analysis of our conversational computer protocols. This is particularly the case in light of Grice's ambition to explain the Cooperative Principle as “something that it is reasonable for us to follow, that we should not abandon.” (Grice 1975 p. 48; emphasis in original)
The CP makes mention of the “purpose or direction” of a given conversation. We may view the fulfillment of the task shared by the group of collaborating components as the purpose of the conversation. Each individual component in the group has its own role to play and, therefore, its own “personal” goals in working toward successful completion of the shared task. Kitcher (1990), writing in the context of the social structure of scientific collaboration, introduces the notions of personal and impersonal epistemic intention. We may adapt these ideas to our setting, explicitly drawing out the notion of a role within a conversational protocol. A cooperative component “wishes” for the group as a whole to succeed: this is its “impersonal” epistemic intention. It also has goals for itself, “personal” epistemic intentions, namely to successfully perform its roles within the group.
Finally, the CP is a specific example of the general idea of epistemic reasoning, logical reasoning incorporating knowledge and beliefs about one's own knowledge and beliefs, and about the knowledge and beliefs of other parties (Fagin et al. 2004; Hendricks and Symons 2015; van Ditmarsch, van der Hoek and Kooi 2017). However, epistemic reasoning has further applications in the design of conversational protocols among concurrent components, which brings us to our next topic.
2.2Knowledge types and knowledge flow
The conversational state that accumulates as part of a collaboration among components can be thought of as a collection of facts. First, there are those facts that define the frame of a conversation. These are exactly the facts that identify the task at hand; we label them “framing knowledge”, and taken together, they are the “conversational frame” for the conversation whose purpose is completion of a particular shared task. Just as tasks can be broken down into more finely-focused subtasks, so can conversations be broken down into sub-conversations. In these cases, part of the conversational state of an overarching interaction will describe a frame for each sub-conversation, within which corresponding sub-conversational state exists. The knowledge framing a conversation acts as a bridge between it and its wider context, defining its “purpose” in the sense of the CP. Figure 2 schematically depicts these relationships.
Some facts define conversational frames, but every shared fact is contextualized within some conversational frame. Within a frame, then, some facts will pertain directly to the task at hand. These, we label “domain knowledge”. Generally, such facts describe global aspects of the common problem that remain valid as we shift our perspective from participant to participant. Other facts describe the knowledge or beliefs of particular components. These, we label “epistemic knowledge”.
For example, as a file transfer progresses, the actual content of the file does not change: it remains a global fact that byte number 300 (say) has value 255, no matter whether the transfer has reached that position or not. The content of the file is thus “domain knowledge”. However, as the transfer proceeds and acknowledgements of receipt stream from the recipient to the transmitter, the transmitter's beliefs about the receiver's knowledge change. Each successive acknowledgement leads the transmitter to believe that the receiver has learned a little more of the file's content. Information on the progress of the transfer is thus “epistemic knowledge”.
If domain knowledge is “what is true in the world”, and epistemic knowledge is “who knows what”, the third piece of the puzzle is “who needs to know what” in order to effectively make a contribution to the shared task at hand. We will use the term “interests” as a name for those facts that describe knowledge that a component needs to learn. Knowledge of the various interests in a group allows collaborators to plan their communication acts according to the needs of individual components and the group as a whole. In conversations among people, interests are expressed as questions; in a computational setting, they are conveyed by requests, queries, or subscriptions.
The interests of components in a concurrent system thus direct the flow of knowledge within the system. The interests of a group may be constant, or may vary with time.
When interest is fixed, remaining the same for a certain class of shared task, the programmer can plan paths for communication up front. For example, in the context of a single TCP connection, the interests of the two parties involved are always the same: each peer wishes to learn what the other has to say. As a consequence, libraries implementing TCP can bake in the assumption that clients will wish to access received data. As another example, a programmer charged with implementing a request counter in a web server may choose to use a simple global integer variable, safe in the knowledge that the only possible item of interest is the current value of the counter.
A changing, dynamic set of interests, however, demands development of a vocabulary for communicating changes in interest during a conversation. For example, the query language of a SQL database is just such a vocabulary. The server's initial interest is in what the client is interested in, and is static, but the client's own interests vary with each request, and must be conveyed anew in the context of each separate interaction. Knowledge about dynamically-varying interests allows a group of collaborating components to change its interaction patterns on the fly.
With this ontology in hand, we may answer questions K2 and K3. Each task is delimited by a conversational frame. Within that frame, components share knowledge related to the domain of the task at hand, and knowledge related to the knowledge, beliefs, needs, and interests of the various participants in the collaborative group. Conversations are recursively structured by shared knowledge of (sub-)conversational frames, defined in terms of any or all of the types of knowledge we have discussed. Some conversations take place at different levels within a larger frame, bridging between tasks and their subtasks. Components are frequently engaged in multiple tasks, and thus often participate in multiple conversations at once. The knowledge a component needs to do its job is provided to it when it is created, or later supplied to it in response to its interests.
2.3Unpredictability at run-time
A full answer to question K4 must wait until the survey of communication and coordination mechanisms of chapter 3. However, this dissertation will show that at least one form of knowledge-sharing, the Syndicate design, encourages robust handling of many kinds of concurrency-related unpredictability.
The epistemological approach we have taken to questions K1–K3 suggests some initial steps toward an answer to question K5. In order for a program to be robust in the face of unpredictable events, it must first be able to detect these events, and second be able to muster an appropriate response to them. Certain kinds of events can be reliably detected and signaled, such as component crashes and exceptions, and arrivals and departures of components in the group. Others cannot easily be detected reliably, such as nontermination, excessive slowness, or certain kinds of deadlock and datalock. Half-measures such as use of timeouts must suffice for the latter sort. Still other kinds of unpredictability such as memory races or message races may be explicitly worked around via careful protocol design, perhaps including information tracking causality or provenance of a piece of knowledge or arranging for extra coordination to serialize certain sensitive operations.
No matter the source of the unpredictability, once detected it must be signaled to interested parties. Our epistemic, knowledge-sharing focus allows us to treat the facts of an unpredictable event as knowledge within the system. Often, such a fact will have an epistemic consequence. For example, learning that a component has crashed will allow us to discount any partial results we may have learned from it, and to discard any records we may have been keeping of the state of the failed component itself. Generally speaking, an epistemological perspective can help each component untangle intact from damaged or potentially untrustworthy pieces of knowledge. Having classified its records into “salvageable” and “unrecoverable”, it may discard items as necessary and engage with the remaining portion of the group in actions to repair the damage and continue toward the ultimate goal.
One particular strategy is to retry a failed action. Consideration of the roles involved in a shared task can help determine the scope of the action to retry. For example, the idea of supervision that features so prominently in Erlang programming (Armstrong 2003) is to restart entire failing components from a specification of their roles. Here, consideration of the epistemic intentions of components can be seen to help the programmer design a system robust to certain forms of unpredictable failure.
2.4Unpredictability in the design process
Programs are seldom “finished”. Change must be accommodated at every stage of a program's life cycle, from the earliest phases of development to, in many cases, long after a program is deployed. When concurrency is involved, such change often involves emendations to protocol definitions and shifts in the roles and relationships within a group of components. Just as with question K4, a full examination of question K6 must wait for chapter 3. However, approaching the question in the abstract, we may identify a few desirable characteristics of linguistic support for concurrent programming.
First, debugging of concurrent programs can be extremely difficult. A language should have tools for helping programmers gain insight into the intricacies of the interactions among each program's components. Such tools depend on information gleaned from the knowledge-sharing mechanism of the language. As such, a mechanism that generates trace information that matches the mental model of the programmer is desirable.
Second, changes to programs often introduce new interactions among existing components. A knowledge-sharing mechanism should allow for straightforward composition of pieces of program code describing (sub)conversations that a component is to engage in. It should be possible to introduce an existing component to a new conversation without heavy revision of the code implementing the conversations the component already supports.
Finally, service programs must often run for long periods of time without interruption. In cases where new features or important bug-fixes must be introduced, it is desirable to be able to replace or upgrade program components without interrupting service availability. Similar concerns arise even for user-facing graphical applications, where upgrades to program code must preserve various aspects of program state and configuration across the change.
2.5Syndicate's approach to concurrency
Syndicate places knowledge front and center in its design in the form of assertions. An assertion is a representation of an item of knowledge that one component wishes to communicate to another. Assertions may represent framing knowledge, domain knowledge, and epistemic knowledge, as a component sees fit. Each component in a group exists within a dataspace which both keeps track of the group's current set of assertions and schedules execution of its constituent components. Components add and remove assertions from the dataspace freely, and the dataspace ensures that components are kept informed of relevant assertions according to their declared interests.
In order to perform this task, Syndicate dataspaces place just one constraint on the interpretation of assertions: there must exist, in a dataspace implementation, a distinct piece of syntax for constructing assertions that will mean interest in some other assertion. For example, if “the color of the boat is blue” is an assertion, then so is “there exists some interest in the color of the boat being blue”. A component that asserts interest in a set of other assertions will be kept informed as members of that set appear and disappear in the dataspace through the actions of the component or its peers.
Syndicate makes extensive use of wildcards for generating large—in fact, often infinite—sets of assertions. For example, “interest in the color of the boat being anything at all” is a valid and useful set of assertions, generated from a piece of syntax with a wildcard marker in the position where a specific color would usually reside. Concretely, we might write , which generates the set of assertions , with ranging over the entire universe of assertions.
The design of the dataspace model thus far seems similar to the tuplespace model (Gelernter 1985; Gelernter and Carriero 1992; Carriero et al. 1994). There are two vital distinctions. The first is that tuples in the tuplespace model are “generative”, taking on independent existence once placed in the shared space, whereas assertions in the dataspace model are not. Assertions in a dataspace never outlive the component that is currently asserting them; when a component terminates, all its assertions are retracted from the shared space. This occurs whether termination was normal or the result of a crash or an exception. The second key difference is that multiple copies of a particular tuple may exist in a tuplespace, while redundant assertions in a dataspace cannot be distinguished by observers. If two components separately place an assertion into their common dataspace, a peer that has previously asserted interest in is informed merely that has been asserted, not how many times it has been asserted. If one redundant assertion of is subsequently withdrawn, the observer will not be notified; only when every assertion of is retracted is the observer notified that is no longer present in the dataspace. Observers are shown only a set view on an underlying bag of assertions. In other words, producing a tuple is non-idempotent, while making an assertion is idempotent.
Even more closely related is the fact space model (Mostinckx et al. 2007; Mostinckx, Lombide Carreton and De Meuter 2008), an approach to middleware for connecting programs in mobile networks. The model is based on an underlying tuplespace, interpreting tuples as logical facts by working around the generativity and poor fault-tolerance properties of the tuplespace mechanism in two ways. First, tuples are recorded alongside the identity of the program that produced them. This provenance information allows tuples to be removed when their producer crashes or is otherwise disconnected from the network. Second, tuples can be interpreted in an idempotent way by programs. This allows programs to ignore redundant tuples, recovering a set view from the bag of tuples they observe. While the motivations and foundations of the two works differ, in many ways the dataspace and fact space models address similar concerns. Conceptually, the dataspace model can be viewed as an adaptation and integration of the fact space model into a programming language setting. The fact space model focuses on scaling up to distributed systems, while our focus is instead on a mechanism that scales down to concurrency in the small. In addition, the dataspace model separates itself from the fact space model in its explicit, central epistemic constructions and its emphasis on conversational frames.
The dataspace model maintains a strict isolation between components in a dataspace, forcing all interactions between peers through the shared dataspace. Components access and update the dataspace solely via message passing. Shared memory in the sense of multi-threaded models is ruled out. In this way, the dataspace model seems similar to the actor model (Hewitt, Bishop and Steiger 1973; Agha 1986; Agha et al. 1997; De Koster et al. 2016). The core distinction between the models is that components in the dataspace model communicate indirectly by making and retracting assertions in the shared store which are observed by other components, while actors in the actor model communicate directly by exchange of messages which are addressed to other actors. Assertions in a dataspace are routed according to the intersection between sets of assertions and sets of asserted interests in assertions, while messages in the actor model are each routed to an explicitly-named target actor.
The similarities between the dataspace model and the actor, tuplespace, and fact space models are strong enough that we borrow terminology from them to describe concepts in Syndicate. Specifically, we borrow the term “actor” to denote a Syndicate component. What the actor model calls a “configuration” we fold into our idea of a “dataspace”, a term which also denotes the shared knowledge store common to a group of actors. The term “dataspace” itself was chosen to highlight this latter denotation, making a connection to fact spaces and tuplespaces.
We will touch again on the similarities and differences among these models in chapter 3, examining details in chapter 11. In the remainder of this subsection, let us consider Syndicate's relationship to questions K1–K6.
Cooperation, knowledge & conversation.
The Syndicate design takes questions K1–K3 to heart, placing them at the core of its choice of sharing mechanism and the concomitant approach to protocol design. Actors exchange knowledge encoded as assertions via a shared dataspace. All shared state in a Syndicate program is represented as assertions: this includes domain knowledge, epistemic knowledge, and frame knowledge. Key to Syndicate's functioning is the use of a special form of epistemic knowledge, namely assertions of interest. It is these assertions that drive knowledge flow in a program from parties asserting some fact to parties asserting interest in that fact.
Mey (2001) defines pragmatics as the subfield of linguistics which “studies the use of language in human communication as determined by the conditions of society”. Broadening its scope to include computer languages in software communication as determined by the conditions of the system as a whole takes us into a somewhat speculative area.Pragmatics is sometimes characterized as dealing with the effects of context [...] if one collectively refers to all the facts that can vary from utterance to utterance as ‘context.’ (Korta and Perry 2015)
Viewing an interaction among actors as a conversation and shared assertions as conversational state allows programmers to employ the linguistic tools discussed in section 2.1, taking steps toward a pragmatics of computer protocols. Syndicate encourages programmers to design conversational protocols directly in terms of roles and to map conversational contributions onto the assertion and retraction of assertions in the shared space. Grice's maxims offer high-level guidance for defining the meaning of each assertion: the maxims of quantity guide the design of the individual records included in each assertion; those of quality and relevance help determine the criteria for when an assertion should be made and when it should be retracted; and those of manner shape a vocabulary of primitive assertions with precisely-defined meanings that compose when simultaneously expressed to yield complex derived meanings.
Syndicate's assertions of interest determine the movement of knowledge in a system. They define, in effect, the set of facts an actor is “listening” for. All communication mechanisms must have some equivalent feature, used to route information from place to place. Unusually, however, Syndicate allows actors to react to these assertions of interest, in that assertions of interest are ordinary assertions like any other. Actors may act based on their knowledge of the way knowledge moves in a system by expressing interest in interest and deducing implicatures from the discovered facts. Mey (2001) defines a conversational implicature as “something which is implied in conversation, that is, something which is left implicit in actual language use.” Grice (1975) makes three statements helpful in pinning down the idea of conversational implicature: 1. “To assume the presence of a conversational implicature, we have to assume that at least the Cooperative Principle is being observed.” 2. “Conversational implicata are not part of the meaning of the expressions to the employment of which they attach.” This is what distinguishes implicature from implication. 3. “To calculate a conversational implicature is to calculate what has to be supposed in order to preserve the supposition that the Cooperative Principle is being observed.”
For example, imagine an actor responsible for answering questions about factorials. The assertion means that the factorial of is . If learns that some peer has asserted , which is to be interpreted as interest in the set of facts describing all potential answers to the question “what is the factorial of ?,” it can act on this knowledge to compute a suitable answer and can then assert in response. Once it learns that interest in the factorial of is no longer present in the group, it can retract its own assertion and release the corresponding storage resources. Knowledge of interest in a topic acts as a signal of demand for some resource: here, computation (directly) and storage (indirectly). The raw fact of the interest itself has the direct semantic meaning “please convey to me any assertions matching this pattern”, but has an indirect, unspoken, pragmatic meaning—an implicature—in our imagined protocol of “please compute the answer to this question.”
The idea of implicature finds use beyond assertions of interest. For example, the process of deducing an implicature may be used to reconstruct temporarily- or permanently-unavailable information “from context,” based on the underlying assumption that the parties involved are following the Cooperative Principle. For example, a message describing successful fulfillment of an order carries an implicature of the existence of the order. A hearer of the message may infer the order's existence on this basis. Similarly, a reply implicates the existence of a request.
Finally, the mechanism that Syndicate provides for conveying assertions from actor to actor via the dataspace allows reasoning about common knowledge (Fagin et al. 2004). An actor placing some assertion into the dataspace knows both that all interested peers will automatically learn of the assertion and that each such peer knows that all others will learn of the assertion. Providing this guarantee at the language level encourages the use of epistemic reasoning in protocol design while avoiding the risks of implementing the necessary state-management substrate by hand.
Run-time unpredictability.
Recall from section 2.3 that robust treatment of unpredictability requires that we must be able to either detect and respond to or forestall the occurrence of the various unpredictable situations inherent to concurrent programming. The dataspace model is the foundation of Syndicate's approach to questions K4 and K5, offering a means for signaling and detection of such events. However, by itself the dataspace model is not enough. The picture is completed with linguistic features for structuring state and control flow within each individual actor. These features allow programmers to concisely express appropriate responses to unexpected events. Finally, Syndicate's knowledge-based approach suggests techniques for protocol design which can help avoid certain forms of unpredictability by construction.
The dataspace model constrains the means by which Syndicate programs may communicate events within a group, including communication of unpredictable events. All communication must be expressed as changes in the set of assertions in the dataspace. Therefore, an obvious approach is to use assertions to express such ideas as demand for some service, membership of some group, presence in some context, availability of some resource, and so on. Actors expressing interest in such assertions will receive notifications as matching assertions come and go, including when they vanish unexpectedly. Combining this approach with the guarantee that the dataspace removes all assertions of a failing actor from the dataspace yields a form of exception propagation.
For example, consider a protocol where actors assert , where is a message for the user, in order to cause a user interface element to appear on the user's display. The actor responsible for reacting to such assertions, creating and destroying graphical user interface elements, will react to retraction of a assertion by removing the associated graphical element. The actor that asserts some may deliberately retract it when it is no longer relevant for the user. However, it may also crash. If it does, the dataspace model ensures that its assertions are all retracted. Since this includes the assertion, the actor managing the display learns automatically that its services are no longer required.
Another example may be seen in the example discussed above. The client asserting may “lose interest” before it receives an answer, or of course may crash unexpectedly. From the perspective of actor , the two situations are identical: is informed of the retraction, concludes that no interest in the factorial of remains, and may then choose to abandon the computation. The request implicated by assertion of is effectively canceled by retraction, whether this is caused by some active decision on the part of the requestor or is an automatic consequence of its unexpected failure.
The dataspace model thus offers a mechanism for using changes in assertions to express changes in demand for some resource, including both expected and unpredictable changes. Building on this mechanism, Syndicate offers linguistic tools for responding appropriately to such changes. Assertions describing a demand or a request act as framing knowledge and thus delimit a conversation about the specific demand or request concerned. For example, the presence of for each particular corresponds to one particular “topic of conversation”. Likewise, the assertion corresponds to a particular “call frame” invoking the services of actor . Actors need tools for describing such conversational frames, associating local conversational state, relevant event handlers, and any conversation-specific assertions that need to be made with each conversational frame created.
Syndicate introduces a language construct called a facet for this purpose. Each actor is composed of multiple facets; each facet represents a particular conversation that the actor is engaged in. A facet both scopes and specifies conversational responses to incoming events. Each facet includes private state variables related to the conversation concerned, as well as a bundle of assertions and event handlers. Each event handler has a pattern over assertions associated with it. Each of these patterns is translated into an assertion of interest and combined with the other assertions of the facet to form the overall contribution that the facet makes to the shared dataspace. An analogy to objects in object-oriented languages can be drawn. Like an object, a facet has private state. Its event handlers are akin to an object's methods. Unique to facets, though, is their contribution to the shared state in the dataspace: objects lack a means to automatically convey changes in their local state to interested peers.
Facets may be nested. This can be used to reflect nested sub-conversations via nested facets. When a containing facet is terminated, its contained facets are also terminated, and when an actor has no facets left, the actor itself terminates. Of course, if the actor crashes or is explicitly shut down, all its facets are removed along with it. These termination-related aspects correspond to the idea that a thread of conversation that logically depends on some overarching discussion context clearly becomes irrelevant when the broader discussion is abandoned.
The combination of Syndicate's facets and its assertion-centric approach to state replication yields a mechanism for robustly detecting and responding to certain kinds of unpredictable event. However, not all forms of unpredictability lend themselves to explicit modeling as shared assertions. For these, we require an alternative approach.
Consider unpredictable interleavings of events: for example, UDP datagrams may be reordered arbitrarily by the network. If some datagram can only be interpreted after datagram has been interpreted, a datagram receiver must arrange to buffer packets when they are received out of order, reconstructing an appropriate order to perform its task. The same applies to messages passed between actors in the actor model. The observation that datagram establishes necessary context for the subsequent message suggests an approach we may take in Syndicate. If instead of messages we model and as assertions, then we may write our program as follows:
- Express interest in . Wait until notified that has been asserted.
- Express interest in . Wait until notified that has been asserted.
- Process and as usual.
- Withdraw the previously-asserted interests in and .
This program will function correctly no matter whether is asserted before or vice versa. The structure of program reflects the observation that supplies a frame within which is to be understood by paying attention to only after having learned . Use of assertions instead of messages allows an interpreter of knowledge to decouple itself from the precise order of events in which knowledge is acquired and shared, concentrating instead on the logical dependency ordering among items of knowledge.
Finally, certain forms of unpredictability cannot be effectively detected or forestalled. For example, no system can distinguish nontermination from mere slowness in practice. In cases such as these, timeouts can be used in Syndicate just as in other languages. Modeling time as a protocol involving assertions in the dataspace allows us to smoothly incorporate time with other protocols, treating it as just like any other kind of knowledge about the world.
Unpredictability in the design process.
Section 2.4, expanding on question K6, introduced the challenges of debuggability, flexibility, and upgradeability. The dataspace model contributes to debuggability, while facets and hierarchical layering of dataspaces contribute to flexibility. While this dissertation does not offer more than a cursory investigation of upgradeability, the limited exploration of the topic so far completed does suggest that it could be smoothly integrated with the Syndicate design.
The dataspace model leads the programmer to reason about the group of collaborating actors as a whole in terms of two kinds of change: actions that alter the set of assertions in the dataspace, and events delivered to individual actors as a consequence of such actions. This suggests a natural tracing mechanism. There is nothing to the model other than events and actions, so capturing and displaying the sequence of actions and events not only accurately reflects the operation of a dataspace program, but directly connects to the programmer's mental model as well.
Facets can be seen as atomic units of interaction. They allow decomposition of an actor's relationships and conversations into small, self-contained pieces with well-defined boundaries. As the overall goals of the system change, its actors can be evolved to match by making alterations to groups of related facets in related actors. Altering, adding, or removing one facet while leaving others in an actor alone makes perfect sense.
The dataspace model is hierarchical. Each dataspace is modeled as a component in some wider context: as an actor in another, outer dataspace. This applies recursively. Certain assertions in the dataspace may be marked with a special constructor that causes them to be relayed to the next containing dataspace in the hierarchy, yielding cross-dataspace interaction. Peers in a particular dataspace are given no means of detecting whether their collaborators are simple actors or entire nested dataspaces with rich internal structure. This frees the program designer to decompose an actor into a nested dataspace with multiple contained actors, without affecting other actors in the system at large. This recursive, hierarchical (dis)aggregation of actors also contributes to the flexibility of a Syndicate program as time goes by and requirements change.
Code upgrade is a challenging problem for any system. Replacing a unit of code involves the old code marshaling its state into a bundle of information to be delivered to the new code. In other words, the actor involved sends a message to its “future self”. Systems like Erlang (Armstrong 2003) incorporate sophisticated language- and library-level mechanisms for supporting such code replacement. Syndicate shares with Erlang some common ideas from the actor model. The strong isolation between actors allows each to be treated separately when it comes to code replacement. Logically, each is running an independent codebase. By casting all interactions among actors in terms of a protocol, both Erlang and Syndicate offer the possibility of protocol-mediated upgrades and reboots affecting anything from a small part to the entirety of a running system.
2.6Syndicate design principles
In upcoming chapters, we will see concrete details of the Syndicate design and its implementation and use. Before we leave the high-level perspective on concurrency, however, a few words on general principles of the design of concurrent and distributed systems are in order. I have taken these guidelines as principles to be encouraged in Syndicate and in Syndicate programs. To be clear, they are my own conjectures about what makes good software. I developed them both through my experiences with early Syndicate prototypes and my experiences of development of large-scale commercial software in my career before beginning this project. In some cases, the guidelines influenced the Syndicate design, having an indirect but universal effect on Syndicate programs. In others, they form a set of background assumptions intended to directly shape the protocols designed by Syndicate programmers.
Exclude implementation concepts from domain ontologies.
When working with a Syndicate implementation, programmers must design conversational protocols that capture relevant aspects of the domain each program is intended to address. The most important overarching principle is that Syndicate programs and protocols should make their domain manifest, and hide implementation constructs. Generally, each domain will include an ontology of its own, relating to concepts largely internal to the domain. Such an ontology will seldom or never include concepts from the host language or even Syndicate-specific ideas.
Following this principle, Syndicate takes care to avoid polluting a programmer's domain models with implementation- and programming-language-level concepts. As far as possible, the structure and meaning of each assertion is left to the programmer. Syndicate implementations reserve the contents of a dataspace for domain-level concepts. Access to information in the domain of programs, relevant to debugging, tracing and otherwise reflecting on the operation of a running program, is offered by other (non-dataspace, non-assertion) means. This separation of domain from implementation mechanism manifests in several specific corollaries:
- Do not propagate host-language exception values across a dataspace.
An actor that raises an uncaught exception is terminated and removed from the dataspace, but the details of the exception (stack traces, error messages, error codes etc.) are not made available to peers via the dataspace. After all, exceptions describe some aspect of a running computer program, and do not in general relate to the program's domain.
17Syndicate distinguishes itself from Erlang here. Erlang's failure-signaling primitives, links and monitors, necessarily operate in terms of actor IDs, so it is no great step to include stack traces and error messages alongside an actor ID in a failure description record.Instead, a special reflective mechanism is made available for host-language programs to access such information for debugging and other similar purposes. Actors in a dataspace do not use this mechanism when operating normally. As a rule, they instead depend on domain-level signaling of failures in terms of the (automatic) removal of domain-level assertions on failure, and do not depend on host-language exceptions to signal domain-level exceptional situations.
- Make internal actor identifiers completely invisible.
The notion of a (programming-language) actor is almost never part of the application domain; this goes double for the notion of an actor's internal identifier (a.k.a. pointer, “pid”, or similar). Where identity of specific parties is relevant to a domain, Syndicate requires the protocol to explicitly specify and manage such identities, and they remain distinct from the internal identities of actors in a running Syndicate program. Again, during debugging, the identities of specific actors are relevant to the programmer, but this is because the programmer is operating in a different domain from that of the program under study.
Explicit treatment of identity unlocks two desirable abilities:
- One (implementation-level) actor can transparently perform multiple (domain-level) roles. Having decoupled implementation-level identity from domain-level information, we are free to choose arbitrary relations connecting them.
- One actor can transparently delegate portions of its responsibilities to others. Explicit management of identity allows actors to share a domain-level identity without needing to share an implementation-level identity. Peers interacting with such actors remain unaware of the particulars of any delegation being employed.
- Multicast communication should be the norm; point-to-point, a special case.
Conversational interactions can involve any number of participants. In languages where the implementation-provided medium of conversation always involves exactly two participants, programmers have to encode -party domain-level conversations using the two-party mechanism. Because of this, messages between components have to mention implementation-level conversation endpoints such as channel or actor IDs, polluting otherwise domain-specific ontologies with implementation-level constructs. In order to keep implementation ideas out of domain ontologies, Syndicate does not define any kind of value-level representation of a conversation. Instead, it leaves the choice of scheme for naming conversations up to the programmer.
- Equivalences on messages, assertions and other forms of shared state should be in terms of the domain, not in terms of implementation constructs.
For example, consider deduplication of received messages. In some protocols, in order to make message receipt idempotent, a table of previously-seen messages must be maintained. To decide membership of this table, a particular equivalence must be chosen. Forcing this equivalence to involve implementation-level constructs entails a need for the programmer to explicitly normalize messages to ensure that the implementation-level equivalence reflects the desired domain-level equivalence. To be even more specific:
- If a transport includes message sequence numbers, message identifiers, timestamps etc., then these items of information from the transport should not form part of the equivalence used.
- Sender identity should not form part of the equivalence used. If a particular protocol needs to know the identity of the sender of a message, it should explicitly include a definition of the relevant notion of identity (not necessarily the implementation-level identity of the sender) and explicitly include it in message type definitions.
Support resource management decisions.
Concurrent programs in all their forms rely on being able to scope the size and lifetime of allocations of internal resources made in response to external demand. “Demand” and “resource” are extremely general ideas. As a result, resource management decisions appear in many different guises, and give rise to a number of related principles:
- Demand-matching should be well-supported.
Demand-matching is the process of automatic allocation and release of some resource in response to detected need elsewhere in a program. The concept applies in many different places.
For example, in response to the demand of an incoming TCP connection, a server may allocate resources including a pair of memory buffers and a new thread. The buffers, combined with TCP back-pressure, give control over memory usage, and the thread gives control over compute resources as well as offering a convenient language construct to attach other kinds of resource-allocation and -release decisions to. When the connection closes, the server may terminate the thread, release other associated resources, and finalize its state.
Another example can be found in graphical user interfaces, where various widgets manifest in response to the needs of the program. An entry in a “buddy list” in a chat program may be added in response to presence of a contact, making the “demand” the presence of the contact and the “resource” the resulting list entry widget. When the contact disconnects, the “demand” for the “resource” vanishes, and the list entry widget should be removed.
- Service presence (Konieczny et al. 2009) and presence information generally should be well-supported.
Consider linking multiple independent services together to form a concurrent application. A web-server may depend on a database: it “demands” the services of the database, which acts as a “resource”. The web-server and database may in turn depend upon a logging service. Each service cannot start its work before its dependencies are ready: it observes the presence of its dependencies as part of its initialization.
Similarly, in a publish-subscribe system, it may be expensive to collect and broadcast a certain statistic. A publisher may use the availability of subscriber information to decide whether or not the statistic needs to be maintained. Consumers of the statistic act as “demand”, and the resource is the entirety of the activity of producing the statistic, along with the statistic itself. Presence of consumers is used to manage resource commitment.
Finally, the AMQP messaging middleware protocol (The AMQP Working Group 2008) includes special flags named “immediate” and “mandatory” on each published message. They cause a special “return to sender” feature to be activated, triggering a notification to the sender only when no receiver is present for the message at the time of its publication. This form of presence allows a sender to take alternative action in case no peer is available to attend to its urgent message.
Support direct communication of public aspects of component state.
This is a generalization of the notion of presence, which is just one portion of overall state.
Avoid dependence on timeouts.
In a distributed system, a failed component is indistinguishable from a slow one and from a network failure. Timeouts are a pragmatic solution to the problem in a distributed setting. Here, however, we have the luxury of a non-distributed design, and we may make use of specific forms of “demand” information or presence in order to communicate failure. Timeouts are still required for inter-operation with external systems, but are seldom needed as a normal part of greenfield Syndicate protocol design.
Reduce dependence on order-of-operations.
The language should be designed to make programs robust by default to reordering of signals. As part of this, idempotent signals should be the default where possible.
- Event-handlers should be written as if they were to be run in a (pseudo-) random order, even if a particular implementation does not rearrange them randomly. This is similar to the thinking behind the random event selection in CML's choice mechanism (Reppy 1992 page 131).
- Questions of deduplication, equivalence, and identity must be placed at the heart of each Syndicate protocol design, even if only at an abstract level.
Eschew transfer of higher-order data.
Mathematical and computational structures enjoy an enormous amount of freedom not available to structures that must be realized in the physical world. Similarly, patterns of interaction that can be realized in a non-distributed setting are often inappropriate, unworkable, or impossible to translate to a distributed setting. One example of this concerns higher-order data, by which I mean certain kinds of closure, mutable data structures, and any other stateful kind of entity.
Syndicate is not a distributed programming language, but was heavily inspired by my experience of distributed programming and by limitations of existing programming languages employed in a distributed setting. Furthermore, certain features of the design suggest that it may lead to a useful distributed programming model in future. With this in mind, certain principles relate to a form of physical realizability; chief among them, the idea of limiting information exchange to first-order data wherever possible. The language should encourage programmers to act as if transfer of higher-order data between peers in a dataspace were impossible. While non-distributed implementations of Syndicate can offer support for transfer of functions, objects containing mutable references, and so on, stepping to a distributed setting limits programs to exchange of first-order data only, since real physical communication networks are necessarily first-order. Transfer of higher-order data involves a hidden use/mention distinction. Higher-order data may be encoded, but cannot directly be transmitted.
With that said, however, notions of stateful location or place are important to certain domains, and the ontologies of such domains may well naturally include references to such domain-relevant location information. It is host-language higher-order data that Syndicate discourages, not domain-level references to location and located state.
Arrange actors hierarchically.
Many experiments in structuring groups of (actor model) actors have been performed over the past few decades. Some employ hierarchies of actors, that is, the overall system is structured as a tree, with each actor or group existing in exactly one group (e.g. Varela and Agha 1999). Others allow actors to be placed in more than one group at once, yielding a graph of actors (e.g. Callsen and Agha 1994).
Syndicate limits actor composition to tree-shaped hierarchies of actors, again inspired by physical realizability. Graph-like connectivity is encoded in terms of protocols layered atop the hierarchical medium provided. Recursive groupings of computational entities in real systems tend to be hierarchical: threads within processes within containers managed by a kernel running under a hypervisor on a core within a CPU within a machine in a datacenter.
2.7On the name “Syndicate”
Now that we have seen an outline of the Syndicate design, the following definitions may shed light on the choice of the name “Syndicate”:
A syndicate is a self-organizing group of individuals, companies, corporations or entities formed to transact some specific business, to pursue or promote a shared interest.
— Wikipedia
Syndicate, n.
1. A group of individuals or organizations combined to promote a common interest.
1.1 An association or agency supplying material simultaneously to a number of newspapers or periodicals.
Syndicate, v.tr.
...
1.1 Publish or broadcast (material) simultaneously in a number of newspapers, television stations, etc.
— Oxford Dictionary
An additional relevant observation is that a syndicate can be a group of companies, and a company can be a group of actors.
3Approaches to Coordination
Our analysis of communication and coordination so far has yielded a high-level, abstract view on concurrency, taking knowledge-sharing as the linchpin of cooperation among components. The previous chapter raised several questions, answering some in general terms, and leaving others for investigation in the context of specific mechanisms for sharing knowledge. In this chapter, we explore these remaining questions. To do so, we survey the paradigmatic approaches to communication and coordination. Our focus is on the needs of programmers and the operational issues that arise in concurrent programming. That is, we look at ways in which an approach helps or hinders achievement of a program's goals in a way that is robust to unpredictability and change.
3.1A concurrency design landscape
The outstanding questions from chapter 2 define a multi-dimensional landscape within which we place different approaches to concurrency. A given concurrency model can be assigned to a point in this landscape based on its properties as seen through the lens of these questions. Each point represents a particular set of trade-offs with respect to the needs of programmers.
To recap, the questions left for later discussion were:
- K4Which forms of knowledge-sharing are robust in the face of the unpredictability intrinsic to concurrency?
- K6Which forms of knowledge-sharing are robust to and help mitigate the impact of changes in the goals of a program?
In addition, the investigation of question K3 (“what do concurrent components need to know to do their jobs?”) concluded with a picture of domain knowledge, epistemic knowledge, framing knowledge, and knowledge flow within a group of components. However, it left unaddressed the question of mechanism, giving rise to a follow-up question:
In short, the three questions relate to robustness, operability and mechanism, respectively. The rest of the chapter is structured around an informal investigation of characteristics refining these categories.
- Mechanism (K3bis).
- A central characteristic of a given concurrency model is its mechanism for exchange of knowledge among program components. Each mechanism yields a different set of possibilities for how concurrent conversations evolve. First, a conversation may have arbitrarily many participants, and a participant may engage in multiple conversations at once. Hence, models and language designs must be examined as to Second, conversations come with associated state. Each participating component must find out about changes to this state and must integrate those changes with its local view. The component may also wish to change conversational state; such changes must be signaled to relevant peers. A mechanism can thus be analyzed in terms of
- Robustness (K4).
- Each concurrency model offers a different level of support to the programmer for addressing the unpredictability intrinsic to concurrent programming. Programs rely on the integrity of each participant's view of overall conversational state; this may entail consideration of consistency among different views of the shared state in the presence of unpredictable latency in change propagation. These lead to investigation of
- C5how a model helps maintain integrity of conversational state and
- C6how it helps ensure consistency of state as a program executes.
- Operability (K6).
- The notion of operability is broad, including attributes pertaining to the ease of working with the model at design, development, debugging and deployment time. We will focus on the ability of a model to support
Characteristics C1–C12 in figure 3 will act as a lens through which we will examine three broad families of concurrency: shared memory models, message-passing models, and tuplespaces and external databases. In addition, we will analyze the fact space model briefly mentioned in the previous chapter.
We illustrate our points throughout with a chat server that connects an arbitrary number of participants. It relays text typed by a user to all others and generates announcements about the arrival and departure of peers. A client may thus display a list of active users. The chat server involves chat-room state—the membership of the room—and demands many-to-many communication among the concurrent agents representing connected users. Each such agent receives events from two sources: its peers in the chat-room and the TCP connection to its user. If a user disconnects or a programming error causes a failure in the agent code, resources such as TCP sockets must be cleaned up correctly, and appropriate notifications must be sent to the remaining agents and users.
3.2Shared memory
Shared memory languages are those where threads communicate via modifications to shared memory, usually synchronized via constructs such as monitors (Gosling et al. 2014; IEEE 2009; ISO 2014). Figure 4 sketches the heart of a chat room implementation using a monitor (Brinch Hansen 1993) to protect the shared members variable.
(C1; C3; C4) Mutable memory tracks shared state and also acts as a communications mechanism. Buffers and routing information for messages between threads are explicitly encoded as part of the conversational state, which naturally accommodates the multi-party conversations of our chat server. However, announcing changes in conversational state to peers—a connection or disconnection, for example—requires construction of a broadcast mechanism out of low-level primitives.
(C2) To engage in multiple conversations at once, a thread must monitor multiple regions of memory for changes. Languages with powerful memory transactions make this easy; the combination of “retry” and “orelse” gives the requisite power (Harris et al. 2005). Absent such transactions, and ruling out polling, threads must explicitly signal each other when making changes. If a thread must wait for any one of several possible events, it is necessary to reinvent multiplexing based on condition variables and write code to perform associated book-keeping.
class Chatroom
private Map<String, (String->())> members
public synchronized connect(user, callback)
for (existingUser, _) in members
callback(existingUser + " arrived")
members.put(user, callback)
announce(user + " arrived")
public synchronized speak(user, text)
announce(user + ": " + text)
public synchronized disconnect(user)
if (!members.containsKey(user)) { return }
members.remove(user)
announce(user + " left")
private announce(what)
for (user, callback) in members.clone()
try { callback(what) }
catch (exn) { disconnect(user) }
(C5) Maintaining the integrity of shared state is famously difficult. The burden of correctly placing transaction boundaries or locks and correctly ordering updates falls squarely on the programmer. It is reflected in figure 4 not only in the use of the monitor concept itself, but also in the careful ordering of events in the connect and disconnect methods. In particular, the call to announce (line 13) must follow the removal of user (line 12), because otherwise, the system may invoke callback for the disconnected user. Similarly, cloning the members map (line 15) is necessary so that a disconnecting user (line 17) does not change the collection mid-iteration. Moreover, even with transactions and correct locking discipline, care must be taken to maintain logical invariants of an application. For example, if a chat user's thread terminates unexpectedly without calling disconnect, the system continues to send output to the associated TCP socket indefinitely, even though input from the socket is no longer being handled, meaning members has become logically corrupted. Conversely, a seemingly-correct program may call disconnect twice in corner cases, which explains the check (line 11) for preventing double departure announcements.
(C7; C8) Memory transactions with “retry” allow control flow to follow directly from changes to shared data; otherwise, however, data flow is completely decoupled from inter-thread control flow. The latter is provided via synchronization primitives, which are only coincidentally associated with changes to the shared store. Coming from the opposite direction, control flow is also decoupled from data flow. For example, exceptions do not automatically trigger a clean-up of shared state or signal the termination of the thread to the relevant group of peers. Determining responsibility for a failure and deciding on appropriate recovery actions is challenging. Consider an action by user A that leads to a call to announce. If the callback associated with user B (line 16) throws an exception, the handler on line 17 catches it. To deal with this situation, the developer must reason in terms of three separate, stateful entities with non-trivial responsibilities: the agents for A and B plus the chat room itself. If the exception propagates, it may not only damage the monitor’s state but terminate the thread representing A, even though it is the fault of B’s callback. Contrast the problems seen in this situation with the call to the callback in connect (line 5); it does not need an exception handler, because the data flow resulting from the natural control flow of exception propagation is appropriate.
(C9) The thread model also demands the manual management of resources for a given conversation. For example, disposal of unwanted or broken TCP sockets must be coded explicitly in every program.
(C6) On the bright side, because it is common to have a single copy of any given piece of information, with all threads sharing access to that copy, explicit consideration of consistency among replicas is seldom necessary.
The many interlocking problems described above are difficult to discover in realistic programs, either through testing or formal verification. To reach line 17, a callback must fail mid-way through an announcement caused by a different user. The need for the .clone() on line 15 is not directly obvious. To truly gain confidence in the implementation, one must consider cases where multiple failures occur during one announcement, including the scenario where a failure during speak causes disconnect and another failure occurs during the resulting announcement. The interactions between the various locks, loops, callbacks, exception handlers, and pieces of mutable state are manifold and non-obvious.
(C10; C11; C12) Because shared memory languages allow unconstrained access to shared memory, not connected to any kind of scoping construct or protocol description, recovering a clear picture of the relationships and interactions among threads is extremely challenging. Similarly, as discussed for character C2, modifying a component to engage in multiple conversations at once or expanding the scope of a conversation to include multiple components is in general invasive. Finally, the lack of a clear linguistic specification of the structure of the shared memory and its relationship to a program's threads largely precludes automated support for orthogonal persistence and code upgrade.
An important variation on shared memory is the single-threaded, event-based style of JavaScript (ECMA 2015). While use of explicit locking is reduced in such cases, most of the analysis of the threaded approach continues to hold.
3.3Message-passing
Message-passing models of concurrency include languages using Hoare’s CSP channels (Hoare 1985) or channels from the -calculus (Milner 1999), and those based on the actor model (Hewitt, Bishop and Steiger 1973; Agha 1986; Agha et al. 1997; De Koster et al. 2016). Channel languages include CML (Donnelly and Fluet 2008; Reppy 1991), Go, and Rust, which all use channels in a shared-memory setting, and the Join Calculus (Fournet and Gonthier 2000), which assumes an isolated-process setting. This section concentrates on isolated processes because channel-based systems using shared memory are like those discussed in section 3.2. Actor languages include Erlang (Armstrong 2003), Scala (Haller and Odersky 2009), AmbientTalk (Van Cutsem et al. 2014), and E (Miller, Tribble and Shapiro 2005).
Channel- and actor-based models are closely related (Fowler, Lindley and Wadler 2016). An actor receives input exclusively via a mailbox (Agha 1986), and messages are explicitly addressed by the sending actor to a specific recipient. In channel-based languages, messages are explicitly addressed to particular channels; each message goes to a single recipient, even when a channel’s receive capability is shared among a group of threads.
(C1) Both actor- and channel-based languages force an encoding of the chat room’s one-to-many medium in terms of built-in point-to-point communication constructs. Compare figure 5, which expresses the chat room as a process-style actor, with figure 6, which presents pseudo-code for a channel-based implementation. In figure 5, the actor embodying the chat room’s broadcast medium responds to Speak messages (line 15) by sending ChatOutput messages to actors representing users in the room. In figure 6, the thread running the chatroom() procedure responds similarly to Speak instructions received on its control channel (line 13).
def chatroom()
members = new Hashtable()
while True
match receiveMessage()
case Connect(user, PID)
monitor(PID) // Erlang-style "link"
for peer in members.keys
send(PID, ChatOutput(peer + " arrived"))
members.put(user, PID)
announce(members, user + " arrived")
case EXIT_SIGNAL(PID)
user = members.findKeyForValue(PID)
members.remove(user)
announce(members, user + " left")
case Speak(user, text)
announce(members, user + ": " + text)
def announce(members, what)
for PID in members.values
send(PID, ChatOutput(what))
(C2) Languages with channels often provide a “select” construct, so that programs can wait for events on any of a group of channels. Such constructs implement automatic demultiplexing by channel identity. For example, a thread acting as a user agent might await input from the chat room or the thread’s TCP connection (figure 7a). The language runtime takes care to atomically resolve the transaction. In these languages, a channel reference can stand directly for a specific conversational context. By contrast, actor languages lack such a direct representation of a conversation. Actors retrieve messages from their own private mailbox and then demultiplex manually by inspecting received messages for correlation identifiers (figure 7b). While the channel-based approach forces use of an implementation-level correlator—the channel reference—explicit pattern-based demultiplexing allows domain-level information in each received message to determine the relevant conversational context. The E language (Miller 2006; De Koster, Van Cutsem and De Meuter 2016) is a hybrid of the two approaches, offering object references to denote specific conversations within the heap of a given actor, and employs method dispatch as a limited pattern matcher over received messages.
def chatroom(ch)
members = new Hashtable()
while True
match ch.get()
case Connect(user, callbackCh)
for peer in members.keys
callbackCh <- peer + " arrived"
members.put(user, callbackCh)
announce(members, user + " arrived")
case Disconnect(user)
members.remove(user)
announce(members, user + " left")
case Speak(user, text)
announce(members, user + ": " + text)
def announce(members, what)
for callbackCh in members.values
callbackCh <- what
select {
case line <- callbackCh:
tcpOutputCh <- line
case line <- tcpInputCh:
chatroomCh <- Speak(myName, line)
}
(a) channel-style
match receiveMessage() {
case ChatOutput(line):
socket.write(line)
case TcpInput(_, line):
send(ChatroomPID, Speak(myName, line))
}
(b) Actor-style
(C3; C4; C5) With either actors or channels, only a small amount of conversational state is managed by the language runtime. In actor systems, it is the routing table mapping actor IDs to mailbox addresses; in channel-based systems, the implementation of channel references and buffers performs the analogous role. Developers implement other kinds of shared state using message passing. This approach to conversational state demands explicit programming of updates to a local replica of the state based on received messages. Conversely, when an agent decides that a change to conversational state is needed, it must broadcast the change to the relevant parties. Correct notification of changes is crucial to maintaining integrity of conversational state. Most other aspects of integrity maintenance become local problems due to the isolation of individual replicas. In particular, a crashing agent cannot corrupt peers.
(C5) Still, the programmer is not freed from having to consider execution order when it comes to maintaining local state. Consider the initial announcement of already-present peers to an arriving user in figure 5 (lines 7–8). Many subtle variations on this code arise from moving the addition of the new user (line 9) elsewhere in the Connect handler clause; some omit self-announcement or announce the user’s appearance twice.
(C7; C8) Both models make it impossible to have data flow between agents without associated control flow. As Hewitt, Bishop and Steiger (1973) write, “control flow and data flow are inseparable” in the actor model. However, control flow within an agent may not coincide with an appropriate flow of data to peers, especially when an exception is raised and crashes an agent. Channel references are not exclusively owned by threads, meaning we cannot generally close channels in case of a crashing thread. Furthermore, most channel-based languages are synchronous, meaning a send blocks if no recipient is ready to receive. If a thread servicing a channel crashes, then the next send to that channel may never complete. In our chat server, a crashed user agent thread can deadlock the whole system: the chatroom thread may get stuck during callbacks (lines 7 and 17 in figure 6). In general, synchronous channel languages preclude local reasoning about potential deadlocks; interaction with some party can lead to deadlock via a long chain of dependencies. Global, synchronous thinking has to be brought to bear in protocol design for such languages: the programmer must consider scheduling in addition to data flow. Actors can do better. Sends are asynchronous, introducing latency and buffering but avoiding deadlock, and mailboxes are owned by exactly one actor. If that actor crashes, further communication to or from that actor is hopeless. Indeed, Erlang offers monitors and exit signals, i.e., an actor may subscribe to a peer’s lifecycle events (line 6 in figure 5). Such subscriptions allow the chat room to combine error handling with normal disconnection. No matter whether a user agent actor terminates normally or abnormally, the EXIT_SIGNAL handler (lines 12–14) runs, announcing the departure to the remaining peers. The E language allows references to remote objects to break when the associated remote vat exits, crashes, or disconnects, providing a hybrid of channel-style demultiplexing with Erlang-style exit signaling.
(C6) Where many replicas of a piece of state exist alongside communications delays, the problem of maintaining consistency among replicas arises. Neither channels nor actors have any support to offer here. Channels, and synchronous communication in general, seem to prioritize (without guaranteeing) consistency at the expense of deadlock-proneness; asynchronous communication avoids deadlock, but risks inconsistency through the introduction of latency.
(C9) Exit signals are a step toward automatically managing resource deallocation. While actors must manually allocate resources, the exit signal mechanism may be used to tie the lifetime of a resource, such as a TCP socket, to the lifetime of an actor. If fine-grained control is needed, it must be programmed manually. Additionally, in asynchronous (buffered) communication, problems with resource control arise in a different way: it is easy to overload a component, causing its input buffer or mailbox to grow potentially without bound.
(C10) Enforced isolation between components, and forcing all communication to occur via message-passing, makes the provision of tooling for visualizing execution traces possible. Languages such as Erlang include debug trace facilities in the core runtime, and make good use of them for lightweight capturing of traces even in production. However, the possibility of message races complicates reasoning and debugging; programmers are often left to analyze the live behavior of their programs, if tooling is unavailable or inadequate. Modification of programs to capture ad-hoc trace information frequently causes problematic races to disappear, further complicating such analysis.
(C11) As figure 7 makes clear, modifying a component to engage in multiple simultaneous conversations can be straightforward, if all I/O goes through a single syntactic location. However, if communication is hidden away in calls to library routines, such modifications demand non-local program transformations. Similarly, adding a new participant to an existing conversation can require non-local changes. In instances where a two-party conversation must now include three or more participants, this often results in reification of the communications medium into a program component in its own right.
(C12) Erlang encourages adherence to a “tail-call to next I/O action” convention allowing easy upgrade of running code. Strictly-immutable local data and functional programming combine with this convention to allow a module backing a process to be upgraded across such tail-calls, seamlessly transitioning to a new version of the code. In effect, all actor state is held in accumulator data structures explicitly threaded through actor implementations. Other actor languages without such strong conventions cannot offer such a smooth path to live code upgrade. Channel-based languages could include similar conventions; in practice, I am not aware of any that do so.
3.4Tuplespaces and databases
Finally, hybrid models exist, where a shared, mutable store is the medium of communication, but the store itself is accessed and components are synchronized via message passing. These models are database-like in nature. Languages employing such models include tuplespace-based languages such as Linda (Gelernter 1985; Carriero et al. 1994), Lime (Murphy, Picco and Roman 2006), and TOTAM (Scholliers, González Boix and De Meuter 2009; Scholliers et al. 2010; González Boix 2012; González Boix et al. 2014), as well as languages that depend solely on an external DBMS for inter-agent communication, such as PHP (Tatroe, MacIntyre and Lerdorf 2013).
Tuplespace languages have in common the notion of a “blackboard” data structure, a tuplespace, shared among a group of agents. Data items, called tuples, are written to the shared area and retrieved by pattern matching. Once published to the space, tuples take on independent existence. Similarly, reading a tuple from the space may move it from the shared area to an agent’s private store.
The original tuplespace model provided three essential primitives: out, in, and rd. The first writes tuples to the store; the other two move and copy tuples from the store to an agent, respectively. Both in and rd are blocking operations; if multiple tuples match an operation’s pattern, an arbitrary single matching tuple is moved or copied. Later work extended this austere model with, for example, copy-collect (Rowstron and Wood 1996), which allows copying of all matching tuples rather than the arbitrary single match yielded by rd. Such extensions add essential expressiveness to the system (Busi and Zavattaro 2001; Felleisen 1991). Lime goes further yet, offering not only non-blocking operations inp and rdp, but also reactions, which are effectively callbacks, executed once per matching tuple. Upon creation of a reaction, existing tuples trigger execution of the callback. When subsequent tuples are inserted, any matches to the reaction’s pattern cause additional callback invocations. This moves tuplespace programming toward programming with publish/subscribe middleware (Eugster et al. 2003). TOTAM takes Lime's reactions even further, allowing reaction to removal of a previously-seen tuple.
External DBMS systems share many characteristics with tuplespaces: they allow storage of relations; stored items are persistent; retrieval by pattern-matching is common; and many modern systems can be extended with triggers, code to be executed upon insertion, update, or removal of matching data. One difference is the notion of transactionality, standard in DBMS settings but far from settled in tuplespaces (Bakken and Schlichting 1995; Papadopoulos and Arbab 1998). Another is the decoupling of notions of process from the DBMS itself, where tuplespace systems integrate process control with other aspects of the coordination mechanism.
class UserAgent
public run(name, socket)
new Reaction(Present(_), fn(who) { socket.println(who + " arrived") })
new Reaction(Absent(_), fn(who) { socket.println(who + " left") })
new Reaction(Message(_,_), fn(who, what) { socket.println(who + ": " + what) })
previousLine = null
try
inp(Absent(name))
out(Present(name))
while (line = socket.readLine()) != null
if previousLine != null: in(Message(name, previousLine))
out(Message(name, line))
previousLine = line
finally
if previousLine != null: in(Message(name, previousLine))
in(Present(name))
out(Absent(name))
Figure 8 presents a pseudo-code tuplespace implementation of a user agent, combining Java-like constructs with Lime-like reactions. Previous sketches have concentrated on appropriate implementation of the shared medium connecting user agents; here, we concentrate on the agents themselves, because tuplespaces are already sufficiently expressive to support broadcasting.
(C1; C2; C3) Tuplespaces naturally yield multi-party communication. All communication happens indirectly through manipulation of shared state. Inserted tuples are visible to all participants. With reactions, programmers may directly express the relationship between appearance of tuples matching a pattern and execution of a code fragment, allowing a richer kind of demultiplexing of conversations than channel-based models. For example, the reactions in figure 8 (lines 3–5) manifestly associate conversations about presence, absence and utterances with specific responses, respectively; the tuplespace automatically selects the correct code to execute as events are received. By contrast, in tuplespace languages without reactions, the blocking natures of in and rd lead to multiplexing problems similar to those seen with shared memory and monitors.
(C1) Tuples are persistent, hence the need to retract each message before inserting the next (line 11). An unfortunate side effect is that if a new participant joins mid-conversation, it receives the most recent utterance from each existing peer, even though that utterance may have been made a long time ago.
(C7; C8; C4; C5) Data flow usually occurs concomitantly with control flow in a tuplespace; in and rd are blocking operations, and reactions trigger code execution in response to a received event. Control flow, however, does not always trigger associated data flow. Because manipulation of the tuplespace is imperative, no mechanism exists within the core tuplespace model to connect the lifetime of tuples in the space with the lifetime of the agent responsible for them. This can lead to difficulty maintaining application-level invariants, even though the system ensures data-structure-level integrity of the tuplespace itself. For an example, see the explicit clean-up action as the process prepares to exit (lines 15–17). In addition, the effect of exceptions inside reactions remains unclear in all tuplespace languages. Turning to external DBMS, we see that the situation is worse. There, setting aside the possibility of abusing triggers for the purpose, changes in state do not directly have an effect on the flow of control in the system. Connections between programs and the DBMS are viewed as entirely transient and records inserted are viewed as sacrosanct once committed.
(C8) Tuplespaces take a wide variety of approaches to failure-handling (Bakken and Schlichting 1995; Rowstron 2000). In Lime, in particular, tuples are localized to tuplespace fragments associated with individual agents. These fragments automatically combine when agents find themselves in a common context. Agent failure or disconnection removes its tuplespace fragment from the aggregate whole. While Lime does not offer the ability to react to removal of individual tuples, it can be configured to insert _host_gone tuples into the space when it detects a disconnection. By reacting to appearance of _host_gone tuples, applications can perform coarse-grained cleaning of the knowledgebase after disconnection or failure. Separately, TOTAM's per-tuple leases (González Boix et al. 2014) give an upper bound on tuple lifetime. Our example chat room is written in an imaginary tuplespace dialect lacking fine-grained reactions to tuple withdrawal, and thus inserts Absent records upon termination (lines 4, 8, and 17 in figure 8) to maintain its invariants.
(C6) Reactions and copy-collect allow maintenance of eventually-consistent views and production of consistent snapshots of the contents of a tuplespace, respectively. However, operations like rd are not only non-deterministic but non-atomic in the sense that by the time the existence of a particular tuple is signaled, that tuple may have been removed by a third party. Tuplespaces, then, offer some mechanisms by which the consistency of the various local replicas of tuplespace contents may be maintained and reasoned about. In contrast, most DBMS systems do not offer such mechanisms for reasoning about and maintaining a client’s local copy of data authoritatively stored at a server. Instead, a common approach is to use transactions to atomically query and then alter information. The effect of this is to bound the lifetime of local views on global state, ensuring that while they exist, they fit in to the transactional framework on offer, and that after their containing transaction is over, they cannot escape to directly influence further computation.
(C9) Detection of demand for some resource can be done using tuples indicating demand and corresponding reactions. The associated callback can allocate and offer access to the demanded resource. In systems like TOTAM, retraction of a demand tuple can be interpreted as the end of the need for the resource it describes; in less advanced tuplespaces, release of resources must be arranged by other means.
(C10) Both tuplespaces and external databases give excellent visibility into application state, on the condition that the tuplespace or database is the sole locus of such state. In cases where this assumption holds, the entirety of the state of the group is visible as the current contents of the shared store. This unlocks the possibility of rich tooling for querying and modifying this state. Such tooling is a well-integrated part of existing DBMS ecosystems. In principle, recording and display of traces of interactions with the shared store could also be produced and used in visualization or debugging.
(C11) The original tuplespace model of Linda lacked non-blocking operations, leading it to suffer from usability flaws well-known from the context of synchronous IPC. As Elphinstone and Heiser write,
While certainly minimal, and simple conceptually and in implementation, experience taught us significant drawbacks of [the model of synchronous IPC as the only mechanism]: it forces a multi-threaded design onto otherwise simple systems, with the resulting synchronisation complexities. (Elphinstone and Heiser 2013)
These problems are significantly mitigated by the addition of Lime's reactions and the later developments of TOTAM's context-aware tuplespace programming. Generally speaking, tuplespace-based designs have moved from synchronous early approaches toward asynchronous operations, and this has had benefits for extending the interactions of a given component as well as extending the scope of a given conversation. External DBMS systems are generally neutral when it comes to programming APIs, but many popular client libraries offer synchronous query facilities only, lack support for asynchronous operations, and offer only limited support for triggers.
(C12) External DBMS systems offer outstanding support for long-lived application state, making partial restarts and partial code upgrades a normal part of life with a DBMS application. Transactionality helps ensure that application restarts do not corrupt shared state. Tuplespaces in principle offer similarly good support.
Finally, the two models, viewed abstractly, suffer from a lack of proper integration with host languages. The original presentation of tuplespaces positions the idea as a complete, independent language design; in reality, tuplespaces tend to show up as libraries for existing languages. Databases are also almost always accessed via a library. As a result, developers must often follow design patterns to close the gap between the linguistic capabilities of the language and their programming needs. Worse, they also have to deploy several different coordination mechanisms, without support from their chosen language and without a clear way of resolving any incompatibilities.
3.5The fact space model
The fact space model (Mostinckx et al. 2007) synthesizes rule-based systems and a rich tuplespace model with actor-based programming in a mobile, ad-hoc networking setting to yield a powerful form of context-aware programming. The initial implementation of the model, dubbed Crime, integrates a RETE-based rule engine (Forgy 1982) with the TOTAM tuplespace and a functional reactive programming (FRP) library (Elliott and Hudak 1997; Bainomugisha et al. 2013) atop AmbientTalk, an object-oriented actor language in the style of E (Van Cutsem et al. 2014). AmbientTalk is unusual among actor languages for its consideration of multicast communication and coordination. In its role as “language laboratory”, it has incorporated ideas from many other programming paradigms. AmbientTalk adds distributed service discovery, error handling, anycast and multicast to an actor-style core language intended for a mobile, ad-hoc network context; TOTAM supplements this with a distributed database, and the rule engine brings logic programming to the table.
In the words of Mostinckx et al.,
The Fact Space model is a coordination model which provides applications with a federated fact space: a distributed knowledge base containing logic facts which are implicitly made available for all devices within reach. [...] [T]he Fact Space model combines the notion of a federated fact space with a logic coordination language. (Mostinckx, Lombide Carreton and De Meuter 2008)
Tuples placed within the TOTAM tuplespace are interpreted as ground facts in the Prolog logic-programming sense. Insertions correspond to Prolog's assert; removals to retract. TOTAM's reactions, which unlike Lime may be triggered on either insertion or removal of tuples, allow connection of changes in the tuplespace to the inputs to the RETE-based rule engine, yielding forward-chaining logic programming driven by activity in the common space.
def userAgent(name, socket)
whenever: [Present, ?who] read: { socket.println(who + " arrived") }
outOfContext: { socket.println(who + " left") }
whenever: [Message, ?who, ?what] read: { socket.println(who + ": " + what) }
publish: [Present, name]
previousLine = nil
while (line = socket.readLine()) != nil
if previousLine != nil: inp([Message, name, previousLine])
publish: [Message, name, line]
previousLine = line
Figure 9 sketches a pseudo-code user agent program. An actor running userAgent is created for each connecting user. As it starts up, it registers two reactions. The first (lines 2–3) reacts to appearance and disappearance of Present tuples. The second (line 4) reacts to each Message tuple appearing in the space. Line 5 places a Present tuple representing the current user in the tuplespace, where it will be detected by peers. Lines 6–10 enter a loop, waiting for user input and replacing the user's previous Message, if any, with a new one.
(C1; C2; C7) The TOTAM tuplespace offers multi-party communication, and the rule engine allows installation of pattern-based reactions to events, resulting in automatic demultiplexing and making for a natural connection from data flow to associated control flow. Where an interaction serves to open a conversational frame for a sub-conversation, additional reactions may be installed; however, there is no linguistic representation of such conversational frames, meaning that any logical association between conversations must be manually expressed and maintained.
(C3; C4) AmbientTalk's reactive context-aware collections (Mostinckx, Lombide Carreton and De Meuter 2008) allow automatic integration of conclusions drawn by the rule engine with collection objects such as sets and hash tables. Each collection object is manifested as a behavior in FRP terminology, meaning that changes to the collection can in turn trigger downstream reactions depending on the collection's value. However, achieving the effect of propagating changes in local variables as changes to tuples in the shared space is left to programmers.
(C5; C6; C8) The Fact Space model removes tuples upon component failure. Conclusions drawn from rules depending on removed facts are withdrawn in turn. Programs thereby enjoy logical consistency after partial failure. However, automatic retraction of tuples is performed only in cases of disconnection. When a running component is engaged in multiple conversations, and one of them comes to a close, there is no mechanism provided by which facts relating to the terminated conversation may be automatically cleaned up. Programmers manually delete obsolescent facts or turn to a strategy borrowed from the E language, namely creation of a separate actor for each sub-conversation. If they choose this option, however, the interactions among the resulting plethora of actors may increase overall system complexity.
def userAgent(name, socket)
presentUsers = set()
whenever: [Present, ?who, ?status]
read: {
if who not in presentUsers
presentUsers.add(who)
socket.println(who + " arrived")
socket.println(who + " status: " + status)
}
outOfContext: {
if rd([Present, who, ?anyStatus]) == nil
presentUsers.remove(who)
socket.println(who + " left")
}
...
(C9) The ability to react to removal as well as insertion of tuples allows programs to match supply of some service to demand, by interpreting particular assertions as demand for some resource. This can, in principle, allow automatic resource management; however, this is only true if all allocation of and interaction with such resources is done via the tuple space. For example, if the actor sketched in figure 9 were to crash, then absent explicit exception-handling code, the connected socket would leak, remaining open. Additionally, in situations where tuples may be interpreted simultaneously at a coarse-grained and fine-grained level, some care must be taken in interpreting tuple arrival and departure events. For example, imagine a slight enhancement of our example program, where we include a user-chosen status message in our Present tuples. In order to react both to appearance and disappearance of a user as well as a change in a user's status, we must interpret Present tuples as sketched in figure 10. There, Present tuples are aggregated by their who fields, ignoring their status fields, in addition to being interpreted entire. The presentUsers collection serves as intermediate state for a kind of SELECT DISTINCT operation, indicating whether any Present tuples for a particular user exist at all in the tuplespace. In the retraction handler (lines 10–14) we explicitly check whether any Present tuples for the user concerned remain in the space, only updating presentUsers if none are left. This avoids incorrectly claiming that a user has left the chat room when they have merely altered their status message.
An alternative approach to the problem is to make use of a feature of Crime not yet described. The Crime implementation of the fact space model exposes the surface syntax of the included rule engine to the programmer, allowing logic program fragments to be written using a Prolog-like syntax and integrated with a main program written in AmbientTalk. This could allow a small program
UserPresent(?who) :- Present(?who,?status).
to augment the tuplespace with UserPresent tuples whenever any Present tuple for a given user exists at all. On the AmbientTalk side, programs would then react separately to appearance and disappearance of UserPresent and Present tuples.
(C10) Like tuplespaces generally, the fact space model has great potential for tool support and system state visualization. However, only those aspects of a program communicating via the underlying tuplespace benefit from its invariants. In the case of the Crime implementation based on AmbientTalk, only selected inter-component interactions travel via the tuplespace and rule engine, leaving other interactions out of reach of potential fact-space-based tools. Programmers must carefully combine reasoning based on the invariants of the fact space model with the properties of the other mechanisms available for programmer use, such as AmbientTalk's own inter-actor message delivery, service discovery and broadcast facilities.
(C11) Extending a conversation to new components and introducing an existing component to an additional conversation are both readily supported by the fact space model as implemented in Crime. However, because no automatic support for release of conversation-associated state exists (other than outright termination of an entire actor), programmers must carefully consider the interactions among individual components. When one of an actor's conversations comes to a close but other conversations remain active, the programmer must make sure to release local conversational state and remove associated shared tuples, but only when they are provably inaccessible to the remaining conversations.
(C12) Crime's AmbientTalk foundation is inspired by E, and can benefit directly from research done on persistence and object upgrade in E-like settings (Yoo et al. 2012; Miller, Van Cutsem and Tulloh 2013).
3.6Surveying the landscape
K3bis Mechanism | Shared memory | Message-passing | Tuplespaces | Fact spaces | Ideal |
C1 Conversation group size | arbitrary | point-to-point | arbitrary | arbitrary | arbitrary |
C2 Correlation/demultiplexing | manual | semi-automatic | semi-automatic | semi-automatic | automatic |
C3 Integration of state change | automatic | manual | semi-automatic | automatic | automatic |
C4 Signaling of state change | manual | manual | manual | manual | automatic |
K4 Robustness | Shared memory | Message-passing | Tuplespaces | Fact spaces | Ideal |
C5 Maintain state integrity | manual | manual | manual | semi-automatic | automatic |
C6 Ensure replica consistency | trivial | manual | semi-automatic | semi-automatic | automatic |
C7 Data control flow | no | yes | yes | yes | yes |
C8 Control data flow | no | partial | no | coarse-grained | fine-grained |
C9 Resource management | manual | manual | manual | coarse-grained | fine-grained |
K6 Operability | Shared memory | Message-passing | Tuplespaces | Fact spaces | Ideal |
C10 Debuggability/visualizability | poor | wide range | potentially good | potentially good | good |
C11 Evolvability | poor | moderate | moderate | good | good |
C12 Durability | poor | good | moderate/good | good | good |
Figure 11 summarizes this chapter's analysis. Each of the first four columns in the table shows, from the programmer's point of view, the support they can expect from a programming language taking the corresponding approach to concurrency. Each row corresponds to one of the properties of concurrency models introduced in figure 3. A few terms used in the table require explanation. An entry of “manual” indicates that the programmer is offered no special support for the property. An entry of “semi-automatic” indicates that some form of support for the property is available, at least for specialized cases, but that general support is again left to the programmer. For example, channel-based languages can automatically demultiplex conversations, but only so long as channels correspond one-to-one to conversations, and the fact space model automatically preserves integrity of conversational state, but only where the end of an actor's participation in a conversation is marked by disconnection from the shared space. Finally, an entry of “automatic” indicates that an approach to concurrency offers strong, general support for the property. An example is the fact space model's ability to integrate changes in the shared space with local variables via its reactive context-aware collections.
While the first four columns address the properties of existing models of concurrency, the final column of the table identifies an “ideal” point in design space for us to aim towards in the design of new models.
(C1; C2; C3; C4) We would like a flexible communications mechanism accommodating many-to-many as well as one-to-one conversations. A component should be able to engage in multiple conversations, without having to jump through hoops to do so. Events should map to event handlers directly in terms of their domain-level meaning. Since conversations come with conversational frames, and conversational frames scope state and behavior, such frames and their interrelationships should be explicit in program code. As conversations proceed, the associated conversational state evolves. Changes to that state should automatically be integrated with local views on it, and changes in local state should be able to be straightforwardly shared with peers. Agents should be offered the opportunity to react to all kinds of state changes.
(C5; C6) We would like to automatically enforce application-level invariants regarding shared, conversational state. In case of partial failure, we should be able to identify and discard damaged portions of conversational state. Where replicas of a piece of conversational state exist, we would like to be able to reason about their mutual consistency. (C7; C8) Hewitt’s criterion that “control and data flow are inseparable” should hold as far as possible, both in terms of control flow being manifestly influenced by data flow and in terms of translation of control effects such as exceptions into visible changes in the common conversational context. (C9) Since conversations often involve associated resources, we would like to be able to connect allocation and release of resources with the lifecycles of conversational frames.
(C10; C11; C12) Given the complexity of concurrent programming, we would like the ability to build tools to gain insight into system state and to visualize both correct and abnormal behavior for debugging and development purposes. Modification of our programs should easily accommodate changes in the scope of a given conversation among components, as well as changes to the set of interactions a given component is engaged in. Finally, robustness involves tolerance of partial failure and partial restarts; where long-lived application state exists, support for code upgrades should also be offered.
IITheory
Overview
Syndicate is a design in two parts. The first part is called the dataspace model. This model offers a mechanism for communication and coordination within groups of concurrent components, plus a mechanism for organizing such groups and relating them to each other in hierarchical assemblies. The second part is called the facet model. This model introduces new language features to address the challenges of describing an actor's participation in multiple simultaneous conversations.
Chapter 4 fleshes out the informal description of the dataspace model of section 2.5 with a formal semantics. The semantics describes a hierarchical structure of components in the shape of a tree. Intermediate nodes in the tree are called dataspaces. From the perspective of the dataspace model, leaf nodes in the tree are modeled as (pure) event-transducer functions; their internal structure is abstracted away.
Chapter 5 describes the facet model part of the Syndicate design, addressing the internal structure of the leaf actors of the dataspace model. Several possible means of interfacing a programming language to a dataspace exist. The simplest approach is to directly encode the primitives of the model in the language of choice, but this forces the programmer to attend to much detail that can be handled automatically by a suitable set of linguistic constructs. The chapter proposes such constructs, augments a generic imperative language model with them, and gives a formal semantics for the result. Together, the dataspace and facet models form a complete design for extending a non-concurrent host language with concurrency.
4Computational Model I: The Dataspace Model
This chapter describes the dataspace model using mathematical syntax and semantics, including theorems about the model's key properties. The goal of the model presented here is to articulate a language design idea. We wish to show how to construct a concurrent language from a generic base language via the addition of a fixed communication layer. The details of the base language are not important, and are thus largely abstracted away. We demand that the language be able to interpret dataspace events, encode dataspace actions, map between its internal data representations and the assertion values of the dataspace model, and confine its computational behavior to that expressible with a total mathematical function. We make the state of each actor programmed in the (extended) base language explicit, require that its behavior be specified as a state-transition function, and demand that it interacts with its peers exclusively via the exchange of immutable messages—not by way of effects. This strict enforcement of message-passing discipline does not prevent us from using an imperative base language, as long as its effects do not leak. In other words, the base could be a purely functional language such as Haskell, a higher-order imperative language such as Racket, or an object-oriented language such as JavaScript.
The dataspace model began life under the moniker “Network Calculus” (NC) (Garnock-Jones, Tobin-Hochstadt and Felleisen 2014), a formal model of publish-subscribe networking incorporating elements of presence as such, rather than the more general state-replication system described in the follow-up paper (Garnock-Jones and Felleisen 2016) and refined in this dissertation. The presentation in this chapter draws heavily on that of the latter paper, amending it in certain areas to address issues that were not evident at the time.
4.1Abstract dataspace model syntax and informal semantics
Figure 12 displays the syntax of dataspace model programs. Each program is an instruction to create a single actor: either a leaf actor or a dataspace actor. A leaf actor has the shape . Its initial assertions are described by the set , while its boot function embodies the first few computations the actor will perform. The boot function usually yields an record specifying a sequence of initial actions along with an existentially-quantified package . This latter specifies the type of the actor's private state, the initial private state value , and the actor's permanent event-transducing behavior function . Alternatively, the boot function may decide that the actor should immediately terminate, in which case it yields an record bearing a sequence of final actions for the short-lived actor to perform before it becomes permanently inert. A dataspace actor has the shape and creates a group of communicating actors sharing a new assertion store. Each in the sequence of programs contained in the definition of a dataspace actor will form one of the initial actors placed in the group as it starts its existence.
Each leaf actor behavior function consumes an event plus its actor's current state. The function computes either a record, namely a sequence of desired actions plus an updated state value, or an record carrying a sequence of desired final actions alone in case the actor decides to request its own termination. We require that such behavior functions be total. If the base language supports exceptions, any uncaught exceptions or similar must be translated into an explicit termination request. If this happens, we say that the actor has crashed, even though it returned a valid termination request in an orderly way.
In the -calculus, a program is usually a combination of an inert part—a function value—and an input value. In the dataspace model, delivering an event to an actor is analogous to such an application. However, the pure -calculus has no analogue of the actions produced by dataspace model actors.
A dataspace model actor may produce actions like those in the traditional actor model, namely sending messages and spawning new actors , but it may also produce state change notifications (SCNs) . These convey sets of assertions an actor wishes to publish to its containing dataspace.
As a dataspace interprets an SCN action, it updates its assertion store. It tracks every assertion made by each contained actor. It not only maps each actor to its current assertions, but each active assertion to the set of actors asserting it. The assertions of each actor, when combined with the assertions of its peers, form the overall set of assertions present in the dataspace.
When an actor issues an SCN action, the new assertion set completely replaces all previous assertions made by that actor. To retract an assertion, the actor issues a state change notification action lacking the assertion concerned. For example, imagine an actor whose most-recently-issued SCN action conveyed the assertion set . By issuing an SCN action , the actor would achieve the effect of retracting the assertion . Alternatively, issuing an SCN would augment the actor's assertion set in the assertion store with a new assertion . Finally, the SCN describes assertion of simultaneous with retraction of .
We take the liberty of using wildcard as a form of assertion set comprehension. For now, when we write expressions such as , we mean the set of all pairs having the atom on the left. In addition, we use three syntactic shorthands as constructors for commonly-used structures: , and are abbreviations for tuples of the atoms observe, outbound and inbound, respectively, with the value . Thus, means .
When an actor issues an assertion of shape , it expresses an interest in being informed of all assertions . In other words, an assertion acts as a subscription to . Similarly, specifies interest in being informed about assertions of shape , and so on. The dataspace sends a state change notification event to an actor each time the set of assertions matching the actor's interests changes.
An actor's subscriptions are assertions like any other. State change notifications thus give an actor control over its subscriptions as well as over any other information it wishes to make available to its peers or acquire from them.
Dataspace “ISWIM”.
The examples in this chapter use a mathematical notation to highlight the essential aspects of the coordination abilities of the dataspace model without dwelling on base language details. While the notation used is not a real language (if you see what I mean (Landin 1966)), it does have implemented counterparts in the prototypes of the dataspace model that incorporate Racket and JavaScript as base languages. These implementations were used to write programs which in turn helped build intuition and serve as a foundation for the full Syndicate design.
We use text to denote Dataspace ISWIM variables and to denote literal atoms and strings. In places where the model demands a sequence of values, for example the actions returned from a behavior function, our language supplies a single list value . We include list comprehensions because actors frequently need to construct, filter, and transform sequences of values. Similarly, we add syntax for sets , including set comprehensions , and for tuples , to represent the sets and tuples needed by the model.
We define functions using patterns over the language's values. For example, the leaf behavior function definition
4.1Suppose we wish to create an actor X with an interest in the price of milk. Here is how it might be written:
If some peer Y previously asserted , this assertion is immediately delivered to X in a state change notification event. Infinite sets of interests thus act as query patterns over the shared dataspace.
Redundant assertions do not cause change notifications. If actor Z subsequently also asserts , no notification is sent to X, since X has already been informed that has been asserted. However, if Z instead asserts , then a change notification is sent to X containing both asserted prices.
Symmetrically, it is not until the last assertion of shape for some particular is retracted from the dataspace that X is sent a notification about the lack of assertions of shape .
When an actor crashes, all its assertions are automatically retracted. By implication, if no other actor is making the same assertions at the time, then peers interested in the crashing actor's assertions are sent a state change notification event informing them of the retraction(s).
4.2For a different example, consider an actor representing a shared mutable reference cell. A new box (initially containing ) is created by choosing a name and launching the actor
To read the value of the box, clients either include an appropriate assertion in their initially declared interests or issue it later:
The behavior of the and actors, when run together in a dataspace, is to repeatedly increment the number held in the .
4.3Our next example demonstrates demand matching. The need to measure demand for some service and allocate resources in response appears in different guises in a wide variety of concurrent systems. Here, we imagine a client, , beginning a conversation with some service by adding to the shared dataspace. In response, the service should create a worker actor to talk to .
The “listening” part of the service is spawned as follows:
4.4Our final example demonstrates an architectural pattern seen in operating systems, web browsers, and cloud computing. Figure 13 sketches the architecture of a program implementing a word processing application with multiple open documents, alongside other applications and a file server actor. The “Kernel” dataspace is at the bottom of this tree-like representation of containment.
The hierarchical nature of the dataspace model means that each dataspace has a containing dataspace in turn. Actors may interrogate and augment assertions held in containing dataspaces by prefixing assertions relating to the th relative dataspace layer with “outbound” markers . Dataspaces relay -labeled assertions outward. Some of these assertions may describe interest in assertions existing at an outer layer. Any assertions matching such interests are relayed back in by the dataspace, which prefixes them with an “inbound” marker to distinguish them from local assertions.
In this example, actors representing open documents communicate directly with each other via a local dataspace scoped to the word processor, but only indirectly with other actors in the system. When the actor for a document decides that it is time to save its content to the file system, it issues a message such as
The file server responds to two protocols, one for writing files and one for reading file contents and broadcasting changes to files as they happen. These protocols are articulated as two subscriptions:
The second indicates interest in subscriptions in the shared dataspace, an interest in interest in file contents. This is how the server learns that peers wish to be kept informed of the contents of files under its control. The file server is told each time some peer asserts interest in the contents of a file. In response, it asserts facts of the form
Our examples illustrate the key properties of the dataspace model and their unique combination. Firstly, the box and demand-matcher examples show that conversations may naturally involve many parties, generalizing the actor model's point-to-point conversations. At the same time, the file server example shows that conversations are more precisely bounded than those of traditional actors. Each of its dataspaces crisply delimits its contained conversations, each of which may therefore use a task-appropriate language of discourse.
Secondly, all three examples demonstrate the shared-dataspace aspect of the model. Assertions made by one actor can influence other actors, but cannot directly alter or remove assertions made by others. The box's content is made visible through an assertion in the dataspace, and any actor that knows can retrieve the assertion. The demand-matcher responds to changes in the dataspace that denote the existence of new conversations. The file server makes file contents available through assertions in the (outer) dataspace, in response to clients placing subscriptions in that dataspace.
Finally, the model places an upper bound on the lifetimes of entries in each shared space. Items may be asserted and retracted by actors at will in response to incoming events, but when an actor crashes, all of its assertions are automatically retracted. If the box actor were to crash during a computation, the assertion describing its content would be visibly withdrawn, and peers could take some compensating action. The demand matcher can be enhanced to monitor supply as well as demand and to take corrective action if some worker instance exits unexpectedly. The combination of this temporal bound on assertions with the model's state change notifications gives good failure-signaling and fault-tolerance properties, improving on those seen in Erlang (Armstrong 2003).
4.2Formal semantics of the dataspace model
The semantics of the dataspace model is most easily understood via an abstract machine. Figure 14 shows the syntax of machine configurations, plus a metafunction , which loads programs in into starting machine states in , and an inert behavior function .
The reduction relation operates on actor states , which are triples of a queue of events destined for the actor, the actor's behavior and internal state , and a queue of actions issued by the actor and destined for processing by its containing dataspace. An actor's behavior and state can take on one of two forms. For a leaf actor, behavior and state are kept together with the type of the actor's private state value in an existential package . For a dataspace actor, behavior is determined by the reduction rules of the model, and its state is a configuration .
Dataspace configurations comprise three registers: a queue of actions to be performed , each labeled with some identifier denoting the origin of the action; the current contents of the assertion store ; and a sequence of actors residing within the configuration. Each actor is assigned a local label , also called a location, scoped strictly to the configuration and meaningless outside. Labels are required to be locally-unique within a given configuration. They are never made visible to leaf actors: labels are an internal matter, used solely as part of the behavior of dataspace actors. The identifiers marking each queued action in the configuration are either the labels of some contained actor or the special identifier denoting an action resulting from some external force, such as an event arriving from the configuration's containing configuration.
Reduction relation.
The reduction relation drives actors toward quiescent and even inert states. Figure 14 defines these syntactic classes, which are roughly analogous to values in the call-by-value -calculus. A state is quiescent when its sequence of actions is empty, and it is inert when, besides being quiescent, it has no more events to process and cannot take any further internal reductions.
The reductions of the dataspace model are defined by the following rules. For convenient reference, the rules are also shown together in figure 15. Rules and deliver an event to a leaf actor and update its state based on the results. Rule delivers an event to a dataspace actor. Rule collects actions produced by contained actors in a dataspace to a central queue, and rules , , and interpret previously-gathered actions. Finally, rule allows contained actors to take a step if they are not already inert.
4.5Rule A leaf actor's behavior function, given event and private state value , may yield a instruction, i.e. . In this case, the actor's state is updated in place and newly-produced actions are enqueued for processing:
4.6Rule Alternatively, a leaf actor's behavior function may yield an instruction in response to event , i.e. . In this case, the terminating actor is replaced with a behavior and its final few actions are enqueued:
4.7Rule When an event arrives for a dataspace, it is labeled with the special location and enqueued for subsequent interpretation.
4.8Inbound event transformationThe metafunction transforms each such incoming event by prepending an “inbound” marker to each assertion contained in the event. This marks the assertions as pertaining to the next outermost dataspace, rather than to the local dataspace.
4.9Rule The rule reads from the queue of actions produced by a particular actor for interpretation by its dataspace. It marks each action with the label of the actor before enqueueing it in the dataspace's pending action queue for processing.
Now that we have considered event delivery and action production and collection, we may turn to action interpretation. The and rules are central. They both depend on metafunctions (short for “broadcast”) and to transform queued actions into pending events for local actors and the containing dataspace, respectively. Before we examine the supporting metafunctions, we will examine the two rules themselves.
4.10Dataspace updateThe assertions of a party labeled are replaced in a dataspace's contents by an assertion set using the operator:
4.11Rule A queued state change notification action not only completely replaces the assertions associated with in the shared dataspace but also inserts a state change notification event into the event queues of interested local actors via . Because may have made “outbound” assertions labeled with , also prepares a state change notification for the wider environment, using .
4.11This is the only rule to update a dataspace's . In addition, because 's assertion set is completely replaced, it is here that retraction of previously-asserted items takes effect.
4.12Rule The rule interprets send-message actions . The metafunction is again used to deliver the message to interested peers, and relays the message on to the containing dataspace if it happens to be “outbound”-labeled with .
4.13Event broadcastThe metafunction computes the consequences for an actor labeled of an action performed by another party labeled . When it deals with a state change notification action , the entire aggregate shared dataspace is projected according to the asserted interests of . The results of the projection are assembled into a state change notification event, but are enqueued only if the event would convey new information to . When deals with a message action , a corresponding message event is enqueued for only if has previously asserted interest in .
4.14Outbound action transformationThe metafunction is analogous to , but for determining information to be relayed to a containing dataspace as a consequence of a local action.
4.15Rule The rule allocates a fresh label and places a newly-spawned actor into the collection of local actors, alongside its siblings. The new label is chosen to be distinct from , from every element of , and from the labels of every . Any deterministic allocation strategy will do; we will choose . The new actor's initial state and initial assertions are computed from the actor specification by .
4.15The rule takes care to ensure that a new actor's initial assertions are processed ahead of other queued actions , even though the new actor's initial actions will be placed at the end of the queue and processed in order as usual. This allows a spawning actor to atomically delegate responsibility to a new actor by issuing a state-change notification immediately following the action. Assertions indicating to the world that the spawning party has “taken responsibility” for some task may be placed in the new actor's initial assertion set and omitted from the subsequent state-change notification. This eliminates any possibility of an intervening moment in which a peer might see a retraction of the assertions concerned. Furthermore, even if the new actor crashes during boot, there will be a guaranteed moment in time before its termination when its initial assertion set was visible to peers. Because the computation of the initial assertion set happens in the execution context of the spawning actor, an uncaught exception raised during that computation correctly blames the spawning actor for the failure. However, the computation of the initial actions is performed in the context of the spawned actor, and an exception at that moment correctly blames the spawned actor.
4.16Rule Finally, the rule allows quiescent, non-inert contained actors to take a step. It rotates the sequence of actors as it does so.
4.3Cross-layer communication
Actors label assertions and message bodies with to address them to the dataspace's own containing dataspace, but there is no corresponding means of addressing an assertion or message to a contained dataspace or actor. Actors may reach out, but not in. Because there is always a unique containing dataspace, reserving specific names for referring to it—the harpoon marks and —is reasonable. These two reserved constructors bootstrap arbitrary cross-layer communication arrangements. Actors draw communications inward by reaching out. They establish subscriptions at outer layers which cause relevant messages and assertions to be relayed towards the inner requesting layer. In effect, they “pull” rather than having peers “push” information.
Directing communications to specific siblings requires a name for each actor. Actor IDs are, as a matter of principle, not made available to the programmer. In cases where “pushing” information inward is desired and useful, and where the resulting sensitive dependence on the topological structure of the overall configuration is acceptable, the dataspace model leaves the specific naming scheme chosen up to the programmer, offering a mechanism ( and ) but remaining neutral on policy.
4.4Messages versus assertions
We have included message-sending actions as primitive operations. However, message transmission can be usefully viewed as a derived construct, as a special case of assertion signaling. We may achieve substantially the same effect as by asserting , holding the assertion for “long enough” for it to register with interested peers, and then retracting again. A message, then, can be imagined as a transient assertion.
There are two interesting corner-cases to consider when thinking about messages in this way. The reduction rules as written have no trouble delivering messages of the form , despite the effect that an assertion of would have; and a message will be delivered to interested recipients even if some neighboring actor is asserting the value at the same time. In a variation on the dataspace model lacking primitive message-sending actions, neither situation works quite as expected.
First, consider the assertion-based analogue of the message . The sender would briefly assert before retracting it again. However, asserts interest in . For the duration of the assertion, it would have the effect of drawing matching assertions toward the sending actor. Primitive support for messages, by contrast, imagines that the “assertion” of the message lasts for an infinitesimal duration. This applies equally to “assertions” of messages that appear to denote interest in other assertions. By the time the events triggered by the message are to be delivered, it is as if the assertion of interest has already been retracted, so no events describing assertions make their way toward the sender.
Second, consider performing the action when is already being asserted by some other peer. The assertion-based analogue of is to briefly assert and then to retract it. However, redundant assertions do not cause perceptible changes in state. The net effect of the fleeting assertion of is zero; no events are delivered. Again, by incorporating messages primitively, we side-step the problem. Strictly speaking, the rule should have a side-condition forbidding its application (or perhaps making it a no-op) when for some . This would imply that sending certain messages at certain times would lead reduction to become stuck. Certain data would be reserved for use in message-sending; others, for more long-lived assertions and retractions. Were this to be elaborated into a type system, each dataspace would have a type representing its protocol. This type would classify values as either message-like or assertion-like. Judgments would connect with the type system of the base language to ensure that the classifications were respected by produced actions.
4.5Properties
A handful of theorems capture invariants that support the design of and reasoning about effective protocols for dataspace model programs. Theorem 4.17 ensures that the dataspace does not get stuck, even though individual actors within the dataspace may behave unpredictably. Theorem 4.20 ensures deterministic reduction of the system. Theorem 4.23 assures programmers that the dataspace does not reorder an actor's actions or any of the resulting events. Theorem 4.35 makes a causal connection between the actions of an actor and the events it subsequently receives. It expresses the purpose of the dataspace: to keep actors informed of exactly the assertions and messages relevant to their interests as those interests change. Tests constructed in Redex (Felleisen, Findler and Flatt 2009) and proofs written for Coq (Coq development team 2004) confirm theorems 4.17 and 4.20.
4.17Proof (Sketch) We employ the Wright/Felleisen technique (Wright and Felleisen 1994) with the progress lemma below. The proof makes use of the fact that all leaf actor behavior functions are total.
4.18Height Let the height of a behavior be defined as follows:
4.19Progress For all and such that , is either inert () or there exists some such that .
4.19Proof (Sketch) By nested induction on the height bound and structure of .
4.20The reduction relation is structured to ensure at most one applicable rule in any situation. Either
- , in which case event is consumed by (rules , , and ); or
- , in which case is ed onto (rule ); or
- , in which case is interpreted (, , and ); or
- and , in which case actor takes a step (rule ).
Observe that the cases are disjoint: the first demands a , but in the others the configuration is not inert; the second demands some non-quiescent actor; the third demands a queued action and only quiescent actors; the fourth demands no queued actions and only quiescent actors. Therefore, assume there exists distinct and such that and . We may then show a contradiction by nested induction on the two instances of the reduction relation and systematic elimination of possible sources of difference between and .
4.21Concurrency and determinism Despite appearances, theorem 4.20 does not sacrifice concurrency; recall from chapter 2 the argument that sequential programs frequently include internal concurrency. Concurrency does not entail nondeterminism. Even with deterministic reduction rules as written, many sources of unpredictability remain. For example, programs might interact with the outside world, including external clocks of various kinds, leading to fine variation in timing of events; code written by one person might make use of “black box” library code written by another, without precisely-documented timing specifications; or fine details of the implementation of some component could change, leading to subtly different interleavings. Introduction of nondeterminism by, say, varying the rule or relaxing some of the quiescence or inertness constraints in the other rules would merely introduce another source of unpredictability. The essential properties of the dataspace model survive such changes unharmed.
4.22Dataspace reliabilityWhile individual leaf actors may exit at any time, dataspace actors cannot terminate at all: no means for voluntary exit is provided, and theorem 4.17 assures us that a dataspace will not crash. In a correct implementation of the dataspace model, dataspace actors will likewise not crash. If the implementation is buggy enough that a dataspace does in fact crash, but not so buggy that it takes its containing dataspace down with it, the usual removal of an actor's assertions allows peers of the failing dataspace actor to observe the consequences of its termination. Abrupt failure of a dataspace is analogous to a crash of an entire computer: there is no opportunity for a clean shutdown of the programs the computer is running; instead, the entire computer simply vanishes offline from the perspective of its peers.
4.23Order Preservation If an actor produces action A before action B, then A is interpreted by the dataspace before B. Events are enqueued atomically with interpretation of the action that causes them. If event C for actor is enqueued before event D, also for , then C is delivered before D.
4.23Proof (Sketch) The reduction rules consistently move items one-at-a-time from the front of one queue to the back of another, and events are only enqueued during action interpretation.
Our final theorem (4.35) guarantees the programmer that each actor receives “the truth, the whole truth, and nothing but the truth” from the dataspace, according to the declared interests of the actor, keeping in mind that there may be updates to the actor's interest set pending in the pipeline. It ensures that the dataspace conveys every relevant assertion and only relevant assertions, and shows that the dataspace is being cooperative in the sense of Grice's Cooperative Principle and Conversational Maxims (section 2.1 and figure 1). The theorem directly addresses the maxims of Quantity, Quality, and Relation.
Before we are able to formally state the theorem, we must define several concepts.
4.24Paths A path is a possibly-empty sequence of locations. A path resolves to a by the partial recursive function :
The definition of makes it clear that locations in a path are ordered leftmost-outermost and rightmost-innermost with respect to a nested dataspace configuration. When is defined, we say is in , and write ; otherwise, is not in , .
4.25Dataspace contents for a path We write to denote the contents of the shared dataspace immediately surrounding the actor denoted by nonempty path in . That is,
4.26Current interest set of in The current interest set of the actor denoted by nonempty path in a given state is
4.27SyllabusThe syllabus of an actor with nonempty path at state is
4.28Reduction sequencesWe use the notation to denote a finite sequence of states corresponding to a prefix of the sequence of reductions of a program . We write to denote the th element of the sequence. A sequence starts with where . Subsequent states in are pairwise related by the reduction relation; that is, . We say that a path is in if for some .
4.29Enqueued eventsWe write when event is enqueued for eventual delivery to actor in reduction sequence at the transition :
4.30Truthfulness An assertion set is called truthful with respect to a dataspace whose contents are if it contains only assertions actually present in . That is, is truthful if .
4.31Relevance An assertion set is called relevant to an actor in a dataspace whose contents are if it contains only assertions of interest to ; i.e., if .
4.32Soundness An assertion set is called sound for an actor in a dataspace whose contents are if it is both truthful w.r.t and relevant w.r.t. and .
4.33Completeness An assertion set is called complete for an actor named in a dataspace whose contents are if it contains every assertion both actually present and of interest to ; that is, if .
4.34Most recent SCN event Let be a path of an actor, be a reduction sequence, and in index to a state in . The most recent SCN event enqueued for as it exists within , written , is computed by
4.35Conversational Soundness and Completeness Let be a reduction sequence. For every actor denoted by a nonempty path in , at every step ,
That is: (1) the dataspace's understanding of the interests of (which shape its syllabus) is solely determined by the actions of ; (2) every time the syllabus of changes, an SCN event is enqueued for , and every SCN event enqueued for results from a change in its syllabus; (3) every SCN event for is sound and complete with respect to the interests of and the contents of the dataspace of ; and (4) every message action of interest to results in a message event for , and no other message events are produced for .
4.35
(1) By lemma 4.36, the rule, and theorem 4.23.
(2) Forward direction: by lemma 4.40, . Let , and the conclusion follows from lemma 4.37. Reverse direction: we are given some s.t. . By definition, then, ; by lemma 4.42, . Combining these facts, ; now, apply lemma 4.40 and we are done.
(3) By definition, in conjunction with our premises, ; lemma 4.41 yields our result.
(4) Forward direction: rule is the only applicable rule; the conclusion follows by definition of for message routing. Reverse direction: likewise, because rule is the only rule that enqueues message events.
4.36Direct from the facts that rule is the only possible rule that can apply as and that replaces in the containing dataspace of with .
4.37Straightforward consequence of definition 4.34.
4.391. By lemma 4.37, an SCN event must be enqueued for at this step; no other rule than enqueues SCN events. 2. Metafunction is the source of the new SCN event, which is equal to by definition. The first case of must apply in order for some event to be enqueued; following the definitions and the use of in the rule gives us our result.
4.40By induction on .
- Case . Recall that . Vacuously true, because both and are undefined for all .
- Case . If , the result is immediate, by lemma 4.39. Otherwise, ; combining this with the induction hypothesis, we learn that .
There are two cases to consider: either the dataspace containing actor steps by rule , or some other kind of reduction takes place.
- If for some we have that , then we know that an actor is an immediate child of the dataspace configuration in . Furthermore we know that must also be an immediate child of the dataspace configuration in , because otherwise would differ from . It follows then that the second case of must apply for actor in this reduction step, and so 's , meaning that . By , we are done.
- Otherwise, it must be the case that , because no other reduction step can possibly affect the register of the dataspace containing actor . Applying this to prove gives our result by .
4.41Let be a nonempty path and be a reduction sequence. For every , is both sound and complete w.r.t. and .
4.41By lemma 4.40, . We must show:
- Soundness demands truthfulness, and relevance, . Both these properties are immediate from the definition of .
- Completeness demands ; this is also immediate from the definition of .
4.42Only rule can enqueue an event for . But it will only do so if, in , ; that is, if
The “soundness” properties of theorem 4.35 forbid overapproximation of the interests of the actor; communicated assertions and messages must be genuinely relevant. However, when taken alone, they permit omission of information. The “completeness” properties ensure timely communication of all relevant assertions and messages, but taken alone permit inclusion of irrelevancies. It is only when both kinds of property are taken together that we obtain a practical result.
It is interesting to consider variations on the model that weaken these properties. A dataspace allowing inclusion of assertions not in (violation of truthfulness) would be harmful: it would violate Grice's maxims of Quality, and hence risk being branded uncooperative. Likewise, a dataspace omitting assertions in it knows to be of interest (violation of completeness) would also be harmful: this violates the first maxim of Quantity and the maxim of Relation. By contrast, allowing inclusion of assertions not in the interest set of a given actor (violation of relevance) would not be harmful, and may even be useful, even though strictly this overinformativeness would be a violation of the second maxim of Quantity. For example, it may be more convenient or more efficient for a dataspace to convey “all sizes are available” than the collection of separate facts “size 4 is available”, “size 6 is available” and “size 7 is available” to some actor expressing interest only in the specific sizes 4, 6 and 7. As another example, use of a narrow probabilistic overapproximation of an actor's interest (e.g. a Bloom filter (Bloom 1970)) could save significant memory and CPU resources in a dataspace implementation while placing only the modest burden of discarding irrelevant assertions on each individual actor.
All this is true only in situations where secrecy is not a concern. If it is important that actors be forbidden from learning the contents of certain assertions, then the relevance aspect of soundness suddenly becomes crucial. For example, consider a system using unguessable IDs as capabilities. Clearly, it would be wrong to send an actor spurious assertions mentioning capabilities that it does not legitimately hold. Secrecy is further discussed in section 11.3.
4.6Incremental assertion-set maintenance
Taking section 4.2 literally implies that dataspaces convey entire sets of assertions back and forth every time some assertion changes. While wholesale transmission is a convenient illusion, it is intractable as an implementation strategy. Because the change in state from one moment to the next is usually small, actors and dataspaces transmit redundant information with each action and event. In short, the model needs an incremental semantics. Relatedly, while many actors find natural expression in terms of whole sets of assertions, some are best expressed in terms of reactions to changes in state. Supporting a change-oriented interface between leaf actors and their dataspaces simplifies the programmer's task in these cases.
Starting from the definitions of section 4.1, we replace assertion-set state-change notification events with patches. Patches allow incremental maintenance of the shared dataspace without materially changing the semantics in other respects. When extended to code in leaf actors, they permit incremental computation in response to changes. We will call the syntax and semantics already presented the monolithic dataspace model, and the altered syntax and semantics introduced in this section the incremental dataspace model.
The required changes to program syntax are small. We replace assertion sets with patches in the syntax of events and actions:
Patches denote changes in assertion sets. They are intended to be applied to some existing set of assertions. The notation is chosen to resemble a substitution, with elements to be added to the set written above the line and those to be removed below. We require that a patch's two sets be disjoint.
4.43Rule To match the exchange of patches for assertion sets, we replace the reduction rule (definition 4.11 and figure 15) with a rule for applying patches:
4.43The effect of the definition of is to render harmless any attempt by to add an assertion it has already added or retract an assertion that is not asserted.
4.44Dataspace patchingThe operator, defined above for wholesale assertion-set updates (definition 4.10), is straightforwardly adapted to patches:
4.46Outbound patch transformationIt is the metafunction that requires deep surgery. We must take care not only to correctly relabel assertions in the resulting patch but to signal only true changes to the aggregate set of assertions of the entire dataspace:
4.47Patch event broadcastThe metafunction , used in the rule, constructs a state change notification patch event tailored to the interests of actor . The notification describes the net change to the shared dataspace caused by actor 's patch action—as far as that change is relevant to the interests of .
The final changes adjust the and rules to produce patches instead of assertion set state change notifications in case of process termination and startup.
4.48Incremental ruleThe rule becomes
4.49Incremental ruleThe rule becomes
Equivalence between monolithic and incremental models.
Programs using the incremental protocol and semantics are not directly comparable to those using the monolithic semantics. Each variation uses a unique language for communication between dataspaces and actors. However, any two assertion sets and can be equivalently represented by and a patch , because and .
This idea suggests a technique for embedding an actor communicating via the monolithic protocol into a dataspace that uses the incremental protocol. Specifically, the actor integrates the series of incoming patches to obtain knowledge about the state of the world, and differentiates its outgoing assertion sets with respect to previous assertion sets.
Every monolithic leaf actor can be translated into an equivalent incremental actor by composing its behavior function with a wrapper that performs this on-the-fly integration and differentiation. The reduction rules ensure that, if every monolithic leaf actor in a program is translated into an incremental actor in this way, each underlying monolithic-protocol behavior function receives events and emits actions identical to those seen in the run of the unmodified program using the monolithic semantics.
4.50We write to denote the translation of a monolithic-protocol program into the incremental-protocol language using this wrapping technique, and use and subscripts for monolithic and incremental constructs generally.
The translation maintains additional state with each leaf actor in order to compute patches from assertion sets and vice versa and to expose information required for judging equivalence between the two kinds of machine state. Where a leaf actor has private state in an untranslated program, it has state in the translated program. The new registers and are the actor's most recently delivered and produced assertion sets, respectively.
4.51We write to denote equivalence between monolithic and incremental actor states. To see what this means, let us imagine hierarchical configurations as trees like the one in figure 13. Each actor and each dataspace becomes a node, and each edge represents the pair of queues connecting an actor to its container. For a monolithic-protocol configuration to be equivalent to an incremental-protocol configuration, it must have the same tree shape and equivalent leaf actors with identical private states. Furthermore, at each internal monolithic node (i.e., at each dataspace), the assertion store must be identical to that in the corresponding incremental node. Finally, events and actions queued along a given edge on the monolithic side must have the same effects as those queued on the corresponding incremental edge.
The effects of monolithic and incremental action queues are the same when corresponding slots in the queues contain either identical message-send actions, spawn actions that result in equivalent actors, or state change notifications that have the same effect on the assertion store in the containing dataspace. Comparing event queues is similar, except that instead of requiring state change notifications to have identical effects on the shared dataspace, we require that they instead identically modify the perspective on the shared dataspace that the actor they are destined for has been accumulating.
If the conditions for establishing are satisfied, then reduction of proceeds in lockstep with reduction of the equivalent , and equivalence is preserved at each step.
4.52Proof (Sketch) Conclusion 1 follows trivially from the definition of and the fact that the translation process does not alter an actor's initial assertion set. The bulk of the proof is devoted to establishing conclusion 2. We first define to mean augmentation of the monolithic program with the same additional registers as provided by . Second, we define an equivalence between -translated and untranslated monolithic machine states that ignores the extra registers, and prove that reduction respects . Third, we prove that and reduce in lockstep, and that an equivalence between translated monolithic and incremental states is preserved by reduction. Finally, we prove that the two notions of equivalence and together imply the desired equivalence . The full proof takes the form of a Coq script.
4.7Programming with the incremental protocol
The incremental protocol occasionally simplifies programs for leaf actors. This applies not only to examples in Dataspace ISWIM, but also to large programs written for the Racket or JavaScript dataspace model implementations. Occasional simplification is not the only advantage of incrementality: the incremental protocol often improves the efficiency of programs. Theorem 4.52 allows programmers to choose on an actor-by-actor basis which protocol is most appropriate for a given task.
For example, the demand-matcher example (numbered 4.3 above) can be implemented in a locally-stateless manner using patch-based state change notifications. It is no longer forced to maintain a record of the most recent set of active conversations, and thus no set subtraction is required. Instead, it can rely upon the added and removed sets in patch events it receives from its dataspace. The revised behavior function takes as its actor-private state value, since each event it receives conveys all the information it needs:
More generally, theorem 4.53 can free actors written using the incremental protocol from maintaining sets of assertions they have “seen before”; they may rely on the dataspace to unambiguously signal (dis)appearance of assertions.
4.53Concision For all pairs of events and delivered to an actor, only if some event was delivered between and , where . Symmetrically, cannot be retracted twice without being asserted in the interim.
4.53Proof (Sketch) The rule prunes patch actions against to ensure that only real changes are passed on in events. itself is then updated to incorporate the patch so that subsequent patches can be accurately pruned in turn.
4.8Styles of interaction
Short-lived observables |
Long-lived observables | |
Short-lived interest | — | Query-like behavior |
Long-lived interest | Publish-subscribe |
State replication, |
The dataspace model offers a selection of different styles of interaction. In order for communication to occur at all, some actors must assert items of knowledge , and others must simultaneously assert interest in such knowledge, . (Here, we may treat message-sending as fleeting assertions of , as discussed in section 4.4.) Varying the lifetimes of assertions placed in the dataspace gives rise to patterns of information exchange reminiscent of publish/subscribe messaging, state replication, streaming queries, and instantaneous queries.
Figure 16 summarizes the situation. There are four regions of interest shown. Only three yield interesting patterns of interaction: if both assertions of interest and assertions of knowledge are very short-lived, no communication can occur. There is no moment when the two kinds of assertion exist simultaneously.
When assertions of interest tend to be long-lived and assertions of the items of interest themselves tend to be brief in duration, a publish/subscribe pattern of interaction results. The assertions of interest can be thought of as subscriptions in this case. Publish/subscribe communication is naturally multi-party; however, point-to-point, channel-like messaging is readily available via a convention for assignment and use of channel names.
As the lifetimes of assertions representing knowledge increase, the pattern of interaction takes on a different character. It begins to resemble a streaming query style of knowledge transfer, where long-lived queries over a changing set of rows yield incrementally-maintained result sets. The resemblance is particularly strong when cast in terms of the incremental patch actions introduced in section 4.6. Seen from a different perspective, this pattern of interaction appears similar to state replication, where spatially-distinct replicas of a set of information are maintained by exchange of messages. The monolithic state change notifications first introduced in section 4.1 most clearly capture the intuition backing this perspective.
Finally, if we consider long-lived assertions of knowledge in combination with briefer and briefer assertions of interest in this knowledge, the style of interaction approaches that of clients making SELECT queries against a shared SQL database. Here, the assertions of interest can usefully be thought of as queries. An important consideration in this style of interaction is the length of time that each query is maintained.
There is no general answer to the question of how long an assertion of interest should be maintained in order to effectively act as a query over matching assertions. It varies from protocol to protocol. In some protocols, it is certain that the assertions of interest will be present at the moment the query is established, in which case an immediate retraction of interest is sound. In other protocols, queries must be held in place for some time to allow them to be detected and responded to. The specific duration depends on the mechanism by which such responses are to be produced: a local actor may be able to compute a result in one round-trip of control transfer, on demand; an actor communicating with a remote system over a network link may require queries to be held for a certain number of seconds.
An actor maintaining an assertion of interest for any non-trivial length of time at all runs the risk of the result set changing during the lifetime of the query. The longer the query is maintained, the more the style of interaction begins to resemble a streaming query and the less it has in common with SQL-style point-in-time queries of a snapshot of system state.
5Computational Model II: Syndicate
With the dataspace model, we have a flexible facility for communicating changes in conversational state among a group of actors. We are able to express styles of interaction ranging from unicast, multicast and broadcast messaging through streaming queries and state replication to shared-database-like protocols. The model's emphasis on structured exchange of public aspects of component state allows us to express a wide range of effects including service presence, fate sharing, and demand matching. These effects in turn serve as mechanisms by which a range of resource-allocation, -management, and -release policies may be expressed.
The dataspace model brings actors together into a conversational group, but says nothing about the internal structure of each leaf actor. Such actors are not only stateful, but internally concurrent. Each leaf actor is frequently engaged in more than one simultaneous conversation. Ordinary programming languages offer no assistance to the programmer for managing intra-actor control and state, even when (like Dataspace ISWIM) extended with dataspace-model-specific data types and functions. However, to simply discard such languages would be a mistake: practicality demands interoperability. If we design a new language specifically for leaf actor programming, we forfeit the benefits of the enormous quantity of useful software written in already-existing languages. Instead, we seek tools for integrating the dataspace model not only with existing programs and libraries but with existing ways of thinking.
We will need new control structures reflecting the conversation-related concepts the dataspace model introduces. Programmers are asked to think in terms of asynchronous (nested sub-)conversations, but given ordinary sequential control flow. They are asked to maintain connections between actor-private state and published assertions in a shared space, but given ordinary variables and heaps. They are asked to respond to conversational implicatures expressing peers' needs, but offered no support for turning such demands into manageable units of programming. Conversely, they are asked to respond to signals indicating abandonment of a conversation by releasing local related resources, but given no means of precisely delimiting such resources. Finally, when a local control decision is made to end an interaction, programmers are left to manually ensure that this is communicated to affected peers.
The second part of the Syndicate design therefore builds on the dataspace model by proposing new language features to address these challenges. The new features are intended for incorporation into base languages used to express leaf actor behaviors. The central novelty is an explicit representation of a (sub-)conversation named a facet. Facets nest, forming a tree that mirrors the nested conversational structure of the actor's interactions. Each actor's private state is held in fields; each field is associated with a particular facet. Special declarations called endpoints allow the programmer to connect assertions in the dataspace with values held in local fields in a bidirectional manner. Endpoints describing interest in assertions—that is, endpoints that publish assertions of the form into the dataspace—offer a convenient syntactic location for the specification of responses to the appearance and disappearance of matching assertions.
Facets, fields, and endpoints together allow the programmer to write programs in terms of conversations, conversational state, and conversational interactions. They connect local to shared state. They offer a unit of resource management that can come and go with changes in expressed demand. Finally, because the connection between a facet and the surrounding dataspace is bidirectional, adding or removing a facet automatically adds or removes its endpoints' assertions, allowing peers to detect and respond to the change. In the extreme case of an actor crash, all its facets are removed, automatically (if abruptly) ending all of its conversations.
Syndicate/λ.
Chapter 4 used an informal quasi-language, Dataspace ISWIM, to illustrate the formal system underpinning the dataspace model. Here, we take a slightly different tack, illustrating new language features by presenting them as part of an otherwise-minimal, mathematical, -calculus-inspired base language with just enough structure to act as a backdrop. We call this language Syndicate/λ, by analogy with the full prototype implementations Syndicate/rkt and Syndicate/js. In our formal presentation, we abstract away from concrete details of base value types and specific built-in operations; where needed for examples, we reuse the notation and concepts sketched for Dataspace ISWIM.
5.1Abstract Syndicate/λ syntax and informal semantics
Figure 17 displays the syntax of Syndicate/λ. It is stratified into expressions and reactive, imperative programs . Expressions are both terminating and pure up to exceptions caused by partial primitive functions. Programs describe the interesting features of the language. While expressions yield values, programs are evaluated solely for their side effects.
The empty or inert program is written . A semicolon is used to denote a form of sequential composition, . The inert program is both a left and a right identity for this form of composition. In this chapter, we identify terms up to arbitrary composition with . This avoids spurious nondeterminism in reduction.
The usual -calculus syntax for application, , is only available to programs, because the language includes only procedure values instead of the function values familiar from -calculus. Each in a procedure value is a branch of a pattern-match construct. When the procedure is called, the supplied argument is tested against each in left-to-right order, and the entire call reduces to the corresponding , substituted appropriately.
It is not only Syndicate/λ syntax that is stratified. Syndicate/λ bindings come in three flavors: immutable variables (“variables”), mutable fields (“fields”), and names for facets (“facet names”). The first two are introduced by the two forms of , and the third is introduced as an automatic consequence of creating a facet. Variables may include values containing procedures, but fields must not. While not strictly required, this restriction captures some of the spirit of programming in Syndicate; recall from section 2.6 the desire to eschew sharing of higher-order data. Field update, , naturally applies only to fields, not variables, and the value to be stored in the field must not directly or indirectly contain a procedure value.
The command emits a dataspace model action of the form , where is the result of evaluating . Similarly, the command spawns a sibling actor in the dataspace, and spawns a sibling dataspace initially containing a lone actor with behavior . Spawned programs may refer to arbitrary variables and fields of their spawning actor; at the moment of the spawn, the store is effectively duplicated, meaning that mutations to fields subsequently performed affect only the actor performing them.
The final two syntactic forms create and destroy facets. The form specifies a facet template which is instantiated at the moment the form is interpreted. Once instantiated, the new facet's endpoints—the assertion endpoint and the event-handling endpoints —become active and contribute assertions to the aggregate of assertions published by the actor as a whole.
Each assertion endpoint is written using syntax chosen to connote set construction. The meaning of such an endpoint is exactly a set of assertions, the union of the sets denoted by the assertion templates embedded in the syntax of the assertion endpoint. Changing a field that is referred to by an assertion endpoint automatically changes the assertions published by that endpoint. In this way, Syndicate/λ programs are able to publish assertions that track changes in local state.
Similarly, event-handling endpoints contribute assertions of interest derived from the event pattern into the dataspace, as well as specifying a subprogram to run when any event relating to is delivered. Event patterns may select the appearance () or disappearance () of assertions matching some pattern, the arrival of a message () matching some pattern, or the synthetic events and which relate to facet lifecycle. Patterns that contain binders capture portions of assertions in matching events, making available in subprograms . As with assertion endpoints, every pattern automatically tracks changes in fields it refers to.
The form , only legal when surrounded by a facet named , causes that facet—and all its nested subfacets—to terminate cleanly, executing any event handlers they might have. Once a terminating facet becomes inert, after its handlers have completed their tasks, its assertions are removed from the shared dataspace and the facet itself is then deleted. The program in is then scheduled to execute alongside the terminating facet, so that any facets that creates will exist in the actor's facet tree as siblings of the just-stopped facet .
Despite being layered atop the dataspace model, the events and actions of that model are not directly exposed to the Syndicate/λ programmer the way that they are in Dataspace ISWIM. Instead of yielding values describing actions to perform in a functional style, programs perform side-effecting operations like and . Instead of functional state transduction, programs imperatively update fields. Instead of describing changes to published assertion sets, programs create facets with embedded endpoints. Finally, instead of manually directing control flow by analyzing and interpreting received events, programs declare event-handling endpoints, which are activated as appropriate.
5.1For our first example, let us revisit the shared mutable reference cell actors of example 4.2. First, we spawn an actor implementing the cell itself:
This actor first creates a new field , initialized to zero. It then creates a single facet named , which has an assertion endpoint that places the assertion into the shared dataspace. The semantics of Syndicate/λ automatically update this published assertion as the value of changes in response to subsequent events. The facet also has a single event-handling endpoint. In response to an incoming message, the endpoint updates the field to contain the new value specified in the received message.
The client actor from example 4.2 can be written as follows:
This actor is stateless, having no fields. It creates a single facet, , which makes no assertions but contains a single event-handling endpoint which responds to patch events. If such a patch event describes the appearance of an assertion matching the pattern , the endpoint sends a message via the dataspace. (We imagine here that includes functions for arithmetic and assume a convenient infix syntax.) Of course, the actor responds to such messages by updating its assertion, which triggers again. This cycle repeats ad infinitum.
5.2Next, we translate the demand-matcher from example 4.3 to Syndicate/λ:
The single event-handling endpoint in facet responds to each asserted tuple by spawning a new actor, which begins its existence by calling the procedure , passing it the from the assertion that led to its creation. In turn, creates a facet which monitors retraction of in addition to performing whichever startup actions a worker should perform. When the last peer to assert retracts its assertion, the worker terminates itself by performing a command on its top-level facet, supplying to replace it.
The concision of Syndicate/λ has allowed us to show how a worker terminates itself once demand for its existence disappears. The Dataspace ISWIM version of example 4.3 omits this functionality: it is possible but verbose to express in Dataspace ISWIM.
5.2Formal semantics of Syndicate/λ
Figure 18 introduces syntax for describing evaluation states of s of facets, as well as a specification of what it means for such a tree to be inert, a definition of evaluation contexts (), field s, and reducible and inert machine states ( and ).
A tree of facets may include unreduced commands drawn from . Reduction interprets these commands, applying any side effects they entail to the machine state. A tree may also include an exception marker, , which arises as a result of various run-time error conditions and leads to abrupt actor termination. The composition operator on facet trees loses much of the flavor of sequentiality that it enjoys in programs, and acts instead primarily to separate (and order) adjacent sibling facets in the tree. However, evaluation contexts prefer redexes in the left-hand side of a composition to those in the right-hand side, thus preserving the intuitive ordering of effects.
The form describes an instantiated, running facet, with active endpoints. It serves as an interior node in a facet tree. Any facets contained in are considered nested children of . If is later stopped, all facets in are stopped as well.
The final two syntactic forms describing facet trees relate to shutdown of facets. First, describes a facet that is marked as terminating. The facet cannot be deleted until has reached inertness, but it will no longer react to incoming events, as can be seen from the lack of event handlers associated with each . Second, marks a contour within the tree. Contained facets and subfacets of will transition to terminating state as soon as they become inert. An explicit contour is necessary because a facet may create a sibling or child facet as a response to being terminated, and such “hail mary” facets must not be allowed to escape termination.
The reduction relation operates on a machine state containing five registers:
- is the store, mapping field identifiers in to field values in . Higher-order values such as procedures may not be placed in the store.
- is the actor's record of the assertions it has learned from the dataspace. As patch events arrive from the dataspace, is updated.
- is the actor's record of the assertions it has placed into the dataspace. As fields are updated and facets are created and destroyed, the actor issues patch actions and updates to account for the changes.
- is an accumulator of dataspace model actions produced. As messages are sent, actors are spawned, and changes are made to published assertions, actions are appended to this register.
- is the tree of facets, the actor's behavior and control state. Reduction drives this tree of facets toward inertness.
Evaluation of expressions and patterns.
The semantics of Syndicate/λ depends on evaluation of expressions in a number of places. Evaluation of expressions is straightforward, since no function or procedure calls (other than to primitives) are allowed. In addition, because Syndicate/λ patterns include calls to primitive functions and references to field values, the semantics requires a means of “evaluating” a pattern.
5.3Evaluation of expressionsThe partial metafunction evaluates an to a , resolving field references using a .
5.4Primitive functionsThe partial metafunction interprets applications of primitive functions , and is to as is to . We do not specify a fixed here, and so escape the need to fix in any detail.
5.5“Evaluation” of patternsThe metafunction “evaluates” a pattern by computing the results of any embedded calls to primitive operations or references to field values from the store. This “evaluation” process may fail with an exception; however, if it succeeds, the resulting pattern does not include any primitive operations or field references, and therefore is guaranteed not to signal an exception when used.
The active assertion set.
As facets come and go and fields change their values, the set of assertions to be placed into the surrounding dataspace by a Syndicate/λ actor changes. The set must be tracked and, as it changes, corresponding patch actions must be computed and emitted.
5.6The metafunction , which extracts the current set of assertions from a tree of facets, is defined in figure 19. It is a pedestrian structural traversal of syntax except when processing an event pattern . In that case, it specially adds the assertion-of-interest constructor to each assertion arising from the pattern inside .
5.7In situations where an actor's assertion set may have changed, the metafunction is used to compute an updated register as well as a patch to be appended to the pending action accumulator.
5.8The metafunction takes care of combining a patch action (often resulting from ) with an existing action queue. Any adjacent enqueued patch actions are coalesced using a patch composition operator. By contrast, no such coalescing is desired (or possible) when enqueueing message or actor-creation actions.
5.9Patch compositionThe patch composition operator is defined as follows:
Pattern matching.
The Syndicate/λ semantics also makes use of pattern matching in a number of places. Occasionally, a suite of patterns with matching continuations must be searched for the first match for some value; at other times, matching of a single pattern with a single value is required.
5.10The metafunction searches a collection of branches, often extracted from a procedure value, to find the first that matches the argument value given. If none of the branches match, an exception is signaled.
5.11The partial metafunction is defined when the given matches the given , and is otherwise undefined. The result of is a program fragment that, when interpreted, uses to bind pattern variables before continuing with the given to .
Reduction relation.
The reduction relation is defined by fourteen rules, shown in full in figures 20 and 21. The rule implements procedure call, and rule allows introduction of immutable variables. The and rules manipulate fields, while rules , and produce actions for interpretation by an actor's surrounding dataspace. The remainder of the rules relate to facet startup and shutdown: instantiates a facet, while the two rules, three rules, and rule combine to handle the process of facet termination.
5.12Rule The rule interprets procedure calls :
5.13Rule The first kind of construct allows programmers to give names to values drawn from . Machine states do not include an environment, and so our presentation makes use of hygienic substitution to replace references to a bound variable with its -computed value while respecting the notion of free names captured by the metafunction (figure 22).
5.14Rule The second kind of construct creates a new field, allocating a fresh name for the field and substituting for references to the field in the body of the . The store in the machine state is updated with the initial value of the field, which is constrained to be drawn from .
5.15Rule In rule , we see the first production of an action for transmission to the surrounding dataspace. Updating a field affects any assertions depending on the field, and a patch action must be issued to communicate any changed assertions to the actor's peers.
5.16Rule The rule is entirely straightforward:
5.17Rules and The and rules are also uncomplicated, but depend on the metafunction, which we will not discuss until section 5.4.
The remaining reduction rules (figure 21) all relate to various stages of a facet's lifecycle.
5.18Rule interprets a facet template , renaming it, transforming it to an interior node in the facet tree and delivering two synthetic events to it.
5.19The rules Rules and handle explicit facet termination requests:
5.20The rulesTermination boundaries are moved leafward through a facet tree by rules , , and .
5.21Rule The final tear-down of a terminated facet does not take place until all of its children have not only become inert but have actually reduced to a literal . The rule takes care of this case. It is here that we finally see a patch action issued to remove the assertions of the terminating facet from the actor's aggregate assertion set.
5.3Interpretation of events
Several of the reduction rules appeal to a metafunction to compute the consequences of a reaction to an event by a collection of event-handling endpoints. As we will see in section 5.4, the same metafunction is used to distribute events arriving from the containing dataspace among the facets in an actor's facet tree.
5.22The function itself is straightforward:
- if and , then ; otherwise,
- if and , then ; otherwise,
- if and , then ; otherwise,
- if and , then ; otherwise,
- if and , then ; otherwise,
- .
The sequence of event-handling endpoints becomes a composition of programs. Each endpoint becomes if the given event does not apply. Patch events apply to and endpoints; message events to endpoints; and and events to and endpoints. The interesting cases are message delivery and patch handling. Message delivery delegates to with the event-handler's pattern and continuation augmented with a catch-all clause to handle the case where the incoming message does not match . Patch processing delegates to the metafunction .
5.23The metafunction extracts a finite sequence of assertions matching pattern from an assertion set carried in a patch. Each relevant assertion should generate one instance of the event handler program . It is clearly an error to attempt to iterate over an infinite set; therefore, yields an exception in cases where the assertion set being projected contains an infinite number of individual assertions that happen to match the pattern .
The first step in 's operation is to filter the set using metafunction , retaining only those assertions that match .
5.24The partial function is similar to (definition 5.11), in that it is defined only where the structure of the pattern matches the assertion; however, it is different in that it yields a as a result that includes detail only where it is relevant to the supplied pattern.
Returning to the operation of , the next step after filtering and partial transformation of the input set is to take each drawn from the set of results and use the arguments and given to to decide whether is novel or not.
The set denotes the set of known assertions just prior to the arrival of the event that is processing. The set denotes the result of updating with the contents of the arriving event. That is, is “what the actor knew before”, and is “what the actor knows now.”
If a particular corresponds to some assertion in , but not to any in , or conversely corresponds to some assertion in but none in , then the actor has learned something new, and the handler program should be instantiated for this . However, if corresponds to some assertion in both or neither of and , then nothing relevant has changed for the actor, and should not be instantiated.
5.25Consider the presence-management portion of a chat service with multiple rooms. Assertions denote presence of the named user in the named room. Rooms are said to “exist” only when inhabited by at least one user. Users joining the system are presented with a list of currently-extant rooms to choose from. A program for calculating this list might be written (assuming suitable data structures and primitive operations for sets):
Imagine now that two users, Alice and Bob, arrive and join the room Lobby simultaneously. This results in delivery of a patch event where to our list-management actor. Ultimately, a call to takes place:
5.26Imagine now that Alice leaves the room, while Bob stays on. This results in a patch event where . At the time of the event, the total knowledge of the actor is . Updating with the patch yields . This time, the endpoint has nothing to do, but the endpoint triggers:
5.27Finally, Bob leaves the room. The patch event is again but with this time. At the time of the event, , and so . The endpoint triggers again, as before; and, as before, leaves us a single value for , namely . This time, however, because is empty, and so is instantiated with , and the actor removes Lobby from .
5.4Interfacing Syndicate/λ to the dataspace model
Thus far, we have discussed the internal operation of Syndicate/λ actors, but have not addressed the question of their interface to the wider world. The path to an answer begins with the way Syndicate/λ constructs actions. To start an actor with store and code , Syndicate/λ issues the dataspace model action . This term appears in rules and , as well.
5.28The function produces a boot function of type (figure 12) which in turn describes the behavior function and initial state of a new actor. Every Syndicate/λ actor has behavior function and a state value drawn from set (fig. 18).
5.29The operator incorporates changes described by an incoming event to a previous record of the contents of the surrounding dataspace. When given a patch event, it updates the assertion set. By contrast, a message event is treated as an infinitesimally-brief assertion of its carried value, as discussed in section 4.4, and the assertion set remains unchanged.
5.30The function traverses a facet tree, using to deliver an incoming event to the event-handler endpoints of every running facet.
5.31The behavior function integrates an event arriving from the dataspace with the machine state held in the actor's private state value, reduces the result, and returns. If the actor terminates all its facets or if reduction yields an exception, instructs the dataspace to terminate the actor.
5.31Syndicate/λ is an untyped language, and can express nontermination:
5.5Well-formedness and Errors
Reduction of Syndicate/λ programs can stop for many reasons. First of all, as in practically all interesting uses of -calculus-like machinery, certain primitive operations may be partial functions. The classic example is arithmetic division, undefined at a zero denominator. This partiality manifests via and yielding no answer. In turn, this affects most of the other core metafunctions as well as the lion's share of the reduction rules.
More interesting are type errors. Certain errors, such as attempts to call a non-procedure or invoke an arithmetic primitive with a non-numeric value, may be prevented by developing a conventional type system (Pierce 2002). Standard techniques also exist for enforcing exhaustive pattern-matching in procedures. Other errors are peculiar to Syndicate/λ. Figures 23 and 24 sketch a “well-formedness” judgment intended to catch three kinds of scope error: reference to an unbound variable, field, or facet; update to a name that is non-existent or not a field; and inappropriate use of a facet name in a command. For an example of the latter, consider the two programs
Going beyond simple scope errors, Syndicate/λ programs can fail in two important ways relating to the assertions they exchange with peers via the shared dataspace. First, programs may make simple data-type errors in their assertions and subscriptions. For example, a particular protocol may require that peers interact by asserting and expressing interest in tuples , where and . It is an error, then, for a program to assert , to misspell , or to assert a tuple such as . Second, the metafunction signals an exception when the set of relevant matches to a given pattern is infinite. Consider the following program, which computes and asserts squares in response to detected interest:
In order to use our well-formedness judgment to work towards a statement of overall soundness, we need to account for the way an actor transmits actions to its environment and receives events in reply. Figure 25 illustrates the alternation between the reduction relation explored in this chapter (upper row) and the exchange of information with an actor's surrounding dataspace, as explored in chapter 4 (lower row). Setting aside cases where an actor exits because of a signaled exception or termination of all of its facets, we can abbreviate the excursions to the lower row of the figure as a pseudo-reduction-rule. From the actor's perspective, it is as if an oracle supplies a fresh (relevant) event at just the right moment, as the actor achieves an inert configuration:
5.32Pseudo-rule
5.33Extended reduction relationWe will write to mean the reduction relation extended with the pseudo-rule.
That is, at every step in a reduction chain, one of three conditions holds. First, the facet tree may be inert, in which case the actor terminates () or yields to its dataspace (). Second, the facet tree may have a signaled exception as its selected redex, in which case it is terminated abruptly. Third, the tree may be neither inert nor in an exception state, in which case there is always another non- reduction step that may be taken.
Examination of the reduction rules and metafunctions shows that is signaled in two situations: when use of a primitive function yields no result, either due to intrinsic partiality or a type error, and when encounters an infinite set of matching assertions. The well-formedness judgment rules out stuckness from misuse of names. If a program makes a simple data-type error in the content of an assertion, a number of consequences may unfold: the actor may simply sit inert forever, having failed to solicit events from peers; the actor may later receive events containing information it is not prepared to handle, resulting in an exception; or the actor may unintentionally trigger crashes in its peers, having supplied them with incoherent information.
5.6Atomicity and isolation
With fields, we have introduced mutable state, opening the door to potential unpredictability. Syndicate mitigates this unpredictability by limiting the scope of mutability to individual actors. In addition, Syndicate's facet model enforces three kinds of atomicity that together help the programmer in reasoning about field updates. First, an actor's behavior function is never preempted. As a result, events internal to an actor occur “infinitely quickly” from the perspective of the surrounding dataspace. This yields “synchrony” similar to that of languages such as Esterel (Berry and Gonthier 1992) and Céu (Sant'Anna, Ierusalimschy and Rodriguez 2015). Second, each actor's private state is not only isolated but threaded through its behavior function in a linear fashion. This yields a natural boundary within which private state may safely be updated via mutation. Third, exceptions during event processing tear down the entire actor at once, including its private state. The same happens for deliberate termination of an actor. Termination is again instantaneous, and damaged private state cannot affect peers. This yields a form of “fail-stop” programming (Schlichting and Schneider 1983). Together, these forms of atomicity allow facet fields to be mutable, while events continue to be handled with sequential code, resolving all questions of internal consistency in the face of imperative updates.
By coalescing adjacent patch actions with during reduction, the Syndicate/λ semantics hides momentary “glitches” from observers. This allows actors to stop one facet and start another publishing the same assertion(s) without observers ever detecting the change, and without being forced to explicitly indicate that a smooth changeover is desired. Contrast this automatic ability to seamlessly delegate responsibility to a new facet with the equivalent ability for glitch-free handover of assertions to a new actor. In the latter case, programmers must explicitly make use of the “initial assertion set” field in the action describing the new actor as described in section 4.2.
Events are dispatched to facets all at once: the metafunction matches each event against all endpoints of an actor's facets simultaneously. Actors thus make all decisions about which event-handlers are to run before any particular event-handler begins execution. This separation of a planning phase from an execution phase helps reduce dependence on order-of-operations (cf. section 2.6) by ensuring no field updates can occur during planning.
5.7Derived forms: and
Examples 5.2 and 5.25 highlight two common idioms in Syndicate programming worthy of promotion to language feature. In example 5.2, we saw a scenario in which appearance of an assertion led to creation of a resource—in this case, a separate actor—and disappearance of the same assertion led to the resource's release. In example 5.25, we saw a scenario in which an actor aggregated specific information from a set of assertions into a local set data structure held in a field. Syndicate offers support for the former scenario via a new form of endpoint called , and support for the latter via a family of forms called .
“” endpoints.
In a facet template , each declares a single event-handling endpoint. We add to the language by extending the class of event patterns:
We interpret the new construct in terms of existing syntax. An endpoint is interpreted as if the programmer had written
Concrete syntax aside, is reminiscent of a form of logical implication
A related derived event pattern, , is able to help us with our demand-matcher example 5.2, where is not directly applicable. In contrast to , the event pattern does not create a facet but instead spawns an entire sibling actor to handle each assertion matching . The critical difference concerns failure. While a failure in a endpoint tears down the current actor, a failure in a endpoint terminates the separate actor, leaving its siblings and parent intact. Equipped with this new construct, we may reformulate example 5.2 as just
“” expressions.
The ability of facets to automatically update published assertions in response to changes in fields provides a unidirectional link from the local state of an actor to the shared state held in its dataspace. To establish a bidirectional link, we require a construct describing a local data structure to maintain in response to changes in observed assertions:
Like , the new construct is interpreted in terms of existing syntax. A program is interpreted as if it were written
The new construct allows us to recast example 5.25 as just
We may usefully generalize from maintenance of fields containing sets to fields containing hash-tables, counts of matching rows, sums of matching rows, and other forms of aggregate summary of a set of assertions:
Finally, while event-handling endpoints in Syndicate/λ allow a program to react to changes in shared assertions, there is no general symmetric ability in this minimal language for a program to directly react to changes in local fields. The only such reaction available is the automatic republication of assertions depending on field values. We will see in chapter 6 a more general form of reaction to field changes that allows the programmer to express dataflow-style dependencies on and among fields. In our example here, this ability might find use in reacting to changes in the set-valued field , updating a graphical display of the room list.
5.8Properties
While Syndicate/λ is a mathematical fiction used to explain a language design, it highlights a number of properties that a Syndicate implementation must enjoy in order to satisfy a programmer's expectations. First of all, when extending a host language with Syndicate features, care must be taken to reconcile the host language's own soundness property with the invariants demanded by constructs such as facet creation and termination, field allocation, reference and update, and so on. If the errors discussed in section 5.5 cannot be ruled out statically, they should be checked for dynamically. Of particular importance is the check for a finite result set in ; experience writing Syndicate programs thus far suggests that programmer errors of this kind are not uncommon while designing a new protocol.
Second, with the introduction of mutable state and sequencing comes the obligation to offer programmers a comprehensible model of order-of-evaluation and of visibility of intra-actor side-effects from the perspective of an actor's peers. As section 5.6 explains, Syndicate/λ supports reasoning about various kinds of atomicity preserved during evaluation. Whichever guarantees about order-of-evaluation and transactionality a host language offers should be extended to Syndicate features so as to preserve these forms of atomicity.
Finally, programmers rely on theorem 4.35's promise of the conversational cooperation of the dataspace connecting a group of actors. In the same way, they rely on an extension of this promise to include Syndicate/λ's endpoints. An implementation of Syndicate must ensure that the event-handling code associated with an endpoint runs only for relevant events, for every relevant event, and never redundantly for the same piece of knowledge. In particular, the notion of necessity developed in lemma 4.42 must be adapted in the setting of facets and endpoints to account for the way in which elides irrelevant detail from incoming patch events.
IIIPractice
Overview
In order to evaluate the Syndicate design, we must be able to write programs using it. In order to write programs, we need three things: algorithms, data structures and implementation techniques allowing us to realize the language design; a concrete instance of integration of the design with a host language; and a number of illustrative examples.
Chapter 6 builds on the formal models of chapters 4 and 5, presenting Syndicate/rkt, an extension of the Racket programming language with Syndicate features.
Chapter 7 then discusses general issues related to Syndicate implementation. First, it presents a new data structure, the assertion trie, important for efficient representation and manipulation of the sets of assertions ubiquitous to the dataspace model. Second, it turns to techniques for implementation and integration of Syndicate with a host language. Third, it describes some experimental tools for visualization and debugging of Syndicate programs.
Finally, chapter 8 presents a large number of examples demonstrating Syndicate idioms.
6Syndicate/rkt Tutorial
Now that we have explored the details of the Syndicate design in the abstract, it is time to apply the design ideas to a concrete language. This chapter introduces Syndicate/rkt, a language which extends Racket (Flatt and PLT 2010) with Syndicate language features. The aim of the chapter is to explain Syndicate/rkt in enough detail to allow the reader to appreciate the implementation ideas of chapter 7 and engage with the examples of chapter 8.
6.1Installation and brief example
The Racket-based Syndicate implementation is supplied as a Racket package. After installing Racket itself, use the command-line or DrRacket-based interactive package management tool to install the syndicate package. For example, on a Unix-like system, run the command
raco pkg install syndicate
The implementation uses Racket's #lang facility to provide a custom language dialect with Syndicate language features built-in. A Racket source file starting with
#lang syndicate
declares itself to be a Syndicate program. Before we examine details of the language, a brief example demonstrates the big picture.
6.1Figure 26 shows a complete Syndicate/rkt program analogous to the box-and-client programs shown in previous chapters (examples 4.2 and 5.1). Typing it into a file and loading it into the DrRacket IDE or running it from the command line produces an unbounded stream of output that begins
client: learned that box's value is now 0
box: taking on new-value 1
client: learned that box's value is now 1
box: taking on new-value 2
...
#lang syndicate
(message-struct set-box (new-value))
(assertion-struct box-state (value))
(spawn (field [current-value 0])
(assert (box-state (current-value)))
(on (message (set-box $new-value))
(printf "box: taking on new-value ~v\n" new-value)
(current-value new-value)))
(spawn (on (asserted (box-state $v))
(printf "client: learned that box's value is now ~v\n" v)
(send! (set-box (+ v 1)))))
Line 1 declares that the module is written using Syndicate/rkt. Lines 2 and 3 declare Racket structures: set-box is declared as a structure to be used as a message, and box-state to be used as an assertion. Lines 4–8 and 9–11 start two actors together in the same dataspace. The first actor provides a mutable reference cell service, and the second accesses the cell.
The cell initially contains the value 0 (line 4). It publishes its value as a box-state record in the shared dataspace (line 5). When it hears a set-box message (line 6), it prints a message to the console and updates its current-value field. This leads to automatic update of the assertion of line 5. The cell actor is the only party able to alter the box-state assertion in the dataspace: peers may submit requests to change the assertion, but cannot themselves change the value.
The client actor begins its life waiting to hear about the assertion of box-state records. When it learns that a new record has been asserted, it prints a message to the console and sends a set-box message, which causes the cell to update itself, closing the loop.
6.2The structure of a running program: ground dataspace, driver actors
Figure 27 shows a schematic of a running Syndicate/rkt program. The main thread drives execution of the Syndicate world, dispatching events to actors and collecting and interpreting the resulting actions. All actors and dataspaces are gathered into a single, special ground dataspace which connects Syndicate to the “outside world” of plain Racket and hence, indirectly, to the underlying operating system.
The figure shows actors relating to the program itself—some of which are running in a nested dataspace—as well as actors supplying services offered by library modules. Two driver actors are shown alongside these. The role of a driver actor in Syndicate/rkt is to offer an assertion- and message-based Syndicate perspective on some external service—the “hardware” to the actor's “driver”. Such driver actors call ordinary Racket library code, spawning Racket-level threads to perform long-lived, CPU-intensive or I/O-heavy tasks. Such threads may inject events—“hardware interrupts”—to the ground dataspace as if they were peers in some surrounding dataspace.
For example, the Syndicate/rkt TCP driver interacts with peers via a protocol of assertions and messages describing TCP/IP sockets and transmitted and received TCP segments. When a socket is requested, the driver spawns not only a Syndicate/rkt actor but also a Racket-level thread. The thread uses Racket's native event libraries to wait for activity on the TCP/IP socket, and sends Syndicate messages describing received packets, errors, or changes in socket state. These messages are delivered to the Syndicate/rkt actor corresponding to the socket, which translates them and forwards them on to the driver's peers.
Similarly, the Timer driver responds to Syndicate/rkt messages requesting its services by updating a priority-queue of pending timers which it shares with a Racket-level thread. The thread interfaces with Racket's native timer mechanisms. Each time it is signaled by Racket, it delivers an appropriate event to the ground dataspace, which is picked up by the Timer driver and forwarded to the original requesting actor.
Syndicate/rkt's driver model is inspired by Erlang's “ports” model for I/O (Erlang/OTP Design Principles 2012). The layer of indirection that a driver actor introduces between a user program and some external facility serves not only to isolate the external facility from user program failures and vice versa but also to separate concerns. The driver actor's responsibility is to implement the access protocol for the external service, no matter how complex and stateful, exposing its features in terms of a Syndicate protocol. The user program may thus concentrate on its own responsibilities, delegating management of the external service to the driver. If either party should fail, the other may gracefully shut down or take some compensating action.
6.3Expressions, values, mutability, and data types
Expressions in Syndicate/rkt are ordinary Racket expressions. While Syndicate/λ maintains a strict separation between commands and expressions, Syndicate/rkt inherits Racket's expression-oriented approach. Racket's functions replace Syndicate/λ's procedures. Ordinary Racket side-effects are available, and Racket's sequencing and order-of-evaluation are used unchanged.
Values in Syndicate/rkt are ordinary Racket values. This includes values used as assertions and message bodies. While Syndicate/λ forbids higher-order and mutable values in fields and assertions, Syndicate/rkt makes no such restriction, trusting the programmer to avoid problematic situations. Actors may exchange mutable data or use Racket's mutable variables as required, though programmers are encouraged to design protocols that honor the spirit of Syndicate by eschewing mutable structures.
The Syndicate/rkt implementation of the dataspace model must be able to inspect the elements of compound data types such as lists, vectors and records in order to fulfill its pattern-matching obligations. Racket's struct record facility defaults to creation of “opaque” records which cannot be inspected in the necessary way. While Syndicate/rkt does not forbid use of such struct definitions—in fact, their opacity is beneficial in certain circumstances (see section 7.2.1)—it is almost always better to use Racket's “prefab” structures, which allow the right kind of introspection.
The special dataspace model observation constructor and the cross-layer constructors and are represented in Syndicate/rkt as instances of structs named observe, outbound and inbound, respectively.
Mathematical notation (figure 12) | |||
Syndicate/rkt notation | (observe ) | (outbound ) | (inbound ) |
6.4Core forms
Each of the constructs of the formal model in chapter 5 maps to a feature of the implementation. In some cases, a built-in Racket language feature corresponds well to a Syndicate feature, and is used directly. In others, a feature is provided by way of a Racket library exposing new functions and data structures. In yet others, new syntax is required, and Racket's syntax-parse facility (Culpepper and Felleisen 2010) is brought to bear. Figure 28 summarizes the core forms added to Racket to yield Syndicate/rkt; figure 29 sketches a rough rubric allowing interpretation of the syntax of Syndicate/λ in terms of Syndicate/rkt.
module-level-form := ... | (require/activate require-spec ...) | struct-declaration | spawn struct-declaration := ... | (message-struct name (field ...)) | (assertion-struct name (field ...)) spawn := (spawn {#:name expr} facet-setup-expr ...) | (spawn* {#:name expr} script-expr ...) | (dataspace {#:name expr} script-expr ...) facet-setup-expr := expr | field-declaration | endpoint-expr field-declaration := (field [field-name initial-value] ...) expr := ... | (current-facet-id) | (observe expr) | (outbound expr) | (inbound expr) script-expr := expr | (react facet-setup-expr ...) | (stop-facet expr script-expr ...) | (stop-current-facet) | field-declaration | spawn | (send! script-expr) endpoint-expr := (assert {#:when test-expr} pattern) | (on-start script-expr ...) | (on-stop script-expr ...) | (on {#:when test-expr} event-pattern script-expr ...) event-pattern := (asserted pattern) | (retracted pattern) | (message pattern)
(void) | |
(begin ) | |
( ) | |
(let (( )) ) | |
(begin (field [ ]) ) | |
( ) | |
(send! ) | |
(spawn* ) | |
(dataspace ) | |
(react (define (current-facet-id)) (assert ) (on-start ) (on-stop ) (on ) ) | |
(stop-facet ) |
Programs and modules.
A module written in the syndicate dialect not only provides constants, functions, and structure type definitions to its clients, as ordinary Racket modules do, but also offers services in the form of actors to be started when the module is activated. Thus, each module does double duty, serving as either or both of a unit of program composition and a unit of system composition. In order to start a Syndicate/rkt program running, a user specifies a module to serve as the entry point. That module is activated in a fresh, empty dataspace, along with any actors created during activation of service modules it depends upon.
The nonterminal module-level-form in figure 28 specifies the Syndicate/rkt extensions to Racket's module-level language. A client may require a module, as usual, or may require and activate it by using the require/activate form. Activation is idempotent within a given program: a particular module's services are only started once.
The module-level language is also extended with two new structure-definition forms (nonterminal struct-declaration), message-struct and assertion-struct. The former is intended to declare structures for use in messages, while the latter declares structures for assertions. Each is a thin veneer over Racket's “prefab” structure definition facility.
Abstraction facilities.
In order to remain minimal, Syndicate/λ includes little in the way of abstraction facilities. However, in Syndicate/rkt, we wish to permit abstraction over field declaration, assertion- and event-handling endpoint installation, and facet creation and tear-down, as well as the usual forms of abstraction common to Racket programming. Therefore, we make abstraction facilities like define, let, define-syntax and let-syntax available throughout the Syndicate/rkt language.
However, not all Syndicate constructs make sense in all contexts. For example, it is nonsensical to attempt to declare an endpoint outside a facet. Syndicate/λ includes special syntactic positions for declaration of endpoints, keeping them clearly (and statically) distinct from positions for specifying commands. This approach conflicts with the desire to reuse Racket's abstraction facilities in all such syntactic positions. Syndicate/rkt therefore brings most Syndicate constructs into a single syntactic class—that of expressions—and relies on a dynamic mechanism to rule out inappropriate usage of Syndicate constructs. An internal flag keeps track of whether the program is in “script” or “facet setup” context.
Figure 28 reflects this dynamic context in its use of nonterminals script-expr and facet-setup-expr. Script expressions may only be used within event-handlers and in ordinary straight-line Racket code. They include expressions which perform side effects such as spawning other actors or sending messages. Facet setup expressions may only be used in contexts where a new facet is being configured. They include expressions which construct both assertion and event-handling endpoints.
Sending messages.
The send! form broadcasts its argument to peers via the dataspace. That is, a message action is enqueued for transmission to the dataspace when the actor's behavior function eventually returns. Message sending, like all other actions, is thus asynchronous from the perspective of the Syndicate/rkt programmer.
Spawning actors and dataspaces.
The nonterminal spawn in figure 28 is available not only at the module level but also anywhere a script-expr is permitted within a running actor. The three forms spawn, spawn*, and dataspace correspond to the Syndicate/λ commands and . Each of the first two, like Syndicate/λ's , constructs a sibling actor in the current dataspace; the third, a nested dataspace whose initial actor runs the script-exprs specified. The two variations spawn and spawn* relate to each other as follows:
(spawn facet-setup-expr ...) (spawn* (react facet-setup-expr ...))
Initially, Syndicate/rkt included only spawn* (written, at the time, “spawn”); a survey of programs showed that the overwhelming majority of uses of spawn* were of the form that the current spawn abbreviates, namely an actor with a single initial facet.
If a spawn, spawn*, or dataspace is supplied with a #:name clause, the result of the corresponding expr is attached to the created actor as its name for debugging and tracing purposes. The name is never made available to peers via assertions or messages in the dataspace.
Facet creation and termination.
The react form causes addition of a new facet to the currently-running actor, nested beneath the currently-active facet, or as the root of the actor's facet tree if used immediately within spawn*. The body of the react form is in “facet setup” context, and declares the new facet's endpoints. Unlike in Syndicate/λ, the facet's name is not manifest in the syntax. Instead, a facet may retrieve its system-generated name with a call to the procedure current-facet-id. Facet IDs may be freely stored in variables, passed as procedure arguments, and so on.
A facet ID must be supplied as the first argument to a stop-facet form, which is the Syndicate/rkt analogue of Syndicate/λ's . For example, the following program starts an actor whose root facet immediately terminates itself:
(spawn (on-start (stop-facet (current-facet-id))))
The program is analogous to the Syndicate/λ program
(stop-current-facet script-expr ...) (stop-facet (current-facet-id) script-expr ...)
captures a common form of use of stop-facet and current-facet-id.
A stop-facet form includes an optional sequence of script-exprs. These are executed outside the stopping facet, once its subscriptions and assertions have been completely withdrawn, in the context of the facet's own containing facet (if any). That is, an expression such as
(react (on (message 'x) (stop-facet (current-facet-id) (react ...))))
(1) creates a facet, which upon receipt of a message 'x (2) terminates itself, and (3) effectively replaces itself with another facet, whose body is shown as an ellipsis. Compare to
(react (on (message 'x) (react ...)))
which upon receipt of each 'x creates an additional, nested subfacet; and
(react (on (message 'x) (stop-current-facet) (react ...)))
which not only terminates itself when it receives an 'x message but also creates a nested subfacet, which is shortly thereafter destroyed as a consequence of the termination of its parent.
Field declaration, access and update.
Syndicate/rkt allows declaration of fields in both “script” and “facet setup” contexts. The field form borrows its syntactic shape from Racket's support for object-oriented programming; it acts as a define of the supplied field-names, initializing each with its initial-value.
Fields are represented as procedures in Syndicate/rkt. When called with no arguments, a field procedure returns the field's current value; when called with a single argument, the field procedure updates the field's value to the given argument value.
Endpoint declaration.
In facet setup context, the forms assert, on-start, on-stop, and on are available for creation of assertion and event-handling endpoints.
The assert form allows a facet to place assertions into the shared dataspace. For example, given the following structure definitions,
(assertion-struct user-present (user-name))
(message-struct say-to (user-name utterance))
the endpoint
(assert (user-present 'Alice))
asserts a user-present record, keeping it there until the endpoint's facet is terminated.
The on form allows facets to react to events described by the event-pattern nonterminal. Each possibility corresponds to the analogous event-pattern in Syndicate/λ. For example,
(on (asserted (user-present $name))
(send! (say-to name "Hello!")))
reacts to the assertion of (user-present 'Alice) with a message, (say-to 'Alice "Hello!").
Both assert and on forms take a pattern, within which most Racket expressions are permitted. Use of the discard operator (_) in a pattern corresponds to Syndicate/λ's ; that is, to a wildcard denoting the universe of assertion values. In an on form, it acts to accept and ignore arbitrary structure in matched assertions or messages. Variables x introduced via binders $x embedded in an on form's pattern are available in the form's script-exprs; it is an error to include a binder in a pattern in an assert form. Patterns may make reference to current field values, and any fields accessed are called the endpoint's dependent fields.
A pattern is evaluated to yield a set of assertions, both initially and every time a dependent field is updated. For on forms, an observe structure constructor is added to each assertion in the denoted set. This process of assertion-set extraction is analogous to Syndicate/λ's use of the metafunction of figure 19.
Whenever an endpoint's pattern is re-evaluated, the resulting assertions are placed in the surrounding dataspace by way of a state-change notification action. However, if a #:when clause is present in the assert or on form, the corresponding test-expr is evaluated just before actually issuing the action. If test-expr yields a false value, no action is produced. This allows conditional assertion and conditional subscription. In particular, a #:when clause test-expr may depend on field values; if it does, the fields are considered part of the dependent fields.
The on-start and on-stop forms introduce facet-startup and -shutdown event handlers. The former are executed once the block of facet-setup-exprs has finished configuring the endpoints of the facet and after the facet's new assertions (including its subscriptions) have been sent to the surrounding dataspace. The latter are executed just prior to withdrawal of the facet's endpoint subscriptions during the facet shutdown process.
An on-start form may be used to send a message in a context where a corresponding reply-listener is guaranteed to be active and listening; for example, in
(react (on-start (send! 'request))
(on (message 'reply ...)))
the 'request is guaranteed to be sent only after the subscription to the 'reply has been established, ensuring that the requesting party will receive the reply even if the replying party responds immediately.
An on-stop form may be used to perform cleanup actions just prior to the end of the conversational context modeled by a facet; for example, in
(react (on (retracted 'connection) (stop-current-facet))
(on-stop (send! 'goodbye)))
the 'goodbye message is guaranteed to be sent before the subscription to 'connection assertions is withdrawn. Any number of on-start and on-stop forms may be added during facet setup.
6.5Derived and additional forms
Figure 30 summarizes derived forms that build upon the core forms to allow concise expression of frequently-employed concepts.
script-expr := ... | blocking-facet-expr facet-setup-expr := ... | derived-endpoint-expr | dataflow-expr derived-endpoint-expr := (stop-when {#:when expr} event-pattern script-expr ...) | (stop-when-true expr script-expr ...) | (during pattern facet-setup-expr ...) | (during/spawn pattern {#:name expr} script-expr ...) | query-endpoint-expr query-endpoint-expr := (define/query-value field-name expr pattern script-expr add/remove) | (define/query-set field-name pattern script-expr add/remove) | (define/query-hash field-name pattern script-expr script-expr add/remove) | (define/query-hash-set field-name pattern script-expr script-expr add/remove) | (define/query-count field-name pattern script-expr add/remove) add/remove := {#:on-add script-expr} {#:on-remove script-expr} blocking-facet-expr := (react/suspend (id) facet-setup-expr ...) | (until event-pattern facet-setup-expr ...) | (flush!) | immediate-query dataflow-expr := (begin/dataflow script-expr ...) | (define/dataflow field-name script-expr {#:default expr}) immediate-query := (immediate-query query-spec ...) query-spec := (query-value expr pattern script-expr) | (query-set pattern script-expr) | (query-hash pattern script-expr script-expr) | (query-hash-set pattern script-expr script-expr) | (query-count pattern script-expr)
Facet termination.
A common idiom is to terminate a facet in response to an event. The abbreviation stop-when is intended for this case:
(stop-when P E ...) (on P (stop-facet (current-facet-id) E ...))
The script-exprs (E ...) are placed inside the stop-facet command, and so are executed outside the stopping facet. The example from above could be written
(react (stop-when (message 'x) (react ...)))
This style of use of stop-when gives something of the flavor of a state-transition in a state machine, since the script-exprs are in a kind of “tail position” with respect to the stopping facet.
Sub-conversations and subfacets.
A second, even more common idiom is that of Syndicate/λ's (section 5.7), which introduces a nested subfacet to an actor for the duration of each assertion matching a given pattern. The triggering assertion acts as a conversational frame, delimiting a sub-conversation. The Syndicate/rkt during form corresponds to a stereotypical usage of core forms:
(during ...) (on (asserted ) (react (stop-when (retracted )) ...))
where is derived from by replacing every binder $x in with the corresponding x.
Just as Syndicate/λ's had a spawning variant, so Syndicate/rkt has during/spawn. The variant form spawns an actor in the dataspace instead of creating a subfacet, confining the scope of failure to an individual sub-conversation rather than allowing a crashing sub-conversation to terminate the actor as a whole.
Streaming queries.
The assert form allows actors to construct shared assertions from values of local fields. To operate in the other direction, updating a field based on some aggregate function over a set of assertions, Syndicate/rkt offers a suite of define/query-* forms. For example, given a structure definition
(assertion-struct mood (user-name description))
representing a user's mood, we may declare a field that tracks the set of all user names that have an associated mood via
(define/query-set moody-users (mood $n _) n)
or a field that collects all available mood descriptions into a local hash table via
(define/query-hash all-moods (mood $n $m) n m)
The resulting fields contain ordinary Racket sets and hash tables. In cases where only a single assertion matching a given pattern is expected, the define/query-value form extracts that single value, falling back on a default during periods when no matching assertion exists in the shared dataspace:
(define/query-value alice-mood 'unknown (mood 'Alice $m) m)
If an #:on-add or #:on-remove clause is supplied to a define/query-* form, the corresponding expressions are evaluated immediately prior to updating the form's associated field upon receiving a relevant change notification.
General-purpose field dependencies.
The Syndicate/rkt implementation uses a simple dependency-tracking mechanism to determine which endpoint patterns depend on which fields, and exposes this mechanism to the programmer. Each begin/dataflow form in a facet setup context introduces a block of code that may depend on zero or more fields. Like an endpoint's pattern, it is executed once at startup and then every time one of its dependent fields is changed. For example, the following prints a message each time Alice's mood changes in the dataspace:
(react (define/query-value alice-mood 'unknown (mood 'Alice $m) m)
(begin/dataflow
(printf "Alice's mood is now: ~a\n" (alice-mood))))
Of course, for a simple example like this there are many alternative approaches, including use of during with an on-start handler:
(react (during (mood 'Alice $m)
(on-start (printf "Alice's mood is now: ~a\n" (alice-mood)))))
or use of an #:on-add clause in define/query-value:
(react (define/query-value alice-mood 'unknown (mood 'Alice $m) m
#:on-add (printf "Alice's mood is now: ~a\n" (alice-mood))))
An important difference between the latter two and the first variation is that only the first variation prints a message if some other piece of code updates the alice-mood field. It is this ability to react to field changes specifically, rather than to dataspace assertion changes generally, that makes begin/dataflow useful.
The form define/dataflow is an abbreviation for definition of a field whose value directly depends on the values of other fields and that should be updated whenever its dependents change:
(define/dataflow F E) (begin (field [F #f]) (begin/dataflow (F E)))
and the form stop-when-true is an abbreviation useful for terminating a facet in response to a predicate over a collection of fields becoming true:
(stop-when-true test-expr E ...)
(begin/dataflow (when test-expr (stop-facet (current-facet-id) E ...)))
Blocking expressions.
The uniformly event-driven style of Syndicate can make it difficult to express certain patterns of control-flow involving blocking requests. Absent special support, the programmer must fall back on manual conversion to continuation-passing style followed by defunctionalization of the resulting continuations, yielding an event-driven state machine (Felleisen et al. 1988). Fortunately, Racket includes a rich library supporting delimited continuations (Felleisen 1988; Flatt et al. 2007). Syndicate/rkt allows event-handlers to capture their continuation up to invocation of the event-handler body and no further, replacing it by a nested subfacet. The subfacet may, in response to a later event, restore the suspended continuation, resuming its computation. This allows the programmer to use a blocking style to describe remote-procedure-call-like interactions with an actor's peers, reminiscent of the style made available by a similar use of continuations in the context of web applications (Queinnec 2000; Graunke et al. 2001).
The react/suspend form suspends the active event-handler, binds the suspended continuation to identifier id, and adds a new subfacet configured with the form's facet-setup-exprs. If one of the new subfacet's endpoints later invokes the continuation in id, the subfacet is terminated, and the arguments to id are returned as the result of the react/suspend form. For example,
(printf "Received: ~a" (react/suspend (k) (on (message (say-to 'Alice $v) (k v))))
waits for the next message sent to 'Alice, and when one arrives, prints it out.
The until form is built on react/suspend, and allows temporary establishment of a set of endpoints until some event occurs. It is defined as
(until E F ...) (react/suspend (k) (stop-when E (k (void))) F ...)
As an example of its use, the following Syndicate/rkt library procedure interacts with a timer service and causes a running event-handler to pause for a set number of seconds:
(define (sleep sec)
(define timer-id (gensym 'sleep))
(until (message (timer-expired timer-id _))
(on-start (send! (set-timer timer-id (* sec 1000.0) 'relative)))))
An important consideration when programming with react/suspend and its derivatives is that the world may change during the time that an event-handler is “blocked”. For example, the following actor has no guarantee that the two messages it prints display the same value:
(message-struct request-printout ())
(message-struct increment-counter ())
(spawn (field [counter 0])
(on (message (increment-counter))
(counter (+ (counter) 1))))
(on (message (request-printout))
(printf "Counter (before sleep): ~a\n" (counter))
(sleep 1)
(printf "Counter (after sleep): ~a\n" (counter))))
Point-in-time queries.
The define/query-* forms allow an actor to reflect a set of assertions into a local field on an ongoing basis. However, some programs call for sampling of the set of assertions present at a given moment in time, rather than establishment of a long-running streaming query. For these programs, Syndicate/rkt offers the immediate-query form, built atop a construct called flush!. The latter is approximately defined as
(flush!) (let ((x (gensym))) (until (message x) (on-start (send! x))))
and acts to force all preceding actions to the dataspace and allow them to take effect before proceeding. In particular, if new endpoints have established subscriptions to some set of assertions, then flush! allows the dataspace a chance to transmit matching assertions to those endpoints before execution of the continuation of flush! proceeds.
The immediate-query form makes use of flush! by establishing a temporary subfacet using react/suspend, creating temporary fields tracking the requested information, and performing a single flush! round in an on-start handler before releasing its suspended continuation. For example,
(immediate-query [query-value 'unknown (mood 'Alice $m) m])
= (react/suspend (k)
(define/query-value v 'unknown (mood 'Alice $m) m)
(on-start (flush!) (k (v))))
retrieves the current mood of 'Alice without setting up a facet to track it over the long term. Likewise,
(immediate-query [query-set (mood $n _) n])
= (react/suspend (k)
(define/query-set v (mood $n _) n)
(on-start (flush!) (k (v))))
retrieves the names of all users with some mood recorded at the time of the query.
Users of immediate-query should remain aware that the world may change during the time the query is executing, since it is based on react/suspend. Because immediate-query yields to the dataspace, other events may arrive between the moment the query is issued and the moment it completes.
6.2Figure 31 demonstrates a number of the language features just introduced. The program starts four actors: a printer (line 6), which displays messages on the standard output; a flip-flop (line 9), which transitions back and forth between active and inactive states in response to a toggle message, and which asserts an active record only when active; a monitor-flip-flop actor (line 18), which displays a message (via the printer) every time the flip-flop changes state; and a periodic-toggle actor (line 21), which interfaces to the system timer driver and arranges for delivery of a toggle message every second. Figure 32 shows the structure of the running program.
#lang syndicate
(require/activate syndicate/drivers/timestate)
(assertion-struct active ())
(message-struct toggle ())
(message-struct stdout-message (body))
(spawn #:name 'printer
(on (message (stdout-message $body))
(displayln body)))
(spawn* #:name 'flip-flop
(define (active-state)
(react (assert (active))
(stop-when (message (toggle))
(inactive-state))))
(define (inactive-state)
(react (stop-when (message (toggle))
(active-state))))
(inactive-state))
(spawn #:name 'monitor-flip-flop
(on (asserted (active)) (send! (stdout-message "Flip-flop is active")))
(on (retracted (active)) (send! (stdout-message "Flip-flop is inactive"))))
(spawn #:name 'periodic-toggle
(field [next-toggle-time (current-inexact-milliseconds)])
(on (asserted (later-than (next-toggle-time)))
(send! (toggle))
(next-toggle-time (+ (next-toggle-time) 1000))))
The flip-flop actor makes use of Syndicate/rkt's abstraction facility to define two local procedures, active-state and inactive-state (lines 10 and 14, respectively). When called, active-state constructs a facet that asserts the active record (line 11) and waits for a toggle message (line 12). When such a message arrives, the facet terminates itself using stop-when and performs the action of calling the inactive-state procedure. In effect, when toggled, an active facet replaces itself with an inactive facet, demonstrating the state-transition-like nature of stop-when endpoints. The inactive-state procedure is similar, omitting only the assertion of the active record.
Despite its lack of explicit fields, the flip-flop actor is stateful. Its state is implicit in its facet structure. Each time it transitions from inactive to active state, or vice versa, the facet tree that forms part of the actor's implicit control state is updated.
Only one of the four actors, periodic-toggle, maintains explicit state. Its next-toggle-time field keeps track of the next moment that a toggle message should be transmitted. Each time the field is updated (line 25), Syndicate/rkt's change-propagation mechanism ensures that the assertion resulting from the subscription of line 23, namely
(observe (later-than (next-toggle-time)))
is updated in the dataspace. The service actor started by the timestate driver, activated by the require/activate form on line 2, is observing observers of later-than records, and coordinates with the underlying Racket timer mechanism to ensure that an appropriate record is asserted once the moment of interest has passed.
6.6Ad-hoc assertions
From time to time, it is convenient to augment the set of assertions currently expressed by an actor without constructing an endpoint or even a whole facet to maintain the new assertions. Syndicate/rkt provides two imperative commands, assert! and retract! which allow ad-hoc addition and removal of assertions. While most programs will never use these commands, they occasionally greatly simplify certain tasks.
(assertion-struct file (name content))
(message-struct save (name content))
(message-struct delete (name))
(spawn (field [files (hash)])
(during (observe (file $name _))
(assert (file name (hash-ref (files) name #f))))
(on (message (save $name $content))
(files (hash-set (files) name content)))
(on (message (delete $name))
(files (hash-remove (files) name))))
(spawn (field [files (hash)] [monitored (set)])
(on (asserted (observe (file $name _)))
(assert! (file name (hash-ref (files) name #f)))
(monitored (set-add (monitored) name)))
(on (retracted (observe (file $name _)))
(retract! (file name (hash-ref (files) name #f)))
(monitored (set-remove (monitored) name)))
(on (message (save $name $content))
(when (set-member? (monitored) name)
(retract! (file name (hash-ref (files) name #f)))
(assert! (file name content)))
(files (hash-set (files) name content)))
(on (message (delete $name))
(when (set-member? (monitored) name)
(retract! (file name (hash-ref (files) name #f)))
(assert! (file name #f)))
(files (hash-remove (files) name))))
Consider figure 33, which implements a simple “file system” abstraction using the protocol structures defined on lines 1–3. Clients assert interest in file records, and in response the server examines its database (the hash table held in its files field) and supplies the record of interest. Because the server uses during (line 5), each distinct file name results in a distinct sub-conversation responsible for maintaining a single assertion (line 6). Subsequent save or delete requests (lines 7–10) update the files table, which automatically causes recomputation of the assertions resulting from different instances of line 6.
An alternative approach is shown in figure 34. Here, conversational state is explicitly maintained in a new field, monitored, which holds a set of the file names known to be of current interest. In response to a newly-appeared assertion of interest (line 2), the server updates monitored (line 4) but also uses assert! to publish an initial response record in the dataspace. Subsequent save or delete requests (lines 8–17) replace this record by using retract! followed by assert!, but only do so if the modified file is known to be of interest.
(assertion-struct lease-request (resource-id request-id))
(assertion-struct lease-assignment (resource-id request-id))
(define (spawn-resource resource-id total-available-leases)
(spawn (field [waiters (make-queue)]
[free-lease-count total-available-leases])
(on (asserted (lease-request resource-id $w))
(cond [(positive? (free-lease-count))
(assert! (lease-assignment resource-id w))
(free-lease-count (- (free-lease-count) 1))]
[else
(waiters (enqueue (waiters) w))]))
(on (retracted (lease-request resource-id $w))
(waiters (queue-filter (lambda (x) (not (equal? w x)))
(waiters)))
(retract! (lease-assignment resource-id w)))
(on (retracted (lease-assignment resource-id $w))
(cond [(queue-empty? (waiters))
(free-lease-count (+ (free-lease-count) 1))]
[else
(define-values (w remainder) (dequeue (waiters)))
(assert! (lease-assignment resource-id w))
(waiters remainder)]))))
A second example of the use of assert! and retract! is shown in figure 35. The program implements something like a counting semaphore. Here, assert! and retract! are used to maintain up to total-available-leases separate lease-assignment records describing parties requesting use of the semaphore. The first-in, first-out nature of the lease assignment process does not naturally correspond to a nested facet structure; no obvious solution using during springs to mind. A remarkable aspect of this program is the use of retract! on line 15, in response to a withdrawn lease request. The party withdrawing its request may or may not currently hold one of the available resources. If it does, the retract! corresponds to a previous assert! (on either line 8 or line 21) and so results in a patch transmitted to the dataspace and a corresponding triggering of the endpoint on line 16; if it does not, then the retract! has no effect on the actor's assertion set, since set subtraction is idempotent, and therefore the lease request vanishes without disturbing the rest of the state of the system.
The assert! and retract! commands manipulate a “virtual” endpoint which is considered to exist alongside the real endpoints within the actor's facets. The effects of the commands are therefore only visible to the extent they do not interfere with assertions made by other facets in the actor. For example, if an actor has an existing facet that has an endpoint (assert 1), and it subsequently performs (assert! 1) and (assert! 2) in response to some event, the patch action sent to the dataspace contains only 2, since 1 was previously asserted by the assertion endpoint. If, later still, it performs (retract! 1) and (retract! 2), the resulting patch action will again only mention 2, since 1 remains asserted by the assertion endpoint.
Very few programs make use of this feature; it is not implemented at all in Syndicate/js. Usually, given freedom to design a protocol appropriate for Syndicate, pervasive use of assertions over messages allows during and nested facets in general to be used instead of assert! and retract!. Setting aside the unusual case of the “semaphore” of figure 35, there are two general areas in which the two commands are helpful:
- They can be used to react to the absence of particular assertions in the dataspace (as seen in the “loading indicator” example discussed in section 9.6).
- They are useful in cases where Syndicate implements a protocol not designed with dataspaces in mind, where messages in the foreign protocol update conversational state that corresponds to assertions on the Syndicate side.
An example of this latter is the IRC client driver. When a user joins an IRC channel, the IRC protocol requires that the server send a list of existing members of the channel to the new user using special syntax—zero or more 353 messages containing nicknames, followed by a 366 message to signal the end of the list—before transitioning into an incremental group-maintenance phase, where separate JOIN and PART messages signal the appearance and disappearance of channel members. The Syndicate protocol for participating in an IRC channel, however, maintains assertions describing the membership of each channel. The mismatch between the IRC protocol's use of messages and the Syndicate protocol's use of assertions is addressed by trusting the server to send appropriate messages, and reflecting each 353, JOIN, and PART into appropriate assert! and retract! actions.
7Implementation
So far, our discussion of Syndicate has been abstract in order to concentrate on the essence of the programming model, with just enough concrete detail provided to allow us to examine small, realistic examples. Now it is time to turn to techniques for making implementations of Syndicate suitable for exploration of the model using practical programs.
The notion of sets of assertions is at the heart of the dataspace model. The formal models presented in chapters 4 and 5 rely on representations of potentially-infinite assertion sets that are amenable to efficient implementations of set union, intersection, complement, and subtraction. These operations are the foundations of the set comprehensions used extensively in the metafunctions used throughout chapter 4. For these reasons, I will begin by examining a trie-based data structure I have developed, that I call an assertion trie (section 7.1), which represents and allows efficient operation on assertion sets. After presenting the data structure in the abstract, I will turn to concrete details of its implementation.
With a suitable data structure in hand, producing an implementation of the dataspace model can be as simple as following the formal model of chapter 4. Section 7.2 gives an overview of the core of an implementation of the dataspace model.
In addition, Syndicate offers not only a means of structuring interactions among groups of components, but also features for structuring state and reactions within a single component, as presented in chapter 5. Section 7.3 describes implementation of the full design.
Finally, by directly representing structures from the formal model in an implementation, we unlock the possibility of reflecting on running programs in terms of the concepts of the formal model. Section 7.4 describes initial steps toward programming tools that exploit this connection to assist in visualization and debugging of Syndicate programs.
While it is both intuitively apparent and borne out by experience that assertion tries are able to faithfully reflect the semantics of a certain kind of sets, and that the prototype Syndicate implementations as a whole live up to the formal models of previous chapters, this chapter will not go beyond making informal claims about these issues. The level of rigor of the implementation work thus far is the usual informal connection to what is proposed to be an underlying formal model. To get to the level of rigor of, say, VLISP (Oliva, Ramsdell and Wand 1995), we would have to formalize the claims and prove them; to get to the level of CompCert (Leroy 2009), we would have to mechanize the claims and proofs.
7.1Representing Assertion Sets
As the model shows, evaluation of Syndicate programs involves frequent and complex manipulation of sets of assertions. While the grammar and reduction rules of the calculus itself depend on only a few elements of the syntax of assertions, namely the unary constructors , , and , much of the power of the system comes from the ability to use a wildcard in user programs specifying sets of assertions to be placed in a dataspace. The wildcard symbol is interpreted as the infinite set of all possible assertions. This leads us to a central challenge: the choice of representation for assertion sets in implementations of Syndicate.
There are a number of requirements that must be satisfied by our representation.
- Efficient computation of metafunctions.
- Metafunctions such as , , and (section 4.6) must have efficient and accurate implementations in order to deliver correct SCN events to the correct subset of actors in a dataspace in response to a SCN action, without wasting time examining assertions made by actors that are not interested in the change represented by the SCN action.
- Efficient message routing.
- Some Syndicate programs make heavy use of message sending. For these programs, it is important to be able to quickly discover the set of actors interested in a given message.
- Compactness.
- The representation of the dataspace must not grow unreasonably large as time goes by. In particular, many Syndicate programs assert and retract the same assertions over and over as they proceed, and it is important that the representation of the dataspace not grow without bound in this case.
- Generality.
- Assertions are the data structures of Syndicate programming. The representation of assertion sets must handle structures rich enough to support the protocols designed by Syndicate programmers. Likewise, it must support embedding wildcards in assertions in a general enough way that common uses of wildcard are not ruled out.
In this section, I will present Assertion Tries. An Assertion Trie is a data structure based on a trie that satisfies these requirements, offering support for semi-structured S-expression assertions with embedded wildcards, sufficient for the examples and case studies presented in this dissertation. In an implementation of Syndicate, these tries are used in many different contexts. First and foremost, they are used to index dataspaces, mapping assertions to actor identities, but they are also used to represent sets of assertions alone, mapping assertions to the unit value, and therefore to represent both monolithic and incremental SCNs (patches).
7.1.1Background
A trie (de la Briandais 1959; Fredkin 1960), also known as a radix tree or prefix tree, is a kind of search tree keyed by sequences of symbols. Each edge in the trie is labeled with a particular symbol, and searches proceed from the root down toward a leaf, following each symbol in turn from the sequence comprising the key being searched for. Tries are commonly used in applications such as phone switches for routing telephone calls, and in publish/subscribe messaging middleware (Ionescu 2010; Ionescu 2011) for routing messages to subscribers. In the former case, the phone number is used as a key, and each digit is one symbol; in the latter, some property of the message itself, commonly its “topic”, serves as the key. In both cases, tries are a good fit because they permit rapid discarding of irrelevant portions of the search space. Standard data structures texts give a good overview of the basics (for example, Cormen et al. 2009, section 2-12).
Each trie node logically has a finite number of edges emanating from it, making tries directly applicable to tasks such as phone call routing, where the set of symbols at each step is finite and small. They also work well for cases of message routing where, while the set of possible symbols at each step is not finite, each subscription involves a specific sequence of symbols without wildcards.
Given two tries, interpreting each as the set of keys that it matches, it is straightforward and efficient to compute the trie corresponding to the set union, set intersection, or set difference of the inputs. However, set complement poses a problem: tries cannot represent cofinite sets of edges emanating from a node. This poses a difficulty for supporting wildcards, since a wildcard is supposed to correspond to the set of all possible symbols, a special case of a cofinite set.
Finally, tries work well where edges are labeled with unstructured symbols. Tries cannot easily represent patterns over semi-structured data such as S-expressions.
The data structure must be adapted in order to properly handle both semi-structured keys and wildcards.
7.1.2Semi-structured assertions & wildcards
While a trie must use sequences of tokens as keys, we wish to key on trees. Hence, we must map our tree-shaped assertions, which have both hierarchical and linear structure, to sequences of tokens that encode both forms of structure. To this end, we reinterpret assertions as sequences of tokens by reading them left to right and inserting a distinct tuple-marker token labeled with the arity of the tuple it introduces to mark entry to a nested subsequence.
Let us limit our attention to S-expressions over atoms, sometimes extended with a wildcard marker, :
Wildcards are not the only option for matching multiple values. In principle, any (decidable) predicate can be used, so long as it can be indexed suitably efficiently. As examples of something narrower than wildcard, but more general than matching specific values, consider range queries over integers and type predicates. Range queries such as can be used in protocols involving contiguous message identifiers for bulk acknowledgement as well as for flow control. Type predicates such as Racket's number? and string? can extend our language of assertion patterns with something reminiscent of occurrence typing (Tobin-Hochstadt and Felleisen 2008).
7.1Meaning of wild S-expressionsEach element of has a straightforward interpretation as a set of s:
The alphabet of tokens we will use, , consists of the atoms, plus a family of tuple-markers (not themselves s), each subscripted with its arity: is the token introducing a 0-ary tuple, a unary tuple, a pair, and so on. Matching “pop” tokens are not included: they follow implicitly from arities used on tuple-markers in sequences of tokens. We will write for the arity of a given token: for all atoms, ; for all tuple-markers, . We will sometimes want to extend the set of tokens with our wildcard marker; we will write for this set.
7.2Serialization of S-expressionsWe now have the ingredients we need to read S-expressions as sequences of tokens using the following definition, and the analogous extended to wild S-expressions and wild tokens by :
7.3The S-expression translates to the following token sequence:
The correctness of some of our operations on assertion tries depends on the idea of a well-formed sequence of tokens; namely, one that corresponds to some .
7.4Parsing of token sequencesWe define the (partial) function (and its obvious extension , ) to parse a sequence of tokens into a and an unconsumed portion of the input:
We extend and to sequences of tokens representing an -tuple of s with and .
7.5Well-formed token sequences Exactly those token sequences for which is defined and yields an empty remainder are the -well-formed token sequences:
7.6For all , and likewise for , mutatis mutandis.
7.6By induction on (respectively ).
7.7For all , if then , and likewise for , mutatis mutandis.
7.7By induction on (respectively ).
7.1.3Assertion trie syntax
Tries themselves are polymorphic in the values carried at their leaves, and consist of the following recursive definitions:
The notion of well-formedness developed for token sequences generalizes to assertion tries by reading token sequences along the edges in a trie stretching from the root to each leaf. The intuition is that if a path in an -well-formed trie ends at an node, then the tokens labeling that path denote exactly s. In addition, all paths in an -well-formed trie should be no longer than necessary. That is, if we traverse s' worth of edges from the root of an -well-formed trie, we will either arrive “early” at an node, or “exactly on time” at an or an node, but will never end up at a node.
7.8Well-formed triesWe reuse the notation for tries. We will again write for . The predicate is defined by structural induction on by three cases:
- ; that is, is -well-formed for all .
- ; that is, an trie is only ever -well-formed.
- if both and ; that is, is -well-formed if is -well-formed and every is -well-formed for every -labeled edge leading away from the node.
This definition deserves an illustration. Following our intuition, the trie
7.9Meaning of triesEach element of such that has an interpretation as a set of pairs of -tuples of s and elements of , , defined in figure 36. Intuitively, traverses the trie, accumulating not only token sequences along paths but also a set of s that are “handled elsewhere”; when it reaches an node, it interprets the sequence, and then rejects any s in the “handled elsewhere” set. When , a trie represents a set of -tuples of s.
7.1.4Compiling patterns to tries
Equipped with syntax for tries, we may define a function which translates wild S-expressions to tries (figure 37).
7.10If , then and .
7.11Consider the wild S-expression , representing the infinite set of all 4-ary tuple S-expressions with first element and second element . To translate this into an equivalent trie, also representing a simple set of assertions, we choose to instantiate with :
7.1.5Representing Syndicate data structures with assertion tries
Syndicate implementations use assertion tries in two ways. The first application is to represent a set of assertions. We use , where trie leaves are placeholders , for this purpose. For example, such tries represent assertion sets in patch events and actions. The second application is to represent the contents of a dataspace; namely, a set of pairs of assertions and actor IDs, or equivalently a map from assertions to sets of actor IDs. Here, we use , and leaves carry sets of actor IDs.
A common operation in Syndicate implementations is relabeling, used among other things to convert back-and-forth between and instances:
7.12If , then implies iff .
7.1.6Searching
A straightforward adaptation of the usual trie-searching algorithm to take wildcards into account (figure 38) allows discovery of the set of actors interested in receiving a copy of a given message. Given a candidate message and a trie representing a dataspace, we first convert the message to an equivalent sequence of tokens, and then walk the trie using the token-sequence as a key:
7.13Sound searching If , meaning that for some , and , then both (a) iff there is no such that , and (b) iff some unique exists such that .
The algorithm can be further adapted to support wildcards embedded in messages, representing simultaneous searching for a particular infinite set of keys. This extended version of has signature , not only allowing in the input token sequence but also demanding a function used to combine s when a wildcard input matches more than one branch of the trie. Allowing wildcards in messages gives a flavor of broadcast messaging dual to the normal type: where usually a pattern declaring interest in messages admits multiple possible messages, and each delivery includes a single message, here we may use a group of patterns each declaring interest in a single message, while each delivery includes multiple messages, by way of the wildcard mechanism.
Hybrids are also possible and useful. For example, consider an instant-messaging system where each connected user has asserted interest in the pair of their own name and the wildcard, for example and . Sending a wildcard message delivers a greeting to all connected users, and sending a specifically-addressed message delivers the greeting to a single participant.
The prototype Syndicate implementations use in a few different ways. First, as discussed, to route a message to a set of actors. Here, the extension of to wildcard-carrying messages is natural. Second, in Syndicate facets, is used in the implementation of (definition 5.23) to interrogate the actor's memory when a patch arrives, to see whether a given assertion of interest was “already known” or whether it is new to the actor concerned. Finally, finds use in filtering of messages by “firewall” proxies; see section 11.3.