Sunday, June 7, 2015

How to Make View-Independent Program Models

In Part 1 of this two part series, I made an argument that the standard way of structuring program authoring tools involves a peculiar and unnecessary model/view coupling that makes a number of problems in the programming tools domain more difficult than they need to be. Here I'll be describing a simple, general way of constructing generic 'program models,' which have much in common with ASTs, but improve on them in a couple of critical ways. I'll be referring to this method as the 'path formulation' of program models.

Why this structure?

Short answer: because it's simple and general, and bears a very close relation to the essential activity in high-level program construction.

The 'path formulation,' a particular way of creating program models, comes from asking the question, "what are we really doing when writing source code?" and finding the answer, "selecting and configuring abstractions provided by a programming language." In that case, writing source code is just one way of doing a more general and essential activity: selecting and configuring abstract 'language constructs.' Here's a short dialogue illustrating the point a bit more.

So, let's say your programming language provides an abstract language construct called 'function_declaration'; it is comprised of a few parts: a 'name,' a 'return_type,' and an 'argument_list.' A configuration for this construct would be a particular 'name,' 'return_type,' and 'argument_list.' What makes it 'abstract' is that none of these entities are tied to a representation. In other words, while it has become reflex to think of these sorts of constructs in terms of character sequences, here we intentionally leave open the question of how they should look.

Making Program Models using the Path Formulation

The idea behind the path formulation is: if you have a model of a programming language in the form of a graph, then individual program models are just paths through the graph.

Here's a high-level, three step recipe for making these language/program models:

(1) Represent your programming language 'abstractly,' using a formal grammar that has no lexical section (now called an 'abstract grammar'): its fundamental units are the fundamental units from your language in the abstract, rather than character sequences. Only consider which abstractions your language should include and the rules for composing them; worry about how to represent them visually elsewhere. For example, we can say that a 'class' is made up of a 'name,' a set of 'variables,' and a set of 'methods,' without any assumptions about how these things are going to look. (I talk about this in the first part, too.)

(2) Convert your abstract grammar into a graph, which will serve as the 'model' for your language. (I have some Java code that will do this for ANTLR grammars, btw, which I can clean up and share on Github if there's interest.)

(3) Represent individual programs as specific paths through the language graph.

So, if your grammar contains a subsection like this ('class_reference' is for inheritance, indicating a parent class):

: class+
: name
           : etc. etc. etc.

Your graph-based language model will have a subsection like this (black nodes are 'language constructs,' orange nodes are 'language atoms'):

And a particular program in your language, consisting of one simple class might look like this:

These paths could be represented by just listing the edges taken—though of course you have to number the edges:

Taking that approach, the model of our simple program looks like this:

 (2 2 0 1 3 0 0 3)

If we would like a little structure in our model representation, we can distinguish between 'language constructs' and 'language atoms' (black nodes and orange nodes), by using opening and closing parentheses to mark the start and end of language constructs.

(2 (2 0 1 (0... 1... 2... 3) 0 0 3))

Note: the ellipses are where we followed some hypothetical edges in 'function declaration.'

Language constructs are just composite abstractions, made up of more than one part; language atoms cannot be broken into smaller units. Actually, 'language constructs' and 'atoms' have a relationship that mirrors S-expressions in Lisp, so it's no surprise that the notation for program models resembles Lisp (here, however, we aren't tied to using this as the visual interface for the programmer).

Using Insight from the Language Model

A big advantage of the path formulation of program models is that it maintains a strong connection between elements of a user's program, and the programming language itself (since programs are paths in the language graph). This connection can be taken advantage of by programming tools to guide programmers in using the language.

As an example, let's say a programmer has just selected a 'class' construct (maybe by typing out the keyword 'class'—or in some new way); the program model now contains a node for that 'class' construct, and the editor, just by examining the 'class' node in the language model, knows all the legal options for proceeding, because each option corresponds to an edge going out of the 'class' node:

To make this as concrete as possible, I'll show one possible UI that takes advantage of this connection to the language graph. Keep in mind, though: you could still render it like a traditional text editor, using the UI contemporary IDEs use for 'auto complete' to display alternatives. It's like automatic auto complete for all aspects of the language—at the least, this would be tremendously useful to new users of a language.

In this hypothetical editor, the UI is split into two main sections: the top is our document, which is just a rendering of the program model; and the bottom contains controls for selecting and configuring language constructs.

Let's say we've just instantiated a class definition by supplying all the necessary parameters; it has the name 'InputHandler,' and our editor has 'collapsed' it, so we just see the name and type:

Since our editor is following along in the language graph, it knows we're back in the 'program' node, from which point we can begin specifying either a class or an interface. Let's say we select 'class.' Our editor now looks like this:

Notice the whole 'class' construct and the 'name' section have red borders; this is to indicate that 'class' hasn't been fully instantiated: it still has 'free' parameters that must be bound to something (in this case 'name' must be bound to something). Also notice that in the bottom section, the options appearing in the grid are just the neighbors of the 'class' node in the language graph. 

I imagine that in using a system like this, the cells in the bottom area would map to keys on your keyboard: this way you could accomplish a task like creating the skeleton of a new method declaration with a single keystroke. Something along these lines would also be much better than text editors for programming with virtual/augmented reality systems and mobile devices. Anyway, this UI is a just a quick sketch of one possible approach. The document region could also be something like this (in that video, I'm rendering the AST in manner identical to how I'm suggesting we render program models).


It's been a long time since we laid down the character sequence and parsing-based architecture of program authoring tools, and contemporary work on programming languages is deeply invested in that established approach. It seems like the program model approach could be an improvement—but who knows what lethal oversights might still be lurking. What's especially needed at this point is a concrete implementation. I'm working on it, slowly, in my free time—but my hope is that others will read the ideas here, and if the they prove to be generally interesting after all, expand and solidify them into serious tools that will improve the experience of programming for the upcoming years. If you'd like to hire me to work on something related, I can be contacted at 'westoncb[at google's mail service]' (or even if you just want to talk about it—though the comment section is probably best for that).



Identifiers etc.
Let's take a look at the abstract program graph one last time:

Notice that 'class reference' and 'name' are language atoms, but these things need structure of their own, so where does that come from? First I'll point out the reason we don't include identifiers in the language definition is that they are mnemonics for humans, not part of the abstract structure of a language; all the language needs is a unique identifier, so we just generate a random one. As for the mnemonic, maybe it should be a string, maybe something else—accordingly we leave it outside of the language spec. In the program model, we just attach the random ID (discussed earlier, too) which can be associated to some representation specified by the programmer (probably totally without their notice, by just typing in their editor as always). A program model with IDs might look like:

(2 (3 (#49843345) 0 0 (#95728745) 0  1 (... ... ... 3) 4))

There's an external file that maps these IDs to representations (often it's just a string) for editors to use.

The following section discusses how to handle the 'class reference' node and others like it.

Referencing the Program Model
There is an aspect of programming languages that isn't captured by the 'abstract grammar' that I've described so far. The abstract grammar only allows us to describe 'free' language constructs which, when supplied with specific parameters, are 'instantiated'; program models contain only instantiated language constructs. However, the 'abstract grammar' should describe the full capabilities of the language, and programming languages always contain mechanisms for referencing already instantiated language constructs: e.g., I have instantiated a 'function_declaration,' which had its 'name' parameter bound to the value 'testFunction'; other parts of my program should be able to reference this specific, instantiated 'function_declaration' by using it's 'name,' 'testFunction,' as a reference.

To be honest, I'm very curious to hear other people's ideas on how to go about doing this, though I do have an approach that seems like it would work well: extend the notation of our 'abstract grammar' (which is just some variation of BNF at the moment) to express 'queries' on program models: i.e. "select all the nodes from the program model of type 'class'." More concretely, let's say our 'class' construct is defined as follows:

: name

(The 'class_reference?' component is used to reference a parent class.)

'class_reference' would be defined in our grammar as follows (except using some appropriate notation, not English):

           : "select nodes on program model of type 'class' in same 'package'"

So, in order to instantiate a 'class_reference' the programmer would have to select a node from the program model that meets the criteria in the query. Ideally, the programmer's IDE would parse the query, run it on the program model, and offer up a selection of valid nodes. Present IDEs do this sort of thing of course, but including the necessary information in a unified, abstract language specification would be beneficial.


  1. Fascinating appealing ideas. I've programmed for decades (my original background was electronics engineering), but I'm "grokking" what you're saying here.

    I wonder if "blowing open" the notion of having to use text editors like vim and emacs to being able to express "programming" in more abstract levels would attract more smart people to programming.

    Lastly, it's interesting to see attempts a metaprogramming applied to create "domain specific language" extensions (Ruby pushes this hard) as a substitute for what you're advocating here.

    1. Thanks Scott! That's an interesting question whether operating at a higher level of abstraction would attract more smart people... Seems like there's a reasonable chance of that: there's too much of dealing with these little issues that aren't really related to the problem you're trying to solve, but instead arise from how you're trying to solve it... Not exactly the most enjoyable aspect of programming.

      I don't really see metaprogramming language extensions as an alternate solution. Extending a language adds to the set of available language constructs that programmers select from and configure when coding; what I'm describing here is largely a way of making it easier for tools to convey to programmers what the available constructs/configurations are at any given moment. I do think if we had tools like this that domain-specific language extensions would become more popular, because the burden of learning/remembering them would be lessened.

      Interesting related excerpt from an old interview with James Gosling (and Ritchie and Stroustrup—was on Hacker News today):

      "It just feels like there's so much territory out there that's beyond the bounds of ASCII text that's just line after line, roughly 80 characters wide, mostly 7-bit ASCII, something that you could type in on a teletype. That's still where programming basically is these days, and it's proven to be a very hard thing to get anything beyond that."

      I'm all for the merits of textual visual representations, but the way we interact with code by managing character sequences does indeed seem old-fashioned, the result of the historical origins of our tools, and unnecessary in the present day.

  2. Very interesting ideas. A large part of the reason that people like the textual representation is that it is incredibly information dense, and universally easy to read and display, so I think it would be hard to move away from that entirely.

    It would be cool to apply this as a sort of preprocessing step - generate real code to be parsed by a compiler, but utilize some kind of highly efficient graph editor. Due to the fact that the editor would fully understand the abstract representation of the language, it could offer very high level tools fairly easily.

    For example, meta-programming could be accomplished via a generative graph algorithm, without having any intermediate serialization steps. Querying the codebase would be very easy programatically, and developers could perform very powerful graph matching searches without the horrible pain of regex.

    A special purpose editor that could facilitate that kind of thing would be fantastic, but is no small endeavour and isn't going to become usable overnight. Many existing editors could benefit from some of these ideas through plugins and extensions. We humans have gotten pretty good at understanding and manipulating textual representations, so why not just augment that with amazing tooling that makes writing code happen language-construct by language-construct, rather than letter by letter.

    First off, you could take the text in the editor, compute its graph representation and find what point in the graph contained the cursor. From there, many very useful developer aids could be implemented. As a basic example, you expose corrolate graph traversals as set of keyboard commands that move the cursor between nodes. I.e. go to parent, cycle through siblings, cycle through descendants.

    Going on step further... Every time the cursor moves in the graph, recalculate the set of options that they have and display them as tab completion options (same idea as the bottom menu in your UI.) When the user tab completes something like an empty class, the editor could guide them through all of the parameters of that empty class via keyboard shortcuts. A very similar concept is referred to as "snippets" in Sublime Text land. This could extend to both generating code, and referencing code. When the user is at the point in the graph where a reference could be made, you could query the graph for all applicable references and offer them as tab completion options.

    I think it would be really interesting to see some of this functionality implemented, at least in a simple case. This option could effectively utilize the highly information dense and easy to transfer format of text, while still offering the tooling of the graph approach. This could significantly speed up the editing process by taking most of the formatting and text creating work out of the hands of the developer. Spelling errors and syntax mistakes could be reduced, while still allowing developers to do that text manipulations themselves if that is faster than the graph approach.

    Of course, this sort of behavior is highly dependent on the language being written. For example, some languages have type systems or scope conventions that would allow you to significantly reduce the number of tab completion options for references. Ideally, the plugin/extension itself would be as language agnostic as possible, relying on language grammars and semantics defined in an easily-modifyable way.

    1. Hey Corey—thanks for the comment! Just to clarify, my intention with the system I describe here is still to work with primarily text-based languages (it's just a different way of working with text that doesn't require parsing). Here's my project that inspired this, as an example of the kind of interface I'm talking about:

      It's nice to see some enthusiasm for what you call a 'graph editor.' I get the feeling there were some old attempts at these which didn't work out so well, and the majority of academics working on programming tools now are of the opinion that it's been tried and failed. My thinking is that the basic idea is still good, but implementations have been troublesome. Additionally, I think the the language representation scheme presented here is part of the solution to better implementations: it would make developing a graph editor way simpler. I've done a lot of work on building a graph editor in a more traditional fashion (I made something like a 'generator generator' that takes in an ANTLR grammar for some language and generates a graph like you describe, which can be walked to generate code), and I've seen some of the difficulties there, which mostly come down to going back and forth between text and models (language graph or AST), whereas this is a simple way of staying in the model realm the whole time except when it's requested that a program model be rendered (which then works like rendering the DOM tree in a web browser).

      I've found one person trying to implement something like what I describe in these articles: —dunno how serious they are about it, though. I may work on something related one day, but I'm cutting out extracurricular programming work for personal reasons for the near future :/

  3. The problem is that necessarily only the models that the humans habitually observe and maintain the simplicity of will remain simple. If you have another way of grabbing the code, working naturally with things that are simple from that perspective will introduce what appears like complexities from the textual view. Unless you continually return to the textual view and maintain its simplicity, the textual simplicity will be traded for simplicity in the other view.

    A slightly more concrete example: We're often viewing things graphically, so we add a control where you can change a color associated with each function (or whatever). The coloring is natural and useful in the graphical view-- but now if you view the program as text it has noisy notes everywhere about what colors things are.

    I think we should experiment with new ways of viewing programs, but I don't think we can have our cake and eat it too, I think that changing how we view programs also changes what those programs are and we have to abandon old views to find new ones.

  4. Hey Brett—thanks for reading!

    When you say "... I think that changing how we view programs also changes what those programs are and we have to abandon old views to find new ones." —that's definitely true of classical programming language architectures, but the reason for it is exactly what I'm attempting to address with this idea of 'program models'.

    The trick is to contain the meaning of programs inside 'program models' which are more abstract than particular visualizations of programs. When I say abstract there, there's a very specific thing I mean: whenever you have two representations of the same thing where one contains less information than the other, that one is more abstract. For example, let's say we have two models of a Person:

    Person(name, height, birthdate, weight, voice, eye_color, hometown)


    Person(name, height)

    —the second model is more abstract.

    So, what I'm proposing here is an abstract model of computer programs:

    ComputerProgram(types, semantics)

    When it comes time to visualize a particular program, we just make that model more concrete—we just add information:

    ComputerProgram(types, semantics, appearance)

    'appearance' is interchangeable; you can keep everything meaningful about how the program operates while replacing the appearance. Additionally, the structures of separate appearance models are totally independent; so, for your example where we add color to each function in one view, that information would disappear completely once you swapped in a text view.

    There's more information on this aspect in the first post (this is the second in a two part series):

    Lemme know if I can clarify more—I'm still trying to figure out the best way of explaining this stuff...