Project Mage: The Elevator Pitch

(Published: 17 January 2023)

Here I am going to explain why structural editing may be useful, try to give you an idea why it would matter for complex applications. If you want to see the designs and the details on the applications themselves, their features, then, please, proceed to the next article instead: The Power of Structure.

If we are to have efficient/usable/powerful textual interfaces, we must build a way to work with information structurally. Structure allows reuse, composition and efficient programmatic access. We need to be able to build highly-specialized editors that compose. And if we define a common textual interface specification that each editor implements, we can then get a highly-seamless experience of editing. Such editors may edit any kind of objects or structures, there are no limitations or assumptions. Contemporary text editors (such as Emacs) present a document to the world as a string, and that's a source of much grief and jankyness. In order to have specialized structural editors, it's important that we build a highly-interactive, highly-flexible foundation capable of graphical handling. These kind of editors will allow us to construct more efficient applications, also highly reusable and customizable, with features typically not even expected of their non-structural counterparts.

The project (within the limits of the campaign) aims to provide alternatives to, or a gateway to replace:

advanced text editors (e.g. Emacs),
specialized text editors (e.g. HEX editors, CSV editors etc.)
command lines (such as those of terminal emulators),
Sly/Slime (Common Lisp IDEs for Emacs),
computational notebooks (e.g. Jupyter),
note-taking applications (e.g. Roam Research, Org-mode)
version-control systems (e.g. Git, mercurial),
bug trackers (e.g. GitHub, just without the browser).

And not just replace: improve upon. Especially this will be evident for the Lisp IDE experience, for the computational notebooks and for knowledge-keeping.

A Brief Intro

First of all: forget all you know about your editor for a second, whatever you are using.

And let's just consider what you are missing instead.

And before we get into that, let's just quickly accept a few simple facts:

(a) information has structure, and
(b) structure tends to be compound.

Now, let's start with a minimal example:

"Hello, world!"

Well, does this sentence have a structure? Sure, you could imagine it being a list of words with punctuation:

("Hello" "," " " "world" "!")

Does this have further structure? Sure, the words could be grouped with punctuation like this:

(("Hello" ",") " " ("world" "!"))

Now, the way you would edit that is by building a tree of editors:

Hello , world !

In this example, the editor tree corresponds directly to the word tree, but it absolutely doesn't have to: any structure or object may be interfaced, and the fact that editors form a tree simply reflects the compound/embedded nature of most structures and how we interact them.

If you manage to implement certain behaviors for each of those editors, you can probably see that you can work with that structure the way you would in an ordinary editor, except now you also have direct access to the underlying objects that make it up.

Going Deeper

Hey, let's build a note-taker. Take a look at the following particular note:

(create-schema power-of-structure
  (:is-a kr-note)
  (:title "The Power of Structure")
  (:tags '("seamlessly-structural editing" "power"))
  (:introduction '("Welcome to Project Mage!"
                   "The purpose of this article is [...]"))
  (:table-of-contents nil)
  (:sections (create-schema))
  (:related-links nil)
  (:footnotes nil))

Don't read into it. Just skim it over with your eyes. It doesn't matter if you understand programming, or lisp, or anything.

This note is just an object, and its slots contain some information. One of those slots contains a title; another one holds a list of tags, etc.

Now, take a step back. Just have a look at that object again. And ask yourself: What if I could edit that?

What if this were presented to me as a textual document, which I could walk through and edit, with a freely-running point, just like in a regular editor?

Don't think of the mechanics. Think of the possibilities. Because you also now have programmatic access to the object, to the information, and you can change it however you like.

So, alright you make an editor for that. A note editor. But there's clearly some complex structure. In fact, most structures are compound: they contain identifiable structures within.

That means we can't get away with just one uniform editor for one simple structure. This means we need editors which also may be compound and may consist of other editors.

For example: every tag belongs to the list of tags, i.e. to ("seamlessly-structural editing" "power"). So, how many editors is that? Well, the topmost one in that arrangement is the list editor (because it's a list of tags). That list editor will contain within itself an editor for each of its tags. So, here, that would yield us three specialized editors.

That's right: each tag has its own editor dedicated just to itself. We can be creating all these editors on the fly, on demand.

Each tag has it's own editor?

Don't think of the performance (it will be good). Think of the possibilities. Think about embedding.

Let me give you an example for Lisp-editing.

;; This is a `Collatz function`.

(defun 3n+1 (n) ; Don't read into this either!
  (if (= n 1)
      1
      (3n+1 (if (oddp n)
                (+ (* 3 n) 1)
                (/ n 2)))))

Do you see that comment about the piece of code being a Collatz function?

Well, that whole comment could have its own little embedded editor. That editor could be that of Markdown. Or Org-mode. Or the kind of a kr-note object shown previously, just with a different structure.

Hey, want a computational notebook? Shove all that Collatz code into a section in your notes:

(create-schema a-section-within-a-note
  (:is-a kr-section)
  (:introduction "Here's an example of a lisp function:") 
  (:code-block
   (create-schema
     (:code
      (list "This is a `collatz function`."
            (create-schema
              (:is-a vertical-space)
              (:size 1))
            (convert-to-lisp-code '(defun 3n+1 (n) #|...|#))))
     (:execution-result
      (create-schema
        (:is-a execution-result)
        #|...|#)))))

Your note has a code block now. That code block contains the code object and the execution result object.

Do you see how easy this is? This is what reuse looks like. Your note-taker now has code-blocks.

How will we work with those code blocks?

We will just reuse the Lisp IDE we wrote 30 seconds ago. It is also just an editor.

So, that's right, in your note-keeping application, you have just shoved in a code-block, which will be lensed via a Lisp IDE. It won't be some fake-IDE, it won't be some trick, it will be an actual IDE taking care of your code within your textual document.

Structural Editing

Lensed? Well, I call this lensing.

Lenses: editors that embed and interoperate with each other.

Lenses are objects which are completely responsible for what they draw, for the input handling, for dealing with the structures they hold or represent or reference. They are editors that live within and interoperate with each other.

The best part? Everything forms a cohesive whole. Why? Because you could define a specification which would require lenses to implement things like:

Navigation. For instance: move the cursor to the next semantically-significant object (like a word).
Modification. Example: remove a range represented by a certain area on the screen.
Querying. Example: search for something within your objects/structures.

If you implement enough of these, and if you think through your UI, then you can have a really comfortable editing workflow (aka editing with a cursor, like in any ordinary editor), to a point where the structural cohesion will seem seamless.

For instance, you could make a selection starting in the middle of your comment and end that selection in the middle of the code, and then press the removal keybinding, and this will remove just what's necessary (i.e. just leaving the comment and the code objects seperate, in this case). Each respective lens will take care of what it's responsible for, and for the lenses it contains.

Of course, a lot of this depends on how well you can design UIs and workflows. How hard it is to do exactly depends on the structure and the application. But for many-many practical applications, it should be absolutely doable. If you know how something should behave, the system will be powerful enough to allow you to program such behavior.

And look: this kind of editing isn't limited to some special objects. You can teach a lens to edit anything: a data structure, some container, some database, whatever. You can edit any sort of objects, in the general sense.

And those objects can have complex behaviors, because they can keep state. You could have a table with many layers. You could attach metadata to objects, like tag a specific function in your code with some information. That information doesn't even have to be a part of your editing workflow. And those objects will have unique identity.

See, the problem with traditional string-based editing is that you can't treat anything like an object or a specialized structure. If you can't do that, then you don't have any power whatsoever. You can't discriminate between things. You can't persistently identify unique objects. And you have barely any control over how you can enforce structure. If you don't have those powers, then you end up with the inefficiencies of interaction and performance. Everything simply becomes hard if you don't have the powers of object identification and structural enforcement. And the string-based editors don't. They assume that a general data structure that fits all purposes is fine. It's not.

String-based editors never accept the simple fact of reality that information has structure. Instead, the tooling has to take on that responsibility, post-factum.

If you are thinking whether structural editing imposes particular limitations with respect to string-based editing, there's a pretty simple point to be made: structural editing subsumes string-based editing. You can just write a specialized editor for general strings. And it would even be useful: you could use those for things you don't know how to structure yet. Or for very small things. And yet, even those string-based editors can often be specialized further, as some semantic units like words, expressions and even characters are often immediately apparent, as is their behavior with respect to the normally-expected text-editing operations.

Most complexity of the modern text editors comes from the fact of them having to deal the with enormous amounts of structure. They simply can't do it, they weren't designed for it, it's not in their DNA.

So, we need a different breed of editors. The kind that can specialize, that can embed, that can interoperate. The kind over which we can define a common interface for seamless textual interaction, for comfort.

These lenses, these editors, they will be completely responsible for their graphic display, their size, the motion of the cursor within them, everything. Yes, even graphical display. One interesting example I can give you is this: imagine you have a lens for editing file-system paths. If your path leads to an image, you could then, at runtime, inherit your lens from an image editor, yielding an image editor within your otherwise textual document (without breaking the editing workflow!). Or you could display matrices or tables in a custom manner, etc.

It is all too often presumed that structural editing is just about editing. But it's really about structure. And the capability to operate directly on structure is instrumental to your ability to wield power.

It's also often times presumed that structural editing has to be clunky. Absolutely not. It's all up to your design. And the system simply has to provide the flexibility to implement that design. That design then comes down to your ability to answer one simple question: What do I want here to happen? And for most practical purposes, such answers are often immediately clear.

What Is This All About?

The applications.

I want to build

a Common Lisp IDE,
a knowledge-representation platform (flexible enough for any kind of note-taking or computational notebook functionality),
a knowledge tracker for software project-management that builds upon the knowledge platform. This will have decentralized version control. Version control will be reused with the computational notebooks.

You might be wondering: why did the previous structural editor efforts didn't do this?

Well, maybe they didn't have enough scope. Maybe they concentrated too much on some language (especially the kind with very complex syntax). Maybe no one wanted to write a good application in them. Maybe they would have taken too much time. Maybe they didn't have a flexible enough foundation. Maybe making for a truly seamless experience requires special design and is a bit harder than writing just a basic structural editor.

But it doesn't matter. I know and I can see that this can be done, and that the result will be very much worth it.

I wish I had a demo to show you, so that all of this wouldn't look like just some castle in the sky. Currently, there's only some Code, only a start.

At last, there are two things that have to be done to achieve what's described here:

we have to build a flexible foundation, a GUI toolkit concentrating on tree-manipulation, and,
define a specification for the embedded editors, for them to provide enough functionality to allow seamless editing.

I have some ideas for the specification, but it will have to be written and instructed by the design of actual, practical editors. For the designs of the GUI toolkit, for the information about the applications, for a more involved discussion about this way of editing, please, see the next article.

FAQ

Doesn't Tree-Sitter/LSP do this for us?

The simple response is: tree-sitter is a fixture upon a string. It's not structural, its power and integration capability is too limited. For a better explanation, see this.

A universal data structure or format for storage and editing?

Nope, nope, nope. The objects that you edit can be absolutely anything, they may be edited/saved however you like, and you can build any interface to them. (The lenses/editors, on the other hand, indeed form a dynamic tree-like structure of prototype objects. But that doesn't mean your data has to be tree-like: lenses are just objects manipulating some other objects (in the general sense of the word), whatever they are.)

Performance?

As a rule, it will be better. I have seen a concern whether representing, say, words as objects could have an impact on the memory footprint or performance. You don't have to represent them as objects! In a structural editor you choose how you want to store something, you can granularize your data, you can dynamically decide what the suitable structure is for any such grain – just as long as you have an editor for that grain. So, the performance and memory footprint may be optimized for, dynamically if need be, it's all up to the programmer.

Do I really need all this power?

Some have also wondered if they really need such power, whether they should care, because they have been getting by. Well, it's really simple: the people who write the extensions and plugins DO care. That, of course, reflects directly onto the casual users.