Posts tagged Author: disconcision

Quasiquotation Part 6: Quasiquotation in the context of Racket/Match

:: racket

By: disconcision

A brief introduction to match:

1
2
(match <target>
    [<pattern> <template>] ...)

An individual match clause consists of a pattern and a template. The pattern is ‘matched’ against the template, either comparing pattern sub-constituents literal-to-literal, or binding literals to ‘wildcard’ variables. The template is just regular racket code. If we use quasiquote/unquote in our template, it’s going to work exactly as I’ve described in the previous part. One common use for pattern matching in Racket is term rewriting and reduction; that is, parsing and evaluating code in domain-specific languages which we ourselves create. This very often involves ‘recognizing’ some pattern of code, and transforming it into another according to a template.

For code rewriting, we want to do two things: destructure nested lists representing our code according to patterns we specify, and then restructure those pieces according to templates we specify. Quasiquotation as described above provides a rich way to restructure; wouldn’t it be nice if we could destucture our code in a way that is syntactically symmetric? That is, instead of imperatively specifying how to take apart and put together expressions using list operations, we descriptively illustrate the transformations themselves.

Using Quasiquotation to Destructure:

Say we wanted to rewrite an if form into an equivalent cond. If we already captured the three components of our original if as variables a, b, and c, we could use quasiquotation to restructure our new cond using `(cond [,a ,c] [else ,b]).

So what we’d like to write is something like this:

1
2
(match '(if #t 1 2)'
[`(if ,a ,b ,c) `(cond [,a ,c] [else ,b])])

While it’s not hard to intuit what’s going on in the left-hand side, let’s be totally explicit about what’s going on. Remember, ‘destructuring’ is something new here; just because we want to do it using the same syntax as regular quasiquotation doesn’t mean the semantics is going to be the same, although as we’ll see it’s very close.

First, we should be explicit about the way the match form treats regular, non-quoted data. Remember that in regular racket code, a symbol (say, a) appearing outside a quote-type form and outside a binding form is treated as a(n implicit) variable reference form (var-ref a). In the left-hand side of a match template, on the other hand, a ‘bare symbol’ is treated as an in-line variable binding form, (var-bind a). Remember that var-ref and var-bind are my shorthands here; they won’t work within Racket. However, if you want to be explicit about in-line binding forms in your patterns, you can use the var form. That is, both these matches are equivalent:

1
2
(match '1 [a "yes"])
(match '1 [(var a) "yes"])

So just like how in the right-hand-side (template) of a match clause, being at quasiquote level 0 means that ‘bare symbols’ are considered variable reference forms, on the left-hand-side (pattern) of a match clause, being at quasiquote level 0 means bare symbols are considered variable binding forms. Similar, at non-zero quasiquote levels, symbols are in both cases considered as symbol-literals, and lists are in both cases considered as list-literals:

1
2
3
4
(match '(3 4) [`(a ,a) `(a ,a)])
> match error
(match '(a 4) [`(a ,a) `(a ,a)])
> '(a 4)'

In the first case, the first a is at quasiquote level 1, so it is interpreted as the symbol-literal ’a. This is not the same as the literal 3, so the match fails. The second a, contained in the unquote, is at quasiquote level 0, so it would be interpreted as a binding form, if it hadn’t been so rudely interrupted by the previous mismatch. The second example finally gives it its chance to shine. On the template side, the first a is at quasiquote level 1, and so is treated as a literal; the second is at level 0 and so is treated as variable reference form.

This example is nearly fully illustrative; aside from the fact that the pattern matches and binds (destructures) and the template constructs and references (restructures), the semantics of quasiquote are almost fully symmetric, in a similar sense as the symmetry as between var-bind and var-ref.

Levels of quasiquotation within match

The only real difference is that pattern-side quasiquotation doesn’t care about higher levels of quasiquote. That is, we don’t need to (and in fact, can’t) use multiple unquotes to ’escape from’ multiple quasiquotations. An unquote always resets the quasiquotation level to 0:

1
2
3
(match '(1 `(,2))
    [`(,a `(,,b)) `(,a ,b)])
> match syntax error

I’m not completely cetain why this behavior was chosen, but it seems likely because there’s no obvious use case for multiple levels of nesting in a pattern. Since the pattern itself is not returned, after matching succeeds or fails, all that matters is the bindings that were made. Here’s my (failed) attempt to find a use for multi-level quasiquotation in patterns:

1
2
3
(match '(1 `(,2))
    [`(,a `(,b)) b])
> ',2

What if we wanted to rewrite the above to extract just the 2, without the leading unquote? Why can’t we just:

1
2
3
(match '(1 `(,2))
    [`(,a `(,,b)) b])
> error

But even if multiple levels of quasiquote were implemented, my attempted solution still doesn’t make sense. Note that in the match’s target, the unquote represents a symbol-literal, whereas in the pattern template, both unquotes represent unquote forms. It just ain’t gonna match!

More to the point, we can already accomplish what we want without modifying match, we just need to be slightly clever:

1
2
3
(match '(1 `(,2))
    [`(,a `(,(list 'unquote b))) b])
> 2

And with that, we’re done! If you’ve made it this far, you now know just about everything you need to implement quasiquotation yourself. All that stands in your way is the Fear of Macros.

Quasiquotation Part 5: Quasiquote!

:: racket

By: disconcision

A Review: Parsing Quote

First, let’s review how to (mentally) parse code when there are (simple) quote forms in the wild.

When encountering a symbol: Let’s say you see the symbol “a”. Are you inside a quote form? Then you’re looking at (quote a), a symbol literal. Otherwise, are you in a binding form like define? If so, you’re at a binding site for the identifier a. Otherwise, you’re looking at a(n implicit) variable reference form, (var-ref a).

For lists: Let’s say you see the list "(a b c)". Are you inside a quote form? If so, you’re looking at (list ’a ’b ’c), a list literal. Otherwise, does the expression denote a special syntactic form (like if)? If not, then you’re looking at a function application.

Introducing Quasiquote

Now, we introduce a new form, quasiquote, sugared as "`". The first and most important thing to note about quasiquote is that, if a quasiquote form doesn’t contain any unquotes, it behaves exactly like quote:

1
2
(quasiquote (if a b c))
> '(if a b c)
1
2
`(if a b c)
> '(if a b c)

If we use an unquote within quasiquote, the effect is precisely what we tried to achieve before: unquoted portions within a quasiquotation are evaluated normally. BUT within a quote form, both unquote AND quasiquote are just treated as symbol literals, and have no effect. Note that this is not a special case for quote; quasiquote and unquote are just treated like everything else!

1
2
3
(define a #t)
`(if ,a ,a ,a)
> (if #t #t #t)
1
2
'(if ,a ,a ,a)
> '(if ,a ,a ,a)
1
2
'`(if ,a ,a ,a)
> '`(if ,a ,a ,a)

There is one additional complexity with quasiquote, which you probably don’t have to worry for a while, but is worth thinking about for the sake of completeness: What happens if we use a quasiquote inside a quasiquote?

1
2
3
(define a 'b)
`(if `(if ,a #t #f) 1 2)
> ????

There are two ways we might think about implementing this. The first (NOT THE WAY IT ACTUALLY WORKS IN RACKET) is simply to ignore nested quasiquotes:

1
2
3
(define a 'b)
`(if `(if ,a #t #f) 1 2)
> '(if `(if b #t #f) 1 2)

After all, why would we want to deal with nested levels of quasiquotation? What would it actually even mean? However, this is NOT the implementation used in scheme/racket. The reason (I think??) is that we want to be able use quasiquotation for macros - for rewriting code - and what’s even better than rewriting code? Rewriting code that rewrites code! That is, we want the ability to use quasiquoted templates which, when evaluated, themselves return quasiquoted templates.

Thus, we have the notion of levels of quasiquotation; a parameter we must keep track of when parsing, expressed as a non-negative integer. In the example above, the unquote containing “a” is under two levels of quasiquotation. Therefore it is not evaluated:

1
2
3
(define a 'b)
`(if `(if ,a #t #f) 1 2)
> '(if (if ,a #t #f) 1 2)

If instead we unquoted a twice, we’d get the following:

1
2
3
(define a 'b)
`(if `(if ,,a #t #f) 1 2)
> '(if `(if ,b #t #f) 1 2)

Note that we’ve preserved both the nested quasiquote AND one level of unquote around “b”. So what we’re left with could itself be used as a template for further processing.

When you’re parsing code in your head, think about it like this: Regular code (not inside a quote or quasiquote form) is at quasiquote level 0. This means it is evaluated like…. wait for it… regular code.

If we’re reading code from the outside-in, every time we encounter a quasiquote, we increment the quasiquote level by 1, and every time we encounter an unquote, we decrement by one. If the quasiquote level is greater than 0, we interpret what we see (including symbols, lists, quotes, and quasiquotes) as literals. If the level reaches 0, we evaluate normally. If it goes under 0, like if we found an unquote without any surrounding quasiquote, we get an error.

That’s basically the full picture of how Racket treats quasiquotation! But like I said, for use at a single level of ’meta’ you only need to worry about quasiquote levels 0 (business as usual) and 1 (literalize until unquoted).

(Side note: There is an additional form of unquote called unquote-splicing (sugared as ,@), which is used to ’splice’ a variable representing a list into a parent list. You can read more about it here.)

Next up: The stunning conclusion: Quasiquotation Part 6: Quasiquotation in the context of Racket/Match

Quasiquotation Part 4: Unquote?

:: racket

By: disconcision

The ability to represent code literally, as data, is all well-and-good, but what if we want to actually do something with it? What if we want to build up code from smaller parts, or break it down into subcomponents?

Technically, we already have everything we need in Racket’s basic list-handling functions. We can use list-reference functions like first and rest to extract components, and list-builder functions like list, list*, and append to build up composites. For deeply nested lists though this can start to feel obscure and indirect.

To be concrete, let’s consider the case of code templates. For example, the if form is an expression which contains three other expressions, denoted below by lower-case letters:

1
'(if a b c)

Let’s say we had values for these three component expressions and wanted to use the abstract quoted form gven above as a template to build a concrete if statement:

1
2
(define bindings '((a 1) (b 2) (c 3))')
(define if-template '(if a b c))

Now, it wouldn’t be too hard to map or iterate through our template, replacing each instance of a particular symbol literal with the corresponding value given by the bindings. But notice that what we’re really doing is just manual variable substitution, something the compiler already knows how to do! Plus, if we started picking it apart by hand, the resulting code wouldn’t be directly portable to new templates. Is there a more general approach here?

We’d need some way of telling the compiler to take some part of a quoted form ‘less literally’; that is, take it as a variable reference (or function application) instead of a symbol- or list-literal. Let’s call this new form unquote, which has the sugared representation , (comma).

What we want:

1
2
3
4
5
6
(define a 1)
(define b 2)
(define c 3)
(define if-template '(if ,a ,b ,c))
if-template
> '(if 1 2 3)

Now, if you try this, you’ll see it doesn’t work. Why didn’t the language designers choose to make unquote work this way? First, remember that quote is a very core part of the language. Changing it for this kind of usage might have unintended consequences elsewhere. But also note that it limits our expressiveness. We’ve added a new form - unquote - to our language; that is, Racket code has a new bound symbol. But code is data; if unquote is code, we’d like to be able to represent code containing ’unquote as a symbol-literal. But if every time we encounter an unquote we jump out of the quote form, this just doesn’t work; a quoted form will never evaluate to a form containing an unquote. That is, there is nothing we can do to produce the following as output:

1
> '(if (unquote a) (unquote b) (unquote c))

or more tersely:

1
> '(if ,a ,b ,c)

The unquoted segments would be evaluated normally, producing either a value or an error. So we’ll have to something just slightly more complicated…

Next up: Quasiquotation Part 5: Quasiquote!

Quasiquotation Part 3: Implicit Operations in a Lispy Context

:: racket

By: disconcision

In standard Racket code, symbols are most typically employed as identifiers. The first use of an identifier is to bind a value as a variable:

1
2
3
(define a 1)
a
> 1

Note that, like variable binding, variable reference is itself a syntatic form. It is used so frequently in most languages that it is almost always implicit, but if we wanted to be explicit about what we mean when we just state a variable we might instead write:

1
2
3
(define a 1)
(var-ref a)
> 1

Note that var-ref is not actually a form in Racket; it’s just my own shorthand. So now, in context, the atomic case of quote is what allows us to refer to a symbol without it being implictly interpreted as a variable reference:

1
2
(quote a)
> 'a

Note again that although it looks like the quote form is being transformed into something, this is purely do to the fact that the Racket pretty-printer uses a syntactic sugar for quote. It’s really just giving what we put in back to us. We can confirm this by using the sugar ourselves:

1
2
'a
> 'a

Or by checking directly:

1
2
(equal? 'a (quote a))
> #t

And a quoted symbol is a literal. Specifically, a symbol-literal. At the risk of repetition: Quotation literalizes its input.

The non-atomic case for quote might seem a shorthand to express list-literals. And this is certainly a worthy feature! But there is another implicit operation linking here; our humble friend, function application. Consider that there are two central uses for lists in Racket. First, there are lists of data, constructed using the list constructor:

1
2
(list 1 2 3)
> '(1 2 3)

Let’s think carefully about why the quote is necessary. Why can’t (1 2 3) itself represent a simple list of three literal elements?

1
2
(define a-list (1 2 3))
> error

In Racket source files, bare unquoted lists are interpreted as follows: First, a bare list is checked against a list of the base syntactic forms of the language; define, if, etc. If the bare list doesn’t match any of those forms, it is expanded into a function application:

1
(define a-list (app 1 2 3))

And 1 isn’t a function, so you get an error.

To be clear, I’m not arguing that having these implicit operations for commonly used forms is in any way a bad thing. It’s a virtual necessity for a readable language. The default context for understanding the meaning of code, the semantics, is provided by the process of evaluation. It makes perfect sense, then, that we’d syntactically streamline the most frequently used tools of evaluation: variable reference and function application.

But it’s also important that we remember that, in an evaluative context, these forms are there, lurking behind the scene. Thus the quote form is necessary in order to be able to refer to source code, or really any list-structured data, in-and-of-itself, in a form which closely tracks with its appearance in the evaluative context.

In other words, the cost of implicit evaluation forms is explicit literalization forms for symbols and lists, and the quote form is how we cheat our way out of paying this cost.

We want to be able to write:

1
2
(define (f x)
    (add1 (add1 x)))

instead of:

1
2
(define (f x)
    (app add1 (app add1 (var-ref x))))

but that means we have to literalize the code as:

1
2
3
4
5
(list
    (list (lit define) (lit x))
    (list
        (list add1)
        (list (lit add1) (lit x))))

unless we use quote as a recursive shorthand for both symbol and list literals:

1
(quote (define (f x) (add1 (add1 x))))

The lesson here is that while implicit operations enhance readability, which operations we want to make implicit depends on context. Quote allows us to open windows into a mirror world, embedding literal interludes into an evaluative context. Negotiating between the two requires that we are canny mental parsers, carefully considering the context to determine the meaning of a given symbol.

Next time we’ll look at expanding our notion of quotation to abstract source code through templates.

Next up: Quasiquotation Part 4: Unquote?

Quasiquotation Part 2: An interlude on implicit operations

:: racket

By: disconcision

You see implicit operations all the time in math notation, so often that it’s easy to forget they’re there. In fact, the fact that we can (usually) forget them is kind of the point. Encapsulation, abstraction, information hiding, etc. etc. Consider the following inoffensive term (in standard mathematical notation):

f(10x)

In words, this denotes the application of some function f to ten times the value of a variable x. Straightforward, but I assert that in this term, white space - the space between the symbols - is overloaded, not once, but thrice, to represent three different implicit operations:

  1. The gap between the f and the first parenthesis represents function application (let’s call this app). Although this is perhaps the single most frequently used operation in mathematics, its simple existence is hardly even remarked on; at least outside of a programming languages class.

  2. The gap between the 1 and the 0 represents an operation which I’ll call dec, the contructor for decimal notation. This is a left-associative binary operation which means “multiply the left argument (digit) by 10 and add it to the value of the right argument”.

  3. Finally, the gap between the 0 and the x represents regular multiplication.

Even though we rarely talk about these operations, especially the first two, they are essential for understanding the semantics behind the syntax. If we fully expanded the above notation, this time using Racket prefix-order s-expression syntax, we’d get the following ‘desugared form’:

1
(app f (multiply (dec 1 0) x))

Ugly and inconvenient, but at least it’s clear exactly what we mean.

Next up: Quasiquotation Part 3: Implicit Operations in a Lispy Context

Quasiquotation Part 1: Quote

:: racket

By: disconcision

Before there was quasiquote, there was quote. Quote is a fundamental part of every lispy language, including Racket. Quote allows us to concisely represent s-expression-based data; in particular, source code. Specifically, it allows us to literalize source code by contextualizing what we mean when we use symbols and lists.

Our goal is to be able to represent source code within source code. The simplest way to represent a thing is just to present the thing itself. Unfortunately, this representation is already taken! Say we simply provide a piece of code:

1
(add1 (add1 1))

The standard semantics of stuff in a source file is to interpret that stuff as something to be executed or evaluated. In other words, the representation above typically carries with it the implicit meaning of “do this thing”; source code is usually read in the context of evaluation. By contrast, if we wish to inhabit a more literal context - to talk about the code syntax in-itself - the most traditional approach is to consider the source as a string; in other words, to “quote” it:

1
"(add1 (add1 1))"

Although almost all extant languages ultimately use strings of characters for their stored representations, this way of literalizing source code is pretty bare-bones. Especially in the context of lispy languages, where programs explictly tree-structured via nested lists. In Racket, lists are most prosaically represented via the list constructor. Let’s rewrite the above string representation of the source as a list-constructor based representation:

1
2
(list "add1"
    (list "add1" "1"))

This is… not much of an improvement, to say the least. It is far more verbose, and it still requires string representations for the atomic elements. In the case of “1”, which is a literal (bare value), it’s safe just to drop the quotes. Since literals (including 1) are distinguished by the fact that they evaluate to themselves, we are reasonably safe equivocating between their evaluative and literal representations.

1
2
> (list "add1"
    (list "add1" 1))

But we can’t just write add1 without the quotes; Racket typically treat ‘bare’ symbols as variable references. Here, add1 is an identifier representing a unary function, and in the above literal representation we just want to represent a reference to that function, not have it expanded in-line. So at minimum, we need a way to distinguish symbols-as-variable-references from symbols-as literals. To this effect, we introduce a syntactic form called quote:

1
2

Now, at this point it’s at least clear what we mean, though at the cost of almost completely sacrificing readability. Let’s try to recover it! First, we introduce the following abbreviation: ’x, which expands to (quote x):

1
2
(list 'add1
    (list 'add1 1))

But we’ve still got those pesky list constructors. I mean, ideally, we’d just want to write:

1
'(add1 (add1 1))

But what exactly would that mean? We need to decide what it means to quote a list; that is, to make a list-literal. Luckily, it’s simple to define quote in a way that makes this work. Quoting a list will imply two things: First, a bracketed structure under a quote is in fact a list: it expands to a list constructor. Second, the elements of that list are themselves quoted. For sublists, this is applied recursively. For symbols and literals, we’ve already shown what this means above. To be explicit, the above expression expands as follows:

1
2
3
4
5
'(add1 (add1 1))
(quote (add1 (add1 1)))
(list (quote add1) (quote (add1 1)))
(list (quote add1) (list (quote add1) (quote 1)))
(list (quote add1) (list (quote add1) 1)

And that’s really all there is to quote. Now it likely seems like we’ve gone a long way to get almost nothing at all. We can summarize what’s gone before as “if you want to refer to the code itself instead of its result under evaluation, stick a ’ in front of it”. This is almost a fair assessment, but let’s take a moment to carefully consider what we’ve gained.

Next up: Quasiquotation Part 2: An interlude on implicit operations

Using Quasiquotation and Pattern Matching for Syntax Rewriting

:: racket

By: disconcision

This is a miniseries of six entries introducing quotation and pattern matching as mechanisms to represent and transform syntax. Some elementary Racket/Scheme experience is assumed. The focus is on quasiquotation; part 6 will assume you have a vague grasp of what pattern matching is, although I might fill in this gap later through a tutorial on implementing pattern matching.

  1. Quasiquotation Part 1: Quote
  2. Quasiquotation Part 2: An interlude on implicit operations
  3. Quasiquotation Part 3: Implicit Operations in a Lispy Context
  4. Quasiquotation Part 4: Unquote?
  5. Quasiquotation Part 5: Quasiquote!
  6. Quasiquotation Part 6: Quasiquotation in the context of Racket/Match