A stand-alone surface syntax definition for Quarto
Requirements
Our current reader is too slow, buggy, and hard to extend. If we commit to maintaining our own code path that translates Markdown to Pandoc’s AST, we get performance and we no longer have to worry about syntax incompatibility.
Specifically, Quarto adds shortcodes {{< like this >}} and code blocks
```{python}
print("like this")
```These need to be handled before Pandoc sees the input stream.
Current plan
We’re going to fork https://github.com/tree-sitter-grammars/tree-sitter-markdown/.
We’ll use this to create a standalone Rust binary that takes .qmd input and produces either Pandoc JSON or Pandoc native format.
Opportunities
Quarto relied in large part (and still does) on using Lua Pandoc filters for custom syntax. That limits the type of syntax that can be easily supported, and as a consequence our current syntax is inconsistent in a number of ways.
For example, code blocks have attributes declared at the start bracket:
```{#id .class key=value}
content
```
But equation blocks have attributes declared at the end:
$$
e = mc^2
$$ {#eq-special-relativity}
And, inline equations do not support attributes at all.
Similarly, inline executable code cells have the language attribute declared inside the ticks, code non-executable inline code elements have attributes declared outside the ticks. We could improve all of this.
Notes
2:04PM
$ pandoc -f markdown to native
## Hello {#id .class key=value}
^D
[ Header
2
( "id" , [ "class" ] , [ ( "key" , "value" ) ] )
[ Str "Notes" ]
]
Pandoc supports attributes at the end of ATX heading lines as the above example. This seems hard to do in the way that tree-sitter-markdown is structured, because the entire line starting with ## is sent to the inline grammar.