包详细信息

ast-generator

nvie4.3k0.6.0

Helper to generate a TypeScript or JavaScript module for an arbitrary AST definition from a specification.

ast, abstract, syntax, tree

自述文件

npm Build Status

AST generator is a command line tool that will help you generate TypeScript code for arbitrary ASTs

flowchart LR
    A["ast.grammar"] --> B["Run generate-ast"]
    B --> C["generated-ast.ts"]

    A@{ shape: doc }
    B@{ shape: proc }
    C@{ shape: doc }

It’s recommended to create the following standard file structure:

mylang/
  ast.grammar        // The input grammar
  generated-ast.ts   // The generated TypeScript module
  index.ts           // You define semantics here

Example grammar

Let’s define an example AST for a simple drawing program.

The following grammar definition (in a file called ast.grammar) describes three nodes (Document, Circle, Rect), and one union (Shape), with various properties.

// In ast.grammar

Document {
  version?: 1 | 2
  shapes: Shape*
}

Shape =
  | Circle
  | Rect

Circle {
  cx: number
  cy: number
  r: number
}

Rect {
  x: number
  y: number
  width: number
  height: number
}

What will be generated?

This definition will generate a TypeScript module with the following things in it.

Types for nodes and unions

export type Node = Document | Shape | Circle

export type Document = {
  type: "Document"
  version: 1 | 2 | null
  shapes: Shape[]
}

export type Shape = Circle | Rect

export type Circle = {
  type: "Circle"
  cx: number
  cy: number
  r: number
}

export type Rect = {
  type: "Rect"
  x: number
  y: number
  width: number
  height: number
}

Constructors for nodes

Each node will get a lowercased function to construct the associated node type.

export function document(version: 1 | 2 | null, shapes: Shape[]): Document {}
export function circle(cx: number, cy: number, r: number): Circle {}
export function rect(x: number, y: number, width: number, height: number): Rect {}

[!NOTE]
Note that there is no constructor for a "shape". A shape is either a circle or a rect.

Predicates for nodes and unions

// Predicates for all nodes
export function isDocument(value: unknown): value is Document {}
export function isCircle(value: unknown): value is Circle {}
export function isRect(value: unknown): value is Rect {}

// Predicates for all unions
export function isNode(value: unknown): value is Node {}
export function isShape(value: unknown): value is Shape {}

Usage

This definition will generate a TypeScript module you can use as follows in your index.ts:

import type { Document, Shape, Rect, Circle } from "./generated-ast"
import { document, rect, circle } from "./generated-ast"
import { isShape } from "./generated-ast"

Another way to import is using a * as import.

import * as G from "./generated-ast"

A full example:

import * as G from "./generated-ast"

const mydoc = G.document(1, [
  G.circle(10, 10, 5),
  G.rect(0, 0, 10, 10),
  G.circle(20, 20, 10),
])

console.log(mydoc.shapes[0].type) // "Circle"
console.log(mydoc.shapes[0].cx) // 10

console.log(G.isShape(mydoc)) // false
console.log(G.isShape(mydoc.shapes[0])) // true

Settings

To change the default discriminator field on all nodes:

// In ast.grammar
settings {
  discriminator = "_kind"
}

This would produce the node types as:

export type Document = {
  _kind: "Document" // 👈
  version: 1 | 2 | null
  shapes: Shape[]
}

export type Circle = {
  _kind: "Circle" // 👈
  cx: number
  cy: number
  r: number
}

You can use the following settings to configure the generated output:

Setting Default Value Description
output "generated-ast.ts" Where to write the generated output to (relative to the grammar file)
discriminator "type" The discriminating field added to every node to identify its node type

Assigning semantic meaning to nodes

An abstract syntax tree represents something you want to give meaning to. To do this, you can define custom properties and methods that will be available on every node.

For example:

// In ast.grammar

semantic property area
semantic method prettify()
semantic method check()

Document {
  version?: 1 | 2
  shapes: Shape*
}

// etc

[!NOTE]
Don't forget to re-run the code generator after changing the grammar.

After this, it will be as if every node (Document, Circle, Rect) have a area property and prettify and check methods.

But what does the area property return? And what do prettify or check do? That’s completely up to you!

Defining a semantic property

In your index.ts, let’s define the area property:

// index.ts
import * as G from "./generated-ast"

declare module "./generated-ast" {
  interface Semantics {
    area: number // 1️⃣
  }
}

// 2️⃣
G.defineProperty("area", {
  Circle: (node) => Math.PI * node.r * node.r,
  Rect: (node) => node.width * node.height,
})

const mydoc = G.document(1, [
  G.circle(10, 10, 5),
  G.rect(0, 0, 10, 10),
  G.circle(20, 20, 10),
])

console.log(mydoc.shapes[0].area) // 78.54
console.log(mydoc.shapes[1].area) // 100
console.log(mydoc.area) // Error: Semantic property 'area' is only partially defined and missing definition for 'Document'

Step 1️⃣ is to augment the Semantics interface. This will make TypeScript understand that every node in the AST will have an area property that will be a number.

Step 2️⃣ is to define how the area property should be computed for each specified node type. The return types will have to match the type you specified in the Semantics augmentation.

Note that in this case, we defined the property partially. An area is not defined on the Document node type. This is a choice. If it makes sense, we could also choose to implement it there, for example, by summing the areas of all the shapes inside it.

Defining a semantic methods

// index.ts
import * as G from "./generated-ast"

declare module "./generated-ast" {
  interface Semantics {
    area: number

    // 1️⃣ Add these
    prettify(): string
    check(): void
  }
}

// 2️⃣
G.defineMethod("prettify", {
  Node: (node) => JSON.stringify(node, null, 2),
})

// 2️⃣
G.defineMethod("check", {
  Circle: (node) => {
    if (node.r < 0) {
      throw new Error("Radius must be positive")
    }
  },

  Rect: (node) => {
    if (node.width < 0 || node.height < 0) {
      throw new Error("Width and height must be positive")
    }
  },
})

const mydoc = G.document(1, [
  G.circle(10, 10, 5),
  G.rect(0, 0, 10, 10),
  G.circle(20, 20, 10),
])

console.log(mydoc.shapes[0].area) // 78.54
console.log(mydoc.shapes[1].area) // 100
console.log(mydoc.area) // Error: Semantic property 'area' is only partially defined and missing definition for 'Document'

Should I use a property or method?

It depends what you want. Both are lazily evaluated, but properties will be evaluated at most once for each node, and be cached. Methods will be re-evaluated every time you call them.

To clarify the difference, suppose you add a randomProp property and a randomMethod, both with the same implementation.

G.defineMethod("random", {
  Node: (node) => Math.random(),
})

mynode.random() // 0.168729
mynode.random() // 0.782916

Versus:

G.defineProperty("random", {
  Node: (node) => Math.random(),
})

mynode.random // 0.437826
mynode.random // 0.437826 (cached!)

Cross-calling

Both methods and properties can use other semantic properties or methods in their definitions, which makes them very powerful. As long as there is no infinite loop, you’re free to write them however.

For example, in the definition of check, we could choose to rely on the area property:

G.defineMethod("check", {
  Circle: (node) => {
    if (node.area < 0) {
      throw new Error("Area must be positive")
    }
  },
  React: (node) => {
    if (node.area < 0) {
      throw new Error("Area must be positive")
    }
  },
})

Partial or exhaustive?

When authoring semantic properties or methods, you can choose to define them partially (e.g. not all node types necessarily have an area) or to define them exhaustively (e.g. all nodes should have a prettify() output defined). This depends on the use case.

When defining the semantics, you can pick between:

  • defineProperty() allows partial definitions
  • definePropertyExhaustively() will require a definition for every node type

The benefit of using definePropertyExhaustively is that if you add a new node to the grammar, TypeScript will help you remember to also define the semantics for it.

Similarly:

  • defineMethod()
  • defineMethodExhaustively()

更新日志

[Unreleased]

  • Breaking Put settings in a new settings block, i.e.
    settings {
      output = "../gen-here-plz.ts"
    }
    instead of the old (no longer supported):
    set output "../gen-here-plz.ts"
  • Add understood settings discriminator, and output for now. More settings will be added later.
  • Generate to generated-ast.ts by default, but allow specifying it through output = "../somewhere-else.ts"
  • No longer support passing output file as a CLI argument

[0.5.0] - 2025-01-30

  • Breaking Node unions no longer have to be written using @MyUnion syntax. This is now "just" MyUnion. The definition itself determines whether it's a union or a basic node.
  • Added support for literal types, e.g., op: ">" | "<" | ">=" | "<=" (previously the closest best thing was op: string).

[0.4.0] - 2025-01-29

  • Every Node now has generated .children and .descendants iterator properties, which enable you to iterate over all of its children in a type-safe manner.
  • Add support for addition of externally defined semantic properties/methods.
  • Add support for changing discriminator field, using set discriminator "_kind" in the grammar.
  • Change default discriminator field to type.
  • Breaking No longer generates visit() method. You can now use the built-in .children and .descendants properties (available on every Node) or defineMethod() to implement your own custom visitors.

[0.3.0] - 2025-01-20

  • New definition language
  • Internals rewritten in Ohm

[0.2.4] - 2025-01-09

  • Add support for null inside literal field types

[0.2.3] - 2025-01-08

  • Fix bug when using CLI: The data argument must be of type string or an instance of Buffer, TypedArray, or DataView.

[0.2.2] - 2025-01-08

  • Make typescript a peer dependency
  • Upgrade some (dev) dependencies

[0.2.1] - 2024-02-28

  • Made _kind enumerable again.

[0.2.0] - 2024-02-28

  • Made the first-defined node the start node for the grammar. It no longer has to be named "Document" per se.
  • Made _kind and range fields non-enumerable.

[0.1.0]

Modernize repo, rewrite in TypeScript.

[0.0.x]

Did not keep a changelog yet.