The Babel Parser

Calling the Babel Parser with a Plugin: Example

See a simple example of how to use the parser at ULL-ESIT-PL/babel-learning/src/parser/example.cjs.

src/parser/example.cjs

const code = `const element : React.Element = <a href="https://www.google.com">Hello, world!</a>;`;
let parse = require("@babel/parser").parse;
let ast = parse(code, {
  sourceType: "module",   // parse in strict mode and allow module declarations
  plugins: [     // enable jsx and flow syntax
    "jsx",
    "flow",
  ],
});
 
const skip = (key, value) => {
  if (key === "loc" || key === "start" || key === "end" || key === "directives" || key === "comments") {
    return undefined;
  }
  return value;
}
console.log(JSON.stringify(ast, skip, 2));

Babel Parser Docs

The docs for the parser are at https://babeljs.io/docs/babel-parser

Article “Creating custom JavaScript syntax with Babel”

See the Svelte maintainer Tan Li Hau (陈立豪) article “Creating custom JavaScript syntax with Babel” (September 25, 2019) available at https://lihautan.com/creating-custom-javascript-syntax-with-babel where the author creates a curry function syntax @@:

// '@@' makes the function `foo` curried
function @@ foo(a, b, c) {
  return a + b + c;
}
console.log(foo(1, 2)(3)); // 6

the parser is a recursive descent parser.

Modifies the lexer. packages/babel-parser/src/tokenizer/types.js. We need typescript for that.
The author looks for "FunctionDeclaration" and finds a function called parseFunction in packages/babel-parser/src/parser/statement.js, and here found a line that sets the generator attribute.

See tan-liu-article.md for the summary of my experience reproducing Tan Liu Hau’s article.

Babel Parser Plugins

At the end of Nicolo Ribaudo’s talk @babel/howto the interviewer asks him about how to add syntax to Babel.

… The Babel parser does not support plugins. That parsing option I was using in the code is just a list of features which the parser already supports but are disabled by default. If you want to test your custom syntax, we don’t yet at least provide an API to do so and we suggest you to just fork the parser (the Babel mono repo) and then you can provide your custom parser as a Babel option.

See

section On Babel Parser Plugins at this notes
section /doc/parser/optional-chaining-in-the-parser.md at the Babel learning repo

Parser Output: The Babel AST

The Babel AST specification is at file spec.md in repo https://github.com/babel/babel/blob/master/packages/babel-parser/ast/spec.md

The Babel parser generates AST according to Babel AST format. It is based on ESTree spec with the following deviations:

Literal token is replaced with StringLiteral, NumericLiteral, BigIntLiteral, BooleanLiteral, NullLiteral, RegExpLiteral
Property token is replaced with ObjectProperty and ObjectMethod
MethodDefinition is replaced with ClassMethod and ClassPrivateMethod
PropertyDefinition is replaced with ClassProperty and ClassPrivateProperty
PrivateIdentifier is replaced with PrivateName
Program and BlockStatement contain additional directives field with Directive and DirectiveLiteral
ClassMethod, ClassPrivateMethod, ObjectProperty, and ObjectMethod value property’s properties in FunctionExpression is coerced/brought into the main method node.
ChainExpression is the kind of node produced by espree for expressions like obj?.aaa?.bbb. It will be replaced with OptionalMemberExpression and OptionalCallExpression
ImportExpression is replaced with a CallExpression whose callee is an Import node. This change will be reversed in Babel 8.
ExportAllDeclaration with exported field is replaced with an ExportNamedDeclaration containing an ExportNamespaceSpecifier node.

Producing a estree compatible AST with the babel parser

The example /src/parser/estree-example.js shows how to produce a estree compatible AST using the babel parser using the plugin estree:

➜ babel-learning git:(main) ✗ cat src/parser/estree-example.js

// This example shows how to produce a estree compatible AST using the babel parser.
const babel = require('@babel/core');
const source = '4';
const options = {
  parserOpts: {
    // https://babeljs.io/docs/en/babel-parser#options
    plugins: ['estree']
  }
};
const ast = babel.parseSync(source, options);
console.log(JSON.stringify(ast, function skip(key, value) {
  if (['loc', 'start', 'end', 'directives', 'comments'].includes(key)) {
    return undefined;
  }
  return value;
}, 2));
//const generate = require("@babel/generator").default;
//console.log(generate(ast).code); // throws an error
const recast = require('recast');
console.log(recast.print(ast).code); // '4;'

The parseSync method receives the source code and options babel.parseSync(code: string, options?: Object) and returns an AST. The options object is described at https://babeljs.io/docs/en/babel-parser#options. Referenced presets and plugins will be loaded such that optional syntax plugins are automatically enabled.

The execution shows that the type field is now Literal instead of NumericLiteral:

➜ babel-learning git:(main) ✗ ➜ node src/parser/estree-example.js

{
  "type": "File",
  "errors": [],
  "program": {
    "type": "Program",
    "sourceType": "module",
    "interpreter": null,
    "body": [
      {
        "type": "ExpressionStatement",
        "expression": {
          "type": "Literal",
          "value": 4,
          "raw": "4"
        }
      }
    ]
  }
}

4;

See Tan Li Hau youtube video [Q&A] Is there specs for babel AST?. Recorded in 2021.

AST for JSX code

AST for JSX code is based on Facebook JSX AST.

Error codes

Error codes are useful for handling the errors thrown by @babel/parser.

There are two error codes, code and reasonCode.

code
- Rough classification of errors (e.g. BABEL_PARSER_SYNTAX_ERROR, BABEL_PARSER_SOURCETYPE_MODULE_REQUIRED).
reasonCode
- Detailed classification of errors (e.g. MissingSemicolon, VarRedeclaration).

See example at /src/parser/error-example.cjs:

const { parse } = require("@babel/parser");
 
const ast = parse(`a b`, { errorRecovery: true });
 
console.log(ast.errors[0].code); // BABEL_PARSER_SYNTAX_ERROR
console.log(ast.errors[0].reasonCode); // MissingSemicolon

Notice how some AST is still generated despite the syntax error:

{
  "type": "File",
  "errors": [
    {
      "code": "BABEL_PARSER_SYNTAX_ERROR",
      "reasonCode": "MissingSemicolon",
      "pos": 1
    }
  ],
  "program": {
    "type": "Program",
    "sourceType": "script",
    "interpreter": null,
    "body": [
      {
        "type": "ExpressionStatement",
        "expression": {
          "type": "Identifier",
          "name": "a"
        }
      },
      {
        "type": "ExpressionStatement",
        "expression": {
          "type": "Identifier",
          "name": "b"
        }
      }
    ]
  }

The Parser: Files and organization

See /doc/parser/organization.md.

Babel Parser File Hierarchy

Currently

➜  babel-parser git:(learning) ✗ jq '.version' package.json 
"7.10.2"

it has 6 directories, 33 files in the srcfolder:

➜  babel-parser git:(learning) ✗ tree src -I node_modules     
src
├── index.js
├── options.js
├── parser
│   ├── base.js
│   ├── comments.js
│   ├── error-message.js
│   ├── error.js
│   ├── expression.js
│   ├── index.js
│   ├── lval.js
│   ├── node.js
│   ├── statement.js
│   └── util.js
├── plugin-utils.js
├── plugins
│   ├── estree.js
│   ├── flow.js
│   ├── jsx
│   │   ├── index.js
│   │   └── xhtml.js
│   ├── placeholders.js
│   ├── typescript
│   │   ├── index.js
│   │   └── scope.js
│   └── v8intrinsic.js
├── tokenizer
│   ├── context.js
│   ├── index.js
│   ├── state.js
│   └── types.js
├── types.js
└── util
    ├── class-scope.js
    ├── identifier.js
    ├── location.js
    ├── production-parameter.js
    ├── scope.js
    ├── scopeflags.js
    └── whitespace.js

Parsing functions

The parsing functions are in the parser folder. It is a recursive descent parser. Let us start with the statememt.js file. The parsing functions usually start with the prefix parse followed by the name of the syntactic variable it parses:

➜  parser git:(learning) ✗ egrep -i '^\s*parse\w+\(' statement.js | cat -n
     1    parseTopLevel(file: N.File, program: N.Program): N.File {
     2    parseInterpreterDirective(): N.InterpreterDirective | null {
     3    parseStatement(context: ?string, topLevel?: boolean): N.Statement {
     4    parseStatementContent(context: ?string, topLevel: ?boolean): N.Statement {
     5    parseDecorators(allowExport?: boolean): void {
     6    parseDecorator(): N.Decorator {
     7    parseMaybeDecoratorArguments(expr: N.Expression): N.Expression {
     8    parseBreakContinueStatement(
     9    parseDebuggerStatement(node: N.DebuggerStatement): N.DebuggerStatement {
    10    parseHeaderExpression(): N.Expression {
    11    parseDoStatement(node: N.DoWhileStatement): N.DoWhileStatement {
    12    parseForStatement(node: N.Node): N.ForLike {
    13    parseFunctionStatement(
    14    parseIfStatement(node: N.IfStatement): N.IfStatement {
    15    parseReturnStatement(node: N.ReturnStatement): N.ReturnStatement {
    16    parseSwitchStatement(node: N.SwitchStatement): N.SwitchStatement {
    17    parseThrowStatement(node: N.ThrowStatement): N.ThrowStatement {
    18    parseTryStatement(node: N.TryStatement): N.TryStatement {
    19    parseVarStatement(
    20    parseWhileStatement(node: N.WhileStatement): N.WhileStatement {
    21    parseWithStatement(node: N.WithStatement): N.WithStatement {
    22    parseEmptyStatement(node: N.EmptyStatement): N.EmptyStatement {
    23    parseLabeledStatement(
    24    parseExpressionStatement(
    25    parseBlock(
    26    parseBlockBody(
    27    parseBlockOrModuleBlockBody(
    28    parseFor(
    29    parseForIn(
    30    parseVar(
    31    parseVarId(decl: N.VariableDeclarator, kind: "var" | "let" | "const"): void {
    32    parseFunctionId(requireId?: boolean): ?N.Identifier {
    33    parseFunctionParams(node: N.Function, allowModifiers?: boolean): void {
    34    parseClassBody(
    35    parseClassMemberFromModifier(
    36    parseClassMember(
    37    parseClassMemberWithIsStatic(
    38    parseClassPropertyName(member: N.ClassMember): N.Expression | N.Identifier {
    39    parsePostMemberNameModifiers(
    40    parseAccessModifier(): ?N.Accessibility {
    41    parseClassPrivateProperty(
    42    parseClassProperty(node: N.ClassProperty): N.ClassProperty {
    43    parseClassId(
    44    parseClassSuper(node: N.Class): void {
    45    parseExport(node: N.Node): N.AnyExport {
    46    parseExportDefaultExpression(): N.Expression | N.Declaration {
    47    parseExportDeclaration(node: N.ExportNamedDeclaration): ?N.Declaration {
    48    parseExportFrom(node: N.ExportNamedDeclaration, expect?: boolean): void {
    49    parseExportSpecifiers(): Array<N.ExportSpecifier> {
    50    parseImport(node: N.Node): N.AnyImport {
    51    parseImportSource(): N.StringLiteral {
    52    parseImportSpecifierLocal(
    53    parseNamedImportSpecifiers(node: N.ImportDeclaration) {
    54    parseImportSpecifier(node: N.ImportDeclaration): void {

Top Level and parseBlockBody

See /doc/parser/top-level.md.

src/parser/index.js: Parser Class

The class Parser is declared in the file src/parser/index.js.

Babel.js is written in Flow. This choice allows the Babel team to leverage Flow’s type-checking capabilities while maintaining a JavaScript codebase. However, Babel remains highly compatible with TypeScript through its plugins and is a popular tool for TypeScript users.

Here are the first lines in the file. The // @flow initial comment indicates that the file is being type-checked by Flow, and the type annotations specify the expected types.

// @flow
 
import type { Options } from "../options";
import type { File } from "../types";
import type { PluginList } from "../plugin-utils";

It starts by importing types:

the Options type,
the File type, and
the PluginList type

from the

src/options.js, (parser options like sourceType, strictMode or tokens)
src/types.js. It defines the types of nodes like Nodebase, Node, Expression, Declaration, Literal, StringLiteral, Program, Comment, Token, etc. The type File corresponds to the root of the AST and is defined as follows:
```
export type File = NodeBase & { // & is the intersection type
  type: "File",
  program: Program,
  comments: $ReadOnlyArray<Comment>,
  tokens: $ReadOnlyArray<Token | Comment>,
};
```
$ReadOnlyArray is the way to say in Flow that is a read-only array. The equivalent in TypeScript is ReadOnlyArray.
src/plugin-utils.js (the type Plugin can be a string or a tuple of a string and an object. PluginList is a read-only array of Plugin elements).

modules.

The file src/parser/index.js continues like this:

import { getOptions } from "../options";
import StatementParser from "./statement";
import { SCOPE_PROGRAM } from "../util/scopeflags"; // const SCOPE_PROGRAM = 0b00000001
import ScopeHandler from "../util/scope";
import ClassScopeHandler from "../util/class-scope";
import ProductionParameterHandler, { PARAM_AWAIT, PARAM, } from "../util/production-parameter";
 
export type PluginsMap = Map<string, { [string]: any }>;
 
export default class Parser extends StatementParser { ... }

It imports the class StatementParser which implements the Statement parsing. Such class is defined in the file src/parser/statement.js

The ScopeHandler is imported in the babel-parser/src/parser/index.js module from src/util/scope

The ClassScopeHandler class is imported by the Parser class in the babel-parser/src/parser/index.js module from src/util/class-scope

The class Parser extends StatementParser and has the constructor, the getScopeHandler method, and the parse method. The pluginsMap function is used to create a map of plugins from a list of plugins.

export default class Parser extends StatementParser {
  constructor(options: ?Options, input: string) { ... }
 
  getScopeHandler(): Class<ScopeHandler<*>> { return ScopeHandler; }
 
  parse(): File { ... }
}
 
function pluginsMap(plugins: PluginList): PluginsMap {
  const pluginMap: PluginsMap = new Map();
  for (const plugin of plugins) {
    const [name, options] = Array.isArray(plugin) ? plugin : [plugin, {}];
    if (!pluginMap.has(name)) pluginMap.set(name, options || {});
  }
  return pluginMap;
}

In the following sections, we will describe the Parser class and its methods.

Constructor of the `Parser` class

The constructor of the Parser class is defined as follows:

  constructor(options: ?Options, input: string) {
    options = getOptions(options);
    super(options, input);
 
    const ScopeHandler = this.getScopeHandler();
 
    this.options = options;
    this.inModule = this.options.sourceType === "module";
    this.scope = new ScopeHandler(this.raise.bind(this), this.inModule);
    this.prodParam = new ProductionParameterHandler();
    this.classScope = new ClassScopeHandler(this.raise.bind(this));
    this.plugins = pluginsMap(this.options.plugins);
    this.filename = options.sourceFilename;
  }

The inheritance hierarchy

The class inherits from StatementParser which itself inherits from ExpressionParser which inherits from BaseParser.

export default class StatementParser extends ExpressionParser { ... }

export default class ExpressionParser extends LValParser { ... }

src/parser/lval.js

export default class LValParser extends NodeUtils { ... }

src/parser/node.js

export class NodeUtils extends UtilParser { ... }

src/parser/util.js

export default class UtilParser extends Tokenizer { ... }

export default class Tokenizer extends ParserErrors {
 
  isLookahead: boolean;
 
  // Token store.
  tokens: Array<Token | N.Comment> = [];
 
  constructor(options: Options, input: string) {
    super();
    this.state = new State();
    this.state.init(options);
    this.input = input;
    this.length = input.length;
    this.isLookahead = false;
  }
  ...
}

We can see that Parser objects have a state object contain among other things information from the options. They also have properties containing the input string and the length of the input string.

src/parser/error.js

export default class ParserError extends CommentsParser { ... }

src/parser/comments.js

export default class CommentsParser extends BaseParser {

And at last we reach the BaseParser class:

src/parser/base.js

export default class BaseParser {
  // Properties set by constructor in index.js
  options: Options;
  inModule: boolean;
  scope: ScopeHandler<*>; // In Flow, the * symbol is used to indicate a generic type that Flow itself should infer,
  classScope: ClassScopeHandler;
  prodParam: ProductionParameterHandler;
  plugins: PluginsMap;
  filename: ?string;
  sawUnambiguousESM: boolean = false;
  ambiguousScriptDifferentAst: boolean = false;
 
  // Initialized by Tokenizer
  state: State;
  // input and length are not in state as they are constant and we do
  // not want to ever copy them, which happens if state gets cloned
  input: string;
  length: number;
 
  hasPlugin(name: string): boolean {
    return this.plugins.has(name);
  }
 
  getPluginOption(plugin: string, name: string) {
    // $FlowIssue
    if (this.hasPlugin(plugin)) return this.plugins.get(plugin)[name];
  }

`parse` method

The call this.scope.enter(SCOPE_PROGRAM) enters the program scope.

The call to this.parseTopLevel(file, program); starts the parsing at the top level.

  parse(): File {
    let paramFlags = PARAM;
    if (this.hasPlugin("topLevelAwait") && this.inModule) {
      paramFlags |= PARAM_AWAIT;
    }
    this.scope.enter(SCOPE_PROGRAM); 
    this.prodParam.enter(paramFlags);
    const file = this.startNode();
    const program = this.startNode();
    this.nextToken();
    file.errors = null;
    this.parseTopLevel(file, program);
    file.errors = this.state.errors;
    return file;
  }
}

Since the Parser class extends the StatementParser class, it inherits all the methods of the StatementParser class and thus it has access to the parseTopLevel method.

The Class StatementParser and the parseTopLevel function

The class StatementParser implements the Statement parsing. It is defined in the file src/parser/statement.js

The function parseTopLevel parses a program. It receives a file node and a program node as parameters. The program parameter is a N.Program node that is going to represent the top-level structure of the program. It will contain the interpreter directive (if any) and the body AST of the program. The function returns a file node that contains the program node, the comments and optionally the tokens of the program. Here is the definition of the Program node:

Program <: Node {
  type: "Program";
  interpreter: InterpreterDirective | null; // #!/usr/bin/env node
  sourceType: "script" | "module";
  body: [ Statement | ModuleDeclaration ];  // Array of expressions or statements or import or export declarations
  directives: [ Directive ]; // "use strict" directives
}

This is the code of the parseTopLevel function:

export default class StatementParser extends ExpressionParser {
 
  parseTopLevel(file: N.File, program: N.Program): N.File {
    program.sourceType = this.options.sourceType;
 
    program.interpreter = this.parseInterpreterDirective();
 
    this.parseBlockBody(program, true, true, tt.eof);
 
    if ( // check for undefined exports if it is a module
      this.inModule &&
      !this.options.allowUndeclaredExports &&
      this.scope.undefinedExports.size > 0
    ) { // raise an error if there are undefined exports
      for (const [name] of Array.from(this.scope.undefinedExports)) {
        const pos = this.scope.undefinedExports.get(name);
        // $FlowIssue
        this.raise(pos, Errors.ModuleExportUndefined, name);
      }
    }
 
    file.program = this.finishNode(program, "Program");
    file.comments = this.state.comments;
 
    if (this.options.tokens) file.tokens = this.tokens;
 
    return this.finishNode(file, "File");
  }
  ...
} // End of class StatementParser

The assignment program.interpreter = this.parseInterpreterDirective(); parses the InterpreterDirective /#!.*/ if any.

The call this.parseBlockBody(program, true, true, tt.eof); parses the body of the program.

The first argument is the node to which the body will be attached.
The first true argument allowDirectives indicates that directives are allowed in the body. Directives are special instructions or statements that are processed differently by the JavaScript engine compared to regular code, like "use strict" and use asm.
The second true argument indicates that the body is top level.
The third argument tt.eof is the token type tha signals the end of the body.

Scope Analysis

See

Scope Analysis in this notes
File parser/babel-scope.md in the babel learning repo

Checking for undefined exports if it is a module

In the section:

 if ( // check for undefined exports if it is a module
      this.inModule && !this.options.allowUndeclaredExports && this.scope.undefinedExports.size > 0
    ) { // raise an error if there are undefined exports
      for (const [name] of Array.from(this.scope.undefinedExports)) {
        const pos = this.scope.undefinedExports.get(name);
        // $FlowIssue
        this.raise(pos, Errors.ModuleExportUndefined, name);
      }
    }

We check if we are in a module and if there are undefined exports. This ensures that exports are always defined before exporting them. This is required according to the spec here: https://www.ecma-international.org/ecma-262/9.0/index.html#sec-module-semantics-static-semantics-early-errors. See pull request 9589.

The [name] part in the expression for (const [name] of Array.from(this.scope.undefinedExports)) uses array destructuring to extract the first element of each iterable element in this.scope.undefinedExports.

finishNode and finishNodeAt

The finishNode function is responsible for finishing the construction of an AST node, assigning it final properties before it is considered complete. The function accepts a generic node T, which must be a node type (NodeType), along with a type that indicates the type of node to end,

finishNode<T: NodeType>(node: T, type: string): T {
  return this.finishNodeAt(node, type, this.state.lastTokEnd, this.state.lastTokEndLoc);
}

The helper function finishNodeAt is responsible for the actual finishing. The function accepts a generic node T, which must be a node type (NodeType), along with

a type that indicates the type of node to end,
a pos position that represents the end of the node in the source code, and
a loc object which contains location information, specifically the end of the node location.

The first step within the function is a safety check for production that raises an error if an attempt is made to terminate a node that has already been terminated previously, which is indicated by node.end > 0.

  finishNodeAt<T: NodeType>(node: T, type: string, pos: number, loc: Position, ): T {
    if (process.env.NODE_ENV !== "production" && node.end > 0) {
      throw new Error(
        "Do not call finishNode*() twice on the same node." +
          " Instead use resetEndLocation() or change type directly.",
      );
    }
    node.type = type;
    node.end = pos;
    node.loc.end = loc;
    if (this.options.ranges) node.range[1] = pos;
    this.processComment(node);
    return node;
  }

parseBlockBody

Here is the parseBlockBody function that is called by parseTopLevel:

parseBlockBody(
    node: N.BlockStatementLike,
    allowDirectives: ?boolean, topLevel: boolean, end: TokenType,
    afterBlockParse?: (hasStrictModeDirective: boolean) => void,
  ): void {
    const body = (node.body = []);
    const directives = (node.directives = []);
    this.parseBlockOrModuleBlockBody(
      body,
      allowDirectives ? directives : undefined,
      topLevel,
      end,
      afterBlockParse,
    );
}

parseIfStatement

The structure of all the parseSomething functions is similar. They start by calling this.next() to move to the next token, then they continue following the grammar rules using the token if needed. Finally, they call this.finishNode to create the AST node. Here it the case of the parseIfStatement function that follows the IfStatement grammar rule:

  parseIfStatement(node: N.IfStatement): N.IfStatement {
    this.next(); // eat `if`
    node.test = this.parseHeaderExpression();    // parse the test expression
    node.consequent = this.parseStatement("if"); // parse the consequent statement
    node.alternate = this.eat(tt._else) ? this.parseStatement("if") : null; // eat `else` and parse the alternate statement if any
    return this.finishNode(node, "IfStatement");
  }

Methods consuming tokens

We can see the difference between this.eat(tt._else) and this.next(). The former consumes the token if it is an else token, while the latter just moves to the next token without consuming it. There is also this.expect(tt._else) that raises an error if the next token is not an else token and consumes it if it is:

  expect(type: TokenType, pos?: ?number): void {
    this.eat(type) || this.unexpected(pos, type);
  }

Another method is this.match that returns true if the next token is of the given type without consuming it:

  match(type: TokenType): boolean {
    return this.state.type === type;
  }

tc39

Ecma International’s TC39 is a group of JavaScript developers, implementers, and academics collaborating with the community to maintain and evolve the definition of JavaScript.

See tc39.md

You can see examples of how Babel is used to study new syntactic and semantic proposals in sections

Tokenizer

See

section Tokenizer in this notes
section /doc/parser/tokenizer.md in the babel-learning repo.

References

See

section References in this notes
section References in the babel-learning repo.

Optional Chaining in the Parser Index Context