The Babel Parser
Calling the Babel Parser with a Plugin: Example
See a simple example of how to use the parser at ULL-ESIT-PL/babel-learning/src/parser/example.cjs.
const code = `const element : React.Element = <a href="https://www.google.com">Hello, world!</a>;`;
let parse = require("@babel/parser").parse;
let ast = parse(code, {
sourceType: "module", // parse in strict mode and allow module declarations
plugins: [ // enable jsx and flow syntax
"jsx",
"flow",
],
});
const skip = (key, value) => {
if (key === "loc" || key === "start" || key === "end" || key === "directives" || key === "comments") {
return undefined;
}
return value;
}
console.log(JSON.stringify(ast, skip, 2));
Babel Parser Docs
The docs for the parser are at https://babeljs.io/docs/babel-parser
Article “Creating custom JavaScript syntax with Babel”
See the Svelte maintainer Tan Li Hau (陈立豪) article “Creating custom JavaScript syntax with Babel” (September 25, 2019) available at https://lihautan.com/creating-custom-javascript-syntax-with-babel
where the author creates a curry function syntax @@
:
// '@@' makes the function `foo` curried
function @@ foo(a, b, c) {
return a + b + c;
}
console.log(foo(1, 2)(3)); // 6
the parser is a recursive descent parser.
-
Modifies the lexer. packages/babel-parser/src/tokenizer/types.js. We need typescript for that.
-
The author looks for
"FunctionDeclaration"
and finds a function calledparseFunction
in packages/babel-parser/src/parser/statement.js, and here found a line that sets thegenerator
attribute.
See tan-liu-article.md for the summary of my experience reproducing Tan Liu Hau’s article.
Babel Parser Plugins
At the end of Nicolo Ribaudo’s talk @babel/howto the interviewer asks him about how to add syntax to Babel.
… The Babel parser does not support plugins. That parsing option I was using in the code is just a list of features which the parser already supports but are disabled by default. If you want to test your custom syntax, we don’t yet at least provide an API to do so and we suggest you to just fork the parser (the Babel mono repo) and then you can provide your custom parser as a Babel option.
See
- section On Babel Parser Plugins at this notes
- section /doc/parser/optional-chaining-in-the-parser.md at the Babel learning repo
Parser Output: The Babel AST
The Babel AST specification is at file spec.md
in repo https://github.com/babel/babel/blob/master/packages/babel-parser/ast/spec.md
The Babel parser generates AST according to Babel AST format. It is based on ESTree spec with the following deviations:
- Literal token is replaced with StringLiteral, NumericLiteral, BigIntLiteral, BooleanLiteral, NullLiteral, RegExpLiteral
- Property token is replaced with ObjectProperty and ObjectMethod
- MethodDefinition is replaced with ClassMethod and ClassPrivateMethod
- PropertyDefinition is replaced with ClassProperty and ClassPrivateProperty
- PrivateIdentifier is replaced with PrivateName
- Program and BlockStatement contain additional
directives
field with Directive and DirectiveLiteral - ClassMethod, ClassPrivateMethod, ObjectProperty, and ObjectMethod value property’s properties in FunctionExpression is coerced/brought into the main method node.
- ChainExpression is the kind of node produced by espree for expressions like
obj?.aaa?.bbb
. It will be replaced with OptionalMemberExpression and OptionalCallExpression - ImportExpression is replaced with a CallExpression whose
callee
is an Import node. This change will be reversed in Babel 8. - ExportAllDeclaration with
exported
field is replaced with an ExportNamedDeclaration containing an ExportNamespaceSpecifier node.
Producing a estree compatible AST with the babel parser
The example /src/parser/estree-example.js shows how to produce a estree compatible AST using the babel parser using the plugin estree
:
➜ babel-learning git:(main) ✗ cat src/parser/estree-example.js
// This example shows how to produce a estree compatible AST using the babel parser.
const babel = require('@babel/core');
const source = '4';
const options = {
parserOpts: {
// https://babeljs.io/docs/en/babel-parser#options
plugins: ['estree']
}
};
const ast = babel.parseSync(source, options);
console.log(JSON.stringify(ast, function skip(key, value) {
if (['loc', 'start', 'end', 'directives', 'comments'].includes(key)) {
return undefined;
}
return value;
}, 2));
//const generate = require("@babel/generator").default;
//console.log(generate(ast).code); // throws an error
const recast = require('recast');
console.log(recast.print(ast).code); // '4;'
The parseSync
method receives the source code and options babel.parseSync(code: string, options?: Object)
and returns an AST.
The options
object is described at https://babeljs.io/docs/en/babel-parser#options.
Referenced presets
and plugins
will be loaded such that optional syntax plugins are automatically enabled.
The execution shows that the type
field is now Literal
instead of NumericLiteral
:
➜ babel-learning git:(main) ✗ ➜ node src/parser/estree-example.js
{
"type": "File",
"errors": [],
"program": {
"type": "Program",
"sourceType": "module",
"interpreter": null,
"body": [
{
"type": "ExpressionStatement",
"expression": {
"type": "Literal",
"value": 4,
"raw": "4"
}
}
]
}
}
4;
See Tan Li Hau youtube video [Q&A] Is there specs for babel AST?. Recorded in 2021.
AST for JSX code
AST for JSX code is based on Facebook JSX AST.
Error codes
Error codes are useful for handling the errors thrown by @babel/parser
.
There are two error codes, code
and reasonCode
.
code
- Rough classification of errors (e.g.
BABEL_PARSER_SYNTAX_ERROR
,BABEL_PARSER_SOURCETYPE_MODULE_REQUIRED
).
- Rough classification of errors (e.g.
reasonCode
- Detailed classification of errors (e.g.
MissingSemicolon
,VarRedeclaration
).
- Detailed classification of errors (e.g.
See example at /src/parser/error-example.cjs:
const { parse } = require("@babel/parser");
const ast = parse(`a b`, { errorRecovery: true });
console.log(ast.errors[0].code); // BABEL_PARSER_SYNTAX_ERROR
console.log(ast.errors[0].reasonCode); // MissingSemicolon
Notice how some AST is still generated despite the syntax error:
{
"type": "File",
"errors": [
{
"code": "BABEL_PARSER_SYNTAX_ERROR",
"reasonCode": "MissingSemicolon",
"pos": 1
}
],
"program": {
"type": "Program",
"sourceType": "script",
"interpreter": null,
"body": [
{
"type": "ExpressionStatement",
"expression": {
"type": "Identifier",
"name": "a"
}
},
{
"type": "ExpressionStatement",
"expression": {
"type": "Identifier",
"name": "b"
}
}
]
}
The Parser: Files and organization
See /doc/parser/organization.md.
Babel Parser File Hierarchy
Currently
➜ babel-parser git:(learning) ✗ jq '.version' package.json
"7.10.2"
it has 6 directories, 33 files in the src
folder:
➜ babel-parser git:(learning) ✗ tree src -I node_modules
src
├── index.js
├── options.js
├── parser
│ ├── base.js
│ ├── comments.js
│ ├── error-message.js
│ ├── error.js
│ ├── expression.js
│ ├── index.js
│ ├── lval.js
│ ├── node.js
│ ├── statement.js
│ └── util.js
├── plugin-utils.js
├── plugins
│ ├── estree.js
│ ├── flow.js
│ ├── jsx
│ │ ├── index.js
│ │ └── xhtml.js
│ ├── placeholders.js
│ ├── typescript
│ │ ├── index.js
│ │ └── scope.js
│ └── v8intrinsic.js
├── tokenizer
│ ├── context.js
│ ├── index.js
│ ├── state.js
│ └── types.js
├── types.js
└── util
├── class-scope.js
├── identifier.js
├── location.js
├── production-parameter.js
├── scope.js
├── scopeflags.js
└── whitespace.js
Parsing functions
The parsing functions are in the parser
folder. It is a recursive descent parser. Let us start with the statememt.js
file.
The parsing functions usually start with the prefix parse
followed by the name of the syntactic variable it parses:
➜ parser git:(learning) ✗ egrep -i '^\s*parse\w+\(' statement.js | cat -n
1 parseTopLevel(file: N.File, program: N.Program): N.File {
2 parseInterpreterDirective(): N.InterpreterDirective | null {
3 parseStatement(context: ?string, topLevel?: boolean): N.Statement {
4 parseStatementContent(context: ?string, topLevel: ?boolean): N.Statement {
5 parseDecorators(allowExport?: boolean): void {
6 parseDecorator(): N.Decorator {
7 parseMaybeDecoratorArguments(expr: N.Expression): N.Expression {
8 parseBreakContinueStatement(
9 parseDebuggerStatement(node: N.DebuggerStatement): N.DebuggerStatement {
10 parseHeaderExpression(): N.Expression {
11 parseDoStatement(node: N.DoWhileStatement): N.DoWhileStatement {
12 parseForStatement(node: N.Node): N.ForLike {
13 parseFunctionStatement(
14 parseIfStatement(node: N.IfStatement): N.IfStatement {
15 parseReturnStatement(node: N.ReturnStatement): N.ReturnStatement {
16 parseSwitchStatement(node: N.SwitchStatement): N.SwitchStatement {
17 parseThrowStatement(node: N.ThrowStatement): N.ThrowStatement {
18 parseTryStatement(node: N.TryStatement): N.TryStatement {
19 parseVarStatement(
20 parseWhileStatement(node: N.WhileStatement): N.WhileStatement {
21 parseWithStatement(node: N.WithStatement): N.WithStatement {
22 parseEmptyStatement(node: N.EmptyStatement): N.EmptyStatement {
23 parseLabeledStatement(
24 parseExpressionStatement(
25 parseBlock(
26 parseBlockBody(
27 parseBlockOrModuleBlockBody(
28 parseFor(
29 parseForIn(
30 parseVar(
31 parseVarId(decl: N.VariableDeclarator, kind: "var" | "let" | "const"): void {
32 parseFunctionId(requireId?: boolean): ?N.Identifier {
33 parseFunctionParams(node: N.Function, allowModifiers?: boolean): void {
34 parseClassBody(
35 parseClassMemberFromModifier(
36 parseClassMember(
37 parseClassMemberWithIsStatic(
38 parseClassPropertyName(member: N.ClassMember): N.Expression | N.Identifier {
39 parsePostMemberNameModifiers(
40 parseAccessModifier(): ?N.Accessibility {
41 parseClassPrivateProperty(
42 parseClassProperty(node: N.ClassProperty): N.ClassProperty {
43 parseClassId(
44 parseClassSuper(node: N.Class): void {
45 parseExport(node: N.Node): N.AnyExport {
46 parseExportDefaultExpression(): N.Expression | N.Declaration {
47 parseExportDeclaration(node: N.ExportNamedDeclaration): ?N.Declaration {
48 parseExportFrom(node: N.ExportNamedDeclaration, expect?: boolean): void {
49 parseExportSpecifiers(): Array<N.ExportSpecifier> {
50 parseImport(node: N.Node): N.AnyImport {
51 parseImportSource(): N.StringLiteral {
52 parseImportSpecifierLocal(
53 parseNamedImportSpecifiers(node: N.ImportDeclaration) {
54 parseImportSpecifier(node: N.ImportDeclaration): void {
Top Level and parseBlockBody
src/parser/index.js: Parser Class
The class Parser
is declared in the file src/parser/index.js.
Babel.js is written in Flow. This choice allows the Babel team to leverage Flow’s type-checking capabilities while maintaining a JavaScript codebase. However, Babel remains highly compatible with TypeScript through its plugins and is a popular tool for TypeScript users.
Here are the first lines in the file.
The // @flow
initial comment indicates that the file is being type-checked by Flow, and the type annotations specify the expected types.
// @flow
import type { Options } from "../options";
import type { File } from "../types";
import type { PluginList } from "../plugin-utils";
It starts by importing types:
- the
Options
type, - the
File
type, and - the
PluginList
type
from the
-
src/options.js, (parser options like
sourceType
,strictMode
ortokens
) -
src/types.js. It defines the types of nodes like
Nodebase
,Node
,Expression
,Declaration
,Literal
,StringLiteral
,Program
,Comment
,Token
, etc. The typeFile
corresponds to the root of the AST and is defined as follows:export type File = NodeBase & { // & is the intersection type type: "File", program: Program, comments: $ReadOnlyArray<Comment>, tokens: $ReadOnlyArray<Token | Comment>, };
$ReadOnlyArray is the way to say in Flow that is a read-only array. The equivalent in TypeScript is ReadOnlyArray.
-
src/plugin-utils.js (the type
Plugin
can be a string or a tuple of a string and an object.PluginList
is a read-only array of Plugin elements).
modules.
The file src/parser/index.js continues like this:
import { getOptions } from "../options";
import StatementParser from "./statement";
import { SCOPE_PROGRAM } from "../util/scopeflags"; // const SCOPE_PROGRAM = 0b00000001
import ScopeHandler from "../util/scope";
import ClassScopeHandler from "../util/class-scope";
import ProductionParameterHandler, { PARAM_AWAIT, PARAM, } from "../util/production-parameter";
export type PluginsMap = Map<string, { [string]: any }>;
export default class Parser extends StatementParser { ... }
It imports the class StatementParser
which implements the Statement
parsing. Such class is defined
in the file src/parser/statement.js
The ScopeHandler
is imported in the babel-parser/src/parser/index.js module from
src/util/scope
The ClassScopeHandler
class is imported by the Parser
class in the
babel-parser/src/parser/index.js module from src/util/class-scope
The class Parser
extends StatementParser
and
has the constructor
, the getScopeHandler
method, and the parse
method.
The pluginsMap
function is used to create a map of plugins from a list of plugins.
export default class Parser extends StatementParser {
constructor(options: ?Options, input: string) { ... }
getScopeHandler(): Class<ScopeHandler<*>> { return ScopeHandler; }
parse(): File { ... }
}
function pluginsMap(plugins: PluginList): PluginsMap {
const pluginMap: PluginsMap = new Map();
for (const plugin of plugins) {
const [name, options] = Array.isArray(plugin) ? plugin : [plugin, {}];
if (!pluginMap.has(name)) pluginMap.set(name, options || {});
}
return pluginMap;
}
In the following sections, we will describe the Parser
class and its methods.
Constructor of the Parser
class
The constructor of the Parser
class is defined as follows:
constructor(options: ?Options, input: string) {
options = getOptions(options);
super(options, input);
const ScopeHandler = this.getScopeHandler();
this.options = options;
this.inModule = this.options.sourceType === "module";
this.scope = new ScopeHandler(this.raise.bind(this), this.inModule);
this.prodParam = new ProductionParameterHandler();
this.classScope = new ClassScopeHandler(this.raise.bind(this));
this.plugins = pluginsMap(this.options.plugins);
this.filename = options.sourceFilename;
}
The inheritance hierarchy
The class inherits from StatementParser
which itself inherits from ExpressionParser
which inherits from BaseParser
.
export default class StatementParser extends ExpressionParser { ... }
export default class ExpressionParser extends LValParser { ... }
src/parser/lval.js
export default class LValParser extends NodeUtils { ... }
src/parser/node.js
export class NodeUtils extends UtilParser { ... }
src/parser/util.js
export default class UtilParser extends Tokenizer { ... }
export default class Tokenizer extends ParserErrors {
isLookahead: boolean;
// Token store.
tokens: Array<Token | N.Comment> = [];
constructor(options: Options, input: string) {
super();
this.state = new State();
this.state.init(options);
this.input = input;
this.length = input.length;
this.isLookahead = false;
}
...
}
We can see that Parser
objects have a state
object contain among other things information from the options
. They also have properties containing the input
string and the length
of the input string.
src/parser/error.js
export default class ParserError extends CommentsParser { ... }
src/parser/comments.js
export default class CommentsParser extends BaseParser {
And at last we reach the BaseParser
class:
src/parser/base.js
export default class BaseParser {
// Properties set by constructor in index.js
options: Options;
inModule: boolean;
scope: ScopeHandler<*>; // In Flow, the * symbol is used to indicate a generic type that Flow itself should infer,
classScope: ClassScopeHandler;
prodParam: ProductionParameterHandler;
plugins: PluginsMap;
filename: ?string;
sawUnambiguousESM: boolean = false;
ambiguousScriptDifferentAst: boolean = false;
// Initialized by Tokenizer
state: State;
// input and length are not in state as they are constant and we do
// not want to ever copy them, which happens if state gets cloned
input: string;
length: number;
hasPlugin(name: string): boolean {
return this.plugins.has(name);
}
getPluginOption(plugin: string, name: string) {
// $FlowIssue
if (this.hasPlugin(plugin)) return this.plugins.get(plugin)[name];
}
parse
method
The call this.scope.enter(SCOPE_PROGRAM)
enters the program scope.
The call to this.parseTopLevel(file, program);
starts the parsing at the top level.
parse(): File {
let paramFlags = PARAM;
if (this.hasPlugin("topLevelAwait") && this.inModule) {
paramFlags |= PARAM_AWAIT;
}
this.scope.enter(SCOPE_PROGRAM);
this.prodParam.enter(paramFlags);
const file = this.startNode();
const program = this.startNode();
this.nextToken();
file.errors = null;
this.parseTopLevel(file, program);
file.errors = this.state.errors;
return file;
}
}
Since the Parser
class extends the StatementParser
class, it inherits all the methods of the StatementParser
class and thus it has access to the parseTopLevel
method.
The Class StatementParser and the parseTopLevel function
The class StatementParser
implements the Statement
parsing. It is defined
in the file src/parser/statement.js
The function parseTopLevel parses a program
.
It receives a file
node and a program
node as parameters.
The program
parameter is a N.Program
node that is going to represent
the top-level structure of the program.
It will contain the interpreter directive (if any) and the body AST of the program.
The function returns a file
node that contains the program
node, the comments
and optionally the tokens
of the program. Here is the definition of the Program node:
Program <: Node {
type: "Program";
interpreter: InterpreterDirective | null; // #!/usr/bin/env node
sourceType: "script" | "module";
body: [ Statement | ModuleDeclaration ]; // Array of expressions or statements or import or export declarations
directives: [ Directive ]; // "use strict" directives
}
This is the code of the parseTopLevel
function:
export default class StatementParser extends ExpressionParser {
parseTopLevel(file: N.File, program: N.Program): N.File {
program.sourceType = this.options.sourceType;
program.interpreter = this.parseInterpreterDirective();
this.parseBlockBody(program, true, true, tt.eof);
if ( // check for undefined exports if it is a module
this.inModule &&
!this.options.allowUndeclaredExports &&
this.scope.undefinedExports.size > 0
) { // raise an error if there are undefined exports
for (const [name] of Array.from(this.scope.undefinedExports)) {
const pos = this.scope.undefinedExports.get(name);
// $FlowIssue
this.raise(pos, Errors.ModuleExportUndefined, name);
}
}
file.program = this.finishNode(program, "Program");
file.comments = this.state.comments;
if (this.options.tokens) file.tokens = this.tokens;
return this.finishNode(file, "File");
}
...
} // End of class StatementParser
The assignment program.interpreter = this.parseInterpreterDirective();
parses the
InterpreterDirective /#!.*/
if any.
The call this.parseBlockBody(program, true, true, tt.eof);
parses the body of the program.
- The first argument is the node to which the body will be attached.
- The first
true
argumentallowDirectives
indicates that directives are allowed in the body. Directives are special instructions or statements that are processed differently by the JavaScript engine compared to regular code, like"use strict"
anduse asm
. - The second
true
argument indicates that the body is top level. - The third argument
tt.eof
is the token type tha signals the end of the body.
Scope Analysis
See
- Scope Analysis in this notes
- File parser/babel-scope.md in the babel learning repo
Checking for undefined exports if it is a module
In the section:
if ( // check for undefined exports if it is a module
this.inModule && !this.options.allowUndeclaredExports && this.scope.undefinedExports.size > 0
) { // raise an error if there are undefined exports
for (const [name] of Array.from(this.scope.undefinedExports)) {
const pos = this.scope.undefinedExports.get(name);
// $FlowIssue
this.raise(pos, Errors.ModuleExportUndefined, name);
}
}
We check if we are in a module and if there are undefined
exports.
This ensures that exports are always defined before exporting them.
This is required according to the spec here: https://www.ecma-international.org/ecma-262/9.0/index.html#sec-module-semantics-static-semantics-early-errors. See pull request 9589.
The [name]
part in the expression for (const [name] of Array.from(this.scope.undefinedExports))
uses array destructuring to extract the first element of each iterable element in this.scope.undefinedExports
.
finishNode and finishNodeAt
The finishNode
function is responsible for finishing the construction of an AST node, assigning it final properties before it is considered complete. The function accepts a generic node T
, which must be a node type (NodeType
), along with a type
that indicates the type of node to end,
finishNode<T: NodeType>(node: T, type: string): T {
return this.finishNodeAt(node, type, this.state.lastTokEnd, this.state.lastTokEndLoc);
}
The helper function finishNodeAt
is responsible for the actual finishing. The function accepts a generic node T
, which must be a node type (NodeType
), along with
- a
type
that indicates the type of node to end, - a
pos
position that represents the end of the node in the source code, and - a
loc
object which contains location information, specifically the end of the node location.
The first step within the function is a safety check for production
that raises an error if an attempt is made to terminate a node that has already been terminated previously, which is indicated by node.end > 0
.
finishNodeAt<T: NodeType>(node: T, type: string, pos: number, loc: Position, ): T {
if (process.env.NODE_ENV !== "production" && node.end > 0) {
throw new Error(
"Do not call finishNode*() twice on the same node." +
" Instead use resetEndLocation() or change type directly.",
);
}
node.type = type;
node.end = pos;
node.loc.end = loc;
if (this.options.ranges) node.range[1] = pos;
this.processComment(node);
return node;
}
parseBlockBody
Here is the parseBlockBody
function that is called by parseTopLevel
:
parseBlockBody(
node: N.BlockStatementLike,
allowDirectives: ?boolean, topLevel: boolean, end: TokenType,
afterBlockParse?: (hasStrictModeDirective: boolean) => void,
): void {
const body = (node.body = []);
const directives = (node.directives = []);
this.parseBlockOrModuleBlockBody(
body,
allowDirectives ? directives : undefined,
topLevel,
end,
afterBlockParse,
);
}
parseIfStatement
The structure of all the parse
Something functions is similar. They start by calling this.next()
to move to the next token, then they
continue following the grammar rules using the token if needed. Finally, they call this.finishNode
to create the AST node.
Here it the case of the parseIfStatement
function that follows the IfStatement grammar rule:
parseIfStatement(node: N.IfStatement): N.IfStatement {
this.next(); // eat `if`
node.test = this.parseHeaderExpression(); // parse the test expression
node.consequent = this.parseStatement("if"); // parse the consequent statement
node.alternate = this.eat(tt._else) ? this.parseStatement("if") : null; // eat `else` and parse the alternate statement if any
return this.finishNode(node, "IfStatement");
}
Methods consuming tokens
We can see the difference between this.eat(tt._else)
and this.next()
. The former consumes the token if it is an else
token, while the latter just moves to the next token without consuming it. There is also this.expect(tt._else)
that raises an error if the next token is not an else
token and consumes it if it is:
expect(type: TokenType, pos?: ?number): void {
this.eat(type) || this.unexpected(pos, type);
}
Another method is this.match
that returns true
if the next token is of the given type without consuming it:
match(type: TokenType): boolean {
return this.state.type === type;
}
tc39
Ecma International’s TC39 is a group of JavaScript developers, implementers, and academics collaborating with the community to maintain and evolve the definition of JavaScript.
See tc39.md
You can see examples of how Babel is used to study new syntactic and semantic proposals in sections
Tokenizer
See
- section Tokenizer in this notes
- section /doc/parser/tokenizer.md in the babel-learning repo.
References
See
- section References in this notes
- section References in the babel-learning repo.