The Babel Parser
Calling the Babel Parser with a Plugin: Example
See a simple example of how to use the parser at ULL-ESIT-PL/babel-learning/src/parser/example.cjs.
const code = `const element : React.Element = <a href="https://www.google.com">Hello, world!</a>;`;
let parse = require("@babel/parser").parse;
let ast = parse(code, {
sourceType: "module", // parse in strict mode and allow module declarations
plugins: [ // enable jsx and flow syntax
"jsx",
"flow",
],
});
const skip = (key, value) => {
if (key === "loc" || key === "start" || key === "end" || key === "directives" || key === "comments") {
return undefined;
}
return value;
}
console.log(JSON.stringify(ast, skip, 2));Babel Parser Docs
The docs for the parser are at https://babeljs.io/docs/babel-parser
Article “Creating custom JavaScript syntax with Babel”
See the Svelte maintainer Tan Li Hau (陈立豪) article “Creating custom JavaScript syntax with Babel” (September 25, 2019) available at https://lihautan.com/creating-custom-javascript-syntax-with-babel
where the author creates a curry function syntax @@:
// '@@' makes the function `foo` curried
function @@ foo(a, b, c) {
return a + b + c;
}
console.log(foo(1, 2)(3)); // 6the parser is a recursive descent parser.
-
Modifies the lexer. packages/babel-parser/src/tokenizer/types.js. We need typescript for that.
-
The author looks for
"FunctionDeclaration"and finds a function calledparseFunctionin packages/babel-parser/src/parser/statement.js, and here found a line that sets thegeneratorattribute.
See tan-liu-article.md for the summary of my experience reproducing Tan Liu Hau’s article.
Babel Parser Plugins
At the end of Nicolo Ribaudo’s talk @babel/howto the interviewer asks him about how to add syntax to Babel.
… The Babel parser does not support plugins. That parsing option I was using in the code is just a list of features which the parser already supports but are disabled by default. If you want to test your custom syntax, we don’t yet at least provide an API to do so and we suggest you to just fork the parser (the Babel mono repo) and then you can provide your custom parser as a Babel option.
See
- section On Babel Parser Plugins at this notes
- section /doc/parser/optional-chaining-in-the-parser.md at the Babel learning repo
Parser Output: The Babel AST
The Babel AST specification is at file spec.md in repo https://github.com/babel/babel/blob/master/packages/babel-parser/ast/spec.md
The Babel parser generates AST according to Babel AST format. It is based on ESTree spec with the following deviations:
- Literal token is replaced with StringLiteral, NumericLiteral, BigIntLiteral, BooleanLiteral, NullLiteral, RegExpLiteral
- Property token is replaced with ObjectProperty and ObjectMethod
- MethodDefinition is replaced with ClassMethod and ClassPrivateMethod
- PropertyDefinition is replaced with ClassProperty and ClassPrivateProperty
- PrivateIdentifier is replaced with PrivateName
- Program and BlockStatement contain additional
directivesfield with Directive and DirectiveLiteral - ClassMethod, ClassPrivateMethod, ObjectProperty, and ObjectMethod value property’s properties in FunctionExpression is coerced/brought into the main method node.
- ChainExpression is the kind of node produced by espree for expressions like
obj?.aaa?.bbb. It will be replaced with OptionalMemberExpression and OptionalCallExpression - ImportExpression is replaced with a CallExpression whose
calleeis an Import node. This change will be reversed in Babel 8. - ExportAllDeclaration with
exportedfield is replaced with an ExportNamedDeclaration containing an ExportNamespaceSpecifier node.
Producing a estree compatible AST with the babel parser
The example /src/parser/estree-example.js shows how to produce a estree compatible AST using the babel parser using the plugin estree:
➜ babel-learning git:(main) ✗ cat src/parser/estree-example.js
// This example shows how to produce a estree compatible AST using the babel parser.
const babel = require('@babel/core');
const source = '4';
const options = {
parserOpts: {
// https://babeljs.io/docs/en/babel-parser#options
plugins: ['estree']
}
};
const ast = babel.parseSync(source, options);
console.log(JSON.stringify(ast, function skip(key, value) {
if (['loc', 'start', 'end', 'directives', 'comments'].includes(key)) {
return undefined;
}
return value;
}, 2));
//const generate = require("@babel/generator").default;
//console.log(generate(ast).code); // throws an error
const recast = require('recast');
console.log(recast.print(ast).code); // '4;'The parseSync method receives the source code and options babel.parseSync(code: string, options?: Object) and returns an AST.
The options object is described at https://babeljs.io/docs/en/babel-parser#options.
Referenced presets and plugins will be loaded such that optional syntax plugins are automatically enabled.
The execution shows that the type field is now Literal instead of NumericLiteral:
➜ babel-learning git:(main) ✗ ➜ node src/parser/estree-example.js
{
"type": "File",
"errors": [],
"program": {
"type": "Program",
"sourceType": "module",
"interpreter": null,
"body": [
{
"type": "ExpressionStatement",
"expression": {
"type": "Literal",
"value": 4,
"raw": "4"
}
}
]
}
}4;See Tan Li Hau youtube video [Q&A] Is there specs for babel AST?. Recorded in 2021.
AST for JSX code
AST for JSX code is based on Facebook JSX AST.
Error codes
Error codes are useful for handling the errors thrown by @babel/parser.
There are two error codes, code and reasonCode.
code- Rough classification of errors (e.g.
BABEL_PARSER_SYNTAX_ERROR,BABEL_PARSER_SOURCETYPE_MODULE_REQUIRED).
- Rough classification of errors (e.g.
reasonCode- Detailed classification of errors (e.g.
MissingSemicolon,VarRedeclaration).
- Detailed classification of errors (e.g.
See example at /src/parser/error-example.cjs:
const { parse } = require("@babel/parser");
const ast = parse(`a b`, { errorRecovery: true });
console.log(ast.errors[0].code); // BABEL_PARSER_SYNTAX_ERROR
console.log(ast.errors[0].reasonCode); // MissingSemicolonNotice how some AST is still generated despite the syntax error:
{
"type": "File",
"errors": [
{
"code": "BABEL_PARSER_SYNTAX_ERROR",
"reasonCode": "MissingSemicolon",
"pos": 1
}
],
"program": {
"type": "Program",
"sourceType": "script",
"interpreter": null,
"body": [
{
"type": "ExpressionStatement",
"expression": {
"type": "Identifier",
"name": "a"
}
},
{
"type": "ExpressionStatement",
"expression": {
"type": "Identifier",
"name": "b"
}
}
]
}The Parser: Files and organization
See /doc/parser/organization.md.
Babel Parser File Hierarchy
Currently
➜ babel-parser git:(learning) ✗ jq '.version' package.json
"7.10.2"it has 6 directories, 33 files in the srcfolder:
➜ babel-parser git:(learning) ✗ tree src -I node_modules
src
├── index.js
├── options.js
├── parser
│ ├── base.js
│ ├── comments.js
│ ├── error-message.js
│ ├── error.js
│ ├── expression.js
│ ├── index.js
│ ├── lval.js
│ ├── node.js
│ ├── statement.js
│ └── util.js
├── plugin-utils.js
├── plugins
│ ├── estree.js
│ ├── flow.js
│ ├── jsx
│ │ ├── index.js
│ │ └── xhtml.js
│ ├── placeholders.js
│ ├── typescript
│ │ ├── index.js
│ │ └── scope.js
│ └── v8intrinsic.js
├── tokenizer
│ ├── context.js
│ ├── index.js
│ ├── state.js
│ └── types.js
├── types.js
└── util
├── class-scope.js
├── identifier.js
├── location.js
├── production-parameter.js
├── scope.js
├── scopeflags.js
└── whitespace.jsParsing functions
The parsing functions are in the parser folder. It is a recursive descent parser. Let us start with the statememt.js file.
The parsing functions usually start with the prefix parse followed by the name of the syntactic variable it parses:
➜ parser git:(learning) ✗ egrep -i '^\s*parse\w+\(' statement.js | cat -n
1 parseTopLevel(file: N.File, program: N.Program): N.File {
2 parseInterpreterDirective(): N.InterpreterDirective | null {
3 parseStatement(context: ?string, topLevel?: boolean): N.Statement {
4 parseStatementContent(context: ?string, topLevel: ?boolean): N.Statement {
5 parseDecorators(allowExport?: boolean): void {
6 parseDecorator(): N.Decorator {
7 parseMaybeDecoratorArguments(expr: N.Expression): N.Expression {
8 parseBreakContinueStatement(
9 parseDebuggerStatement(node: N.DebuggerStatement): N.DebuggerStatement {
10 parseHeaderExpression(): N.Expression {
11 parseDoStatement(node: N.DoWhileStatement): N.DoWhileStatement {
12 parseForStatement(node: N.Node): N.ForLike {
13 parseFunctionStatement(
14 parseIfStatement(node: N.IfStatement): N.IfStatement {
15 parseReturnStatement(node: N.ReturnStatement): N.ReturnStatement {
16 parseSwitchStatement(node: N.SwitchStatement): N.SwitchStatement {
17 parseThrowStatement(node: N.ThrowStatement): N.ThrowStatement {
18 parseTryStatement(node: N.TryStatement): N.TryStatement {
19 parseVarStatement(
20 parseWhileStatement(node: N.WhileStatement): N.WhileStatement {
21 parseWithStatement(node: N.WithStatement): N.WithStatement {
22 parseEmptyStatement(node: N.EmptyStatement): N.EmptyStatement {
23 parseLabeledStatement(
24 parseExpressionStatement(
25 parseBlock(
26 parseBlockBody(
27 parseBlockOrModuleBlockBody(
28 parseFor(
29 parseForIn(
30 parseVar(
31 parseVarId(decl: N.VariableDeclarator, kind: "var" | "let" | "const"): void {
32 parseFunctionId(requireId?: boolean): ?N.Identifier {
33 parseFunctionParams(node: N.Function, allowModifiers?: boolean): void {
34 parseClassBody(
35 parseClassMemberFromModifier(
36 parseClassMember(
37 parseClassMemberWithIsStatic(
38 parseClassPropertyName(member: N.ClassMember): N.Expression | N.Identifier {
39 parsePostMemberNameModifiers(
40 parseAccessModifier(): ?N.Accessibility {
41 parseClassPrivateProperty(
42 parseClassProperty(node: N.ClassProperty): N.ClassProperty {
43 parseClassId(
44 parseClassSuper(node: N.Class): void {
45 parseExport(node: N.Node): N.AnyExport {
46 parseExportDefaultExpression(): N.Expression | N.Declaration {
47 parseExportDeclaration(node: N.ExportNamedDeclaration): ?N.Declaration {
48 parseExportFrom(node: N.ExportNamedDeclaration, expect?: boolean): void {
49 parseExportSpecifiers(): Array<N.ExportSpecifier> {
50 parseImport(node: N.Node): N.AnyImport {
51 parseImportSource(): N.StringLiteral {
52 parseImportSpecifierLocal(
53 parseNamedImportSpecifiers(node: N.ImportDeclaration) {
54 parseImportSpecifier(node: N.ImportDeclaration): void {Top Level and parseBlockBody
src/parser/index.js: Parser Class
The class Parser is declared in the file src/parser/index.js.
Babel.js is written in Flow. This choice allows the Babel team to leverage Flow’s type-checking capabilities while maintaining a JavaScript codebase. However, Babel remains highly compatible with TypeScript through its plugins and is a popular tool for TypeScript users.
Here are the first lines in the file.
The // @flow initial comment indicates that the file is being type-checked by Flow, and the type annotations specify the expected types.
// @flow
import type { Options } from "../options";
import type { File } from "../types";
import type { PluginList } from "../plugin-utils";It starts by importing types:
- the
Optionstype, - the
Filetype, and - the
PluginListtype
from the
-
src/options.js, (parser options like
sourceType,strictModeortokens) -
src/types.js. It defines the types of nodes like
Nodebase,Node,Expression,Declaration,Literal,StringLiteral,Program,Comment,Token, etc. The typeFilecorresponds to the root of the AST and is defined as follows:export type File = NodeBase & { // & is the intersection type type: "File", program: Program, comments: $ReadOnlyArray<Comment>, tokens: $ReadOnlyArray<Token | Comment>, };$ReadOnlyArray is the way to say in Flow that is a read-only array. The equivalent in TypeScript is ReadOnlyArray.
-
src/plugin-utils.js (the type
Plugincan be a string or a tuple of a string and an object.PluginListis a read-only array of Plugin elements).
modules.
The file src/parser/index.js continues like this:
import { getOptions } from "../options";
import StatementParser from "./statement";
import { SCOPE_PROGRAM } from "../util/scopeflags"; // const SCOPE_PROGRAM = 0b00000001
import ScopeHandler from "../util/scope";
import ClassScopeHandler from "../util/class-scope";
import ProductionParameterHandler, { PARAM_AWAIT, PARAM, } from "../util/production-parameter";
export type PluginsMap = Map<string, { [string]: any }>;
export default class Parser extends StatementParser { ... } It imports the class StatementParser which implements the Statement parsing. Such class is defined
in the file src/parser/statement.js
The ScopeHandler is imported in the babel-parser/src/parser/index.js module from
src/util/scope
The ClassScopeHandler class is imported by the Parser class in the
babel-parser/src/parser/index.js module from src/util/class-scope
The class Parser extends StatementParser and
has the constructor, the getScopeHandler method, and the parse method.
The pluginsMap function is used to create a map of plugins from a list of plugins.
export default class Parser extends StatementParser {
constructor(options: ?Options, input: string) { ... }
getScopeHandler(): Class<ScopeHandler<*>> { return ScopeHandler; }
parse(): File { ... }
}
function pluginsMap(plugins: PluginList): PluginsMap {
const pluginMap: PluginsMap = new Map();
for (const plugin of plugins) {
const [name, options] = Array.isArray(plugin) ? plugin : [plugin, {}];
if (!pluginMap.has(name)) pluginMap.set(name, options || {});
}
return pluginMap;
}In the following sections, we will describe the Parser class and its methods.
Constructor of the Parser class
The constructor of the Parser class is defined as follows:
constructor(options: ?Options, input: string) {
options = getOptions(options);
super(options, input);
const ScopeHandler = this.getScopeHandler();
this.options = options;
this.inModule = this.options.sourceType === "module";
this.scope = new ScopeHandler(this.raise.bind(this), this.inModule);
this.prodParam = new ProductionParameterHandler();
this.classScope = new ClassScopeHandler(this.raise.bind(this));
this.plugins = pluginsMap(this.options.plugins);
this.filename = options.sourceFilename;
}The inheritance hierarchy
The class inherits from StatementParser which itself inherits from ExpressionParser which inherits from BaseParser.
export default class StatementParser extends ExpressionParser { ... }export default class ExpressionParser extends LValParser { ... }src/parser/lval.js
export default class LValParser extends NodeUtils { ... }src/parser/node.js
export class NodeUtils extends UtilParser { ... }src/parser/util.js
export default class UtilParser extends Tokenizer { ... }export default class Tokenizer extends ParserErrors {
isLookahead: boolean;
// Token store.
tokens: Array<Token | N.Comment> = [];
constructor(options: Options, input: string) {
super();
this.state = new State();
this.state.init(options);
this.input = input;
this.length = input.length;
this.isLookahead = false;
}
...
}We can see that Parser objects have a state object contain among other things information from the options. They also have properties containing the input string and the length of the input string.
src/parser/error.js
export default class ParserError extends CommentsParser { ... }src/parser/comments.js
export default class CommentsParser extends BaseParser {And at last we reach the BaseParser class:
src/parser/base.js
export default class BaseParser {
// Properties set by constructor in index.js
options: Options;
inModule: boolean;
scope: ScopeHandler<*>; // In Flow, the * symbol is used to indicate a generic type that Flow itself should infer,
classScope: ClassScopeHandler;
prodParam: ProductionParameterHandler;
plugins: PluginsMap;
filename: ?string;
sawUnambiguousESM: boolean = false;
ambiguousScriptDifferentAst: boolean = false;
// Initialized by Tokenizer
state: State;
// input and length are not in state as they are constant and we do
// not want to ever copy them, which happens if state gets cloned
input: string;
length: number;
hasPlugin(name: string): boolean {
return this.plugins.has(name);
}
getPluginOption(plugin: string, name: string) {
// $FlowIssue
if (this.hasPlugin(plugin)) return this.plugins.get(plugin)[name];
}parse method
The call this.scope.enter(SCOPE_PROGRAM) enters the program scope.
The call to this.parseTopLevel(file, program); starts the parsing at the top level.
parse(): File {
let paramFlags = PARAM;
if (this.hasPlugin("topLevelAwait") && this.inModule) {
paramFlags |= PARAM_AWAIT;
}
this.scope.enter(SCOPE_PROGRAM);
this.prodParam.enter(paramFlags);
const file = this.startNode();
const program = this.startNode();
this.nextToken();
file.errors = null;
this.parseTopLevel(file, program);
file.errors = this.state.errors;
return file;
}
}Since the Parser class extends the StatementParser class, it inherits all the methods of the StatementParser class and thus it has access to the parseTopLevel method.
The Class StatementParser and the parseTopLevel function
The class StatementParser implements the Statement parsing. It is defined
in the file src/parser/statement.js
The function parseTopLevel parses a program.
It receives a file node and a program node as parameters.
The program parameter is a N.Program node that is going to represent
the top-level structure of the program.
It will contain the interpreter directive (if any) and the body AST of the program.
The function returns a file node that contains the program node, the comments
and optionally the tokens of the program. Here is the definition of the Program node:
Program <: Node {
type: "Program";
interpreter: InterpreterDirective | null; // #!/usr/bin/env node
sourceType: "script" | "module";
body: [ Statement | ModuleDeclaration ]; // Array of expressions or statements or import or export declarations
directives: [ Directive ]; // "use strict" directives
}This is the code of the parseTopLevel function:
export default class StatementParser extends ExpressionParser {
parseTopLevel(file: N.File, program: N.Program): N.File {
program.sourceType = this.options.sourceType;
program.interpreter = this.parseInterpreterDirective();
this.parseBlockBody(program, true, true, tt.eof);
if ( // check for undefined exports if it is a module
this.inModule &&
!this.options.allowUndeclaredExports &&
this.scope.undefinedExports.size > 0
) { // raise an error if there are undefined exports
for (const [name] of Array.from(this.scope.undefinedExports)) {
const pos = this.scope.undefinedExports.get(name);
// $FlowIssue
this.raise(pos, Errors.ModuleExportUndefined, name);
}
}
file.program = this.finishNode(program, "Program");
file.comments = this.state.comments;
if (this.options.tokens) file.tokens = this.tokens;
return this.finishNode(file, "File");
}
...
} // End of class StatementParserThe assignment program.interpreter = this.parseInterpreterDirective(); parses the
InterpreterDirective /#!.*/ if any.
The call this.parseBlockBody(program, true, true, tt.eof); parses the body of the program.
- The first argument is the node to which the body will be attached.
- The first
trueargumentallowDirectivesindicates that directives are allowed in the body. Directives are special instructions or statements that are processed differently by the JavaScript engine compared to regular code, like"use strict"anduse asm. - The second
trueargument indicates that the body is top level. - The third argument
tt.eofis the token type tha signals the end of the body.
Scope Analysis
See
- Scope Analysis in this notes
- File parser/babel-scope.md in the babel learning repo
Checking for undefined exports if it is a module
In the section:
if ( // check for undefined exports if it is a module
this.inModule && !this.options.allowUndeclaredExports && this.scope.undefinedExports.size > 0
) { // raise an error if there are undefined exports
for (const [name] of Array.from(this.scope.undefinedExports)) {
const pos = this.scope.undefinedExports.get(name);
// $FlowIssue
this.raise(pos, Errors.ModuleExportUndefined, name);
}
}We check if we are in a module and if there are undefined exports.
This ensures that exports are always defined before exporting them.
This is required according to the spec here: https://www.ecma-international.org/ecma-262/9.0/index.html#sec-module-semantics-static-semantics-early-errors. See pull request 9589.
The [name] part in the expression for (const [name] of Array.from(this.scope.undefinedExports)) uses array destructuring to extract the first element of each iterable element in this.scope.undefinedExports.
finishNode and finishNodeAt
The finishNode function is responsible for finishing the construction of an AST node, assigning it final properties before it is considered complete. The function accepts a generic node T, which must be a node type (NodeType), along with a type that indicates the type of node to end,
finishNode<T: NodeType>(node: T, type: string): T {
return this.finishNodeAt(node, type, this.state.lastTokEnd, this.state.lastTokEndLoc);
}The helper function finishNodeAt is responsible for the actual finishing. The function accepts a generic node T, which must be a node type (NodeType), along with
- a
typethat indicates the type of node to end, - a
posposition that represents the end of the node in the source code, and - a
locobject which contains location information, specifically the end of the node location.
The first step within the function is a safety check for production that raises an error if an attempt is made to terminate a node that has already been terminated previously, which is indicated by node.end > 0.
finishNodeAt<T: NodeType>(node: T, type: string, pos: number, loc: Position, ): T {
if (process.env.NODE_ENV !== "production" && node.end > 0) {
throw new Error(
"Do not call finishNode*() twice on the same node." +
" Instead use resetEndLocation() or change type directly.",
);
}
node.type = type;
node.end = pos;
node.loc.end = loc;
if (this.options.ranges) node.range[1] = pos;
this.processComment(node);
return node;
}parseBlockBody
Here is the parseBlockBody function that is called by parseTopLevel:
parseBlockBody(
node: N.BlockStatementLike,
allowDirectives: ?boolean, topLevel: boolean, end: TokenType,
afterBlockParse?: (hasStrictModeDirective: boolean) => void,
): void {
const body = (node.body = []);
const directives = (node.directives = []);
this.parseBlockOrModuleBlockBody(
body,
allowDirectives ? directives : undefined,
topLevel,
end,
afterBlockParse,
);
}parseIfStatement
The structure of all the parseSomething functions is similar. They start by calling this.next() to move to the next token, then they
continue following the grammar rules using the token if needed. Finally, they call this.finishNode to create the AST node.
Here it the case of the parseIfStatement function that follows the IfStatement grammar rule:
parseIfStatement(node: N.IfStatement): N.IfStatement {
this.next(); // eat `if`
node.test = this.parseHeaderExpression(); // parse the test expression
node.consequent = this.parseStatement("if"); // parse the consequent statement
node.alternate = this.eat(tt._else) ? this.parseStatement("if") : null; // eat `else` and parse the alternate statement if any
return this.finishNode(node, "IfStatement");
}Methods consuming tokens
We can see the difference between this.eat(tt._else) and this.next(). The former consumes the token if it is an else token, while the latter just moves to the next token without consuming it. There is also this.expect(tt._else) that raises an error if the next token is not an else token and consumes it if it is:
expect(type: TokenType, pos?: ?number): void {
this.eat(type) || this.unexpected(pos, type);
}Another method is this.match that returns true if the next token is of the given type without consuming it:
match(type: TokenType): boolean {
return this.state.type === type;
}tc39
Ecma International’s TC39 is a group of JavaScript developers, implementers, and academics collaborating with the community to maintain and evolve the definition of JavaScript.
See tc39.md
You can see examples of how Babel is used to study new syntactic and semantic proposals in sections
Tokenizer
See
- section Tokenizer in this notes
- section /doc/parser/tokenizer.md in the babel-learning repo.
References
See
- section References in this notes
- section References in the babel-learning repo.