LR Parsing
Breaking Ambiguity in Jison/Eyapp/Yacc/Bison et al. using token precedence
This is a simplified version of the rules to resolve conflicts and ambiguities in a Yacc-like parser generator:
Precedence Rules
- La precedencia de los tokens se hace en la cabecera del programa Jison; esto es: antes del primer
%%
usando la sintáxis%left token ...
,%right token ...
o%nonassoc token ...
- La precedencia de una regla de producción es la precedencia del último token que aparece en la parte derecha de la regla.
- Por ejemplo la precedencia de será la precedencia que le demos al token en la cabecera del programa Jison.
- Si no existen tokens en la parte derecha con precedencia asignada o bien si la precedencia por defecto no es la deseada, se puede usar la directiva
%prec
para asignar la precedencia deseada a la regla. Por ejemplo:asigna la precedencia del tokenexp: '-' exp %prec 'STRONG'
STRONG
a la reglaexp: '-' exp
- Cuando el parser detecta un conflicto y ve que hay dos posibles vias de continuar la construcción del árbol: Una que indica que quizá se aplicó la regla y otra que indica que quizá se pueda seguir leyendo el token a la entrada,
- El parser compara las precedencias del token y de la regla y se queda con el de mas prioridad.
- Si es el token quien tiene mayor prioridad avanzará en la lectura desplazando el token y buscando nuevos símbolos (se dice que hace un shift; en este caso el AST se “hundirá” a derechas) y
- Si es la regla completará el subárbol parcial y continuará en su construcción del árbol (se dice que hace un reduce y en este caso el árbol construido estará más hundido a izquierdas)
- Los tokens declarados en la misma línea mediante una declaración
%left
o%right
tienen igual precedencia e igual asociatividad.- La directiva
%nonassoc t r
indica que el parser debe producir un error cuando el tokent
aparece como lookahead con una regla de prioridadr
.
- La directiva
- La precedencia es mayor cuanto mas abajo su posición en el texto
Jison
%nonassoc
La directiva %nonassoc
indica que el parser debe producir un error cuando el token aparece como lookahead con una regla de su misma prioridad.
Por ejemplo, si tenemos el siguiente programa jison:
%{
const { builders: b } = require('ast-types');
}
%}
%nonassoc ','
%%
es: e EOF { return b.program([b.expressionStatement($e)]); }
;
e:
e ',' e { $$ = b.sequenceExpression([$e1, $e2]) }
| N { let numNode = b.literal($N); numNode.loc = @1; $$ = b.callExpression(b.identifier('Complex'),[numNode]); $$.loc = @1; }
;
Una entrada como 2,3,4
causa error:
➜ solution24250311 git:(main) ✗ cat examples/julio.calc
2,3,4%
➜ solution24250311 git:(main) ✗ bin/calc2js.mjs examples/julio.calc -c
Syntax error:
Parse error on line 1:
2,3,4
---^
Expecting 'EOF', got ','
Undocumented features about Jison lexer
The actions inside the lexer are called as anonymous functions. For instance for a lexer lexer.l
as the following:
%{
const util = require('util');
const deb = (x) => console.log(util.inspect(x, {depth: null}));
%}
number \d+([.]\d+)?([eE][-+]?\d+)?[i]?|"i"
%%
\s+ ;
{number} return 'N';
'**' return '**';
[-+*/!()] return yytext;
<<EOF>> return 'EOF';
.+ {
throw new Error('Unexpected character ' + yytext+
`- ${yy_.yy.input} - ${deb(yy_)}`);
}
The lexer generated by Jison will have the following structure:
/* generated by jison-lex 0.3.4 */
var lexer = (function(){
var lexer = ({
EOF:1,
parseError:function parseError(str, hash) { ... },
// resets the lexer, sets new input
setInput:function (input, yy) { ... },
// consumes and returns one char from the input
input:function () {... },
// unshifts one char (or a string) into the input
unput:function (ch) { ... },
// When called from action, caches matched text and appends it on next action
more:function () { ... },
// When called from action, signals the lexer that this rule fails to match the input, so the next matching rule (regex) should be tested instead.
reject:function () { ... },
// retain first n characters of the match
less:function (n) { ... },
// displays already matched input, i.e. for error messages
pastInput:function () { ... },
// displays upcoming input, i.e. for error messages
upcomingInput:function () {... },
// displays the character position where the lexing error occurred, i.e. for error messages
showPosition:function () { ... },
// test the lexed token: return FALSE when not a match, otherwise return token
test_match:function(match, indexed_rule) { ... },
// return next match in input
next:function () { ... },
// return next match that has a token
lex:function lex () {
var r = this.next();
if (r) {
return r;
} else {
return this.lex();
}
},
// activates a new lexer condition state (pushes the new lexer condition state onto the condition stack)
begin:function begin (condition) {
this.conditionStack.push(condition);
},
// pop the previously active lexer condition state off the condition stack
popState:function popState () { ...},
// produce the lexer rule set which is active for the currently active lexer condition state
_currentRules:function _currentRules () { ... },
// return the currently active lexer condition state; when an index argument is provided it produces the N-th previous condition state, if available
topState:function topState (n) { ... },
// alias for begin(condition)
pushState:function pushState (condition) { ... },
// return the number of states currently on the stack
stateStackSize:function stateStackSize() { ... },
options: {},
performAction: function anonymous(yy,yy_,$avoiding_name_collisions,YY_START) {
const util = require('util');
const deb = (x) => console.log(util.inspect(x, {depth: null}));
var YYSTATE=YY_START;
switch($avoiding_name_collisions) {
case 0:;
break;
case 1:return 14;
break;
case 2:return 12;
break;
case 3:return yy_.yytext;
break;
case 4:return 5;
break;
case 5:
throw new Error('Unexpected character ' + yy_.yytext+
`- ${yy_.yy.input} - ${deb(yy_)}`);
break;
}
},
rules: [/^(?:\s+)/,/^(?:(\d+([.]\d+)?([eE][-+]?\d+)?[i]?|i\b))/,/^(?:\*\*)/,/^(?:[-+*/!()])/,/^(?:$)/,/^(?:.+)/],
conditions: {"INITIAL":{"rules":[0,1,2,3,4,5],"inclusive":true}}
});
return lexer;
})();
parser.lexer = lexer;
function Parser () {
this.yy = {};
}
The token numbers are defined somewhere in the generated parser. For this example we have:
var parser = {
...
symbols_: { "error": 2, "es": 3, "e": 4, "EOF": 5, "-": 6, "+": 7, "*": 8, "/": 9, "(": 10, ")": 11, "**": 12, "!": 13, "N": 14, "$accept": 0, "$end": 1 },
terminals_: { 2: "error", 5: "EOF", 6: "-", 7: "+", 8: "*", 9: "/", 10: "(", 11: ")", 12: "**", 13: "!", 14: "N" },
productions_: [0, [3, 2], [4, 3], [4, 3], [4, 3], [4, 3], [4, 3], [4, 3], [4, 2], [4, 1], [4, 2]],
...
We can see that the performAction
function is called with yy
and an object yy_
.
The object yy_
contains the following properties:
{
yy: {
input: '4.3XXXX',
offsets: [ 0 ],
lexer: [Circular *1],
parser: {
yy: { input: '4.3XXXX', offsets: [ 0 ] },
parseError: [Function: parseError]
}
},
_input: '',
done: false,
_backtrack: false,
_more: false,
yyleng: 4,
yylineno: 0,
match: 'XXXX',
matched: '4.3XXXX',
yytext: 'XXXX',
conditionStack: [ 'INITIAL' ],
yylloc: { first_line: 1, last_line: 1, first_column: 3, last_column: 7 },
offset: 0,
matches: [ 'XXXX', index: 0, input: 'XXXX', groups: undefined ]
}
Thus we can access to the current offest via yy.lexer.offset
of via yy_.offset
.
However , if you want to access the match position via yy_.yylloc
you will get an error because
the prefix yy
is used by Jison to manipulate and change the user’s code and the expression yy_.yylloc
is transformed into yy_.yy.yylloc
which is not defined. Instead use directly yylloc
,
yylineno
, yytext
, etc. in your lexer action.
References
- Jison Documentation
- Jison Debugger ¡No te lo pierdas!
- Mi primer proyecto utilizando Jison por Erick Navarro
- Repo ULL-ESIT-PL-1718/jison-aSb
- Repo ULL-ESIT-PL-1718/ull-etsii-grado-pl-jisoncalc
- Folder jison/examples from the Jison distribution
Introducción al Análisis LR
- Análisis Sintáctico Ascendente en JavaScript
- Precedencia y Asociatividad
- Construcción de las Tablas para el Análisis SLR
- Algoritmo de Análisis LR (yacc/bison/jison)
- Learning to Manage Conflicts
- Conflicto ds ;ss
- Parse::Eyapp Debugging Tutorial
- Lexical Tie ins: a flag which is set by the parser actions, whose purpose is to alter the way tokens are parsed
Parse::Eyapp
Parse::Eyapp is an extension of yacc for Perl, the standard Unix compiler-compiler. It provides a compiler which lets you compile yacc -like grammars into a Perl LALR(1) Object Oriented parser. It automates the building of the abstract syntax trees, provides an API for tree transformation and has being used inside the Perl community in many projects (see for instance the RPerl compiler developed by AutoParallel Technologies at http://rperl.org/).
Casiano Rodriguez-Leon created the compiler and maintained during the years 2006 to 2017. From 2017 up to now is maintained by William N. Braswell, Jr.
See https://metacpan.org/dist/Parse-Eyapp/view/eyapp, https://cpan.metacpan.org/authors/id/C/CA/CASIANO/, https://metacpan.org/author/CASIANO/releases
LR Parses for C languages
- ANSI C y C++ Grammars
- Old examples of tiny C languages in Eyapp. PL de la Antigua Ingeniería Informática. Eyapp is/was a LALR parser generator written by Casiano. The repo contains a compiler for a subset of C.
History of Parsing
- Parsing: a timeline. by Jeffrey Kegler
- Parse-Eyapp un Parser LR para Perl