Basics
TokenRewriteStream is very powerful and its usage is very regular, a grammar uses a single TokenRewriteStream object (saytokens
) and grammar actions can call 4 methods on tokens
:* delete deletes one token or a set of tokens
* replace replaces one token or a set of tokens with a string
* insertAfter inserts a string after a token
* insertBefore inserts a string before a token
For instance, one can write:
programHeading : PROGRAM identifier LPAREN identifierList RPAREN SEMI { programName = $identifier.text; tokens.replace($PROGRAM,"class");// replaces PROGRAM by "class" tokens.delete($LPAREN,$RPAREN);// deletes everything between LPAREN and RPAREN (incl. identifierList, which can be long) }Or:
ifStatement : i=IF expression t=THEN s=statement { tokens.insertBefore($expression.start," (");// refers to start token of expression tokens.insertAfter($expression.stop,") ");;// refers to stop token of expression tokens.insertAfter($t,"\n");//refers to t, which is equivalent to THEN (t=THEN) }The parameters of those methods are generally references to the rule elements (or their shortcuts). E.g. let's consider a rule is
ifStatement : i=IF expression
:* $IF, $i refers to a
Token
object.* $expression.start refers to the first token object of the expression
* $expression.stop refers to the last token object of the expression
* $expression.text refers to the text of the expression (a String)
* $expression.tree refers to the Tree of the expression, (typed by Object, must be cast, e.g.
(Tree) $t.tree
)Other examples using
text
and tree
:assignmentStatement : variable ASSIGN expression { if ($variable.text.equals("foo")) { .... } }
parameterGroup : identifierList COLON t=typeIdentifier { System.err.println("Warning: "+$t.text+", line "+((Tree)$t.tree).getLine()+"); }
Main
The main of a source-to-source translator using TokenRewriteStream looks like this:
public class Translator { public static void main(String[] args) throws Exception { PascalLexer lexer = new PascalLexer(new ANTLRFileStream(args[0])); MyTokenRewriteStream tokens = new MyTokenRewriteStream(lexer); PascalParser parser = new PascalParser(tokens); parser.tokens = tokens; parser.program(); System.out.print(tokens.toChangedString()); // emit after changes } }
Performance considerations
TokenRewriteStream is pretty intelligent and applies the rewrites only when toString() is called. However, when one uses.text
(for instance in $variable.text
), ANTLR generates code that actually calls toString() on the TokenRewriteStream object, and ..text
returns the rewritten String.
As a result, if parsing triggers a lot of .text
and rewrite, it becomes extremely slow.
A workaround can be that
..text
returns the original String. This can be achieved with using the following class:public class EfficientTokenRewriteStream extends TokenRewriteStream { public EfficientTokenRewriteStream(PascalLexer lexer) { super(lexer); } @Override public String toString(int start, int end) { return toOriginalString(start, end);// does not rewrite } public String toChangedString(int a, int b) { fill(); return super.toString(a, b); } public String toChangedString() { return toChangedString(MIN_TOKEN_INDEX, size()-1); } }