Basics
TokenRewriteStream is very powerful and its usage is very regular, a grammar uses a single TokenRewriteStream object (saytokens) and grammar actions can call 4 methods on tokens:* delete deletes one token or a set of tokens
* replace replaces one token or a set of tokens with a string
* insertAfter inserts a string after a token
* insertBefore inserts a string before a token
For instance, one can write:
programHeading
: PROGRAM identifier LPAREN identifierList RPAREN SEMI
{
programName = $identifier.text;
tokens.replace($PROGRAM,"class");// replaces PROGRAM by "class"
tokens.delete($LPAREN,$RPAREN);// deletes everything between LPAREN and RPAREN (incl. identifierList, which can be long)
}
Or:
ifStatement
: i=IF expression t=THEN s=statement
{
tokens.insertBefore($expression.start," (");// refers to start token of expression
tokens.insertAfter($expression.stop,") ");;// refers to stop token of expression
tokens.insertAfter($t,"\n");//refers to t, which is equivalent to THEN (t=THEN)
}
The parameters of those methods are generally references to the rule elements (or their shortcuts). E.g. let's consider a rule is ifStatement : i=IF expression:* $IF, $i refers to a
Token object.* $expression.start refers to the first token object of the expression
* $expression.stop refers to the last token object of the expression
* $expression.text refers to the text of the expression (a String)
* $expression.tree refers to the Tree of the expression, (typed by Object, must be cast, e.g.
(Tree) $t.tree)Other examples using
text and tree:
assignmentStatement
: variable ASSIGN expression
{
if ($variable.text.equals("foo")) { .... }
}
parameterGroup
: identifierList COLON t=typeIdentifier
{
System.err.println("Warning: "+$t.text+", line "+((Tree)$t.tree).getLine()+");
}
Main
The main of a source-to-source translator using TokenRewriteStream looks like this:
public class Translator {
public static void main(String[] args) throws Exception {
PascalLexer lexer = new PascalLexer(new ANTLRFileStream(args[0]));
MyTokenRewriteStream tokens = new MyTokenRewriteStream(lexer);
PascalParser parser = new PascalParser(tokens);
parser.tokens = tokens;
parser.program();
System.out.print(tokens.toChangedString()); // emit after changes
}
}
Performance considerations
TokenRewriteStream is pretty intelligent and applies the rewrites only when toString() is called. However, when one uses.text (for instance in $variable.text), ANTLR generates code that actually calls toString() on the TokenRewriteStream object, and ..text returns the rewritten String.
As a result, if parsing triggers a lot of .text and rewrite, it becomes extremely slow.
A workaround can be that
..text returns the original String. This can be achieved with using the following class:
public class EfficientTokenRewriteStream extends TokenRewriteStream {
public EfficientTokenRewriteStream(PascalLexer lexer) {
super(lexer);
}
@Override
public String toString(int start, int end) {
return toOriginalString(start, end);// does not rewrite
}
public String toChangedString(int a, int b) {
fill();
return super.toString(a, b);
}
public String toChangedString() {
return toChangedString(MIN_TOKEN_INDEX, size()-1);
}
}