ANTLR: Heterogeneous AST and imaginary tokens - antlr

it's my first question here :)
I'd like to build an heterogeneous AST with ANTLR for a simple grammar. There are different Interfaces to represent the AST nodes, e. g. IInfiExp, IVariableDecl. ANTLR comes up with CommonTree to hold all the information of the source code (line number, character position etc.) and I want to use this as a base for the implementations of the AST interfacese IInfixExp ...
In order to get an AST as output with CommonTree as node types, I set:
options {
language = Java;
k = 1;
output = AST;
ASTLabelType = CommonTree;
}
The IInifxExp is:
package toylanguage;
public interface IInfixExp extends IExpression {
public enum Operator {
PLUS, MINUS, TIMES, DIVIDE;
}
public Operator getOperator();
public IExpression getLeftHandSide();
public IExpression getRightHandSide();
}
and the implementation InfixExp is:
package toylanguage;
import org.antlr.runtime.Token;
import org.antlr.runtime.tree.CommonTree;
// IInitializable has only void initialize()
public class InfixExp extends CommonTree implements IInfixExp, IInitializable {
private Operator operator;
private IExpression leftHandSide;
private IExpression rightHandSide;
InfixExp(Token token) {
super(token);
}
#Override
public Operator getOperator() {
return operator;
}
#Override
public IExpression getLeftHandSide() {
return leftHandSide;
}
#Override
public IExpression getRightHandSide() {
return rightHandSide;
}
// from IInitializable. get called from ToyTreeAdaptor.rulePostProcessing
#Override
public void initialize() {
// term ((PLUS|MINUS) term)+
// atom ((TIMES|DIIDE) atom)+
// exact 2 children
assert getChildCount() == 2;
// left and right child are IExpressions
assert getChild(0) instanceof IExpression
&& getChild(1) instanceof IExpression;
// operator
switch (token.getType()) {
case ToyLanguageParser.PLUS:
operator = Operator.PLUS;
break;
case ToyLanguageParser.MINUS:
operator = Operator.MINUS;
break;
case ToyLanguageParser.TIMES:
operator = Operator.TIMES;
break;
case ToyLanguageParser.DIVIDE:
operator = Operator.DIVIDE;
break;
default:
assert false;
}
// left and right operands
leftHandSide = (IExpression) getChild(0);
rightHandSide = (IExpression) getChild(1);
}
}
The corresponding rules are:
exp // e.g. a+b
: term ((PLUS<InfixExp>^|MINUS<InfixExp>^) term)*
;
term // e.g. a*b
: atom ((TIMES<InfixExp>^|DIVIDE<InfixExp>^) atom)*
;
This works fine, becouse PLUS, MINUS etc. are "real" tokens.
But now comes to the imaginary token:
tokens {
PROGRAM;
}
The corresponding rule is:
program // e.g. var a, b; a + b
: varDecl* exp
-> ^(PROGRAM<Program> varDecl* exp)
;
With this, ANTLR doesn't create a tree with PROGRAM as root node.
In the parser, the following code creates the Program instance:
root_1 = (CommonTree)adaptor.becomeRoot(new Program(PROGRAM), root_1);
Unlike InfixExp not the Program(Token) constructor but Program(int) is invoked.
Program is:
package toylanguage;
import java.util.Collections;
import java.util.LinkedList;
import java.util.List;
import org.antlr.runtime.Token;
import org.antlr.runtime.tree.CommonTree;
class Program extends CommonTree implements IProgram, IInitializable {
private final LinkedList<IVariableDecl> variableDeclarations = new LinkedList<IVariableDecl>();
private IExpression expression = null;
Program(Token token) {
super(token);
}
public Program(int tokeType) {
// What to do?
super();
}
#Override
public List<IVariableDecl> getVariableDeclarations() {
// don't allow to change the list
return Collections.unmodifiableList(variableDeclarations);
}
#Override
public IExpression getExpression() {
return expression;
}
#Override
public void initialize() {
// program: varDecl* exp;
// at least one child
assert getChildCount() > 0;
// the last one is a IExpression
assert getChild(getChildCount() - 1) instanceof IExpression;
// iterate over varDecl*
int i = 0;
while (getChild(i) instanceof IVariableDecl) {
variableDeclarations.add((IVariableDecl) getChild(i));
i++;
}
// exp
expression = (IExpression) getChild(i);
}
}
you can see the constructor:
public Program(int tokeType) {
// What to do?
super();
}
as a result of it, with super() a CommonTree ist build without a token. So CommonTreeAdaptor.rulePostProcessing see a flat list, not a tree with a Token as root.
My TreeAdaptor looks like:
package toylanguage;
import org.antlr.runtime.tree.CommonTreeAdaptor;
public class ToyTreeAdaptor extends CommonTreeAdaptor {
public Object rulePostProcessing(Object root) {
Object result = super.rulePostProcessing(root);
// check if needs initialising
if (result instanceof IInitializable) {
IInitializable initializable = (IInitializable) result;
initializable.initialize();
}
return result;
};
}
And to test anything I use:
package toylanguage;
import org.antlr.runtime.ANTLRStringStream;
import org.antlr.runtime.CommonTokenStream;
import org.antlr.runtime.RecognitionException;
import org.antlr.runtime.TokenStream;
import org.antlr.runtime.tree.CommonTree;
import toylanguage.ToyLanguageParser.program_return;
public class Processor {
public static void main(String[] args) {
String input = "var a, b; a + b + 123"; // sample input
ANTLRStringStream stream = new ANTLRStringStream(input);
ToyLanguageLexer lexer = new ToyLanguageLexer(stream);
TokenStream tokens = new CommonTokenStream(lexer);
ToyLanguageParser parser = new ToyLanguageParser(tokens);
ToyTreeAdaptor treeAdaptor = new ToyTreeAdaptor();
parser.setTreeAdaptor(treeAdaptor);
try {
// test with: var a, b; a + b
program_return program = parser.program();
CommonTree root = program.tree;
// prints 'a b (+ a b)'
System.out.println(root.toStringTree());
// get (+ a b), the third child of root
CommonTree third = (CommonTree) root.getChild(2);
// prints '(+ a b)'
System.out.println(third.toStringTree());
// prints 'true'
System.out.println(third instanceof IInfixExp);
// prints 'false'
System.out.println(root instanceof IProgram);
} catch (RecognitionException e) {
e.printStackTrace();
}
}
}
For completeness, here is the full grammar:
grammar ToyLanguage;
options {
language = Java;
k = 1;
output = AST;
ASTLabelType = CommonTree;
}
tokens {
PROGRAM;
}
#header {
package toylanguage;
}
#lexer::header {
package toylanguage;
}
program // e.g. var a, b; a + b
: varDecl* exp
-> ^(PROGRAM<Program> varDecl* exp)
;
varDecl // e.g. var a, b;
: 'var'! ID<VariableDecl> (','! ID<VariableDecl>)* ';'!
;
exp // e.g. a+b
: term ((PLUS<InfixExp>^|MINUS<InfixExp>^) term)*
;
term // e.g. a*b
: atom ((TIMES<InfixExp>^|DIVIDE<InfixExp>^) atom)*
;
atom
: INT<IntegerLiteralExp> // e.g. 123
| ID<VariableExp> // e.g. a
| '(' exp ')' -> exp // e.g. (a+b)
;
INT : ('0'..'9')+ ;
ID : ('a'..'z')+ ;
PLUS : '+' ;
MINUS : '-' ;
TIMES : '*' ;
DIVIDE : '/' ;
WS : ('\t' | '\n' | '\r' | ' ')+ { $channel = HIDDEN; } ;
OK, the final question is how to get from
program // e.g. var a, b; a + b
: varDecl* exp
-> ^(PROGRAM<Program> varDecl* exp)
;
a tree with PROGRAM as root
^(PROGRAM varDecl* exp)
and not a flat list with
(varDecl* exp) ?
(Sorry for this numerous code fragments)
Ciao Vertex

Try creating the following constructor:
public Program(int tokenType) {
super(new CommonToken(tokenType, "PROGRAM"));
}

Related

ANTLR: help needed with custom interpreter

I have some problems with "gathering" the data. It seems my code knows how to visit each tree and calculate values correctly but something is missing as the result is always null. Help is appreciated.
.g4 file:
grammar MyParser;
ID: [a-z]+ ;
Operator_I: '&';
Operator_U: '|';
EQ: '=';
expr: '(' expr ')' #wParExpr
| expr op=(Operator_I | Operator_U) expr #wExpr
| ID wID
;
corr: 'C=' (expr)* #Result;
.java code
import org.antlr.v4.runtime.misc.NotNull;
import org.antlr.v4.runtime.tree.TerminalNode;
import java.util.*;
public class EvalVisitor extends MyParserBaseVisitor<Value> {
#Override
public Value visitWParExpr(MyParserParser.WParExprContext ctx) {
// TODO Auto-generated method stub
System.out.println("visitWParExpr:"+ctx.getText());
return visitChildren(ctx);
}
#Override
public Value visitWID(MyParserParser.WIDContext ctx) {
// TODO Auto-generated method stub
System.out.println("id:"+ctx.getText());
String id=ctx.getText();
Value v = getMemory().get(id);
l("ret id:"+v.value);
return v;
}
#Override
public Value visitWExpr(MyParserParser.WExprContext ctx) {
String left=ctx.expr(0).getText();
String right=ctx.expr(1).getText();
if (ctx.getText().contains("(")) {
return visitChildren(ctx);
}
l("visitWExpr, left="+left+",right:"+right);
String op=ctx.op.getText();
Value lefti = visit(ctx.expr(0));
Value righti = visit(ctx.expr(1));;
Value v= null;
if (op.equals("&")) {
v=new Value(lefti.asDouble()+righti.asDouble());
} else if (op.equals("|")) {
v=new Value(lefti.asDouble()+righti.asDouble());
}
l("return:"+v.asString());
return v;
}
public void l(String log) {
System.out.println(log);
}
#Override public Value visitResult(MyParserParser.ResultContext ctx) {
String s1 = ctx.getText();
String s2 = ctx.expr(0).getText();
l("VisitResult:"+s1+", expr="+s2);
Value v = visit(ctx.expr(0));
l("VisitResult end:"+v);
memory.put("result",v);
return v;
}
private Map<String, Value> memory = new HashMap<String, Value>();
public EvalVisitor() {
getMemory().put("a", new Value(new Double(5)));
getMemory().put("b", new Value(new Double(9)));
getMemory().put("c", new Value(new Double(12)));
}
public Map<String, Value> getMemory() {
return memory;
}
public void setValues(Map<String, Value> values) {
this.memory = values;
}
}
public static void main(String[] args) throws Exception {
MyParserLexer lexer = new MyParserLexer(new ANTLRFileStream("c:\\test.mu"));
MyParserParser parser = new MyParserParser(new CommonTokenStream(lexer));
ParseTree tree = parser.corr();
EvalVisitor visitor = new EvalVisitor();
visitor.visit(tree);
System.out.println(visitor.getMemory());
}
output:
VisitResult:C=(a&b)&(b&c)|(c&c), expr=(a&b)&(b&c)|(c&c)
visitWParExpr:(a&b)
visitWExpr, left=a,right:b
id:a
ret id:5.0
id:b
ret id:9.0
return:14.0
visitWParExpr:(b&c)
visitWExpr, left=b,right:c
id:b
ret id:9.0
id:c
ret id:12.0
return:21.0
visitWParExpr:(c&c)
visitWExpr, left=c,right:c
id:c
ret id:12.0
id:c
ret id:12.0
return:24.0
VisitResult end:null
{result=null, a=5.0, b=9.0, c=12.0}
EDIT
I think I got it working. The reason was that I did not do evaluations in wExpr context. I didn't understand that I need to evaluate subtrees (if context contains "(") there unless I have only id available which resolves down to double value.
Can this same be achieved in more efficient manner or implement the same functionality some other way ? For example is it possible to do this if I had only one expr() function that gets all the expressions ?
Here's revised code if someone is interested:
import org.antlr.v4.runtime.misc.NotNull;
import org.antlr.v4.runtime.tree.TerminalNode;
import java.util.*;
public class EvalVisitor extends MyParserBaseVisitor<Value> {
#Override
public Value visitWParExpr(MyParserParser.WParExprContext ctx) {
// TODO Auto-generated method stub
System.out.println("visitWParExpr:"+ctx.getText()+", expr:"+ctx.expr().getText());
String expr= ctx.expr().getText();
Value v = visit(ctx.expr());
l("WParExpr:"+v);
memory.put(expr, v);
return v;
}
#Override
public Value visitWID(MyParserParser.WIDContext ctx) {
// TODO Auto-generated method stub
System.out.println("VisitWID:"+ctx.getText());
String id=ctx.getText();
Value v = getMemory().get(id);
l("ret id:"+v.value);
return v;
}
#Override
public Value visitWExpr(MyParserParser.WExprContext ctx) {
String left=ctx.expr(0).getText();
String right=ctx.expr(1).getText();
l("visitWExpr:"+ctx.getText()+", left="+left+",right:"+right);
Value v=null;
Value lv=null;
Value rv=null;
if (ctx.getText().contains("(")) {
lv = visit(ctx.expr(0));
rv = visit(ctx.expr(1));
} else {
lv = visit(ctx.expr(0));
rv = visit(ctx.expr(1));;
}
String op=ctx.op.getText();
l("lv="+lv+",rv="+rv+",op="+op);
if (op.equals("&")) {
v=new Value(lv.asDouble()+rv.asDouble());
} else if (op.equals("|")) {
v=new Value(lv.asDouble()*rv.asDouble());
}
l("return:"+v.asString());
return v;
}
public void l(String log) {
System.out.println(log);
}
#Override public Value visitResult(MyParserParser.ResultContext ctx) {
String s1 = ctx.getText();
String s2 = ctx.expr().getText();
l("VisitResult:"+s1+", expr="+s2);
Value v = visit(ctx.expr());
l("VisitResult end:"+v);
memory.put("result",v);
return v;
}
private Map<String, Value> memory = new HashMap<String, Value>();
public EvalVisitor() {
getMemory().put("a", new Value(new Double(5)));
getMemory().put("b", new Value(new Double(9)));
getMemory().put("c", new Value(new Double(12)));
}
public Map<String, Value> getMemory() {
return memory;
}
public void setValues(Map<String, Value> values) {
this.memory = values;
}
}
Sample output for: string:
C=((a&b)&(b&c))|(c&c)|((b&a)&(c&b))|(c|c)
VisitResult:C=((a&b)&(b&c))|(c&c)|((b&a)&(c&b))|(c|c), expr=((a&b)&(b&c))|(c&c)|((b&a)&(c&b))|(c|c)
visitWExpr:((a&b)&(b&c))|(c&c)|((b&a)&(c&b))|(c|c), left=((a&b)&(b&c))|(c&c)|((b&a)&(c&b)),right:(c|c)
visitWExpr:((a&b)&(b&c))|(c&c)|((b&a)&(c&b)), left=((a&b)&(b&c))|(c&c),right:((b&a)&(c&b))
visitWExpr:((a&b)&(b&c))|(c&c), left=((a&b)&(b&c)),right:(c&c)
visitWParExpr:((a&b)&(b&c)), expr:(a&b)&(b&c)
visitWExpr:(a&b)&(b&c), left=(a&b),right:(b&c)
visitWParExpr:(a&b), expr:a&b
visitWExpr:a&b, left=a,right:b
VisitWID:a
ret id:5.0
VisitWID:b
ret id:9.0
lv=5.0,rv=9.0,op=&
return:14.0
WParExpr:14.0
visitWParExpr:(b&c), expr:b&c
visitWExpr:b&c, left=b,right:c
VisitWID:b
ret id:9.0
VisitWID:c
ret id:12.0
lv=9.0,rv=12.0,op=&
return:21.0
WParExpr:21.0
lv=14.0,rv=21.0,op=&
return:35.0
WParExpr:35.0
visitWParExpr:(c&c), expr:c&c
visitWExpr:c&c, left=c,right:c
VisitWID:c
ret id:12.0
VisitWID:c
ret id:12.0
lv=12.0,rv=12.0,op=&
return:24.0
WParExpr:24.0
lv=35.0,rv=24.0,op=|
return:840.0
visitWParExpr:((b&a)&(c&b)), expr:(b&a)&(c&b)
visitWExpr:(b&a)&(c&b), left=(b&a),right:(c&b)
visitWParExpr:(b&a), expr:b&a
visitWExpr:b&a, left=b,right:a
VisitWID:b
ret id:9.0
VisitWID:a
ret id:5.0
lv=9.0,rv=5.0,op=&
return:14.0
WParExpr:14.0
visitWParExpr:(c&b), expr:c&b
visitWExpr:c&b, left=c,right:b
VisitWID:c
ret id:12.0
VisitWID:b
ret id:9.0
lv=12.0,rv=9.0,op=&
return:21.0
WParExpr:21.0
lv=14.0,rv=21.0,op=&
return:35.0
WParExpr:35.0
lv=840.0,rv=35.0,op=|
return:29400.0
visitWParExpr:(c|c), expr:c|c
visitWExpr:c|c, left=c,right:c
VisitWID:c
ret id:12.0
VisitWID:c
ret id:12.0
lv=12.0,rv=12.0,op=|
return:144.0
WParExpr:144.0
lv=29400.0,rv=144.0,op=|
return:4233600.0
VisitResult end:4233600.0
{result=4233600.0, a=5.0, c&c=24.0, b=9.0, c=12.0, (a&b)&(b&c)=35.0, c|c=144.0, a&b=14.0, b&a=14.0, (b&a)&(c&b)=35.0, b&c=21.0, c&b=21.0}

hive querying records for a specific uniontype

I have a sample hive table created as
CREATE TABLE union_test(foo UNIONTYPE<int, double, array<string>, struct<a:int,b:string>>);
The data can be viewed as
SELECT foo FROM union_test;
The output is
{0:1}
{1:2.0}
{2:["three","four"]}
{3:{"a":5,"b":"five"}}
{2:["six","seven"]}
{3:{"a":8,"b":"eight"}}
{0:9}
{1:10.0}
the first field (tag) denotes the type of the union ( 0 for int, 1 for double, 2 for array etc).
My problem is if I found to select only those records where the union type is 2 (array), how should I frame my query?
There is no function in Hive to read data from UnionType. So i wrote 2 UDF´s. One to get Union tag (that you trying to do) and second to get struct from union as an example.
get_union_tag() function:
package HiveUDF;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.UnionObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.primitive.PrimitiveObjectInspectorFactory;
#Description(name = "get_union_tag", value = "_FUNC_(unionObject)"
+ " - Returns union object Tag", extended = "Example:\n" + " > SELECT _FUNC_(unionObject) FROM src LIMIT 1;\n one")
public class GetUnionTag extends GenericUDF {
// Global variables that inspect the input.
// These are set up during the initialize() call, and are then used during the
// calls to evaluate()
private transient UnionObjectInspector uoi;
#Override
// This is what we do in the initialize() method:
// Verify that the input is of the type expected
// Set up the ObjectInspectors for the input in global variables
// Return the ObjectInspector for the output
public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
// Verify the input is of the required type.
// Set the global variables (the various ObjectInspectors) while we're doing this
// Exactly one input argument
if( arguments.length != 1 ){
throw new UDFArgumentLengthException("_FUNC_(unionObject) accepts exactly one argument.");
}
// Is the input an array<>
if( arguments[0].getCategory() != ObjectInspector.Category.UNION ){
throw new UDFArgumentTypeException(0,"The single argument to AddExternalIdToPurchaseDetails should be "
+ "Union<>"
+ " but " + arguments[0].getTypeName() + " is found");
}
// Store the ObjectInspectors for use later in the evaluate() method
uoi = ((UnionObjectInspector)arguments[0]);
// Set up the object inspector for the output, and return it
return PrimitiveObjectInspectorFactory.javaByteObjectInspector;
}
#Override
public Object evaluate(DeferredObject[] arguments) throws HiveException {
byte tag = uoi.getTag(arguments[0].get());
return tag;
}
#Override
public String getDisplayString(String[] children) {
StringBuilder sb = new StringBuilder();
sb.append("get_union_tag(");
for (int i = 0; i < children.length; i++) {
if (i > 0) {
sb.append(',');
}
sb.append(children[i]);
}
sb.append(')');
return sb.toString();
}
}
function get_struct_from_union() UDF :
package HiveUDF;
import org.apache.hadoop.hive.ql.exec.Description;
import org.apache.hadoop.hive.ql.exec.UDFArgumentException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentLengthException;
import org.apache.hadoop.hive.ql.exec.UDFArgumentTypeException;
import org.apache.hadoop.hive.ql.metadata.HiveException;
import org.apache.hadoop.hive.ql.udf.generic.GenericUDF;
import org.apache.hadoop.hive.serde2.objectinspector.ObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.StructObjectInspector;
import org.apache.hadoop.hive.serde2.objectinspector.UnionObjectInspector;
#Description(name = "get_union_struct", value = "_FUNC_(unionObject)"
+ " - Returns struct ", extended = "Example:\n" + " > _FUNC_(unionObject).value \n 90.0121")
public class GetUnionStruct extends GenericUDF {
// Global variables that inspect the input.
// These are set up during the initialize() call, and are then used during the
// calls to evaluate()
//
// ObjectInspector for the list (input array<>)
// ObjectInspector for the struct<>
// ObjectInspectors for the elements of the struct<>, target, quantity and price
private UnionObjectInspector unionObjectInspector;
private StructObjectInspector structObjectInspector;
#Override
// This is what we do in the initialize() method:
// Verify that the input is of the type expected
// Set up the ObjectInspectors for the input in global variables
// Return the ObjectInspector for the output
public ObjectInspector initialize(ObjectInspector[] arguments) throws UDFArgumentException {
// Verify the input is of the required type.
// Set the global variables (the various ObjectInspectors) while we're doing this
// Exactly one input argument
if( arguments.length != 1 ){
throw new UDFArgumentLengthException("_FUNC_(unionObject) accepts exactly one argument.");
}
// Is the input an array<>
if( arguments[0].getCategory() != ObjectInspector.Category.UNION ){
throw new UDFArgumentTypeException(0,"The single argument to AddExternalIdToPurchaseDetails should be "
+ "Union<Struct>"
+ " but " + arguments[0].getTypeName() + " is found");
}
// Set up the object inspector for the output, and return it
return structObjectInspector;
}
#Override
public Object evaluate(DeferredObject[] arguments) throws HiveException {
return ((UnionObjectInspector) unionObjectInspector).getField(arguments[0].get());
}
#Override
public String getDisplayString(String[] children) {
StringBuilder sb = new StringBuilder();
sb.append("get_union_vqtstruct(");
for (int i = 0; i < children.length; i++) {
if (i > 0) {
sb.append(',');
}
sb.append(children[i]);
}
sb.append(')');
return sb.toString();
}
}
to use these UDF´s compile and create jar file. Than upload into hive (in my case HDInsight). Than just use
add jar wasb:///hive/HiveGUDF.jar;
CREATE TEMPORARY FUNCTION get_union_struct AS 'HiveUDF.GetUnionStruct';
before u run e.g.
SELECT get_union_tag(exposed) FROM test;

Create a Gson TypeAdapter for a Guava Range

I am trying to serialize Guava Range objects to JSON using Gson, however the default serialization fails, and I'm unsure how to correctly implement a TypeAdapter for this generic type.
Gson gson = new Gson();
Range<Integer> range = Range.closed(10, 20);
String json = gson.toJson(range);
System.out.println(json);
Range<Integer> range2 = gson.fromJson(json,
new TypeToken<Range<Integer>>(){}.getType());
System.out.println(range2);
assertEquals(range2, range);
This fails like so:
{"lowerBound":{"endpoint":10},"upperBound":{"endpoint":20}}
PASSED: typeTokenInterface
FAILED: range
java.lang.RuntimeException: Unable to invoke no-args constructor for
com.google.common.collect.Cut<java.lang.Integer>. Register an
InstanceCreator with Gson for this type may fix this problem.
at com.google.gson.internal.ConstructorConstructor$12.construct(
ConstructorConstructor.java:210)
...
Note that the default serialization actually loses information - it fails to report whether the endpoints are open or closed. I would prefer to see it serialized similar to its toString(), e.g. [10‥20] however simply calling toString() won't work with generic Range instances, as the elements of the range may not be primitives (Joda-Time LocalDate instances, for example). For the same reason, implementing a custom TypeAdapter seems difficult, as we don't know how to deserialize the endpoints.
I've implemented most of a TypeAdaptorFactory based on the template provided for Multimap which ought to work, but now I'm stuck on the generics. Here's what I have so far:
public class RangeTypeAdapterFactory implements TypeAdapterFactory {
public <T> TypeAdapter<T> create(Gson gson, TypeToken<T> typeToken) {
Type type = typeToken.getType();
if (typeToken.getRawType() != Range.class
|| !(type instanceof ParameterizedType)) {
return null;
}
Type elementType = ((ParameterizedType) type).getActualTypeArguments()[0];
TypeAdapter<?> elementAdapter = (TypeAdapter<?>)gson.getAdapter(TypeToken.get(elementType));
// Bound mismatch: The generic method newRangeAdapter(TypeAdapter<E>) of type
// GsonUtils.RangeTypeAdapterFactory is not applicable for the arguments
// (TypeAdapter<capture#4-of ?>). The inferred type capture#4-of ? is not a valid
// substitute for the bounded parameter <E extends Comparable<?>>
return (TypeAdapter<T>) newRangeAdapter(elementAdapter);
}
private <E extends Comparable<?>> TypeAdapter<Range<E>> newRangeAdapter(final TypeAdapter<E> elementAdapter) {
return new TypeAdapter<Range<E>>() {
#Override
public void write(JsonWriter out, Range<E> value) throws IOException {
if (value == null) {
out.nullValue();
return;
}
String repr = (value.lowerBoundType() == BoundType.CLOSED ? "[" : "(") +
(value.hasLowerBound() ? elementAdapter.toJson(value.lowerEndpoint()) : "-\u221e") +
'\u2025' +
(value.hasLowerBound() ? elementAdapter.toJson(value.upperEndpoint()) : "+\u221e") +
(value.upperBoundType() == BoundType.CLOSED ? "]" : ")");
out.value(repr);
}
public Range<E> read(JsonReader in) throws IOException {
if (in.peek() == JsonToken.NULL) {
in.nextNull();
return null;
}
String[] endpoints = in.nextString().split("\u2025");
E lower = elementAdapter.fromJson(endpoints[0].substring(1));
E upper = elementAdapter.fromJson(endpoints[1].substring(0,endpoints[1].length()-1));
return Range.range(lower, endpoints[0].charAt(0) == '[' ? BoundType.CLOSED : BoundType.OPEN,
upper, endpoints[1].charAt(endpoints[1].length()-1) == '[' ? BoundType.CLOSED : BoundType.OPEN);
}
};
}
}
However the return (TypeAdapter<T>) newRangeAdapter(elementAdapter); line has a compilation error and I'm now at a loss.
What's the best way to resolve this error? Is there a better way to serialize Range objects that I'm missing? What about if I want to serialize RangeSets?
Rather frustrating that the Google utility library and Google serialization library seem to require so much glue to work together :(
This feels somewhat like reinventing the wheel, but it was a lot quicker to put together and test than the time spent trying to get Gson to behave, so at least presently I'll be using the following Converters to serialize Range and RangeSet*, rather than Gson.
/**
* Converter between Range instances and Strings, essentially a custom serializer.
* Ideally we'd let Gson or Guava do this for us, but presently this is cleaner.
*/
public static <T extends Comparable<? super T>> Converter<Range<T>, String> rangeConverter(final Converter<T, String> elementConverter) {
final String NEG_INFINITY = "-\u221e";
final String POS_INFINITY = "+\u221e";
final String DOTDOT = "\u2025";
return new Converter<Range<T>, String>() {
#Override
protected String doForward(Range<T> range) {
return (range.hasLowerBound() && range.lowerBoundType() == BoundType.CLOSED ? "[" : "(") +
(range.hasLowerBound() ? elementConverter.convert(range.lowerEndpoint()) : NEG_INFINITY) +
DOTDOT +
(range.hasUpperBound() ? elementConverter.convert(range.upperEndpoint()) : POS_INFINITY) +
(range.hasUpperBound() && range.upperBoundType() == BoundType.CLOSED ? "]" : ")");
}
#Override
protected Range<T> doBackward(String range) {
String[] endpoints = range.split(DOTDOT);
Range<T> ret = Range.all();
if(!endpoints[0].substring(1).equals(NEG_INFINITY)) {
T lower = elementConverter.reverse().convert(endpoints[0].substring(1));
ret = ret.intersection(Range.downTo(lower, endpoints[0].charAt(0) == '[' ? BoundType.CLOSED : BoundType.OPEN));
}
if(!endpoints[1].substring(0,endpoints[1].length()-1).equals(POS_INFINITY)) {
T upper = elementConverter.reverse().convert(endpoints[1].substring(0,endpoints[1].length()-1));
ret = ret.intersection(Range.upTo(upper, endpoints[1].charAt(endpoints[1].length()-1) == ']' ? BoundType.CLOSED : BoundType.OPEN));
}
return ret;
}
};
}
/**
* Converter between RangeSet instances and Strings, essentially a custom serializer.
* Ideally we'd let Gson or Guava do this for us, but presently this is cleaner.
*/
public static <T extends Comparable<? super T>> Converter<RangeSet<T>, String> rangeSetConverter(final Converter<T, String> elementConverter) {
return new Converter<RangeSet<T>, String>() {
private final Converter<Range<T>, String> rangeConverter = rangeConverter(elementConverter);
#Override
protected String doForward(RangeSet<T> rs) {
ArrayList<String> ls = new ArrayList<>();
for(Range<T> range : rs.asRanges()) {
ls.add(rangeConverter.convert(range));
}
return Joiner.on(", ").join(ls);
}
#Override
protected RangeSet<T> doBackward(String rs) {
Iterable<String> parts = Splitter.on(",").trimResults().split(rs);
ImmutableRangeSet.Builder<T> build = ImmutableRangeSet.builder();
for(String range : parts) {
build.add(rangeConverter.reverse().convert(range));
}
return build.build();
}
};
}
*For inter-process communication, Java serialization would likely work just fine, as both classes implement Serializable. However I'm serializing to disk for more permanent storage, meaning I need a format I can trust won't change over time. Guava's serialization doesn't provide that guarantee.
Here is a Gson JsonSerializer and JsonDeserializer that generically supports a Range: https://github.com/jamespedwards42/Fava/wiki/Range-Marshaller
#Override
public JsonElement serialize(final Range src, final Type typeOfSrc, final JsonSerializationContext context) {
final JsonObject jsonObject = new JsonObject();
if ( src.hasLowerBound() ) {
jsonObject.add( "lowerBoundType", context.serialize( src.lowerBoundType() ) );
jsonObject.add( "lowerBound", context.serialize( src.lowerEndpoint() ) );
} else
jsonObject.add( "lowerBoundType", context.serialize( BoundType.OPEN ) );
if ( src.hasUpperBound() ) {
jsonObject.add( "upperBoundType", context.serialize( src.upperBoundType() ) );
jsonObject.add( "upperBound", context.serialize( src.upperEndpoint() ) );
} else
jsonObject.add( "upperBoundType", context.serialize( BoundType.OPEN ) );
return jsonObject;
}
#Override
public Range<? extends Comparable<?>> deserialize(final JsonElement json, final Type typeOfT, final JsonDeserializationContext context) throws JsonParseException {
if ( !( typeOfT instanceof ParameterizedType ) )
throw new IllegalStateException( "typeOfT must be a parameterized Range." );
final JsonObject jsonObject = json.getAsJsonObject();
final JsonElement lowerBoundTypeJsonElement = jsonObject.get( "lowerBoundType" );
final JsonElement upperBoundTypeJsonElement = jsonObject.get( "upperBoundType" );
if ( lowerBoundTypeJsonElement == null || upperBoundTypeJsonElement == null )
throw new IllegalStateException( "Range " + json
+ "was not serialized with this serializer! The default serialization does not store the boundary types, therfore we can not deserialize." );
final Type type = ( ( ParameterizedType ) typeOfT ).getActualTypeArguments()[0];
final BoundType lowerBoundType = context.deserialize( lowerBoundTypeJsonElement, BoundType.class );
final JsonElement lowerBoundJsonElement = jsonObject.get( "lowerBound" );
final Comparable<?> lowerBound = lowerBoundJsonElement == null ? null : context.deserialize( lowerBoundJsonElement, type );
final BoundType upperBoundType = context.deserialize( upperBoundTypeJsonElement, BoundType.class );
final JsonElement upperBoundJsonElement = jsonObject.get( "upperBound" );
final Comparable<?> upperBound = upperBoundJsonElement == null ? null : context.deserialize( upperBoundJsonElement, type );
if ( lowerBound == null && upperBound != null )
return Range.upTo( upperBound, upperBoundType );
else if ( lowerBound != null && upperBound == null )
return Range.downTo( lowerBound, lowerBoundType );
else if ( lowerBound == null && upperBound == null )
return Range.all();
return Range.range( lowerBound, lowerBoundType, upperBound, upperBoundType );
}
Here is a straight forward solution. Works very well
import com.google.common.collect.BoundType;
import com.google.common.collect.Range;
import com.google.gson.*;
import java.lang.reflect.Type;
public class GoogleRangeAdapter implements JsonSerializer, JsonDeserializer {
public static String TK_hasLowerBound = "hasLowerBound";
public static String TK_hasUpperBound = "hasUpperBound";
public static String TK_lowerBoundType = "lowerBoundType";
public static String TK_upperBoundType = "upperBoundType";
public static String TK_lowerBound = "lowerBound";
public static String TK_upperBound = "upperBound";
#Override
public Object deserialize(JsonElement json, Type typeOfT, JsonDeserializationContext context) throws JsonParseException {
JsonObject jsonObject = (JsonObject)json;
boolean hasLowerBound = jsonObject.get(TK_hasLowerBound).getAsBoolean();
boolean hasUpperBound = jsonObject.get(TK_hasUpperBound).getAsBoolean();
if (!hasLowerBound && !hasUpperBound) {
return Range.all();
}
else if (!hasLowerBound && hasUpperBound){
double upperBound = jsonObject.get(TK_upperBound).getAsDouble();
BoundType upperBoundType = BoundType.valueOf(jsonObject.get(TK_upperBoundType).getAsString());
if (upperBoundType == BoundType.OPEN)
return Range.lessThan(upperBound);
else
return Range.atMost(upperBound);
}
else if (hasLowerBound && !hasUpperBound){
double lowerBound = jsonObject.get(TK_lowerBound).getAsDouble();
BoundType lowerBoundType = BoundType.valueOf(jsonObject.get(TK_lowerBoundType).getAsString());
if (lowerBoundType == BoundType.OPEN)
return Range.greaterThan(lowerBound);
else
return Range.atLeast(lowerBound);
}
else {
double lowerBound = jsonObject.get(TK_lowerBound).getAsDouble();
double upperBound = jsonObject.get(TK_upperBound).getAsDouble();
BoundType upperBoundType = BoundType.valueOf(jsonObject.get(TK_upperBoundType).getAsString());
BoundType lowerBoundType = BoundType.valueOf(jsonObject.get(TK_lowerBoundType).getAsString());
if (lowerBoundType == BoundType.OPEN && upperBoundType == BoundType.OPEN)
return Range.open(lowerBound, upperBound);
else if (lowerBoundType == BoundType.OPEN && upperBoundType == BoundType.CLOSED)
return Range.openClosed(lowerBound, upperBound);
else if (lowerBoundType == BoundType.CLOSED && upperBoundType == BoundType.OPEN)
return Range.closedOpen(lowerBound, upperBound);
else
return Range.closed(lowerBound, upperBound);
}
}
#Override
public JsonElement serialize(Object src, Type typeOfSrc, JsonSerializationContext context) {
JsonObject jsonObject = new JsonObject();
Range<Double> range = (Range<Double>)src;
boolean hasLowerBound = range.hasLowerBound();
boolean hasUpperBound = range.hasUpperBound();
jsonObject.addProperty(TK_hasLowerBound, hasLowerBound);
jsonObject.addProperty(TK_hasUpperBound, hasUpperBound);
if (hasLowerBound) {
jsonObject.addProperty(TK_lowerBound, range.lowerEndpoint());
jsonObject.addProperty(TK_lowerBoundType, range.lowerBoundType().name());
}
if (hasUpperBound) {
jsonObject.addProperty(TK_upperBound, range.upperEndpoint());
jsonObject.addProperty(TK_upperBoundType, range.upperBoundType().name());
}
return jsonObject;
}
}

In antlr4 lexer, How to have a rule that catches all remaining "words" as Unknown token?

I have an antlr4 lexer grammar. It has many rules for words, but I also want it to create an Unknown token for any word that it can not match by other rules. I have something like this:
Whitespace : [ \t\n\r]+ -> skip;
Punctuation : [.,:;?!];
// Other rules here
Unknown : .+? ;
Now generated matcher catches '~' as unknown but creates 3 '~' Unknown tokens for input '~~~' instead of a single '~~~' token. What should I do to tell lexer to generate word tokens for unknown consecutive characters. I also tried "Unknown: . ;" and "Unknown : .+ ;" with no results.
EDIT: In current antlr versions .+? now catches remaining words, so this problem seems to be resolved.
.+? at the end of a lexer rule will always match a single character. But .+ will consume as much as possible, which was illegal at the end of a rule in ANTLR v3 (v4 probably as well).
What you can do is just match a single char, and "glue" these together in the parser:
unknowns : Unknown+ ;
...
Unknown : . ;
EDIT
... but I only have a lexer, no parsers ...
Ah, I see. Then you could override the nextToken() method:
lexer grammar Lex;
#members {
public static void main(String[] args) {
Lex lex = new Lex(new ANTLRInputStream("foo, bar...\n"));
for(Token t : lex.getAllTokens()) {
System.out.printf("%-15s '%s'\n", tokenNames[t.getType()], t.getText());
}
}
private java.util.Queue<Token> queue = new java.util.LinkedList<Token>();
#Override
public Token nextToken() {
if(!queue.isEmpty()) {
return queue.poll();
}
Token next = super.nextToken();
if(next.getType() != Unknown) {
return next;
}
StringBuilder builder = new StringBuilder();
while(next.getType() == Unknown) {
builder.append(next.getText());
next = super.nextToken();
}
// The `next` will _not_ be an Unknown-token, store it in
// the queue to return the next time!
queue.offer(next);
return new CommonToken(Unknown, builder.toString());
}
}
Whitespace : [ \t\n\r]+ -> skip ;
Punctuation : [.,:;?!] ;
Unknown : . ;
Running it:
java -cp antlr-4.0-complete.jar org.antlr.v4.Tool Lex.g4
javac -cp antlr-4.0-complete.jar *.java
java -cp .:antlr-4.0-complete.jar Lex
will print:
Unknown 'foo'
Punctuation ','
Unknown 'bar'
Punctuation '.'
Punctuation '.'
Punctuation '.'
The accepted answer works, but it only works for Java.
I converted the provided Java code for use with the C# ANTLR runtime. If anyone else needs it... here ya go!
#members {
private IToken _NextToken = null;
public override IToken NextToken()
{
if(_NextToken != null)
{
var token = _NextToken;
_NextToken = null;
return token;
}
var next = base.NextToken();
if(next.Type != UNKNOWN)
{
return next;
}
var originalToken = next;
var lastToken = next;
var builder = new StringBuilder();
while(next.Type == UNKNOWN)
{
lastToken = next;
builder.Append(next.Text);
next = base.NextToken();
}
_NextToken = next;
return new CommonToken(
originalToken
)
{
Text = builder.ToString(),
StopIndex = lastToken.Column
};
}
}

How to catch list of tokens in tree grammar of antlr3?

I took a dummy language for example:
It simply accepts one or more '!'.
its lexer and grammar rules are:
grammar Ns;
options {
output=AST;
ASTLabelType=CommonTree;
}
tokens {
NOTS;
}
#header {
package test;
}
#lexer::header {
package test;
}
ns : NOT+ EOF -> ^(NOTS NOT+);
NOT : '!';
ok, as you can see, this represents a language which accept '!' or '!!!' or '!!!!!'...
and I defined some meaningful classes to build ASTs:
public class Not {
public static final Not SINGLETON = new Not();
private Not() {
}
}
public class Ns {
private List<Not> nots;
public Ns(String nots) {
this.nots = new ArrayList<Not>();
for (int i = 0; i < nots.length(); i++) {
this.nots.add(Not.SINGLETON);
}
}
public String toString() {
String ret = "";
for (int i = 0; i < this.nots.size(); i++) {
ret += "!";
}
return ret;
}
}
and here's the tree grammar:
tree grammar NsTreeWalker;
options {
output = AST;
tokenVocab = Ns;
ASTLabelType = CommonTree;
}
#header {
package test;
}
ns returns [Ns ret] : ^(NOTS n=NOT+) {$ret = new Ns($n.text);};
and the main class code with some sample data to test the generated classes:
public class Test {
public static void main(String[] args) throws Exception {
ANTLRInputStream input = new ANTLRInputStream(new ByteArrayInputStream("!!!".getBytes("utf-8")));
NsLexer lexer = new NsLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
NsParser parser = new NsParser(tokens);
CommonTree root = (CommonTree) parser.ns().getTree();
NsTreeWalker walker = new NsTreeWalker(new CommonTreeNodeStream(root));
try {
NsTreeWalker.ns_return r = walker.ns();
System.out.println(r.ret);
} catch (RecognitionException e) {
e.printStackTrace();
}
}
}
but the final output printed is '!', other than the expecting '!!!'.
that's mainly because this line of code :
ns returns [Ns ret] : ^(NOTS n=NOT+) {$ret = new Ns($n.text);};
the $n above captured only one '!', I don't know how to capture all three tokens of '!', in other words , a list of '!' with $n.
Is there some one could help?thanks!
The fact that only one ! gets printed is because your rule:
ns returns [Ns ret]
: ^(NOTS n=NOT+) {$ret = new Ns($n.text);}
;
gets more or less translated as:
Token n = null
LOOP
n = match NOT_token
END
return new Ns(n.text)
Therefor, n.text will always be just a single !.
What you need to do is collect these NOT tokens in a list. In ANTLR you can create a list of tokens using the += operator instead of the "single token" operator =. So change your ns rule into:
ns returns [Ns ret]
: ^(NOTS n+=NOT+) {$ret = new Ns($n);}
;
which gets translated as:
List n = null
LOOP
n.add(match NOT_token)
END
return new Ns(n)
Be sure to change the constructor of your Ns class to take a List instead:
public Ns(List nots) {
this.nots = new ArrayList<Not>();
for (Object o : nots) {
this.nots.add(Not.SINGLETON);
}
}
after which the output of your test class would be:
!!!
Good luck!