How to describe this event log in formal BNF? - grammar

I have a very simple event log format, just I'm having difficulties describing it in BNF (for gocc ).
Here is my simple event log format:
timestamp nested-event-A Running
timestamp Start of nested-event-B
timestamp Start unested-event-C
timestamp End unested-event-C
timestamp Start unested-event-D
timestamp End unested-event-D
timestamp Start unested-event-E
timestamp End unested-event-E
. . .
timestamp End of nested-event-B
timestamp nested-event-A completed
Say the grammar begins with EventLog, and I'm even having difficulties starting the first BNF.
Shall I start with
EventLog ::= nested-event | unested-event
Or,
Shall I start with
EventLog ::= nested-event
nested-event ::= nested-event-start unested-event+ nested-event-end
What would you think the best way to describe it?

Regarding the productions of EventLog, your choice must depend on the knowledge whether if the starting event is always a nested one or can be unested.
If your log consist with many logs of any kind of events, go with:
You can go with:
EventLog ::= event*
event ::= nested-event | unested-event
nested-event ::= nested-event-start unested-event+ nested-event-end
...

OK, here is the full gocc BNF, proven to be tested and working:
/* Lexical part */
_digit : '0'-'9' ;
_jobLog : 'M' 'y' 'J' 'o' 'b' ;
_lineend : [ '\r' ] '\n' ;
timestamp
: _digit _digit _digit _digit '-' _digit _digit '-' _digit _digit
' ' _digit _digit ':' _digit _digit ':' _digit _digit '.' { _digit } ' '
;
jobLogStart : _jobLog' ' 'R' 'u' 'n' 'n' 'i' 'n' 'g' ' ' 'j' 'o' 'b' _lineend ;
processLogStart : 'S' 't' 'a' 'r' 't' ' ' 'o' 'f' ' ' { . } _lineend;
taskLogStart : 'S' 't' 'a' 'r' 't' ' ' { . } _lineend;
taskLogEnd : 'E' 'n' 'd' ' ' { . } _lineend;
processLogEnd : 'E' 'n' 'd' ' ' 'o' 'f' ' ' { . } _lineend;
jobLogEnd : _jobLog ' ' 'J' 'o' 'b' ' ' 'c' 'o' 'm' 'p' 'l' 'e' 't' 'e' 'd' _lineend ;
/* Syntax part */
EventLog
: JobLog
;
JobLog
: JobLogStart ProcessLog JobLogEnd
;
ProcessLog
: ProcessLogStart TaskLog ProcessLogEnd
;
TaskLog
: TaskLogStart TaskLogEnd
| TaskLog
TaskLogStart TaskLogEnd
;
TaskLogStart : timestamp taskLogStart ;
TaskLogEnd : timestamp taskLogEnd ;
ProcessLogStart : timestamp processLogStart ;
ProcessLogEnd : timestamp processLogEnd ;
JobLogStart : timestamp jobLogStart ;
JobLogEnd : timestamp jobLogEnd ;
For the completeness, here is the sample data that tested fine with above syntax:
2022-01-18 10:19:41.6007 MyJob Running job
2022-01-18 10:21:24.8027 Start of The Processing 1/18/2022
2022-01-18 10:21:24.8027 Start unested event C
2022-01-18 10:21:24.8027 End unested event C
2022-01-18 10:21:24.8199 Start unested event D with more words
2022-01-18 10:33:21.9885 End unested event D with more words
2022-01-18 10:33:21.9885 Start unested event E with different words
2022-01-18 10:33:21.9885 End unested event E with different words
2022-01-18 10:33:23.9087 Start unested event F with different words
2022-01-18 10:33:40.8774 End unested event F with different words
2022-01-18 10:33:40.8774 Start ...
2022-01-18 10:35:13.4284 End ...
2022-01-18 10:35:13.4445 Start ...
2022-01-18 10:35:13.5237 End ...
2022-01-18 10:35:13.5237 Start ...
2022-01-18 10:35:13.6597 End ...
2022-01-18 10:35:13.6597 Start ...
2022-01-18 10:36:24.4468 End ...
2022-01-18 10:36:24.4468 Start ...
2022-01-18 10:36:24.4554 End ...
2022-01-18 10:36:24.7238 End of The Processing 1/18/2022
2022-01-18 10:36:24.9746 MyJob Job completed

Related

Why , in pase tree , ANTLR keep mismatching input?

I am new with ANTLR4 and I am trying to visualize the Parse Tree of a text input in a simple form :
grammar Expr;
contract: (I WS SEND WS quantity WS asset WS TO WS beneficiary WS ON WS send_date WS)*;
asset: '$'| 'TND' | 'USD';
quantity:Q;
beneficiary: B;
send_date : day SLASH month SLASH year;
day: D ;
month: M ;
year: Y ;
B : LETTERUP (LETTERLOW+)+ LETTERLOW*;
Q : DIGITO DIGITZ*|DIGITO DIGITZ* POINT DIGITZ*;
D : DIGIT0 DIGITO|(DIGIT1|DIGIT2)DIGITZ|DIGIT3(DIGIT0|DIGIT1);
M : DIGIT0 DIGITO| DIGIT1(DIGIT0|DIGIT1|DIGIT2);
Y : DIGIT2 DIGIT0((DIGIT1(DIGIT7|DIGIT8|DIGIT9))|(DIGIT2 DIGITZ));
I: 'I';
SEND: 'send';
TO:'to' ;
ON: 'on';
LETTER : [a-zA-Z];
LETTERUP : [A-Z];
LETTERLOW : [a-z];
DIGITZ : [0-9];
DIGITO : [1-9];
DIGIT0 : [0];
DIGIT1 : [1];
DIGIT2 : [2];
DIGIT3 : [3];
DIGIT4 : [4];
DIGIT5 : [5];
DIGIT6 : [6];
DIGIT7 : [7];
DIGIT8 : [8];
DIGIT9 : [9];
SLASH:'/';
POINT:'.'|',';
WS : (' ' | '\t' |'\n' |'\r' )+ ;
But it keeps mismatching the send_date as you can see here:
I know it is a seriously complex numerical grammar I did just want some control the 01<= day <= 31 , 01<= month <= 12 and 2017<= year <= 2029 that's all
is there any help? and thanks
The problem happens because your grammar is ambiguous. 07 can match D and 2017 can match Q.
You can fix it like this:
grammar Expr;
contract: (I WS SEND WS quantity WS asset WS TO WS beneficiary WS ON WS send_date WS)*;
asset: '$'| 'TND' | 'USD';
quantity:Q;
beneficiary: B;
send_date : day month year ;
day: D ;
month: M ;
year: Y ;
D : DIGIT0 DIGITO|(DIGIT1|DIGIT2)DIGITZ|DIGIT3(DIGIT0|DIGIT1);
M : SLASH (DIGIT0 DIGITO| DIGIT1(DIGIT0|DIGIT1|DIGIT2));
Y : SLASH (DIGIT2 DIGIT0((DIGIT1(DIGIT7|DIGIT8|DIGIT9))|(DIGIT2 DIGITZ)));
B : LETTERUP (LETTERLOW+)+ LETTERLOW*;
Q : DIGITO DIGITZ*|DIGITO DIGITZ* POINT DIGITZ*;
I: 'I';
SEND: 'send';
TO:'to' ;
ON: 'on';
LETTER : [a-zA-Z];
LETTERUP : [A-Z];
LETTERLOW : [a-z];
DIGITZ : [0-9];
DIGITO : [1-9];
DIGIT0 : [0];
DIGIT1 : [1];
DIGIT2 : [2];
DIGIT3 : [3];
DIGIT4 : [4];
DIGIT5 : [5];
DIGIT6 : [6];
DIGIT7 : [7];
DIGIT8 : [8];
DIGIT9 : [9];
SLASH:'/';
POINT:'.'|',';
WS : (' ' | '\t' |'\n' |'\r' )+ ;
That's a seriously complex numerical grammar. Perhaps you could simplify:
day: NUMBER ;
month: NUMBER ;
year: NUMBER ;
NUMBER : DIGITZ+ ;
DIGITZ : [0-9];
You could enforce semantics like limiting year to [2017...2020] or whatever in your code. Just an idea. Simplifying often helps and then you can enhance it from there, knowing if you make a mistake you can always revert to something that will at least work.
EDIT:
The reason your grammar doesn't work is because the month is being lexed as a day:
[#0,0:0='I',<'I'>,1:0]
[#1,1:1=' ',<WS>,1:1]
[#2,2:5='send',<'send'>,1:2]
[#3,6:6=' ',<WS>,1:6]
[#4,7:9='300',<Q>,1:7]
[#5,10:10=' ',<WS>,1:10]
[#6,11:11='$',<'$'>,1:11]
[#7,12:12=' ',<WS>,1:12]
[#8,13:14='to',<'to'>,1:13]
[#9,15:15=' ',<WS>,1:15]
[#10,16:20='Ahmed',<B>,1:16]
[#11,21:21=' ',<WS>,1:21]
[#12,22:23='on',<'on'>,1:22]
[#13,24:24=' ',<WS>,1:24]
[#14,25:26='03',<D>,1:25]
[#15,27:27='/',<'/'>,1:27]
[#16,28:29='07',<D>,1:28] <-- see, this is being lexed as a D (day)
[#17,30:30='/',<'/'>,1:30]
[#18,31:34='2017',<Q>,1:31] <-- and this is being lexed as a Q (quantity)
[#19,35:36='\r\n',<WS>,1:35]
[#20,37:36='<EOF>',<EOF>,2:0]
line 1:28 mismatched input '05' expecting M
line 1:31 mismatched input '2017' expecting Y
Lexer rules are applied in the order in which they appear, and Day appears before Month. Quantity appears before Year. Hence the improper lexing. This is a scenario, honestly, where I think you need to simplify and just accept numbers. Then in your code, enforce the semantics (make sure year is in range, etc) in your code and provide a helpful error message to the user if values are not in range. Your total effort spend will be less that way.
NEW VERSION
grammar Test2;
contract: (I SEND quantity asset TO beneficiary ON send_date)*;
asset: '$'| 'TND' | 'USD';
send_date : DATE ;
quantity: NUMBER;
beneficiary: B;
DATE : NUMBER SLASH NUMBER SLASH NUMBER ;
B : LETTERUP (LETTERLOW+)+ LETTERLOW*;
I: 'I';
SEND: 'send';
TO:'to' ;
ON: 'on';
LETTER : [a-zA-Z];
LETTERUP : [A-Z];
LETTERLOW : [a-z];
NUMBER: DIGIT+;
DIGIT : [0-9];
SLASH:'/';
POINT:'.'|',';
WS : [ \t\n\r]+ -> skip;
Improvements:
1. Better handling of whitespace much more conventional.
2. Simplified number syntax.
3. It works
[#0,0:0='I',<'I'>,1:0]
[#1,2:5='send',<'send'>,1:2]
[#2,7:9='300',<NUMBER>,1:7]
[#3,11:11='$',<'$'>,1:11]
[#4,13:14='to',<'to'>,1:13]
[#5,16:20='Ahmed',<B>,1:16]
[#6,22:23='on',<'on'>,1:22]
[#7,25:34='03/07/2017',<DATE>,1:25]
[#8,37:36='<EOF>',<EOF>,2:0]
Problem: I simplified away the ability to do decimal numbers for quantity. You can add that back in as you wish.

How to fix extraneous input ' ' expecting, in antlr4

Hello when running antlr4 with the following input i get the following error
image showing problem
[
I have been trying to fix it by doing some changes here and there but it seems it only works if I write every component of whileLoop in a new line.
Could you please tell me what i am missing here and why the problem persits?
grammar AM;
COMMENTS :
'{'~[\n|\r]*'}' -> skip
;
body : ('BODY' ' '*) anything | 'BODY' 'BEGIN' anything* 'END' ;
anything : whileLoop | write ;
write : 'WRITE' '(' '"' sentance '"' ')' ;
read : 'READ' '(' '"' sentance '"' ')' ;
whileLoop : 'WHILE' expression 'DO' ;
block : 'BODY' anything 'END';
expression : 'TRUE'|'FALSE' ;
test : ID? {System.out.println("Done");};
logicalOperators : '<' | '>' | '<>' | '<=' | '>=' | '=' ;
numberExpressionS : (NUMBER numberExpression)* ;
numberExpression : ('-' | '/' | '*' | '+' | '%') NUMBER ;
sentance : (ID)* {System.out.println("Sentance");};
WS : [ \t\r\n]+ -> skip ;
NUMBER : [0-9]+ ;
ID : [a-zA-Z0-9]* ;
**`strong text`**
Your lexer rules produce conflicts:
body : ('BODY' ' '*) anything | 'BODY' 'BEGIN' anything* 'END' ;
vs
WS : [ \t\r\n]+ -> skip ;
The critical section is the ' '*. This defines an implicit lexer token. It matches spaces and it is defined above of WS. So any sequence of spaces is not handled as WS but as implicit token.
If I am right putting tabs between the components of whileloop will work, also putting more than one space between them should work. You should simply remove ' '*, since whitespace is to be skipped anyway.

Antlr4: unexpected parsing

all I have develop ANTR4 grammar. During parse the string
Time;25 10 * * *;'faccalc_minus1_cron.out.'yyyyMMdd.HHmm;America/New_York
I have following errors
Invalid chars in expression! Expression: ;' Invalid chars: ;'
extraneous input ';' expecting {'', INTEGER, '-', '/', ','}
missing ';' at '_'
Incorrect timezone format :faccalc_minus1
I don't undestand why, as regex rule contain '_'.
How to fix it?
Regards,
Vladimir
lexer grammar FileTriggerLexer;
CRON
:
'cron'
;
MARKET_CRON
:
'marketCron'
;
COMBINED
:
'combined'
;
FILE_FEED
:
'FileFeed'
;
MANUAL_NOTICE
:
'ManualNotice'
;
TIME
:
'Time'
;
MARKET_TIME
:
'MarketTime'
;
SCHEDULE
:
'Schedule'
;
PRODUCT
:
'Product'
;
UCA_CLIENT
:
'UCAClient'
;
APEX_GSM
:
'ApexGSM'
;
DELAY
:
'Delay'
;
CATEGORY
:
'Category'
;
EXCHANGE
:
'Exchange'
;
CALENDAR_EXCHANGE
:
'CalendarExchange'
;
FEED
:
'Feed'
;
RANGE
:
'Range'
;
SYNTH
:
'Synth'
;
TRIGGER
:
'Trigger'
;
DELAYED_TRIGGER
:
'DelayedTrigger'
;
INTRA_TRIGGER
:
'IntraTrigger'
;
CURRENT_TRIGGER
:
'CurrentTrigger'
;
CALENDAR_FILE_FEED
:
'CalendarFileFeed'
;
PREVIOUS
:
'Previous'
;
LATE_DELAY
:
'LateDelay'
;
BUILD_ARCHIVE
:
'BuildArchive'
;
COMPRESS
:
'Compress'
;
LATE_TIME
:
'LateTime'
;
CALENDAR_CATEGORY
:
'CalendarCategory'
;
APEX_GPM
:
'ApexGPM'
;
PORTFOLIO_NOTICE
:
'PortfolioNotice'
;
FixedTimeOfDay: 'FixedTimeOfDay';
SEMICOLON
:
';'
;
ASTERISK
:
'*'
;
LBRACKET
:
'('
;
RBRACKET
:
')'
;
PERCENT
:
'%'
;
INTEGER
:
[0-9]+
;
DASH
:
'-'
;
DOUBLE_QUOTE
:
'"'
;
QUOTE
:
'\''
;
SLASH
:
'/'
;
DOT
:
'.'
;
COMMA
:
','
;
UNDERSCORE
:
'_'
;
EQUAL
:
'='
;
MORE_THAN
:
'>'
;
LESS
:
'<'
;
ID
:
[a-zA-Z] [a-zA-Z0-9]*
;
WS
:
[ \t\r\n]+ -> skip
;
/**
* Define Fied Trigger valdiator grammar
*/
grammar FileTriggerValidator;
options
{
tokenVocab = FileTriggerLexer;
}
r
:
(
schedule
| file_feed
| time_feed
| market_time_feed
| manual_notice
| portfolio_notice
| not_checked
)+
;
not_checked
:
(
PRODUCT
| UCA_CLIENT
| APEX_GSM
| APEX_GPM
| DELAY
| CATEGORY
| CALENDAR_CATEGORY
| EXCHANGE
| CALENDAR_EXCHANGE
| FEED
| RANGE
| SYNTH
| TRIGGER
| DELAYED_TRIGGER
| INTRA_TRIGGER
| CURRENT_TRIGGER
| CALENDAR_FILE_FEED
| PREVIOUS
| LATE_DELAY
| LATE_TIME
| COMPRESS
| BUILD_ARCHIVE
)
(
SEMICOLON anyList
)?
;
anyList
:
anyElement
(
SEMICOLON anyElement
)*
;
anyElement
:
cron
| file_name
| with_step_value
| source_file
| timezone
| regEx
;
portfolio_notice
:
PORTFOLIO_NOTICE SEMICOLON regEx
;
manual_notice
:
MANUAL_NOTICE SEMICOLON file_name SEMICOLON timezone
;
time_feed
:
TIME SEMICOLON cron_part
(
timezone?
) SEMICOLON file_name SEMICOLON timezone
;
market_time_feed
:
MARKET_TIME SEMICOLON cron_part timezone SEMICOLON file_name SEMICOLON
timezone
(
SEMICOLON UNDERSCORE? INTEGER
)*
;
file_feed
:
file_feed_name SEMICOLON source_file SEMICOLON source_host SEMICOLON
source_host SEMICOLON regEx SEMICOLON regEx
(
SEMICOLON source_host
)*
;
regEx
:
(
ID
| DOT
| ASTERISK
| INTEGER
| PERCENT
| UNDERSCORE
| DASH
| LESS
| MORE_THAN
| EQUAL
| SLASH
| LBRACKET
| RBRACKET
| DOUBLE_QUOTE
| QUOTE
| COMMA
)+
;
source_host
:
ID
(
DASH ID
)*
;
file_feed_name
:
FILE_FEED
;
source_file
:
(
ID
| DASH
| UNDERSCORE
)+
;
schedule
:
SCHEDULE SEMICOLON schedule_defining SEMICOLON file_name SEMICOLON timezone
(
SEMICOLON DASH? INTEGER
)*
;
schedule_defining
:
cron
| market_cron
| combined_cron
;
cron
:
CRON LBRACKET DOUBLE_QUOTE cron_part timezone DOUBLE_QUOTE RBRACKET
;
market_cron
:
MARKET_CRON LBRACKET DOUBLE_QUOTE cron_part timezone DOUBLE_QUOTE COMMA
DOUBLE_QUOTE ID DOUBLE_QUOTE RBRACKET
;
combined_cron
:
COMBINED LBRACKET cron_list_element
(
COMMA cron_list_element
)* RBRACKET
;
mic_defining
:
ID
;
file_name
:
regEx
;
cron_list_element
:
cron
| market_cron
;
//
schedule_defined_string
:
cron
;
//
cron_part
:
minutes hours days_of_month month week_days
;
//
minutes
:
with_step_value
;
hours
:
with_step_value
;
//
int_list
:
INTEGER
| interval
(
COMMA INTEGER
| interval
)*
;
interval
:
INTEGER DASH INTEGER
;
//
days_of_month
:
with_step_value
;
//
month
:
with_step_value
;
//
week_days
:
with_step_value
;
//
timezone
:
timezone_part
(
SLASH timezone_part
)?
;
//
timezone_part
:
ID
(
UNDERSCORE ID
)?
;
//
with_step_value
:
(
INTEGER
| COMMA
| SLASH
| ASTERISK
| DASH
)+
;
step
:
SLASH int_list
;
To analyze this kind of problem, dump the token stream to see what the lexer is actually doing. To directly dump the tokens, see this answer. AntlrDT, for example, also provides a graphical analysis of the corresponding parse-tree (I am the author of AntlrDT).
From this, easy to see that the first error occurs in the with_step_value rule: does not allow for a trailing semicolon.
Second error is in the timezone_part rule: does not allow for repeated ID UNDERSCORE occurrences.

Xtext grammar : mismatched input '0' expecting RULE_INT

I'm new to Xtext and I'm trying to create a simple DSL for railway systems, here's my grammar :
grammar org.xtext.railway.RailWay with org.eclipse.xtext.common.Terminals
generate railWay "http://www.xtext.org/railway/RailWay"
Model:
(trains+=Train)*
| (paths+=Path)*
| (sections+=Section)*
;
Train:
'Train' name=ID ':'
'Path' path=[Path]
'Speed' speed=INT
'end'
;
Path:
'Path' name=ID ':'
'Sections' ('{' sections+=[Section] (',' sections+=[Section] )+ '}' ) | sections+=[Section]
'end'
;
Section:
'Section' name=ID ':'
'Start' start=INT
'End' end=INT
('SpeedMax' speedMax=INT)?
'end'
;
But when I put this code at the Eclipse instance :
Section brestStBrieux :
Start 0
End 5
end
Section StBrieuxLeMan :
Start 5
End 10
end
Section leManParis :
Start 1
End 12
end
Path brestParis :
Sections { brestStBrieux, StBrieuxLeMan, leManParis}
end
Train tgv :
Path brestParis
Speed 23
end
I got this error three times:
mismatched input '0' expecting RULE_INT
mismatched input '1' expecting RULE_INT
mismatched input '5' expecting RULE_INT
I can't see where those errors come from, what can I do to fix them. Any idea?
Christian is right, since the FLOAT terminal is no longer defined, the original problem is resolved. Anyway, a remaining issue is the rule
Path:
'Path' name=ID ':'
'Sections' ('{' sections+=[Section] (',' sections+=[Section] )+ '}' ) | sections+=[Section]
'end'
;
which currently has this precedence:
Path:
(
'Path' name=ID ':' 'Sections'
('{' sections+=[Section] (',' sections+=[Section] )+ '}' )
)
|
(sections+=[Section] 'end')
;
You may want to rewrite it to
Path:
'Path' name=ID ':'
'Sections'
(
('{' sections+=[Section] (',' sections+=[Section] )+ '}' )
| sections+=[Section]
) 'end'
;
lexing and parsing are different steps. thus no using does not matter. and your grammar gets ambigous (have a look at the warnings when generating the lang) you should turn that into a datatype rule (simply omit the terminal keyword)
=> change your grammar to
grammar org.xtext.example.mydsl2.MyDsl with org.eclipse.xtext.common.Terminals
generate myDsl "http://www.xtext.org/example/mydsl2/MyDsl"
Model:
(trains+=Train)*
| (paths+=Path)*
| (sections+=Section)*
;
Train:
'Train' name=ID ':'
'Path' path=[Path]
'Speed' speed=INT
'end'
;
Path:
'Path' name=ID ':'
'Sections' ('{' sections+=[Section] (',' sections+=[Section] )+ '}' ) | sections+=[Section]
'end'
;
Section:
'Section' name=ID ':'
'Start' start=INT
'End' end=INT
('SpeedMax' speedMax=INT)?
'end'
;
FLOAT : '-'? INT ('.' INT)?;

Antlr grammar for parsing simple expression

I would like to parse following expresion with antlr4
termspannear ( xxx, xxx , 5 , true )
termspannear ( xxx, termspannear ( xxx, xxx , 5 , true ) , 5 , true )
Where termspannear functions can be nested
Here is my grammar:
//Define a gramar to parse TermSpanNear
grammar TermSpanNear;
start : TERMSPAN ;
TERMSPAN : TERMSPANNEAR | 'xxx' ;
TERMSPANNEAR: 'termspannear' OPENP BODY CLOSEP ;
BODY : TERMSPAN COMMA TERMSPAN COMMA SLOP COMMA ORDERED ;
COMMA : ',' ;
OPENP : '(' ;
CLOSEP : ')' ;
SLOP : [0-9]+ ;
ORDERED : 'true' | 'false' ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
After running:
antlr4 TermSpanNear.g4
javac TermSpanNear*.java
grun TermSpanNear start -gui
termspannear ( xxx, xxx , 5 , true )
^D![enter image description here][1]
line 1:0 token recognition error at: 'termspannear '
line 1:13 extraneous input '(' expecting TERMSPAN
and the tree looks like:
Can someone help me with this grammar ?
So the parsed tree contains all params and and also nesting works
NOTE:
After suggestion by I rewrote it to
//Define a gramar to parse TermSpanNear
grammar TermSpanNear;
start : termspan EOF;
termspan : termspannear | 'xxx' ;
termspannear: 'termspannear' '(' body ')' ;
body : termspan ',' termspan ',' SLOP ',' ORDERED ;
SLOP : [0-9]+ ;
ORDERED : 'true' | 'false' ;
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
I think now it works
I'm geting the following trees:
For
termspannear ( xxx, xxx , 5 , true )
For
termspannear ( xxx, termspannear ( xxx, xxx , 5 , true ) , 5 , true )
You're using way too many lexer rules.
When you're defining a token like this:
BODY : TERMSPAN COMMA TERMSPAN COMMA SLOP COMMA ORDERED ;
then the tokenizer (lexer) will try to create the (single!) token: xxx,xxx,5,true. E.g. it does not allow any space in between it. Lexer rules (the ones starting with a capital) should really be the "atoms" of your language (the smallest parts). Whenever you start creating elements like a body, you glue atoms together in parser rules, not in lexer rules.
Try something like this:
grammar TermSpanNear;
// parser rules (the elements)
start : termpsan EOF ;
termpsan : termpsannear | 'xxx' ;
termpsannear : 'termspannear' OPENP body CLOSEP ;
body : termpsan COMMA termpsan COMMA SLOP COMMA ORDERED ;
// lexer rules (the atoms)
COMMA : ',' ;
OPENP : '(' ;
CLOSEP : ')' ;
SLOP : [0-9]+ ;
ORDERED : 'true' | 'false' ;
WS : [ \t\r\n]+ -> skip ;