Is there a way to get back source code from antlr4ts parse tree after modifications ctx.removeLastChild/ctx.addChild? [duplicate] - antlr

I want to keep white space when I call text attribute of token, is there any way to do it?
Here is the situation:
We have the following code
IF L > 40 THEN;
ELSE
IF A = 20 THEN
PUT "HELLO";
In this case, I want to transform it into:
if (!(L>40){
if (A=20)
put "hello";
}
The rule in Antlr is that:
stmt_if_block: IF expr
THEN x=stmt
(ELSE y=stmt)?
{
if ($x.text.equalsIgnoreCase(";"))
{
WriteLn("if(!(" + $expr.text +")){");
WriteLn($stmt.text);
Writeln("}");
}
}
But the result looks like:
if(!(L>40))
{
ifA=20put"hello";
}
The reason is that the white space in $stmt was removed. I was wondering if there is anyway to keep these white space
Thank you so much
Update: If I add
SPACE: [ ] -> channel(HIDDEN);
The space will be preserved, and the result would look like below, many spaces between tokens:
IF SUBSTR(WNAME3,M-1,1) = ')' THEN M = L; ELSE M = L - 1;

This is the C# extension method I use for exactly this purpose:
public static string GetFullText(this ParserRuleContext context)
{
if (context.Start == null || context.Stop == null || context.Start.StartIndex < 0 || context.Stop.StopIndex < 0)
return context.GetText(); // Fallback
return context.Start.InputStream.GetText(Interval.Of(context.Start.StartIndex, context.Stop.StopIndex));
}
Since you're using java, you'll have to translate it, but it should be straightforward - the API is the same.
Explanation: Get the first token, get the last token, and get the text from the input stream between the first char of the first token and the last char of the last token.

#Lucas solution, but in java in case you have troubles in translating:
private String getFullText(ParserRuleContext context) {
if (context.start == null || context.stop == null || context.start.getStartIndex() < 0 || context.stop.getStopIndex() < 0)
return context.getText();
return context.start.getInputStream().getText(Interval.of(context.start.getStartIndex(), context.stop.getStopIndex()));
}

Looks like InputStream is not always updated after removeLastChild/addChild operations. This solution helped me for one grammar, but it doesn't work for another.
Works for this grammar.
Doesn't work for modern groovy grammar (for some reason inputStream.getText contains old text).
I am trying to implement function name replacement like this:
enterPostfixExpression(ctx: PostfixExpressionContext) {
// Get identifierContext from ctx
...
const token = CommonTokenFactory.DEFAULT.createSimple(GroovyParser.Identifier, 'someNewFnName');
const node = new TerminalNode(token);
identifierContext.removeLastChild();
identifierContext.addChild(node);
UPD: I used visitor pattern for the first implementation

Related

Split by delimiter which is contained in a record

I have a column which I am splitting in Snowflake.
The format is as follows:
I have been using split_to_table(A, ',') inside of my query but as you can probably tell this uncorrectly also splits the Scooter > Sprinting, Jogging and Walking record.
Perhaps having the delimiter only work if there is no spaced on either side of it? As I cannot see a different condition that could work.
I have been researching online but haven't found a suitable work around yet, is there anyone that encountered a similar problem in the past?
Thanks
This is a custom rule for the split to table, so we can use a UDTF to apply a custom rule:
create or replace function split_to_table2(STR string, DELIM string, ROW_MUST_CONTAIN string)
returns table (VALUE string)
language javascript
strict immutable
as
$$
{
initialize: function (argumentInfo, context) {
},
processRow: function (row, rowWriter, context) {
var buffer = "";
var i;
const s = row.STR.split(row.DELIM);
for(i=0; i<s.length-1; i++) {
buffer += s[i];
if(s[i+1].includes(row.ROW_MUST_CONTAIN)) {
rowWriter.writeRow({VALUE: buffer});
buffer = "";
} else {
buffer += row.DELIM
}
}
rowWriter.writeRow({VALUE: s[i]})
},
}
$$;
select VALUE from
table(split_to_table2('Car > Bike,Bike > Scooter,Scooter > Sprinting, Jogging and Walking,Walking > Flying', ',', '>'))
;
Output:
VALUE
Car > Bike
Bike > Scooter
Scooter > Sprinting, Jogging and Walking
Walking > Flying
This UDTF adds one more parameter than the two in the build in table function split_to_table. The third parameter, ROW_MUST_CONTAIN is the string a row must contain. It splits the string on DELIM, but if it does not have the ROW_MUST_CONTAIN string, it concatenates the strings to form a complete string for a row. In this case we just specify , for the delimiter and > for ROW_MUST_CONTAIN.
We can get a little clever with regexp_replace by replacing the actual delimiters with something else before the table split. I am using double pipes '||' but you can change that to something else. The '\|\|\\1' trick is called back-referencing that allows us to include the captured group (\\1) as part of replacement (\|\|)
set str='car>bike,bike>car,truck, and jeep,horse>cat,truck>car,truck, and jeep';
select $str, *
from table(split_to_table(regexp_replace($str,',([^>,]+>)','\|\|\\1'),'||'))
Yes, you are right. The only pattern, which I can see, is the one with the whitespace after the comma.
It's a small workaround but we can make use of this pattern. In below code I am replacing such commas, where we do have whitespaces afterwards. Then I am applying split to table function and I am converting the previous replacement back.
It's not super pretty and would crash if your string contains "my_replacement" or any other new pattern, but its working for me:
select replace(t.value, 'my_replacement', ', ')
from table(
split_to_table(replace('Car > Bike,Bike > Scooter,Scooter > Sprinting, Jogging and Walking,Walking > Flying', ', ', 'my_replacement'),',')) t

ZK : Using a custom Id generator for selenium while having dynamic ids

I'm using a custom Id generator, the one listed here in the article
It works fine, however I've dynamic ID that i generate for my tabs:
<tabpanels children="#load(vm.myTabsList) #template('myTemplate')" >
<template name="myTemplate" var="each">
<tabpanel id="tabPanel${each.key}">
<include src="zul/myTabZul.zul/>
</tabpanel>
</template>
</tabpanels>
I'm getting this exception :
org.zkoss.zk.ui.UiException: Illegal character, }, not allowed in uuid,
The method that throws Error doesn't leave much room actualy ... Not sure how I can bypass that.
public static void checkUuid(String uuid) {
int j;
if (uuid == null || (j = uuid.length()) == 0)
throw new UiException("uuid cannot be null or empty");
while (--j >= 0) {
final char cc = uuid.charAt(j);
if ((cc < 'a' || cc > 'z') && (cc < 'A' || cc > 'Z')
&& (cc < '0' || cc > '9') && cc != '_')
throw new UiException("Illegal character, "+cc+", not allowed in uuid, "+uuid);
}
}
Although the original generator seems to do just fine but it probably doesn't generate the Id at the same time.
If someone has expering on selenium and ZK, thanks for your input.
I don't think it's Selenium related.
The final '}' is considered part of your id string because 'tabPanel${each.key}' is not a valid EL expression. Have a look here on how to write valid EL expressions.
I would advise you to concatenate the two parts (tabPanel and ${each.key}) using the core's cat method. Your code would become:
<!-- at the beginning of your file -->
<?taglib uri="http://www.zkoss.org/dsp/web/core" prefix="c"?>
...
<tabpanel id="${c:cat('tabPanel',each.key)}">
'each.key' will be properly interpreted thanks to the surrounding ${.} (it does not have to be in direct contact).
I hope this is not too late and it still helps.
I think it's because you try to use a dynamic id, but it doesn't take it.
tabPanel${each.key} contains the '}' spoke of (and mark up that they check from right to left, so that's the first one we come across)
With a little more of the zul I could advise how you can change it.

websql use select in to get rows from an array

in websql we can request a certain row like this:
tx.executeSql('SELECT * FROM tblSettings where id = ?', [id], function(tx, rs){
// do stuff with the resultset.
},
function errorHandler(tx, e){
// do something upon error.
console.warn('SQL Error: ', e);
});
however, I know regular SQL and figured i should be able to request
var arr = [1, 2, 3];
tx.executeSql('SELECT * FROM tblSettings where id in (?)', [arr], function(tx, rs){
// do stuff with the resultset.
},
function errorHandler(tx, e){
// do something upon error.
console.warn('SQL Error: ', e);
});
but that gives us no results, the result is always empty. if i would remove the [arr] into arr, then the sql would get a variable amount of parameters, so i figured it should be [arr]. otherwise it would require us to add a dynamic amount of question marks (as many as there are id's in the array).
so can anyone see what i'm doing wrong?
aparently, there is no other solution, than to manually add a question mark for every item in your array.
this is actually in the specs on w3.org
var q = "";
for each (var i in labels)
q += (q == "" ? "" : ", ") + "?";
// later to be used as such:
t.executeSql('SELECT id FROM docs WHERE label IN (' + q + ')', labels, function (t, d) {
// do stuff with result...
});
more info here: http://www.w3.org/TR/webdatabase/#introduction (at the end of the introduction)
however, at the moment i created a helper function that creates such a string for me
might be better than the above, might not, i haven't done any performance testing.
this is what i use now
var createParamString = function(arr){
return _(arr).map(function(){ return "?"; }).join(',');
}
// when called like this:
createparamString([1,2,3,4,5]); // >> returns ?,?,?,?,?
this however makes use of the underscore.js library we have in our project.
Good answer. It was interesting to read an explanation in the official documentation.
I see this question was answered in 2012. I tried it in Google 37 exactly as it is recommened and this is what I got.
Data on input: (I outlined them with the black pencil)
Chrome complains:
So it accepts as many question signs as many input parameters are given. (Let us pay attention that although array is passed it's treated as one parameter)
Eventually I came up to this solution:
var activeItemIds = [1,2,3];
var q = "";
for (var i=0; i< activeItemIds.length; i++) {
q += '"' + activeItemIds[i] + '", ';
}
q= q.substring(0, q.length - 2);
var query = 'SELECT "id" FROM "products" WHERE "id" IN (' + q + ')';
_db.transaction(function (tx) {
tx.executeSql(query, [], function (tx, results1) {
console.log(results1);
debugger;
}, function (a, b) {
console.warn(a);
console.warn(b);
})
})

Lucene: how to preserve whitespaces etc when tokenizing stream?

I am trying to perform a "translation" of sorts of a stream of text. More specifically, I need to tokenize the input stream, look up every term in a specialized dictionary and output the corresponding "translation" of the token. However, i also want to preserve all the original whitespaces, stopwords etc from the input so that the output is formatted in the same way as the input instead of ended up being a stream of translations. So if my input is
Term1: Term2 Stopword! Term3
Term4
then I want the output to look like
Term1': Term2' Stopword! Term3'
Term4'
(where Termi' is translation of Termi) instead of simply
Term1' Term2' Term3' Term4'
Currently I am doing the following:
PatternAnalyzer pa = new PatternAnalyzer(Version.LUCENE_31,
PatternAnalyzer.WHITESPACE_PATTERN,
false,
WordlistLoader.getWordSet(new File(stopWordFilePath)));
TokenStream ts = pa.tokenStream(null, in);
CharTermAttribute charTermAttribute = ts.getAttribute(CharTermAttribute.class);
while (ts.incrementToken()) { // loop over tokens
String termIn = charTermAttribute.toString();
...
}
but this, of course, loses all the whitespaces etc. How can I modify this to be able to re-insert them into the output? thanks much!
============ UPDATE!
I tried splitting the original stream into "words" and "non-words". It seems to work fine. Not sure whether it's the most efficient way, though:
public ArrayList splitToWords(String sIn)
{
if (sIn == null || sIn.length() == 0) {
return null;
}
char[] c = sIn.toCharArray();
ArrayList<Token> list = new ArrayList<Token>();
int tokenStart = 0;
boolean curIsLetter = Character.isLetter(c[tokenStart]);
for (int pos = tokenStart + 1; pos < c.length; pos++) {
boolean newIsLetter = Character.isLetter(c[pos]);
if (newIsLetter == curIsLetter) {
continue;
}
TokenType type = TokenType.NONWORD;
if (curIsLetter == true)
{
type = TokenType.WORD;
}
list.add(new Token(new String(c, tokenStart, pos - tokenStart),type));
tokenStart = pos;
curIsLetter = newIsLetter;
}
TokenType type = TokenType.NONWORD;
if (curIsLetter == true)
{
type = TokenType.WORD;
}
list.add(new Token(new String(c, tokenStart, c.length - tokenStart),type));
return list;
}
Well it doesn't really lose whitespace, you still have your original text :)
So I think you should make use of OffsetAttribute, which contains startOffset() and endOffset() of each term into your original text. This is what lucene uses, for example, to highlight snippets of search results from the original text.
I wrote up a quick test (uses EnglishAnalyzer) to demonstrate:
The input is:
Just a test of some ideas. Let's see if it works.
The output is:
just a test of some idea. let see if it work.
// just for example purposes, not necessarily the most performant.
public void testString() throws Exception {
String input = "Just a test of some ideas. Let's see if it works.";
EnglishAnalyzer analyzer = new EnglishAnalyzer(Version.LUCENE_35);
StringBuilder output = new StringBuilder(input);
// in some cases, the analyzer will make terms longer or shorter.
// because of this we must track how much we have adjusted the text so far
// so that the offsets returned will still work for us via replace()
int delta = 0;
TokenStream ts = analyzer.tokenStream("bogus", new StringReader(input));
CharTermAttribute termAtt = ts.addAttribute(CharTermAttribute.class);
OffsetAttribute offsetAtt = ts.addAttribute(OffsetAttribute.class);
ts.reset();
while (ts.incrementToken()) {
String term = termAtt.toString();
int start = offsetAtt.startOffset();
int end = offsetAtt.endOffset();
output.replace(delta + start, delta + end, term);
delta += (term.length() - (end - start));
}
ts.close();
System.out.println(output.toString());
}

Lex Yacc, should i tokenize character literals?

I know, poorly worded question not sure how else to ask though.
I always seem to end up in the error branch regardless of what i'm entering and can't figure out where i'm screwing this up. i'm using a particular flavor of Lex/YACC called GPPG which just sets this all up for use with C#
Here is my Y
method : L_METHOD L_VALUE ')' { System.Diagnostics.Debug.WriteLine("Found a method: Name:" + $1.Data ); }
| error { System.Diagnostics.Debug.WriteLine("Not valid in this statement context ");/*Throw new exception*/ }
;
here's my Lex
\'[^']*\' {this.yylval.Data = yytext.Replace("'",""); return (int)Tokens.L_VALUE;}
[a-zA-Z0-9]+\( {this.yylval.Data = yytext; return (int)Tokens.L_METHOD;}
The idea is that i should be able to pass
Method('value') to it and have it properly recognize that this is correct syntax
ultimately the plan is to execute the Method passing the various parameters as values
i've also tried several derivations. for example:
method : L_METHOD '(' L_VALUE ')' { System.Diagnostics.Debug.WriteLine("Found a method: Name:" + $1.Data ); }
| error { System.Diagnostics.Debug.WriteLine("Not valid in this statement context: ");/*Throw new exception*/ }
;
\'[^']*\' {this.yylval.Data = yytext.Replace("'",""); return (int)Tokens.L_VALUE;}
[a-zA-Z0-9]+ {this.yylval.Data = yytext; return (int)Tokens.L_METHOD;}
You need a lex rule to return the punctuation tokens 'as-is' so that the yacc grammar can recognize them. Something like:
[()] { return *yytext; }
added to your second example should do the trick.