ANTLR4 store all tokens in specific channel - antlr

I have a lexer which puts every token the parser is interested in into the default channel and all comment-tokens in channel 1.
The default channel is used to create the actual tree while the comment channel is used to seperate the tokens and to store all comments.
Look at this scribble:
In chapter 12.1 p. 206-208 in The Definitive ANTLR4 Reference there is a comparable situation where comment tokens are shifted inside the token stream. The represented approach is to read out the comment-channel in an exit-method inside the parser.
In my opinion this is a very rough option for my problem, because i don't want to overwhelm my listener with that back-looking operations. Is there a possibility to override a method which puts tokens inside the comment-channel?

It looks like you misunderstand how channels work in ANTLR. What happens is that the lexer, as it comes along a token, assigns the default channel (just a number) during initialization of the token. That value is only changed when the lexer finds a -> channel() command or you change it explicitely in a lexer action. So there is nothing to do in a listener or whatever to filter out such tokens.
Later when you want to get all tokens "in" a given channel (i.e. all tokens that have a specific channel number assigned) you can just iterate over all tokens returned by your token stream and compare the channel value. Alternatively you can create a new CommonTokenStream instance and pass it the channel your are interested in. It will then only give you those tokens from that channel (it uses a token source, e.g. a lexer, to get the actual tokens and cache them).

I found out, that there is a easy way to override how tokens are created. To do this, one can override a method inside the CommonTokenFactory and give it to the Lexer. At this point i can check the channel and i am able to push the tokens in a separate set.
In my opinion this is a little bit hacky, but i do not need to iterate over the whole commonTokenStream later on.
This code is only to demonstrate the idea behind (in C#) .
internal class HeadAnalyzer
{
#region Methods
internal void AnalyzeHeader(Stream headerSourceStream)
{
var antlrFileStream =
new AntlrInputStream(headerSourceStream);
var mcrLexer = new MCRLexer(antlrFileStream);
var commentSaverTokenFactory = new MyTokenFactory();
mcrLexer.TokenFactory = commentSaverTokenFactory;
var commonTokenStream = new CommonTokenStream(mcrLexer);
var mcrParser = new MCRParser(commonTokenStream);
mcrParser.AddErrorListener(new DefaultErrorListener());
MCRParser.ProgramContext tree;
try
{
tree = mcrParser.program(); // create the tree
}
catch (SyntaxErrorException syntaxErrorException)
{
throw new NotImplementedException();
}
var headerContext = new HeaderContext();
var headListener = new HeadListener(headerContext);
ParseTreeWalker.Default.Walk(headListener, tree);
var comments = commentSaverTokenFactory.CommentTokens; // contains all comments :)
}
#endregion
}
internal class MyTokenFactory : CommonTokenFactory
{
internal readonly List<CommonToken> CommentTokens = new List<CommonToken>();
public override CommonToken Create(Tuple<ITokenSource, ICharStream> source, int type, string text, int channel, int start, int stop, int line, int charPositionInLine)
{
var token = base.Create(source, type, text, channel, start, stop, line, charPositionInLine);
if (token.Channel == 1)
{
CommentTokens.Add(token);
}
return token;
}
}
Maybe there are some better approaches. For my usecase it works as expected.

Related

Retrieving Guild Version of Channel/User from DiscordSocketClient

I need SocketGuildUser and SocketGuildChannel when a message arrives. I don't see a straightforward way of getting these without downcasting.
private void downCast(SocketMessage msg)
{
SocketUser user = msg.Author;
ISocketMessageChannel channel = msg.Channel;
var gUser = (SocketGuildUser)user;
var Channel = (SocketGuildChannel) channel;
}
Not sure under what circumstances the downcast will fail...
This all leaves me really scratching my head. Why is there no link between a SocketMessage and a SocketGuild? There isn't even a field for the Guild's ID to enable a subsequent call to DiscordSocketClient.GetGuild(uint64 ID).

What happens when flux is returned from spring web controller?

I am comparatively new to reactive APIs and was curious about what was happening behind the scenes when we return a Flux from a web controller.
According to spring-web documentation
Reactive return values are handled as follows:
A single-value promise is adapted to, similar to using DeferredResult. Examples include Mono (Reactor) or Single (RxJava).
A multi-value stream with a streaming media type (such as application/stream+json or text/event-stream) is adapted to, similar to using ResponseBodyEmitter or SseEmitter. Examples include Flux (Reactor) or Observable (RxJava). Applications can also return Flux or Observable.
A multi-value stream with any other media type (such as application/json) is adapted to, similar to using DeferredResult<List<?>>.
I created two APIs as below:
#GetMapping("/async-deferredresult")
public DeferredResult<List<String>> handleReqDefResult(Model model) {
LOGGER.info("Received async-deferredresult request");
DeferredResult<List<String>> output = new DeferredResult<>();
ForkJoinPool.commonPool().submit(() -> {
LOGGER.info("Processing in separate thread");
List<String> list = new ArrayList<>();
for (int i = 0; i < 10000 ; i++) {
list.add(String.valueOf(i));
}
output.setResult(list);
});
LOGGER.info("servlet thread freed");
return output;
}
#GetMapping(value = "/async-flux",produces = MediaType.APPLICATION_JSON_VALUE)
public Flux<String> handleReqDefResult1(Model model) {
LOGGER.info("Received async-deferredresult request");
List<String> list = new ArrayList<>();
list.stream();
for (int i = 0; i < 10000 ; i++) {
list.add(String.valueOf(i));
}
return Flux.fromIterable(list);
}
So the exception was that both APIs should behave same as multi-value stream(Flux) should have similar behavior to that of a returning a DeferredResult. But in API where deferred result was returned, whole list was printed in one go on browser where as in API where Flux was returned the numbers where printed sequentially(one by one).
What exactly is happening when I am returning Flux from controller ?
When we return a Flux from a service endpoint many things can happen. But I assume you want to know what is happening when Flux observed as stream of events from client of this endpoint.
Scenario One: By adding 'application/json' as the content type of the endpoint Spring will communicate to the client to expect JSON body.
#GetMapping(value = "/async-flux", produces = MediaType.APPLICATION_JSON_VALUE)
public Flux<String> handleReqDefResult1(Model model) {
List<String> list = new ArrayList<>();
for (int i = 0; i < 10000; i++) {
list.add(String.valueOf(i));
}
return Flux.fromIterable(list);
}
The output at the client will be the whole set of numbers in one go. And once the response delivered the connection will be closed. Even though you have used Flux as the response type, you are still bound the laws of how HTTP over TCP/IP works. The endpoint got a HTTP request, execute the logic and respond with HTTP response containing final result.
As a result, you do not see the real value of a reactive api.
Scenario Two: By adding 'application/stream+json' as the content type of the endpoint, Spring starts to treat the resulting events of the Flux stream as individual JSON items. When an item is emitted is gets serialised, the HTTP response buffer is flushed, and the connection from the server to client keep open up until the event sequence get completed.
To get that working we can slightly modify your original code as follows.
#GetMapping(value = "/async-flux",produces = MediaType.APPLICATION_STREAM_JSON_VALUE)
public Flux<String> handleReqDefResult1(Model model) {
List<String> list = new ArrayList<>();
for (int i = 0; i < 10000 ; i++) {
list.add(String.valueOf(i));
}
return Flux.fromIterable(list)
// we have 1 sec delay to demonstrate the difference of behaviour.
.delayElements(Duration.ofSeconds(1));
}
This time we can see the real value of reactive api endpoint where it is able to deliver results to it's client as date get available.
You can find more details about how to build reactive REST APIs at
https://medium.com/#senanayake.kalpa/building-reactive-rest-apis-in-java-part-1-cd2c34af55c6
https://medium.com/#senanayake.kalpa/building-reactive-rest-apis-in-java-part-2-bd270d4cdf3f

Retrieve token list from ParserRuleContext

Main question
Is there an easy way to get a list of tokens (ideally in a form of a TokenStream) from the parser rule class ParserRuleContext?
Related answers
In an answer for a question Traversal of tokens using ParserRuleContext in listener - ANTLR4 this solution appeared:
ParserRuleContext pctx = ctx.getParent();
List<TerminalNode> nodes = pctx.getTokens(pctx.getStart(), pctx.getStop());
But there is no method with signature ParserRuleContext::getTokens(Token, Token) in ANTLRv4.
My solution
I thought about retriving a list of tokens from TokenStream by using TokenStream:get(index: int) method, where index value will be set to a range of indicies of given ParserRuleContext start/stop tokens.
Additional question
Is there a way to get a subset of tokens from TokenStream in a form of another TokenStream?
So, I overlooked some classes and their inferfaces in the ANTLRv4 API.
Answer to main question
Proposed above solution is correct. Also BufferedTokenStream and CommonTokenStream classes have method public List<Token> getTokens(int start, int stop) which allows to retrive list of tokens from a given range (especially from a range between start token and stop token of ParserRuleContext class)
Answer to additional question
You can utilize ListTokenSource class which implements TokenSource interface. Then you can create CommonTokenStream class passing the ListTokenSource.
Code example of ParserRuleRewriter
I encapsulate above ideas into small code example featuring ParserRuleRewriter - a TokenStreamRewriter that rewrites only given parser rule. In the code tokenStream parameter is a token stream of a full program.
import org.antlr.v4.runtime.*;
import java.util.List;
public class ParserRuleRewriter {
private TokenStreamRewriter rewriter;
public ParserRuleRewriter(ParserRuleContext parserRule, CommonTokenStream tokenStream) {
Token start = parserRule.getStart();
Token stop = parserRule.getStop();
List<Token> ruleTokens = tokenStream.getTokens(start.getTokenIndex(), stop.getTokenIndex());
ListTokenSource tokenSource = new ListTokenSource(ruleTokens);
CommonTokenStream commonTokenStream = new CommonTokenStream(tokenSource);
commonTokenStream.fill();
rewriter = new TokenStreamRewriter(commonTokenStream);
}
public void replace(Token token, ParserRuleContext rule) {
rewriter.replace(token, rule.getText());
}
#Override
public String toString() {
return rewriter.getText();
}
}

Google diff-match-patch : How to unpatch to get Original String?

I am using Google diff-match-patch JAVA plugin to create patch between two JSON strings and storing the patch to database.
diff_match_patch dmp = new diff_match_patch();
LinkedList<Patch> diffs = dmp.patch_make(latestString, originalString);
String patch = dmp.patch_toText(diffs); // Store patch to DB
Now is there any way to use this patch to re-create the originalString by passing the latestString?
I google about this and found this very old comment # Google diff-match-patch Wiki saying,
Unpatching can be done by just looping through the diff, swapping
DIFF_INSERT with DIFF_DELETE, then applying the patch.
But i did not find any useful code that demonstrates this. How could i achieve this with my existing code ? Any pointers or code reference would be appreciated.
Edit:
The problem i am facing is, in the front-end i am showing a revisions module that shows all the transactions of a particular fragment (take for example an employee details), like which user has updated what details etc. Now i am recreating the fragment JSON by reverse applying each patch to get the current transaction data and show it as a table (using http://marianoguerra.github.io/json.human.js/). But some JSON data are not valid JSON and I am getting JSON.parse error.
I was looking to do something similar (in C#) and what is working for me with a relatively simple object is the patch_apply method. This use case seems somewhat missing from the documentation, so I'm answering here. Code is C# but the API is cross language:
static void Main(string[] args)
{
var dmp = new diff_match_patch();
string v1 = "My Json Object;
string v2 = "My Mutated Json Object"
var v2ToV1Patch = dmp.patch_make(v2, v1);
var v2ToV1PatchText = dmp.patch_toText(v2ToV1Patch); // Persist text to db
string v3 = "Latest version of JSON object;
var v3ToV2Patch = dmp.patch_make(v3, v2);
var v3ToV2PatchTxt = dmp.patch_toText(v3ToV2Patch); // Persist text to db
// Time to re-hydrate the objects
var altV3ToV2Patch = dmp.patch_fromText(v3ToV2PatchTxt);
var altV2 = dmp.patch_apply(altV3ToV2Patch, v3)[0].ToString(); // .get(0) in Java I think
var altV2ToV1Patch = dmp.patch_fromText(v2ToV1PatchText);
var altV1 = dmp.patch_apply(altV2ToV1Patch, altV2)[0].ToString();
}
I am attempting to retrofit this as an audit log, where previously the entire JSON object was saved. As the audited objects have become more complex the storage requirements have increased dramatically. I haven't yet applied this to the complex large objects, but it is possible to check if the patch was successful by checking the second object in the array returned by the patch_apply method. This is an array of boolean values, all of which should be true if the patch worked correctly. You could write some code to check this, which would help check if the object can be successfully re-hydrated from the JSON rather than just getting a parsing error. My prototype C# method looks like this:
private static bool ValidatePatch(object[] patchResult, out string patchedString)
{
patchedString = patchResult[0] as string;
var successArray = patchResult[1] as bool[];
foreach (var b in successArray)
{
if (!b)
return false;
}
return true;
}

ANTLR forward references

I need to create a grammar for a language with forward references. I think that the easiest way to achieve this is to make several passes on the generated AST, but I need a way to store symbol information in the tree.
Right now my parser correctly generates an AST and computes scopes of the variables and function definitions. The problem is, I don't know how to save the scope information into the tree.
Fragment of my grammar:
composite_instruction
scope JScope;
#init {
$JScope::symbols = new ArrayList();
$JScope::name = "level "+ $JScope.size();
}
#after {
System.out.println("code block scope " +$JScope::name + " = " + $JScope::symbols);
}
: '{' instruction* '}' -> ^(INSTRUCTION_LIST instruction*)
;
I would like to put a reference to current scope into a tree, something like:
: '{' instruction* '}' -> ^(INSTRUCTION_LIST instruction* {$JScope::symbols})
Is it even possible? Is there any other way to store current scopes in a generated tree? I can generate the scope info in a tree grammar, but it won't change anything, because I still have to store it somewhere for the second pass on the tree.
To my knowledge, the syntax for the rewrite rules doesn't allows for directly assigning values as your tentative snippet suggests. This is in part due to the fact that the parser wouldn't really know to what part of the tree/node the values should be added to.
However, one of the cool features of ANTLR-produced ASTs is that the parser makes no assumptions about the type of the Nodes. One just needs to implement a TreeAdapator which serves as a factory for new nodes and as a navigator of the tree structure. One can therefore stuff whatever info may be needed in the nodes, as explained below.
ANTLR provides a default tree node implementation, CommonTree, and in most cases (as in the situation at hand) we merely need to
subclass CommonTree by adding some custom fields to it
subclass the CommonTreeAdaptor to override its create() method, i.e. the way it produces new nodes.
but one could also create a novel type of node altogher, for some odd graph structure or whatnot. For the case at hand, the following should be sufficient (adapt for the specific target language if this isn't java)
import org.antlr.runtime.tree.*;
import org.antlr.runtime.Token;
public class NodeWithScope extends CommonTree {
/* Just declare the extra fields for the node */
public ArrayList symbols;
public string name;
public object whatever_else;
public NodeWithScope (Token t) {
super(t);
}
}
/* TreeAdaptor: we just need to override create method */
class NodeWithScopeAdaptor extends CommonTreeAdaptor {
public Object create(Token standardPayload) {
return new NodeWithScope(standardPayload);
}
}
One then needs to slightly modify the way the parsing process is started, so that ANTLR (or rather the ANTLR-produced parser) knows to use the NodeWithScopeAdaptor rather than CommnTree.
(Step 4.1 below, the rest if rather standard ANTLR test rig)
// ***** Typical ANTLR pipe rig *****
// ** 1. input stream
ANTLRInputStream input = new ANTLRInputStream(my_input_file);
// ** 2, Lexer
MyGrammarLexer lexer = new MyGrammarLexer(input);
// ** 3. token stream produced by lexer
CommonTokenStream tokens = new CommonTokenStream(lexer);
// ** 4. Parser
MyGrammarParser parser = new MyGrammarParser(tokens);
// 4.1 !!! Specify the TreeAdapter
NodeWithScopeAdaptor adaptor = new NodeWithScopeAdaptor();
parser.setTreeAdaptor(adaptor); // use my adaptor
// ** 5. Start process by invoking the root rule
r = parser.MyTopRule();
// ** 6. AST tree
NodeWithScope t = (NodeWithScope)r.getTree();
// ** 7. etc. parse the tree or do whatever is needed on it.
Finally your grammar would have to be adapted with something akin to what follows
(note that the node [for the current rule] is only available in the #after section. It may however reference any token attribute and other contextual variable from the grammar-level, using the usual $rule.atrribute notation)
composite_instruction
scope JScope;
#init {
$JScope::symbols = new ArrayList();
$JScope::name = "level "+ $JScope.size();
}
#after {
($composite_instruction.tree).symbols = $JScope::symbols;
($composite_instruction.tree).name = $JScope::name;
($composite_instruction.tree).whatever_else
= new myFancyObject($x.Text, $y.line, whatever, blah);
}
: '{' instruction* '}' -> ^(INSTRUCTION_LIST instruction*)
;