How to discard the rest of line after syntax error - yacc

I'm implementing a small shell, and using lex&yacc to parse command. Lex reads command from stdin and yacc execute the command after yyparse.
The problem is, when there is a syntax error, yacc prompt an error and parse from the begining. In this case, cmd1 >>> cmd2 leads to running cmd2 becuase >>> is a syntax error.
My question is how to discard the rest of current command after encounting a syntax error?

If you want to write an interactive language with a prompt that lets users enter expressions, it's a bad idea to simply use yacc on the entire input stream. Yacc might get confused about something on one line and then misinterpret subsequent lines. For instance, the user might have an unbalanced parenthesis on the first line. or a string literal which is not closed, and then yacc will just keep consuming subsequent lines of the input, looking to close the construct.
It's better to gather the line of input from the user, and then parse that as one unit. The end of the line then simply the end of the input as far as Yacc is concerned.
If you're using lex, there are ways to redirect lex to read characters from a buffer in memory instead of from a FILE * stream. Look for documentation on the YY_INPUT macro, which you can define in a Lex file to basically specify the code that Lex uses for obtaining input characters.
Analogy time: Using a scanner developed with lex/yacc for directly handling interactive user input is a little bit like using scanf for handling user input. Whereas capturing a line into a buffer and then parsing it is more like using sscanf. Quote:
It's perfectly appropriate to parse strings with sscanf (as long as the return value is checked), because it's so easy to regain control, restart the scan, discard the input if it didn't match, etc. [comp.lang.c FAQ, 12.20].

Related

How to add a small bit of context in a grammar?

I am tasked to parse (and transform) a code of a computer language, that has a slight quirk in its rules, at least I see it this way. To be exact, the compiler treats new lines (as well as semicolons) as statement separators, but other than that (e.g. inside the statement) it treats them as spacers (whitespace).
As an example, this code:
try
local x = 5 / 0
catch (i)
print(i + "\n")
is proved to be equivalent to this:
try local x = 5 / 0 catch (i) print(i + "\n")
I don't see how I can express such a rule in EBNF, or specifically in Lark EBNF dialect. I mean in a sensible way. I probably could define all possible newline positions inside all statements, but it would be cumbersome and error-prone.
I wish to find a way to treat newlines contextually. Is there a proven method for this, preferably within Python/Lark domain? If I have to modify the parser for that purpose, then where should I start?
Or if I misunderstood something in this language in particular or in machine language parsing in general, or my statement of the problem is wrong, I'd also be happy to get educated.
(As you may guess, the language in question has a well proven implementation, but no officially defined grammar. Also, it is Squirrel, for all that it matters.)
The relevant quote from the "specification" is this:
A squirrel program is a simple sequence of statements.:
stats := stat [';'|'\n'] stats
[...] Statements can be separated with a new line or ‘;’ (or with the keywords case or default if inside a switch/case statement), both symbols are not required if the statement is followed by ‘}’.
These are relatively complex rules and in their totality not context free if newlines can also be ignored everywhere else. Note however that in my understanding the text implies that ; or \n are required when no of the other cases apply. That would make your example illegal. That probably means that the BNF as written is correct, e.g. both ; and \n are optionally everywhere. In that case you can (for lark) just put an %ignore "\n" statement and it should work fine.
Also, lark should not complain if you both ignore the \n and use it in a rule: Where useful it will match it in a rule, otherwise it will just ignore it. Note however that this breaks if you use a Terminal that includes the \n (e.g. WS or /\s/). Just have \n as an extra case.
(For the future: You will probably get faster response for lark questions if you ask over on gitter or at least put a link to SO there.)

Need clean syntax in batch

Context
I am thinking I can solve a problem with the proper creation of a *.bat file.
I am automating a process in a backup program called Acronis Backup and Recovery.
I am able to make a script (jScript) that creates all the syntax except for one part correctly.
In a normal command prompt the command I would run looks like this
acrocmd backup file --include="C:\documents\Gale_thesis.doc" "D:\Sandbox\!oDC!-IMG_0222.MOV" "C:\temp\magnifyReader" --loc="D:\backups" --arc="Backup1a"
The jScript I am creating can generate this with no problem and save as a *.bat file. This can works perfect if my file names are clean. By clean I mean no characters the batch files think are key words and commands.
Anytime I have a word like “copy” or a character like “!” in a file name it fails.
Question
So I am now wondering if loading variables from a text file would do the trick?
I am sure a lot of readers know that when load multiple file/folder paths at the command line you need to surround them with double quotes.
So I need this variable to have the correct syntax to be parsed by the batch file and work like the example when I type it directly at a command prompt.
I had tried to follow info about using for /f etc.
But the examples are not broad enough for me to understand, nobody seems to explain how to use these variables mixed in with other syntax.
I know a little about working with variable in a *.bat file. My jScript application can produce the text in any format a list, escaped, what ever is needed.
Thanks
I might suggest you to take a look at escaping characters
http://www.robvanderwoude.com/escapechars.php
in for loops !var! is used when delayedexpansion is enabled so you might need to escape it
I used the following code provided by Aacini to test the arguments that are being passed
#echo off
setlocal enabledelayedexpansion
set argCount=0
for %%x in (%*) do (
set /A argCount+=1
set "argVec[!argCount!]=%%~x"
)
echo Number of processed arguments: %argCount%
and since delayedexpansion is enabled I had to escape ! character
arg.bat --include="C:\documents\Gale_thesis.doc" "D:\Sandbox\^^^!oDC^^^!-IMG_0222.MOV" "C:\temp\magnifyReader" --loc="D:\backups" --arc="Backup1a"
Also about the triple escape quotes ^^^
the problem here is that we need to pass two special characters,
1st is the up arrow ^ and 2nd is the exclamation mark !
so the 2nd batch file (the one that reads our arguments) should get ^!
to escape ^ we use ^^ and to escape ! we use ^!
Thanks to Aacini for his code in HERE

'Bad repeat count' while inputting a file, FORTRAN

I am trying to read a file into my code.
there are 2 subroutines, one which writes a file and the other which reads it.
the writing part was:
write(*,*)'entered refile, shall make file'
ileunitA=int(presentstep)
write(fname,1012)ileunitA
1012 format('DATA_',i6.6,'.dat')
write(fnam,1112)index
1112 format('pp',i3.3)
open(UNIT=ileunitA,FILE=fname)
!variables from module global
write(ileunita,*)u,v,w,pc,p,p0,rho1,gam,con
write(ileunita,*)aip,aim,ajp,ajm,akp,akm,ap,ap0
write(ileunita,*) scon,smomu,smomv,smomw
...
The reading part was as follows(in another subroutine):
ileunita=25;
open(unit=ILEUNITA,file='DATA_010500.dat')
!variables from module global
read(ileunita,*)u,v,w,pc,p,p0,rho1,gam,con
read(ileunita,*)aip,aim,ajp,ajm,akp,akm,ap,ap0
read(ileunita,*) scon,smomu,smomv,smomw
...
When I run the code, it shows the following error:
At line 3682 of file bub2.f90 (unit = 25, file = 'DATA_000001.dat')
Fortran runtime error: Bad repeat count in item 1 of list input
Can anyone help me figure out what could be the problem? And what is 'repeat count'. What is a 'bad' repeat count? Thanks
Guessing a little (you could show the text in the problematic line in your question...), but you are using list directed input (and output) with the * as the second specifier in the read (and write) statements. List directed input allows multiple fields that have the same value to be represented using the syntax r*c, where r is a numeric repeat count and c is the value to be repeated.
If any of your output items generate a field that contains a * then that could be confusing the processing of input.
(It is permissible (though rare) for a processor to represent multiple output fields that have the same value using a repeat count, for example WRITE (unit,*) 23, 23, 23, 23 could result in an input file that contains the text 4*23.)
List directed input also has some other features, such as the handling of delimiter characters, the / character causing input processing to terminate and the possibility and handling of null values. Some of these features may surprise those not familiar with the rules (which are inspired by typical short cuts taken when input was submitted via punched cards), which why it is often better to avoid list directed input and output and use an explicit format instead.
If any of your data fields are of type character you should consider using a non-default DELIM mode to avoid any special characters within the character variable value from confusing the input processing.

When is EOF needed in ANTLR 4?

The TestDriver in ANTLRWorks2 seems kind of finicky about when it'll accept a grammer without and explicit EOF and when it will not. The Hello grammar in the ANTLR4 Getting Started Guide doesn't use EOF anywhere, so I inferred that it's better to avoid explicit EOF if possible.
What is the best practice for using EOF? When do you actually need it?
You should include an explicit EOF at the end of your entry rule any time you are trying to parse an entire input file. If you do not include the EOF, it means you are not trying to parse the entire input, and it's acceptable to parse only a portion of the input if it means avoiding a syntax error.
For example, consider the following rule:
file : item*;
This rule means "Parse as many item elements as possible, and then stop." In other words, this rule will never attempt to recover from a syntax error because it will always assume that the syntax error is part of some syntactic construct that's beyond the scope of the file rule. Syntax errors will not even be reported, because the parser will simply stop.
If instead I had the following rule:
file : item* EOF;
In means "A file consists exactly of a sequence of zero-or-more item elements." If a syntax error is reached while parsing an item element, this rule will attempt to recover from (and report) the syntax error and continue because the EOF is required and has not yet been reached.
For rules where you are only trying to parse a portion of the input, ANTLR 4 often works, but not always. The following issue describes a technical problem where ANTLR 4 does not always make the correct decision if the EOF is omitted.
https://github.com/antlr/antlr4/issues/118
Unfortunately the performance impact of this change is substantial, so until that is resolved there will be edge cases that do not behave as you expect.

How to write a custom assembly compiler (sort of) in VB.NET

I've been trying to write a simple script compiler for a custom language used by the Game Boy Advance's Z80 processor.
All I want it to do is look at a human-readable command, take it and its arguments and convert it into a hexadecimal value into a ROM file. That's it. Each command is a byte, and each may take a different number of arguments - arguments can be either 8, 16, or 32 bits and each command has a specific number of arguments that it takes.
All of this sort of code is handled by the game and converted into workable machine code within the game's memory, so I'm not writing a full-on assembly compiler if you will. The game automatically knows how many args a command has, what each command does, exactly how to execute it as it is, etc.
For instance, you have command 0x4E, which takes in one 8-bit argument and another 32-bit argument. In hex that would obviously be 4E XX YY YY YY YY. I want my compiler to read it from text as foo 0xXX 0xYYYYYYYY and directly write it into a file as the former.
My question is, how would I do that in VB.NET? I know it's probably a very simple answer, but I see a lot of different options to write it to a file--some work and most don't for me. Could you give me some sample code as to how I would do this?
Writing an assembly compiler as I understand it is not so simple. I recomed you to use one already written see: Software Development Tools for Z80 Family
If you are still interested in writing it here are instructions:
Write the text you want to translate to some file (or memory stream)
Read it line by line
Parse the line either splitting it to an array or with regular
expressions
Identify command and arguments (as far as I remember it some commands
does not have arguments)
Translate the command to Hex (with a collection or dictionary of
commands)
Write results to an array remembering the references for jump
addresses
When everything is translated resolve addresses and write them to
right places.
I think that the most tricky part is to deal with symbolic addressees.
If you are still interested write a first piece of code (or ask how to do it) and continue with next ones.
This sounds like an assembler, even if it for a 'custom language'.
Start by parsing the command lines. use string.split method to convert the string to an array of strings. the first element in the array is your foo, you can then look that up and output 4E, then convert the subsequent elements to bytes.