The position of <EOF> in ANTLR 4 is kind of strange? - antlr

I am trying the ANTLR 4, it gives me the following output for the simple Hello grammar in the book < The Definitive ANTLR 4 Reference >:
[#2,12:11='<EOF>',<-1>,2:0]
According to the book's interpretation, the12:11 notation means <EOF> token starts at position 12 and ends at 11. How could this be possible?
PS. I am working on Windows.

In ANTLR 4, both endpoints are inclusive. The length of a span with inclusive endpoints is the following:
Length = End - Start + 1
The length of the EOF symbol is 0 (it appears at a known location, but it contains no input symbols). If the input is 12 characters long, you get this formula for the end position:
0 = End - 12 + 1
Therefore:
End = 0 + 12 - 1 = 11

Related

get prefix out a size range with different size formats

I have column in a df with a size range with different sizeformats.
artikelkleurnummer size
6725 0161810ZWA B080
6726 0161810ZWA B085
6727 0161810ZWA B090
6728 0161810ZWA B095
6729 0161810ZWA B100
in the sizerange are also these other size formats like XS - XXL, 36-50 , 36/38 - 52/54, ONE, XS/S - XL/XXL, 363-545
I have tried to get the prefix '0' out of all sizes with start with a letter in range (A:K). For exemple: Want to change B080 into B80. B100 stays B100.
steps:
1 look for items in column ['size'] with first letter of string in range (A:K),
2 if True change second position in string into ''
for range I use:
from string import ascii_letters
def range_alpha(start_letter, end_letter):
return ascii_letters[ascii_letters.index(start_letter):ascii_letters.index(end_letter) + 1]
then I've tried a for loop
for items in df['size']:
if df.loc[df['size'].str[0] in range_alpha('A','K'):
df.loc[df['size'].str[1] == ''
message
SyntaxError: unexpected EOF while parsing
what's wrong?
You can do it with regex and the pd.Series.str.replace -
df = pd.DataFrame([['0161810ZWA']*5, ['B080', 'B085', 'B090', 'B095', 'B100']]).T
df.columns = "artikelkleurnummer size".split()
replacement = lambda mpat: ''.join(g for g in mpat.groups() if mpat.groups().index(g) != 1)
df['size_cleaned'] = df['size'].str.replace(r'([a-kA-K])(0*)(\d+)', replacement)
Output
artikelkleurnummer size size_cleaned
0 0161810ZWA B080 B80
1 0161810ZWA B085 B85
2 0161810ZWA B090 B90
3 0161810ZWA B095 B95
4 0161810ZWA B100 B100
TL;DR
Find a pattern "LetterZeroDigits" and change it to "LetterDigits" using a regular expression.
Slightly longer explanation
Regexes are very handy but also hard. In the solution above, we are trying to find the pattern of interest and then replace it. In our case, the pattern of interest is made of 3 parts -
A letter in from A-K
Zero or more 0's
Some more digits
In regex terms - this can be written as r'([a-kA-K])(0*)(\d+)'. Note that the 3 brackets make up the 3 parts - they are called groups. It might make a little or no sense depending on how exposed you have been to regexes in the past - but you can get it from any introduction to regexes online.
Once we have the parts, what we want to do is retain everything else except part-2, which is the 0s.
The pd.Series.str.replace documentation has the details on the replacement portion. In essence replacement is a function that takes all the matching groups as the input and produces an output.
In the first part - where we identified three groups or parts. These groups are accessed with the mpat.groups() function - which returns a tuple containing the match for each group. We want to reconstruct a string with the middle part excluded, which is what the replacement function does
sizes = [{"size": "B080"},{"size": "B085"},{"size": "B090"},{"size": "B095"},{"size": "B100"}]
def range_char(start, stop):
return (chr(n) for n in range(ord(start), ord(stop) + 1))
for s in sizes:
if s['size'][0].upper() in range_char("A", "K"):
s['size'] = s['size'][0]+s['size'][1:].lstrip('0')
print(sizes)
Using a List/Dict here for example.

How to process mainframe numbers where "{" is the last character

I have a one mainframe file data like as below
000000720000{
I need to parse the data and load into a hive table like below
72000
the above field is income column and "{" sign which denotes +ve amount
datatype used while creating table income decimal(11,2)
in layout.cob copybook using INCOME PIC S9(11)V99
could someone help?
The number you want is 7200000 which would be 72000.00.
The conversion you are looking for is:
Positive numbers
{ = 0
A = 1
B = 2
C = 3
D = 4
E = 5
F = 6
G = 7
H = 8
I = 9
Negative numbers (this makes the whole value negative)
} = 0
J = 1
K = 2
L = 3
M = 4
N = 5
O = 6
P = 7
Q = 8
R = 9
Let's explain why.
Based on your question the issue you are having is when packed decimal data is unpacked UNPK into character data. Basically, the PIC S9(11)V2 actually takes up 7 bytes of storage and looks like the picture below.
You'll see three lines. The top is the character representation (missing in the first picture because the hex values do not map to displayable characters) and the lines below are the hexadecimal values. Most significant digit on top and least below.
Note that in the rightmost byte the sign is stored as C which is positive, to represent a negative value you would see a D.
When it is converted to character data it will look like this
Notice the C0 which is a consequence of the unpacking to preserve the sign. Be aware that this display is on z/OS which is EBCDIC. If the file has been transferred and converted to another code-page you will see the correct character but the hex values will be different.
Here are all the combinations you will likely see for positive numbers
and here for negative numbers
To make your life easy, if you see one of the first set of characters then you can replace it with the corresponding number. If you see something from the second set then it is a negative number.

Regular expression (0+1)*1(0+1)*0 DFA

I'm trying to understand the regular expression: (0+1)*1(0+1)*0 Could you provide examples that matches this pattern?
Let me explain :
1 - (0+1) mean any number of 0, then a 1
2 - (0+1)* means the previous line any number of times (can be 0)
3 - (0+1)*1 mean the previous line and a 1
4 - (0+1)*0 means line 2 and a 0
10 works : 0 times (0+1), then a 1, then 0 times (0+1), then a 0.
00000000000100000000000110 works : eleven 0 and a 1, twice (this is (0+1)*). Then, a 1. Then, no (0+1), and the last 0. A few other examples :
10
00001000010000110000100001000010
01010110
0110
I hope you understood (I'm not english, my english is bad, sorry)
EDIT : There are a lot of websites that can help you with regular expressions, whether it is learning or testing regex.

Finding Coefficients of LFSR

I am studying cryptography from Cristof Paar's book. There is a question about LFSR's I have trouble with. I just can't understand one point here. Question is this:
We want to perform an attack on another LFSR-based stream cipher. In order
to process letters, each of the 26 uppercase letters and the numbers 0, 1, 2, 3, 4, 5
are represented by a 5-bit vector according to the following mapping:
A -> 0 = 00000
.
.
.
Z -> 25 = 11001
0 -> 26 = 11010
.
.
.
5 -> 31= 11111
(binary)
We happen to know the following facts about the system:
-The degree of the LFSR is m = 6.
-Every message starts with the header WPI
We observe now on the channel the following message (the fourth letter is a zero): j5a0edj2b
What are the feedback coefficients of the LFSR? (This one!)
Solution:
I can't understand the matrix in this solution where did these numbers come?
Using WPI, we have plaintext begins with
P=>(10110)(01111)(01000)
Using j5a0edj2b we have the ciphertext
C=>(01001)(11111)(00000)(11010)(00100)(00011)(01001)............
then by addition of P and C in mod 2, the key stream is
S=>(11111)(10000)(01000)....
we find the matrix from key stream
s0=1,s1=1,s2=1,s3=1,s4=1,s5=1,s6=0,s7=0,s8=0,s9=0,s10=0,s11=1 etc
For the matrix
first line.... (s0,s1,s2,s3,s4,s5)
second line....(s1,s2,s3,s4,s5,s6)
third line.....(s2,s3,s4,s5,s6,s7)
4th (s3,s4,s5,s6,s7,s8)
5th (s4,s5,s6,s7,s8,s9)
last line (s5,s6,s7,s8,s9,s10)
this calulations are given in LFSRs in details

Pascal- How to convert Real to Integer variable

I'm writing a task in pascal.
Everything is ok, just my result is not right.
I'm summing some numbers
Example: 2.3 + 3.4+ 3.3 = 9
But output shows: 9.000000 + EEE or something like that.
So- how to convert, to be only 9, not this REAL variable.
To actually convert:
var
i: integer;
...
i := round(floatVar);
To output only the integer part:
writeln(floatVar:9:0);
Let's consider this quite simpler equation:
3.5 + 2.5
What do you expect? 6, right? Let's try this code
write(3.5 + 2.5);
Unfortunately, it's a floating-point number, so it would produce a number represented in a scientific way:
6.00000000000E+00
or 6.0000000000 x 100, or 6 x 10o. Whatever, you only care about 6, who need this weird useless long number? So the idea is to cut off the decimal part and output to the console only the integer part, which can be done with this line of code:
write(3.5 + 2.5 : 0 : 0);
Ok, now it outputs a beautiful number as expected
6
Seems like the problem is solved, but you say that:
I'm summing some numbers
Example: 2.3 + 3.4+ 3.3 = 9
Ohh so that the evenly, beautiful integer is just randomly appeared? Here the problem comes, how do you expect this equation would output?
3.6 + 2.5
It should be 6.1, right? Let's try it with the worked line of code:
write(3.6 + 2.5 : 0 : 0);
And the output is...
6
Unexpected, right? So how about rounding to some decimal places, like 1?
write(3.5 + 2.5 : 0 : 1);
write(3.6 + 2.5 : 0 : 1);
Then, 3.5 + 2.5 = 6.0 and 3.6 + 2.5 = 6.1. But 6.0 may look quite long, so how to make it output 6 for 6.0 and 6.1 for 6.1?
Actually, you can't make the program auto-detect if a real variable contains an integer value because the way a real var is stored is completely different from an integer var (how different they are, please contact Google; but you can do it manually by making a function to do the job).
So my solution is, to be easy, making the output rounded to some decimal places, and that's it.
For purpose of showing pretty output on the screen you can use something like this:
Writeln(result:0:2);
Result on screen would be this:
9.00
What this means someone would ask? Well first number 0 means how wide filed is. if you say it's 0 then Pascal writes it at the very left side of screen. If you said writeln(result:5:2) result would be:
9.00
In other words i would print form the right side and leave 5 chars to do so.
Second number 2, in this example means you want that result printed with 2 decimal places. You can place it only if you want to print on screen value that is real, single, double, extended and so on.You can round to any number of decimals, and if you do writeln(result:0:0) you would get ouput:
9
If you are printing integer and want to have some length of field, lets sat 5 you would do: writeln(int:5). If you added :2 to the end you would get compile time error.
This all also works for something like this: writeln(5/3.5+sqrt(3):0:3),
You should know that this does not round variable itself but just formats output. This is also legal:
program test;
var
a:real;
n,m:integer;
begin
readln(a,m,n);
writeln(a:m:n);
end.
What i did here is i asked user if on how many decimals and with what length of field he wants to write entered number a. This can be useful so i'm pointing it out. Thank you for reading. I hope i helped
You can convert to string, get the int part, e convert to int number!
Or Float to Str than Str to Int:
nPage := StrToInt(FloatToStr(Int(nReg / nTPages))) + 1;