Issue
I am getting data via command awk from file, exactly string in "" from <a href="DATA">.
Source file.
...
<!-- Page 18 -->
<p style="position:absolute;top:956px;left:485px;white-space:nowrap" class="ft1829">145041</p>
<p style="position:absolute;top:586px;left:246px;white-space:nowrap" class="ft1829">145042</p>
<p style="position:absolute;top:156px;left:446px;white-space:nowrap" class="ft1829">440332</p>
<!-- Page 19 -->
<p style="position:absolute;top:1205px;left:53px;white-space:nowrap" class="ft1938"><b>1 790,- </b>| 457710</p>
<p style="position:absolute;top:1205px;left:634px;white-space:nowrap" class="ft1938"><b>2 290,- </b>| 464429</p>
<p style="position:absolute;top:924px;left:353px;white-space:nowrap" class="ft1938"><b>2 590,- </b>| 464430</p>
...
Command (with help on this forum).
awk '/Page/ {h=$3} /-- Page 1 --/ {h="Title"} /href=/ && h {split($0,a,"\"");print h","a[6]}'
Results.
...
18,145041
18,145042
18,440332
19,457710
19,464429
...
Problem is, when links are on the same line, data for only first link are processed.
Example.
` 457710</p> | 464429</p>`
Output.
...
18,457710,
...
Expected output.
...
18,457710,
18,464429,
...
What is wrong in awk command?
Thanks for any ideas.
Update 1
I need take all hrefs from this input.
<!-- Page 1 -->
<p style="position:absolute;top:397px;left:23px;white-space:nowrap" class="ft116">237002 | 237003</p>
<p style="position:absolute;top:831px;left:666px;white-space:nowrap" class="ft124">230041</p>
<p style="position:absolute;top:855px;left:447px;white-space:nowrap" class="ft116">467173</p>
<p style="position:absolute;top:910px;left:36px;white-space:nowrap" class="ft116">Hmotnost: 6 kg | 464431</p>
<!-- Page 2 -->
<p style="position:absolute;top:1176px;left:561px;white-space:nowrap" class="ft216">318417</p>
<p style="position:absolute;top:963px;left:561px;white-space:nowrap" class="ft216">338701</p>
...
Command.
awk 'match($0,/class=\"[a-zA-Z]+[0-9]+/){val=substr($0,RSTART,RLENGTH);sub(/[^0-9]*/,"",val)} match($0,/<a href=\"[0-9]+/){val1=substr($0,RSTART,RLENGTH);sub(/[^"]*\"/,"",val1);print substr(val,1,2)","val1}' test.html
Output.
11,237002
12,230041
11,467173
11,464431
21,318417
...
But I need this (for example 1,238003 is not present in result above, and first column page is different).
1,237002
1,237003
1,230041
1,467173
1,464431
2,318417
...
Thanks.
As the awk command will only process the first hyperlink on each line, just edit the file first to suit the awk command:
sed 's/\(a href=\)/\n\1/g' data-file | awk '/page/ ....'
Tested with given example, could you please try following.
awk '
{
gsub("</p> | ","&\n")
$1=$1
}
match($0,/class=\"[a-zA-Z]+[0-9]+/){
val=substr($0,RSTART,RLENGTH)
sub(/[^0-9]*/,"",val)
}
match($0,/<a href=\"[0-9]+/){
val1=substr($0,RSTART,RLENGTH)
sub(/[^"]*\"/,"",val1)
print substr(val,1,2)","val1
val=val1=""
}
' Input_file
#!/bin/csh
set i=0
if ($i == 1 && { -e $HOME_EXIST } )then
echo "Hi"
else
echo "Hello"
endif
Why both condition are checked in logical AND operation in csh if first condition return false ?
I am getting following error :
HOME_EXIST: Undefined variable.
use $? to check if the var is defined:
if ($?HOME_EXIST) then
(do whatever you want)
endif
Your problem is that even though && is lazy, csh will try to substitute $HOME_EXIST before it starts to evaluate the expression: Reference.
You could get around this problem by using nested ifs.
#!/bin/csh
set i=0
if ($i == 1)then
if(-e $HOME_EXIST)then
echo "Hi"
endif
else
echo "Hello"
endif
I am using antlr 3.1.3 and generating a python target. My lexer and parser accept very large files. Based on command-line or dynamic run-time controlled parameters, I would like to capture a portion of the recognized input and stop parsing early. For example, if my language consists of a header and a body, and the body might have gigabytes of tokens, and I am only interested in the header, I would like to have a rule that stops the lexer and parser without raising an exception. For performance reasons, I don't want to read the entire body.
grammar Example;
options {
language=Python;
k=2;
}
language:
header
body
EOF
;
header:
HEAD
(STRING)*
;
body:
BODY { if stopearly: help() }
(STRING)*
;
// string literals
STRING: '"'
(
'"' '"'
| NEWLINE
| ~('"'|'\n'|'\r')
)*
'"'
;
// Whitespace -- ignored
WS:
( ' '
| '\t'
| '\f'
| NEWLINE
)+ { $channel=HIDDEN }
;
HEAD: 'head';
BODY: 'body';
fragment NEWLINE: '\r' '\n' | '\r' | '\n';
What about:
body:
BODY {!stopearly}? => (STRING)*
;
?
That's using a syntantic predicate to enable certain language parts. I use that often to toggle language parts depending on a version number. I'm not 100% certain. It might be you have to move the predicate and the code following it into an own rule.
This is a python-specific answer. I Added this to my parser:
#parser::header
{
class QuitEarlyException(Exception):
def __init__(self, value):
self.value = value
def __str__(self):
return repr(self.value)
}
and changed this:
body:
BODY { if stopearly: raise QuitEarlyException('ok') }
(STRING)*
;
Now I have a "try" block around my parser:
try:
parser.language()
except QuitEarlyException as e:
print "stopped early"
I am using recursion in yacc and I want to check all the values that are parsed by the recursion rule.My yacc rule is
%{
#include<stdio.h>
.
.
.
%}
%%
abc:ABC expr
;
expr:VALUE','expr
|VALUE
;
%%
if i have a statement like
ABC 1,2,3,4
it gets parsed.
i want to check that all the numbers parsed via expr
have sum equal to some value say 10
how can i check this?
Edit:
You can count the values parsed and keep their running total with code that goes something like this:
%{
#include<stdio.h>
int count;
.
.
.
%}
%%
abc: { count = 0; } ABC expr { printf("count: %d; sum: %d\n", count, $2); }
;
expr: VALUE ',' expr { $$ = $1 + $3; }
| VALUE { $$ = $1; count++; }
;
%%
I'm trying to figure out the grammar for the following syntax.
foreach
where x = 1 when some_variable = true
where x = 2 when some_variable = false
where y = 0
print z // Main block
when none // optional
print 'not found' // Exception block
endfor
My grammar looks like:
foreach_stmt : 'for' 'each' where_opt* blockstmt* whennone_opt? 'endfor'
;
where_opt : 'where' cond_clause
;
cond_clause : test when_opt*
;
when_opt : 'when' test
;
whennone_opt : 'when' 'none' blockstmt*
;
test : or_test
;
// further rules omitted
But when the main block is blank, for example
foreach
where x = 1
// main block is blank, do nothing
when none
print 'none'
endfor
In this case my grammar considers "when none" is a cond_clause to "where x = 1" which is not what I'm expecting.
Also consider the following case:
foreach
where x = 1 when none = 2
print 'none'
// exceptional block is blank
endfor
Where the "none" can be a variable, and "none = 2" should match the "test" rule so it's part of "where...when...".
However when "none" is not in an expression statement, I want "when none" match the "foreach" rather than the previous "where". How can I modify my grammar to do this?
Sorry this title sucks but I don't know how to describe the problem in a few words. Any help would be greatly appreciated.
The parser generated from the following ANTLR grammar:
grammar Genexus;
parse
: foreach_stmt* EOF
;
foreach_stmt
: 'foreach' where_opt* blockstmt* whennone_opt? 'endfor'
;
where_opt
: 'where' cond_clause
;
cond_clause
: test when_opt*
;
when_opt
: 'when' test
;
whennone_opt
: 'when' 'none' blockstmt*
;
test
: identifier '=' atom
;
identifier
: 'none'
| Identifier
;
blockstmt
: 'print' atom
;
atom
: Boolean
| Number
| StringLiteral
| Identifier
;
Number
: '0'..'9'+
;
Boolean
: 'true'
| 'false'
;
Identifier
: ('a'..'z' | 'A'..'Z' | '_')+
;
StringLiteral
: '\'' ~'\''* '\''
;
Ignore
: (' ' | '\t' | '\r' | '\n') {skip();}
| '//' ~('\r' | '\n')* {skip();}
| '/*' .* '*/' {skip();}
;
Produces the following 3 parse-trees from your examples:
1
Source:
foreach
where x = 1 when some_variable = true
where x = 2 when some_variable = false
where y = 0
print z // Main block
when none // optional
print 'not found' // Exception block
endfor
Parse tree:
larger image
2
Source:
foreach
where x = 1
// main block is blank, do nothing
when none
print 'none'
endfor
Parse tree:
3
Source:
foreach
where x = 1 when none = 2
print 'none'
// exceptional block is blank
endfor
Parse tree: