awk: match pattern and print lines after and before till next pattern - awk

Using awk:
Find a pattern.
Print all lines after that pattern till next pattern.
Print all lines before that pattern till next pattern.
eg. if this is the content of the file
?hello#
line-0
?type=A;so on
line-1
short-description
line-2
line-3
ending#
line-4
?bye#
match pattern short-description and print lines after till pattern # and print lines before till pattern ? so the output should be:
?type=A;so on
line-1
short-description
line-2
line-3
ending#
i tried: awk '/short-description/{copy=1;next} /#/{copy=0;next} copy' file
but i don't know how to get the before pattern part, i have very limited knowledge of awk. Also please provide a one line solution.
please help. Thanks a lot.

Try:
/^\?/ { delete arr ; len = 0 ; hit = 0 }
/^\?/,/#$/ {
arr[len++] = $0
if ( /short-description/ )
hit = 1
}
/#$/ {
if(hit)
for(i=0;i<len;++i)
print arr[i]
}
Or, this one-liner:
BEGIN { RS="?" } /short-description/ { sub("#.*","") ; print $0 }

Related

Using awk to analyze log file to identify blocks and to extract information

I am trying to figure out a way to use awk to analyze my log files from an old application. The log file contains processing information from the application but the structure is a bit messy. But it has a structure like this:
some random text
...
BLOCK-BEGIN bla bla INFO1:VAL1
variable lines of text
INFO2:VAL2
variable lines of text
POSSIBLE-BLOCK-END-PHRASE1
...
some random text
INFO3:not-desired-val5
...
BLOCK-BEGIN bla bla INFO1:VAL3
variable lines of text
INFO2:VAL4
variable lines of text
POSSIBLE-BLOCK-END-PHRASE2
...
What I want to do is to first identify the blocks. In this example above, there are two blocks with same block beginning but different endings. Within each block, I want to extract then few information, i.e. INFO1,INFO2 in the example. The desired output in this case would be:
VAL1,VAL2
VAL3,VAL4
I know some basic of awk. Therefore, any solutions or hints are highly welcome. Thanks
Update: my first attempt
awk '/BLOCK-BEGIN/{printf substr($4,7)",";for (i = 0 ; i < NF; i++) getline; if($0 ~ '/^INFO2/') print substr($0,7)}'
The output is:
VAL1,VAL2
VAL3,VAL4
But is there a better way to do it? Any suggestions?
$ awk -v OFS=',' '
(split($NF,a,/:/) == 2) && sub(/^INFO/,"",a[1]) {
info[a[1]] = a[2]
if ( a[1] == 2 ) {
print info[1], info[2]
}
}
' file
VAL1,VAL2
VAL3,VAL4
Regarding the code you posted in your question:
printf substr($4,7)"," - never do printf <input data> as it'll fail when your input contains printf formatting characters, always do printf "%s", <input data> instead so that could should be written printf "%,",substr($4,7).
getline - there's aonly a few specific situations where getline is the right approach and when it is you have to write it securely. This isn't the right situation and it's not written securely. See awk.freeshell.org/AllAboutGetline.
for (i = 0 ; i < NF; i++) all field numbers, array indices, and string character positions in awk start at 1, not 0, so write your code to match to you don't trip over thinking arrays or anything else start at zero - for (i = 1 ; i <= NF; i++).
'foo... $0 ~ '/^INFO2/' ...bar' those inner 's are terminating the awk script body and so exposing what's between them to the shell for interpretation. Never do that. In this case idk why you thought you needed them as your code should just be 'foo... $0 ~ /^INFO2/ ...bar'.
With your shown samples only, please try following awk code.
awk -F'INFO[0-9]+:' '
/BLOCK-BEGIN/{
if(val2 && val1){
print val1","val2
}
val1=val2=""
val1=$NF
next
}
/^INFO[0-9]+:/{
val2=(val2?val2 ",":"") $NF
}
END{
if(val2 && val1){
print val1","val2
}
}
' Input_file

How can I store the length of a line into a var withing awk script?

I have this simple awk script with which I attempt to check the amount of characters in the first line.
if the first line has more of less than 10 characters I want to store the amount
of caracters into a var.
Somehow the first print statement works but storing that result into a var doesn't.
Please help.
I tried removing dollar sign " thelength=(length($0))"
and removing the parenthesis "thelength=length($0)" but it doen't print anything...
Thanks!
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=$(length($0))
print "The length of the first line is: ",$thelength;
exit 1;
}
}
END { print "STOP" }' $1
Two issues dealing with mixing ksh and awk scripting ...
no need to make a sub-shell call within awk to obtain the length; use thelength=length($0)
awk variables do not require a leading $ when being referenced; use print ... ,thelength
So your code becomes:
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=length($0)
print "The length of the first line is: ",thelength;
exit 1;
}
}
END { print "STOP" }' $1

Print smallest integer from file using awk custom function?

awk function looks like this in a file name fun.awk:
{
print small()
}
function small()
{
a[NR]=$0
smal=0
for(i=1;i<=3;i++)
{
if( a[i]<a[i+1])
smal=a[i]
else
smal=a[i+1]
}
return smal
}
The contents of awk.write:
1
23
32
The awk command is:
awk -f fun.awk awk.write
It gives me no result? Why?
I think you are going about this the wrong way. In awk, one approach might be:
NR == 1 {
small = $0
}
$0 < small {
small = $0
}
END {
print small
}
which simply simply sets small to the smallest integer we've seen so far on each line, and prints it at the end. (Note: you need to start with a initializing small on the first line.
A simpler approach might just be to sort the lines as numbers with sort, and pick the first one.

AWK -- How to assign a variable's value from matching regex which comes later?

While I have this awk script,
/regex2/{
var = $1}
/regex1/{
print var}
which I executed over input file:
regex1
This should be assigned as variable regex2
I got no printed output. The desired output is: "This" to be printed out.
I might then think to utilize BEGIN:
BEGIN{
/regex2/
var = $1}
/regex1/{
print var}
But apparently BEGIN cannot accommodate regex matching function. Any suggestion to this?
This would achieve the desired result:
awk '/regex2/ { print $1 }'
Otherwise, you'll need to read the file twice and perform something like the following. It will store the last occurrence of /regex2/ in var. Upon re-reading the file, it will print var for each occurrence of /regex1/. Note that you'll get an empty line in the output and the keyword 'This' on a newline:
awk 'FNR==NR && /regex2/ { var = $1; next } /regex1/ { print var }' file.txt{,}
Typically this sort of thing is done with a flag:
/regex1/ { f = 1 }
f && /regex2/ { var = $1; f = 0 }

Awk command to insert corresponding line numbers except for blank lines

I'm doing an assignment at the moment and the question that's stumped me is:
"Write an awk command to insert the corresponding line number before
each line in the text file above. The blank line should NOT be
numbered in this case."
I have an answer, but I'm struggling to find the explanation of what each component does.
The command is:
awk '{print (NF? ++a " " :"") $0}' <textfile.txt>
I know that NF is the field number, and that $0 refers to the whole input record. I tried playing around with the command to find what does what, but it always seems to have syntax errors whenever I omit something.
So, my question is what does each component do? What does the ++a do? The ? after NF? and what does the bit with the quotations do?
Thanks in advance!
The instruction ... ? ... : ... it's an if-else. So, it's the same as:
if ( NF > 0 ) {
++a;
print a " " $0;
} else {
print $0;
}
a is a variable that is only incremented when found a line with fields.
print (NF? ++a " " :"") $0
a ternary operator has been used in your solution.
for a blank line NF will be 0 always
so
cond?true case:false case
if NF is >0 then print a or else print ""
a++ says that after printing increment a by 1 which will be used for next non blank line processing.
awk 'BEGIN{count=1}{if($0~/^$/){print}else{print count,$0;count++}}' your_file
tested below:
> cat temp.cc
int main ()
{
}
> awk 'BEGIN{count=1}{if($0~/^$/){print}else{print count,$0;count++}}' temp.cc
1 int main ()
2 {
3 }
>