converting awk command line to awk file

converting awk command line to awk file - awk

This works:
awk -F2 '{if (NF > 1) { if (substr($1,0,2) == "..") printf ("%.2f 2%s", ((50*length($1))/1000) , $2); else printf("%s2%s",$1,$2); for (i=3;i<=NF;i++) printf("2%s",$i) } else if (substr($1,0,2) == "..") printf("%.2f",((50*length($1))/1000)); else printf("%s",$1); printf("\n");}'-f debugconsole > debugconsoleWithCount
But when I make the file countdots.awk as follows:
BEGIN {
if (NF > 1)
{
if (substr($1,0,2) == "..")
printf ("%.2f 2%s", ((50*length($1))/1000) , $2);
else
printf("%s2%s",$1,$2)
for (i=3;i<=NF;i++)
printf("2%s",$i)
}
else
if (substr($1,0,2) == "..")
printf("%.2f",((50*length($1))/1000));
else printf("%s",$1);
printf("\n");
}
and run it like this:
awk -F2 -f countdots4.awk debugconsole > debugconsoleWithCount
I get an empty debugconsoleWithCount file.

A BEGIN block in awk is executed only once, before the first input record is read.
An END rule is executed only once, after all the input is read.
In your transformation, since you put everything in BEGIN block, it becomes a no-op since values of NF, $1, $2 .. etc is not even set. Hence you get an empty file. If you remove it it should work fine.
BEGIN and END blocks are not mandatory so you don't have to keep them in your awk script. BEGIN is often used to print titles, headers, initializing variables to particular values etc. END block is often used to do final processing after entire input is read.

Basically this has to do with not understanding BEGIN and END.
I took out BEGIN and it does what I want.
Should I delete this from stackoverflow? let me know.

Related

Using awk to analyze log file to identify blocks and to extract information

I am trying to figure out a way to use awk to analyze my log files from an old application. The log file contains processing information from the application but the structure is a bit messy. But it has a structure like this:
some random text
...
BLOCK-BEGIN bla bla INFO1:VAL1
variable lines of text
INFO2:VAL2
variable lines of text
POSSIBLE-BLOCK-END-PHRASE1
...
some random text
INFO3:not-desired-val5
...
BLOCK-BEGIN bla bla INFO1:VAL3
variable lines of text
INFO2:VAL4
variable lines of text
POSSIBLE-BLOCK-END-PHRASE2
...
What I want to do is to first identify the blocks. In this example above, there are two blocks with same block beginning but different endings. Within each block, I want to extract then few information, i.e. INFO1,INFO2 in the example. The desired output in this case would be:
VAL1,VAL2
VAL3,VAL4
I know some basic of awk. Therefore, any solutions or hints are highly welcome. Thanks
Update: my first attempt
awk '/BLOCK-BEGIN/{printf substr($4,7)",";for (i = 0 ; i < NF; i++) getline; if($0 ~ '/^INFO2/') print substr($0,7)}'
The output is:
VAL1,VAL2
VAL3,VAL4
But is there a better way to do it? Any suggestions?

$ awk -v OFS=',' '
(split($NF,a,/:/) == 2) && sub(/^INFO/,"",a[1]) {
info[a[1]] = a[2]
if ( a[1] == 2 ) {
print info[1], info[2]
}
}
' file
VAL1,VAL2
VAL3,VAL4
Regarding the code you posted in your question:
printf substr($4,7)"," - never do printf <input data> as it'll fail when your input contains printf formatting characters, always do printf "%s", <input data> instead so that could should be written printf "%,",substr($4,7).
getline - there's aonly a few specific situations where getline is the right approach and when it is you have to write it securely. This isn't the right situation and it's not written securely. See awk.freeshell.org/AllAboutGetline.
for (i = 0 ; i < NF; i++) all field numbers, array indices, and string character positions in awk start at 1, not 0, so write your code to match to you don't trip over thinking arrays or anything else start at zero - for (i = 1 ; i <= NF; i++).
'foo... $0 ~ '/^INFO2/' ...bar' those inner 's are terminating the awk script body and so exposing what's between them to the shell for interpretation. Never do that. In this case idk why you thought you needed them as your code should just be 'foo... $0 ~ /^INFO2/ ...bar'.

With your shown samples only, please try following awk code.
awk -F'INFO[0-9]+:' '
/BLOCK-BEGIN/{
if(val2 && val1){
print val1","val2
}
val1=val2=""
val1=$NF
next
}
/^INFO[0-9]+:/{
val2=(val2?val2 ",":"") $NF
}
END{
if(val2 && val1){
print val1","val2
}
}
' Input_file

Do specific task for some lines, copy all other lines

I have set of specific lines in file where I would like to do some changes, and I want to just coppy all other lines. I imagine code should look something like this
awk -v imin=5 -v imax=10 -v shift=5.54545 '{
(NR==5){ print $1+5,$2; }
(NR==7){ print $1+shift,$2; }
((NR>imin)&&(NR<imax)){ print $1,$2,$3+shift; }
(NR == EVERY_OTHER_LINE){ print $0; }
}' input_data.dat
But I don't know how to do this (NR == EVERY_OTHER_LINE), meaning every line except the ones handled above.
Best what I found is here, but it is not really what I want.
https://unix.stackexchange.com/questions/563455/awk-print-all-remaining-lines

I would follow the following approach:
(NR==5){ print $1+5,$2; next }
(NR==7){ print $1+shift,$2; next }
((NR>imin) && (NR<imax)){ print $1,$2,$3+shift; next}
1;
We introduce the next command to avoid that any special lines have a secondary print statement
This is, however, a bit convoluted, so the following method for this particular case might be better:
{line=$0}
(NR==5) { line=$1+5 OFS $2 }
(NR==7) { line=$1+shift OFS $2 }
((NR>imin)&&(NR<imax)){ line = $1 OFS $2 OFS $3+shift }
{print line}
Ofcourse, if record 5 and 7 only have 2 fields and the records between imin and imax with imin>7 have 3 fields, then it is even easier:
(NR==5){ $1+=5 }
(NR==7){ $1+=shift }
(NR>imin)&&(NR<imax){ $3+=shift }
1

How can I store the length of a line into a var withing awk script?

I have this simple awk script with which I attempt to check the amount of characters in the first line.
if the first line has more of less than 10 characters I want to store the amount
of caracters into a var.
Somehow the first print statement works but storing that result into a var doesn't.
Please help.
I tried removing dollar sign " thelength=(length($0))"
and removing the parenthesis "thelength=length($0)" but it doen't print anything...
Thanks!
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=$(length($0))
print "The length of the first line is: ",$thelength;
exit 1;
}
}
END { print "STOP" }' $1

Two issues dealing with mixing ksh and awk scripting ...
no need to make a sub-shell call within awk to obtain the length; use thelength=length($0)
awk variables do not require a leading $ when being referenced; use print ... ,thelength
So your code becomes:
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=length($0)
print "The length of the first line is: ",thelength;
exit 1;
}
}
END { print "STOP" }' $1

Can someone help me getting average of a column using awk with condition on other column

awk -F, '{if ($2 == 0) awk '{ total += $3; count++ } END { print total/count }' CLN_Tapes_LON; }' /tmp/CLN_Tapes_LON
awk: {if ($2 == 0) awk {
awk: ^ syntax error
bash: count++: command not found

Just for fun, let's look at what's wrong with your original version and transform it into something that works, step by step. Here's your initial version (I'll call it version 0):
awk -F, '{if ($2 == 0) awk '{ total += $3; count++ } END { print total/count }' CLN_Tapes_LON; }' /tmp/CLN_Tapes_LON
The -F, sets the field separator to be the comma character, but your later comment seems to indicate that the columns (fields) are separated by spaces. So let's get rid of it; whitespace-separation is what awk expects by default. Version 1:
awk '{if ($2 == 0) awk '{ total += $3; count++ } END { print total/count }' CLN_Tapes_LON; }' /tmp/CLN_Tapes_LON
You seem to be attempting to nest a call to awk inside your awk program? There's almost never any call for that, and this wouldn't be the way to do it anyway. Let's also get rid of the mismatched quotes while we're at it: note in passing that you cannot nest single quotes inside another pair of single quotes that way: you'd have to escape them somehow. But there's no need for them at all here. Version 2:
awk '{if ($2 == 0) { total += $3; count++ } END { print total/count } }' /tmp/CLN_Tapes_LON
This is close but not quite right: the END block is only executed when all lines of input are finished processing: it doesn't make sense to have it inside an if. So let's move it outside the braces. I'm also going to tighten up some whitespace. Version 3:
awk '{if ($2==0) {total+=$3; count++}} END{print total/count}' /tmp/CLN_Tapes_LON
Version 3 actually works, and you could stop here. But awk has a handy way of specifying to run a block of code only against lines that match a condition: 'condition {code}' So yours can more simply be written as:
awk '$2==0 {total+=$3; count++} END{print total/count}' /tmp/CLN_Tapes_LON
... which, of course, is pretty much exactly what John1024 suggested.

$ awk '$2 == 0 { total += $3; count++;} END { print total/count; }' CLN_Tapes_LON
3
This assumes that your input file looks like:
$ cat CLN_Tapes_LON
CLH040 0 3
CLH041 0 3
CLH042 0 3
CLH043 0 3
CLH010 1 0
CLH011 1 0
CLH012 1 0
CLH013 1 0
CLH130 1 40
CLH131 1 40
CLH132 1 40
CLH133 1 40

Thought I'd try to do this without awk. Awk is clearly the better choice, but it's still a one-liner.
bc<<<"($(grep ' 0 ' file|tee >(wc -l>i)|cut -d\ -f3|tr '\n' '+')0)/"$(<i)
3
It extracts lines with 0 in the second column with grep. This is passed to tee for wc -l to count the lines and to cut to extract the third column. tr replaces the new lines with "+" which is put over the number of lines (i.e., "12 / 4"). This is then passed to bc.

Sum up from line "A" to line "B" from a big file using awk

aNumber|bNumber|startDate|timeZone|duration|currencyType|cost|
22677512549|778|2014-07-02 10:16:35.000|NULL|NULL|localCurrency|0.00|
22675557361|76457227|2014-07-02 10:16:38.000|NULL|NULL|localCurrency|10.00|
22677521277|778|2014-07-02 10:16:42.000|NULL|NULL|localCurrency|0.00|
22676099496|77250331|2014-07-02 10:16:42.000|NULL|NULL|localCurrency|1.00|
22667222160|22667262389|2014-07-02 10:16:43.000|NULL|NULL|localCurrency|10.00|
22665799922|70110055|2014-07-02 10:16:45.000|NULL|NULL|localCurrency|20.00|
22676239633|433|2014-07-02 10:16:48.000|NULL|NULL|localCurrency|0.00|
22677277255|76919167|2014-07-02 10:16:51.000|NULL|NULL|localCurrency|1.00|
This is the input (sample of million of line) i have in csv file.
I want to sum up duration based on date.
My concern is i want to sum up first 1000000 lines
the awk program i'm using is:
test.awk
BEGIN { FS = "|" }
NR>1 && NR<=1000000
FNR == 1{ next }
{
sub(/ .*/,"",$3)
key=sprintf("%10s",$3)
duration[key] += $5 } END {
printf "%-10s %16s,"dAccused","Duration"
for (i in duration) {
printf "%-4s %16.2f i,duration[i]
}}
i run my script as
$awk -f test.awk 'file'
The input i have doesn't condsidered my condition NR>1 && NR<=1000000
ANY SUGGESTION? PLEASE!

You're looking for this:
BEGIN { FS = "|" }
1 < NR && NR <= 1000000 {
sub(/ .*/, "", $3)
key = sprintf("%10s",$3)
duration[key] += $5
}
END {
printf "%-10s %16s\n", "dAccused", "Duration"
for (i in duration) {
printf "%-4s %16.2f i,duration[i]
}
}
A lot of errors become obvious with proper indentation.
The reason you saw 1,000,000 lines was due to this:
NR>1 && NR<=1000000
That is a condition with no action block. The default action is to print the current record if the condition is true. That's why you see a lot of awk one-liners end with the number 1

You didn't post any expected output and your duration field is always NULL so it's still not clear what you really want output, but this is probably the right approach:
$ cat tst.awk
BEGIN { FS = "|" }
NR==1 { for (i=1;i<NF;i++) f[$i] = i; next }
{
sub(/ .*/,"",$(f["startDate"]))
sum[$(f["startDate"])] += $(f["duration"])
}
NR==1000000 { exit }
END { for (date in sum) print date, sum[date] }
$ awk -f tst.awk file
2014-07-02 0
Instead of discarding your header line, it uses it to create an array f[] that maps the field names to their order in each line so instead of having to hard-code that duration is field 4 (or whatever) you just reference it as $(f["duration"]).
Any time your input file has a header line, don't discard it - use it so your script is not coupled to the order of fields in your input file.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

converting awk command line to awk file - awk

Basically this has to do with not understanding BEGIN and END. I took out BEGIN and it does what I want. Should I delete this from stackoverflow? let me know.

Related

Using awk to analyze log file to identify blocks and to extract information

Do specific task for some lines, copy all other lines

How can I store the length of a line into a var withing awk script?

Can someone help me getting average of a column using awk with condition on other column

Sum up from line "A" to line "B" from a big file using awk

Categories

Resources