AWK multiple patterns substitution - awk

Using AWK I'd like to process this text:
#replace count 12
in in
#replace in 77
main()
{printf("%d",count+in);
}
Into:
in in
ma77()
{pr77tf("%d",12+77);
}
When a '#replace' declaration occurs, only the code below it is affected. I've got:
/#replace/ { co=$2; czym=$3 }
!/#replace/ { gsub(co,czym); print }
However I'm getting only
in in
ma77()
{pr77tf("%d",count+77);
}
in return. As you can see only the second gsub works. Is there a simple way to remeber all the substitutions?

You just need to use an array to store the substitutions:
$ awk '/#replace/{a[$2]=$3;next}{for(k in a)gsub(k,a[k])}1' file
in in
ma77()
{pr77tf("%d",12+77);
}

Related

How can I store the length of a line into a var withing awk script?

I have this simple awk script with which I attempt to check the amount of characters in the first line.
if the first line has more of less than 10 characters I want to store the amount
of caracters into a var.
Somehow the first print statement works but storing that result into a var doesn't.
Please help.
I tried removing dollar sign " thelength=(length($0))"
and removing the parenthesis "thelength=length($0)" but it doen't print anything...
Thanks!
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=$(length($0))
print "The length of the first line is: ",$thelength;
exit 1;
}
}
END { print "STOP" }' $1
Two issues dealing with mixing ksh and awk scripting ...
no need to make a sub-shell call within awk to obtain the length; use thelength=length($0)
awk variables do not require a leading $ when being referenced; use print ... ,thelength
So your code becomes:
#!/bin/ksh
awk ' BEGIN {FS=";"}
{
if (NR==1)
if(length($0)!=10)
{
print(length($0))
thelength=length($0)
print "The length of the first line is: ",thelength;
exit 1;
}
}
END { print "STOP" }' $1

Print smallest integer from file using awk custom function?

awk function looks like this in a file name fun.awk:
{
print small()
}
function small()
{
a[NR]=$0
smal=0
for(i=1;i<=3;i++)
{
if( a[i]<a[i+1])
smal=a[i]
else
smal=a[i+1]
}
return smal
}
The contents of awk.write:
1
23
32
The awk command is:
awk -f fun.awk awk.write
It gives me no result? Why?
I think you are going about this the wrong way. In awk, one approach might be:
NR == 1 {
small = $0
}
$0 < small {
small = $0
}
END {
print small
}
which simply simply sets small to the smallest integer we've seen so far on each line, and prints it at the end. (Note: you need to start with a initializing small on the first line.
A simpler approach might just be to sort the lines as numbers with sort, and pick the first one.

Merging rows in a file | Performance Improvement

I have a file in which I have to merge 2 rows on the basis of:
- Common sessionID
- Immediate next matching pattern (GX with QG)
file1:
session=001,field01,name=GX1_TRANSACTION,field03,field04
session=001,field91,name=QG
session=001,field01,name=GX2_TRANSACTION,field03,field04
session=001,field92,name=QG
session=004,field01,name=GX1_TRANSACTION,field03,field04
session=002,field01,name=GX1_TRANSACTION,field03,field04
session=002,field01,name=GX2_TRANSACTION,field03,field04
session=002,field92,name=QG
session=003,field91,name=QG
session=003,field01,name=GX2_TRANSACTION,field03,field04
session=003,field92,name=QG
session=004,field91,name=QG
session=004,field01,name=GX2_TRANSACTION,field03,field04
session=004,field92,name=QG
I have created an awk (I am new and learnt awk only from This portal only) which created my desired output.
Output1
session=001,field01,name=GX1_TRANSACTION,field03,field04,session=001,field91,name=QG
session=001,field01,name=GX2_TRANSACTION,field03,field04,session=001,field92,name=QG
session=002,field01,name=GX1_TRANSACTION,field03,field04,NOMATCH-QG
session=002,field01,name=GX2_TRANSACTION,field03,field04,session=002,field92,name=QG
session=003,field01,name=GX2_TRANSACTION,field03,field04,session=003,field92,name=QG
session=004,field01,name=GX1_TRANSACTION,field03,field04,session=004,field91,name=QG
session=004,field01,name=GX2_TRANSACTION,field03,field04,session=004,field92,name=QG
Output2: Pending
session=003,field91,name=QG
Awk:
{
if($0~/name=GX1_TRANSACTION/ || $0~/GX2_TRANSACTION/) {
if($1 in ccr)
print ccr[$1]",NOMATCH-QG";
ccr[$1]=$0;
}
if($0~/name=QG/) {
if($1 in ccr) {
print ccr[$1]","$0;
delete ccr[$1];
}
else {
print $0",NOUSER" >> Pending
}
}
}
END {
for (i in ccr)
print ccr[i]",NOMATCH-QG"
}
Command:
awk -F"," -v Pending=t -f a.awk file1
But Issue is my "file1" is really big, So I want to improve the performance of this script. Is their any way by which I can improve its performance?
There are a couple of changes that may lead to small improvements in speed, and if not may give you some ideas for future awk scripts.
Don't "manually" test every line if you don't have to - raise the name= tests to the main awk loop. Currently your script checks $0 up to three times per line for a name= match.
Since you're using , as the FS, test the corresponding field ($3) instead of $0. It only saves a few leading chars of pattern matching in your example data.
Here's a refactored a.awk:
$3~/name=GX[12]_TRANSACTION/ {
if($1 in ccr)
print ccr[$1]",NOMATCH-QG";
ccr[$1]=$0;
}
$3~/name=QG/ {
if($1 in ccr) {
print ccr[$1]","$0;
delete ccr[$1];
}
else {
print $0",NOUSER" >> Pending
}
}
END { for (i in ccr) print ccr[i]",NOMATCH-QG" }
I've also condensed the GX pattern match to one regex. I get the same output as your example.
In any program, IO (e.g. print statements) is usually the most real-time intensive operation. In awk there's an operation that's even slower, though, and that's string concatenation. Because awk doesn't require you to pre-allocate memory for strings, the memory gets allocated dynamically so then when you increase the length of a string, it must get dynamically re-allocated. So, you can speed up your program by removing the string concatenations, e.g. for all those hard-coded ","s you're printing instead of just setting/using the OFS.
I haven't really thought about the logic of your overall approach but there's a couple of other tweaks you could try:
BEGIN{ FS=OFS="," }
NF {
if ($3 ~ /name=GX[12]_TRANSACTION/) {
if($1 in ccr) {
print ccr[$1], "NOMATCH-QG"
}
ccr[$1]=$0
}
else {
if($1 in ccr) {
print ccr[$1], $0
delete ccr[$1]
}
else {
print $0, "NOUSER" >> Pending
}
}
}
END {
for (i in ccr)
print ccr[i], "NOMATCH-QG"
}
Note that by setting FS in the script you no longer need to use -F"," on the command line.
Are you sure you want >> instead of > on the print to "Pending"? Those 2 constructs don't mean the same in awk as they do in shell.

reproducing grep "my pattern" myfile.log | sort | uniq | wc -l in awk

If I perform this grep on my target file I get eg 275 as result.
But I want to learn awk so tried this in awk:
awk 'BEGIN { count=0 } /my pattern/ { count++ } END { print count }' myfile.log
And this prints the 275 as expected.
So getting ambitious I created an awk script like this:
BEGIN {
print "Log File Analysis";
message=0;
events=0;
}
{
/message/ { messages++; }
/event/ { events++; }
}
END {
print "messages:\t" messages;
print "events:\t" events;
}
I get a syntax error,
$ awk -f test_learn.awk test_log.log
awk: test_learn.awk:16: /message/ { messages++; }
awk: test_learn.awk:16: ^ syntax error
What am I doing wrong?
I am using awk from MinGW shell on windows 7.
try
awk 'BEGIN { count=0 }; /my pattern/{count++ }; END { print count }' myfile.log
OR
awk 'BEGIN { count=0}; { if ($0 ~ /my pattern/) count++ }; END { print count };' myfile.log
Better yet, as variables are initialized as zero by default, you don't need the BEGIN block, so
awk '/my pattern/{count++ }; END { print count };' myfile.log
You can either have a default loop applied to all lines in a file, as in 2d example with the if, or you can have multiple blocks, "filtered" by pattern, as above, and in your edited addition.
When doing one-liners have you have, some awks required the semi-colon to separate the BEGIN and END blocks from the main loop block.
Edit
Same Idea with your 2nd issue, and integrating Ed Morton's improvments (thanks)
/message/ { messages++ }
/event/ { events++ }
END {
print "Log File Analysis"
print "messages:\t" messages
print "events:\t" events
}
IHTH

Using a variable defined inside AWK

I got this piece of script working. This is what i wanted:
input
3.76023 0.783649 0.307724 8766.26
3.76022 0.764265 0.307646 8777.46
3.7602 0.733251 0.30752 8821.29
3.76021 0.752635 0.307598 8783.33
3.76023 0.79528 0.307771 8729.82
3.76024 0.814664 0.307849 8650.2
3.76026 0.845679 0.307978 8802.97
3.76025 0.826293 0.307897 8690.43
with script
!/bin/bash
awk -F ', ' '
{
for (i=3; i<=10; i++) {
if (i==NR) {
npc1[i]=sprintf("%s", $1);
npc2[i]=sprintf("%s", $2);
npc3[i]=sprintf("%s", $3);
npRs[i]=sprintf("%s", $4);
print npc1[i],npc2[i],\
npc3[i], npc4[i];
}
}
} ' p_walls.raw
echo "${npc1[100]}"
But now I can't use those arrays npc1[i], outside awk. That last echo prints nothing. Isnt it possible or am I missing something?
AWK is a separate process, after it finishes all internal data is gone. This is true for all external processes/commands. Bash only sees what bash builtins touch.
i is never 100, so why do you want to access npc1[100]?
What are you really trying to do? If you rewrite the question we might be able to help...
(Cherry on the cake is always good!)
Sorry, but all of #yi_H 's answer and comments above are correct.
But there's really no problem loading 2 sets of data into 2 separate arrays in awk, ie.
awk '{
if (FILENAME == "file1") arr1[i++]=$0 ;
#same for file2; }
END {
f1max=++i; f2max=++j;
for (i=1;i<f1max;i++) {
arr1[i]
# put what you need here for arr1 processing
#
# dont forget that you can do things like
if (arr1[i] in arr2) { print arr1[i]"=arr2[arr1["i"]=" arr2[arr1[i]] }
}
for j=1;j<f2max;j++) {
arr2[j]
# and here for arr2
}
}' file1 file2
You'll have to fill the actual processing for arr1[i] and arr2[j].
Also, get an awk book for the weekend and be up and running by Monday. It's easy. You can probably figure it out from grymoire.com/Unix/awk.html
I hope this helps.