I am working on the following code in an awk script and I need the output to be redirected to another file within the same script.
BEGIN { FS=OFS="," }
NR==1 {print; next}
{ $9 = sprintf("%0.2f", $9) }
{ a[$0]++ }
BEGIN { FS=OFS="," }
{ gsub(/\r/,"") }
FNR==1 { $10="Survival Percentage" }
FNR > 1 && ($5+0==$5 && $6+0==$6 && $3+0==$3){
$10=sprintf("%0.2f",(($5-$6)/$3)*100)
}1
END {
if (i>0){
for (i in a){
print "i" > nj.csv
}}}
This is my code and just by executing it I get an error pointing to the point between nj and csv (nj.csv). Any idea to solve it?
gsub(/\r/,"") is almost always the wrong thing to do, you probably meant sub(/\r$/,""), and print "i" > nj.csv should be print i > "nj.csv" but idk why you have 2 identical BEGIN sections or what the overall purpose of the script is as it doesn't seem to make any sense.
I have set of specific lines in file where I would like to do some changes, and I want to just coppy all other lines. I imagine code should look something like this
awk -v imin=5 -v imax=10 -v shift=5.54545 '{
(NR==5){ print $1+5,$2; }
(NR==7){ print $1+shift,$2; }
((NR>imin)&&(NR<imax)){ print $1,$2,$3+shift; }
(NR == EVERY_OTHER_LINE){ print $0; }
}' input_data.dat
But I don't know how to do this (NR == EVERY_OTHER_LINE), meaning every line except the ones handled above.
Best what I found is here, but it is not really what I want.
https://unix.stackexchange.com/questions/563455/awk-print-all-remaining-lines
I would follow the following approach:
(NR==5){ print $1+5,$2; next }
(NR==7){ print $1+shift,$2; next }
((NR>imin) && (NR<imax)){ print $1,$2,$3+shift; next}
1;
We introduce the next command to avoid that any special lines have a secondary print statement
This is, however, a bit convoluted, so the following method for this particular case might be better:
{line=$0}
(NR==5) { line=$1+5 OFS $2 }
(NR==7) { line=$1+shift OFS $2 }
((NR>imin)&&(NR<imax)){ line = $1 OFS $2 OFS $3+shift }
{print line}
Ofcourse, if record 5 and 7 only have 2 fields and the records between imin and imax with imin>7 have 3 fields, then it is even easier:
(NR==5){ $1+=5 }
(NR==7){ $1+=shift }
(NR>imin)&&(NR<imax){ $3+=shift }
1
I have a file in which I have to merge 2 rows on the basis of:
- Common sessionID
- Immediate next matching pattern (GX with QG)
file1:
session=001,field01,name=GX1_TRANSACTION,field03,field04
session=001,field91,name=QG
session=001,field01,name=GX2_TRANSACTION,field03,field04
session=001,field92,name=QG
session=004,field01,name=GX1_TRANSACTION,field03,field04
session=002,field01,name=GX1_TRANSACTION,field03,field04
session=002,field01,name=GX2_TRANSACTION,field03,field04
session=002,field92,name=QG
session=003,field91,name=QG
session=003,field01,name=GX2_TRANSACTION,field03,field04
session=003,field92,name=QG
session=004,field91,name=QG
session=004,field01,name=GX2_TRANSACTION,field03,field04
session=004,field92,name=QG
I have created an awk (I am new and learnt awk only from This portal only) which created my desired output.
Output1
session=001,field01,name=GX1_TRANSACTION,field03,field04,session=001,field91,name=QG
session=001,field01,name=GX2_TRANSACTION,field03,field04,session=001,field92,name=QG
session=002,field01,name=GX1_TRANSACTION,field03,field04,NOMATCH-QG
session=002,field01,name=GX2_TRANSACTION,field03,field04,session=002,field92,name=QG
session=003,field01,name=GX2_TRANSACTION,field03,field04,session=003,field92,name=QG
session=004,field01,name=GX1_TRANSACTION,field03,field04,session=004,field91,name=QG
session=004,field01,name=GX2_TRANSACTION,field03,field04,session=004,field92,name=QG
Output2: Pending
session=003,field91,name=QG
Awk:
{
if($0~/name=GX1_TRANSACTION/ || $0~/GX2_TRANSACTION/) {
if($1 in ccr)
print ccr[$1]",NOMATCH-QG";
ccr[$1]=$0;
}
if($0~/name=QG/) {
if($1 in ccr) {
print ccr[$1]","$0;
delete ccr[$1];
}
else {
print $0",NOUSER" >> Pending
}
}
}
END {
for (i in ccr)
print ccr[i]",NOMATCH-QG"
}
Command:
awk -F"," -v Pending=t -f a.awk file1
But Issue is my "file1" is really big, So I want to improve the performance of this script. Is their any way by which I can improve its performance?
There are a couple of changes that may lead to small improvements in speed, and if not may give you some ideas for future awk scripts.
Don't "manually" test every line if you don't have to - raise the name= tests to the main awk loop. Currently your script checks $0 up to three times per line for a name= match.
Since you're using , as the FS, test the corresponding field ($3) instead of $0. It only saves a few leading chars of pattern matching in your example data.
Here's a refactored a.awk:
$3~/name=GX[12]_TRANSACTION/ {
if($1 in ccr)
print ccr[$1]",NOMATCH-QG";
ccr[$1]=$0;
}
$3~/name=QG/ {
if($1 in ccr) {
print ccr[$1]","$0;
delete ccr[$1];
}
else {
print $0",NOUSER" >> Pending
}
}
END { for (i in ccr) print ccr[i]",NOMATCH-QG" }
I've also condensed the GX pattern match to one regex. I get the same output as your example.
In any program, IO (e.g. print statements) is usually the most real-time intensive operation. In awk there's an operation that's even slower, though, and that's string concatenation. Because awk doesn't require you to pre-allocate memory for strings, the memory gets allocated dynamically so then when you increase the length of a string, it must get dynamically re-allocated. So, you can speed up your program by removing the string concatenations, e.g. for all those hard-coded ","s you're printing instead of just setting/using the OFS.
I haven't really thought about the logic of your overall approach but there's a couple of other tweaks you could try:
BEGIN{ FS=OFS="," }
NF {
if ($3 ~ /name=GX[12]_TRANSACTION/) {
if($1 in ccr) {
print ccr[$1], "NOMATCH-QG"
}
ccr[$1]=$0
}
else {
if($1 in ccr) {
print ccr[$1], $0
delete ccr[$1]
}
else {
print $0, "NOUSER" >> Pending
}
}
}
END {
for (i in ccr)
print ccr[i], "NOMATCH-QG"
}
Note that by setting FS in the script you no longer need to use -F"," on the command line.
Are you sure you want >> instead of > on the print to "Pending"? Those 2 constructs don't mean the same in awk as they do in shell.
I would like to read a file like this
1.23213213
0.12321321
-1.12321321
0.23232322
into a variable, or array to use it somewhere in the main {} code.
But I would like to use it if this file exists. How can I check if it already exists or not, and if not, then do not use that variable or array?
I don't understand completely what you want to achieve, but perhaps something like this can be useful to you:
It process the file line by line and saves each one in an array, the key is the line number so you keep the order. In the END section check how many lines were processed and get if the file had content.
awk '{ line[ FNR ] = $0 } END { if ( FNR > 0 ) { print "File" } else { print "NO file" } }' infile
EDIT to comment:
But in awk you can process many files from command line.
BEGIN {
...
}
## Processing of first file in command line.
FNR == NR {
a[ FNR ] = $0
next
}
## Processing of second file in command line
FNR < NR {
## Check if array 'a' has the values you want and use them
## 'for(...)variable += a[i]' or whatever.
}
Run script like:
awk -f script.awk first_file.txt second_file.txt
But if first_file.txt doesn't exists, awk will complain with an error.
I got this piece of script working. This is what i wanted:
input
3.76023 0.783649 0.307724 8766.26
3.76022 0.764265 0.307646 8777.46
3.7602 0.733251 0.30752 8821.29
3.76021 0.752635 0.307598 8783.33
3.76023 0.79528 0.307771 8729.82
3.76024 0.814664 0.307849 8650.2
3.76026 0.845679 0.307978 8802.97
3.76025 0.826293 0.307897 8690.43
with script
!/bin/bash
awk -F ', ' '
{
for (i=3; i<=10; i++) {
if (i==NR) {
npc1[i]=sprintf("%s", $1);
npc2[i]=sprintf("%s", $2);
npc3[i]=sprintf("%s", $3);
npRs[i]=sprintf("%s", $4);
print npc1[i],npc2[i],\
npc3[i], npc4[i];
}
}
} ' p_walls.raw
echo "${npc1[100]}"
But now I can't use those arrays npc1[i], outside awk. That last echo prints nothing. Isnt it possible or am I missing something?
AWK is a separate process, after it finishes all internal data is gone. This is true for all external processes/commands. Bash only sees what bash builtins touch.
i is never 100, so why do you want to access npc1[100]?
What are you really trying to do? If you rewrite the question we might be able to help...
(Cherry on the cake is always good!)
Sorry, but all of #yi_H 's answer and comments above are correct.
But there's really no problem loading 2 sets of data into 2 separate arrays in awk, ie.
awk '{
if (FILENAME == "file1") arr1[i++]=$0 ;
#same for file2; }
END {
f1max=++i; f2max=++j;
for (i=1;i<f1max;i++) {
arr1[i]
# put what you need here for arr1 processing
#
# dont forget that you can do things like
if (arr1[i] in arr2) { print arr1[i]"=arr2[arr1["i"]=" arr2[arr1[i]] }
}
for j=1;j<f2max;j++) {
arr2[j]
# and here for arr2
}
}' file1 file2
You'll have to fill the actual processing for arr1[i] and arr2[j].
Also, get an awk book for the weekend and be up and running by Monday. It's easy. You can probably figure it out from grymoire.com/Unix/awk.html
I hope this helps.