File size grows greatly after using awk - awk

I want to add row number for a file, then I do like this,
awk '{print $0 "\x03" NR > "/opt/data2/gds_test/test_partly.txt"}' /opt/data2/gds_test/test_partly.txt
I put this line of command in a shell script file, and run it for some time, it still does not finish, so I end it by force, but I find the source file size has changed from 1.7G to 242G,
What happened? I am a little confused,
I had ever use a small file to test in command line, this awk command seems ok,

You're reading from the front of a file at the same time as you're writing onto the end of it. Try this instead:
tmp=$(mktemp)
awk '{print $0 "\x03" NR}' '/opt/data2/gds_test/test_partly.txt' > "$tmp" &&
mv "$tmp" '/opt/data2/gds_test/test_partly.txt'

yes, i change to redirect the result to a tmp file, and then delete the original file and rename the tmp file, it is ok,
and i just also get to know that gawk -I inplace can be used,

Related

BEGIN and END blocks in awk

I am using the awk command in terminal on my Mac.
I want to print the contents of an already existing file and give a title to each column which i'll separate using a tab then I want to send the output to another file. What line of code would I use to give titles to the columns? Im hoping to use simple awk commands and preferably if I can complete the task in as little lines as possible.
So far I have tried to use the BEGIN command. (The titles I want to give are first name, second name and score)
BEGIN { print "First Name\tSecond Name\tScore}**
then I want to print the entire contents of the file.
{print} filename.txt
Finally I want to save the output to another file
End{print} filename.txt > output.txt
to do this all all together
awk 'BEGIN {print "First Name\tSecond Name\tScore";}
{print}
End{print}' filename.txt > output.txt
However, this only saved the titles to the output file and not the contents of the original file under the columns.
awk processes files line by line. Before it starts processing the file you can have it do something. We use the BEGIN keyword to note that the following block of code is to be executed before processing. Same with END running after the processing of each line of the file is complete.
While your code has some superfluous bits in it, like the unnecessary END block, it still should do exactly what you are wanting to do, assuming you have data in your filename.txt.
A more succinct awk code would be:
awk 'BEGIN {print "First Name\tSecond Name\tScore";}1' filename.txt > output.txt
In action (using commas instead of tabs because it's easier and I'm lazy):
$ echo "1,2,3" > filename.txt
$ awk 'BEGIN {print "c1,c2,c3"}1' filename.txt > output.txt
$ cat output.txt
c1,c2,c3
1,2,3

Check if all multiple strings exist in one line

I have a file that have this info
IRE_DRO_Fabric_A drogesx0112_IRE_DRO_A_ISIL03_091_871
IRE_DRO_Fabric_A drogesx0112_IRE_DRO_A_NETAPP_7890_2D5_1D8
IRE_DRO_Fabric_A drogesx0112_SAN_A
IRE_DRO_Fabric_B drogesx0112_IRE_DRO_B_ISIL03_081_873
IRE_DRO_Fabric_B drogesx0112_IRE_DRO_B_NETAPP_7890_9D3_2D8
IRE_DRO_Fabric_B drogesx0112_SAN_B
and wanted to check if multiple string were found per line. Tried this command but it's not working. Not sure if it's possible for the current text type?
grep 'drogesx0112.*ISIL03_091_871\|ISIL03_091_871.*drogesx0112' file << tried this but not working
grep 'drogesx0112' file | grep 'ISIL03_091_871' << tried this but not working
Looking for this output (I'm actually looking for string1(drogesx0112) and string2(ISIL03_091_871)
>grep 'drogesx0112.*ISIL03_091_871\|ISIL03_091_871.*drogesx0112' file # command
>IRE_DRO_Fabric_A drogesx0112_IRE_DRO_A_ISIL03_091_871 < output
so it's like i wanted to check if drogesx0112 and ISIL03_091_871 are present in a single line in a file.
Simple awk
$ awk ' /drogesx0112/ && /ISIL03_091_871/ ' gafm.txt
IRE_DRO_Fabric_A drogesx0112_IRE_DRO_A_ISIL03_091_871
$
Simple Perl
$ perl -ne ' print if /drogesx0112/ and /ISIL03_091_871/ ' gafm.txt
IRE_DRO_Fabric_A drogesx0112_IRE_DRO_A_ISIL03_091_871
$
If you are not looking for any order and simply want to check if both strings are present in a single line or not then try following.
awk '/drogesx0112/ && /ISIL03_091_871/' Input_file
In case you are looking for sequence of strings in line:
If your line has drogesx0112 first and then ISIL03_091_871 then try following.
awk '/drogesx0112.*ISIL03_091_871/' Input_file
If your line has ISIL03_091_871 first and then drogesx0112 then try following.
awk '/ISIL03_091_871.*drogesx0112/' Input_file
This might work for you (GNU sed):
sed '/drogesx0112/!d;/ISIL03_091_871/!d' file
Delete the current line if it does not contain drogesx0112 and delete it if does not contain ISIL03_091_871 too.
Another way:
sed -n '/drogesx0112/{/ISIL03_091_871/p}' file
A third:
sed '/drogesx0112/{/ISIL03_091_871/p};d' file

How to print multiple files in awk?

What is wrong with this file please? I owuld like to print all lines from file01, file02, file03 ... file11.
awk '{print}' file[01-11].txt > file
Assuming you are running this in BASH, the [01-11] second is not in the correct format. Instead, consider the following:
awk '{print}' file{01..11}.txt > file
This is again, assuming a specific shell. If you are running this awk command in a shell that does not support the {##..##} nomenclature, consider testing how your file[01-11].txt is expanding first -- I imagine it's not expanding out to the files you think.
How about using cat itself for it like(since you are only printing and not doing any other operation):
cat Input_file{01..11}.txt > file
In case you really want to do only in awk then try:
awk '1' Input_file{01..11}.txt > file

why awk print file content while there is no print command

i have an awk file, which i read each words from a file into an array, there is no print command in it, but after i run it, the whole content of the file is printed,
#!/bin/awk -f
{
for(i=1;i<=NF;i++)
used[$i]=1
}
after i run this awk file like this
awk 1.awk 2
the whole content of file 2 is printed on the screen, i am confused,
i tried this directly from command line, there is nothing printed out, so i think there is something wrong with the file or the way to run this file,
You missed the -f option: awk -f 1.awk 2
What you provided is, instead of the contents of "1.awk" as the awk commands, you're providing the literal string 1.awk as the awk command.
You can essentially done this: awk '"1.awk"' 2
And since that is a "true" value, the default action is to print each record of the data contained in file "2".

Append prefix to first column of a file with awk

I have a couple of hundreds of files which I want to process with xargs. They all need a fix of their first column.
Therefore I need an awk command to append the prefix "ID_" to the first column of a file (except for the first header line). Can anyone help me with this?
Something along the line:
gawk -f ';' "{$1='ID_' $1; print $0}" file.csv > file_processed.csv
I am not expert for the command, though. And I would rather like to have some inplace processing instead of making a copy of each file. Beforehand, I made it in VIM, but then I only had one file.
:%s/^-/ID_/
I hope someone can help me here.
gawk 'BEGIN{FS=";"; OFS=";"} {if(NR>1) $1="ID_"$1; print}' file.csv > file_processed.csv
FS and OFS set the input and output field separators, respectively.
NR>1 checks whether current line number is larger than 1, so we don't modify the header line.
You can also modify the file in place with -i inplace option:
gawk -i inplace 'BEGIN{FS=";"; OFS=";"} {if(NR>1) $1="ID_"$1; print}' file.csv
Edit
After elaborating the original question, here's the final version:
gawk -i inplace 'BEGIN{FS=OFS=";"} NR>1{sub(/^-/,"ID_",$2)} 1' file.csv
which substitutes - in the beginning of second column with ID_.
NR>1 action applies for all but first (header) line. 1 invokes the default default print action.
If you just want to do something, particularly adding a prefix, on the first field, it is not different from adding the prefix to the whole line.
So you can just awk '$0 = "ID_" $0' file.csv it should do the work. If you want to make it "change in place", you can:
awk '$0="ID_"$0' csv >/tmp/foo && mv /tmp/foo file.csv
You can also make use of sed:
sed -i 's/^/ID_/' file
The -i does "in-place modification"
You mentioned vim, and gave s/^-/ID_/ cmd, it doesn't add the prefix (ID_), it will replace the leading - by the ID_, they are different.