Need customized output filename using awk command - awk

How to split file via awk command and customize the name of the output file?
I have tried,
awk -v RS=$val '{ outfile = "'$filename'""." NR "'.${extension}'"; print > outfile}' $file

You can use sprintf to create the filenames as
outputfile = sprintf ("filename%03d.txt", NR);
Some things to be noted in your script,
You cannot access shell variables directly in your awk script, you need to create awk variables using -v formate.
So the script should be like
awk -v filename=$filename -v extension=$extension '{ outputfile = sprintf (filename"%03d"extension, NR); print > outputfile }'
awk doesn't have any specific concatenation operator, just writing 2 strings together concatenates them.

Related

How do I append awk script to a file?

hello i have the following awk script:
#!/bin/awk -f
{for(i=4;i<=7;i++) j+=$i; print "Student",NR",",$1,$2",",j/4; j=0}
and i want to append the output to a new file (newfile.txt).
Could you please try following. Where mention output file inside awk code itself.
{for(i=4;i<=7;i++) j+=$i; print "Student" OFS NR",",$1,$2"," OFS j/4 >> (output_file); j=0}
OR 2nd way is when you are running your awk code where script is the script name you are calling to run your awk program eg-->
./script >> output_file
In case you want to run an awk one-liner to get output into a output file then try following.
awk -v output_file="Output.txt" '{for(i=4;i<=7;i++) j+=$i; print "Student" OFS NR",",$1,$2"," OFS j/4 >> (output_file); j=0}' Input_file
In above code I have created a variable named output_file whose value you could keep it as per your wish too.
Don't use a shebang to call awk from a shell script as it robs you of the ability to separate your functionality by what each (the shell or awk) does best. Make your shell script look like this instead (using whichever shell you use for the shebang):
#!/bin/env bash
awk '
{for(i=4;i<=7;i++) j+=$i; print "Student",NR",",$1,$2",",j/4; j=0}
' "$#" >> newfile.txt

Count field separators on each line of input file and if missing/exceeding, output filename to error file

I have to validate the input file, Input.txt, for proper number of field separators on each row and if even one row including the header is missing or exceeding the correct number of field separators then print the name of the file to errorfiles.txt and exit.
I have another file to use as reference for the correct number of field separators, valid.txt, then compare the number of field separators on each row of the input file with the number of field separators in the valid.txt file.
awk -F '|' '{ print NF-1; exit }' valid.txt > fscount
awk -F '|' '(NF-1) != "cat fscount" { print FILENAME>"errorfiles.txt"; exit}' Input.txt
This is not working.
awk -F '|' '{ print NF-1; exit }' valid.txt > fscount
awk -F '|' '(NF-1) != "cat fscount" { print FILENAME>"errorfiles.txt"; exit}' Input.txt
It is not fully clear what your requirement is, to print the FILENAME on just a single input file provided, perhaps you wanted to loop over a list of files on a directory running this command?
Anyway, to use the content of the file in the context of awk, just use its -v switch and use input re-direction on the file
awk -F '|' -v count="$(<fscount)" -v fname="errorfiles.txt" '(NF-1) != (count+0) { print FILENAME > fname; close(fname); exit}' Input.txt
Notice the use of close(filename) here, which is generally required when you are manipulating files inside awk constructs. The close() call just closes the file descriptor associated with opening the file pointed by filename explicitly, instead of letting the OS do it.
GNU awk solution:
awk -F '|' 'ARGIND==1{aimNF=NF; nextfile} ARGIND==2{if (NF!=aimNF) {print FILENAME > "errorfiles.txt"; exit}}' valid.txt Input.txt
You can do it with just one command,
-- use awk to read two files, store NF number of 1st file, and compare it in the second file.
For other awk you can replace ARGIND==1 with FILENAME==ARGV[1], and so on.
Or if you are sure first file won't be empty, use NR==FNR and NR>FNR instead.

awk -Search pattern through Variable

We have wrote shell script for multiple file name search pattern.
file format:
<number>_<20180809>.txt
starting with single number and ending with 8 digits number
Command:
awk -v string="12_1234" -v serch="^[0-9]+_+[0-9][0-9][0-9][0-9]$" "BEGIN{ if (string ~/serch$/) print string }"
If sting matches then return value.
You can just change your command in the following way and it will work:
awk -v string='12_1234' -v search='^[0-9]+_+[0-9][0-9][0-9][0-9]$' 'BEGIN{ if (string ~ search) print string }'
12_1234
You do not need to use /.../ syntax for regex if you use the ~ operator and also you had one extra $. You were really close!!!
Then you must adapt the search regex into ^[0-9]_[0-9]{8}$ to match exactly your_<20180809>` pattern.
Also if you are just extracting this information from the file you can use grep,
$ awk -v string='1_12345678' -v search='^[0-9]_[0-9]{8}$' 'BEGIN{ if (string ~ search) print string }'
1_12345678
$ (search='^[0-9]_[0-9]{8}$'; echo '1_12345678')| grep -oE "$search"
1_12345678

Exact string match in awk

I have a file test.txt with the next lines
1997 100 500 2010TJ
2010TJXML 16 20 59
I'm using the next awk line to get information only about string 2010TJ
awk -v var="2010TJ" '$0 ~ var {print $0}' test.txt
But the code print the two lines. I want to know how to get the line containing the exact string
1997 100 500 2010TJ
the string can be placed in any column of the file.
Several options:
Use a gawk word boundary (not POSIX awk...):
$ gawk '/\<2010TJ\>/' file
An actual space or tab or what is separating the columns:
$ awk '/^2010TJ /' file
Or compare the field directly to the string:
$ awk '$1=="2010TJ"' file
You can loop over the fields to test each field if you wish:
$ awk '{for (i=1;i<=NF;i++) if ($i=="2010TJ") {print; next}}' file
Or, given your example of setting a variable, those same using a variable:
$ gawk -v s=2010TJ '$0~"\\<" s "\\>"'
$ awk -v s=2010TJ '$0~"^" s " "'
$ awk -v s=2010TJ '$1==s'
Note the first is a little different than the second and third. The first is the standalone string 2010TJ anywhere in $0; the second and third is a string that starts with that string.
Try this (for testing only column 1) :
awk '$1 == "2010TJ" {print $0}' test.txt
or grep like (all columns) :
gawk '/\<2010TJ\>/ {print $0}' test.txt
Note
\< \> is word boundarys
another awk with word boundary
awk '/\y2010TJ\y/' file
note \y matches either beginning or end of a word.

Awk print string with variables

How do I print a string with variables?
Trying this
awk -F ',' '{printf /p/${3}_abc/xyz/${5}_abc_def/}' file
Need this at output
/p/APPLE_abc/xyz/MANGO_abc_def/
where ${3} = APPLE
and ${5} = MANGO
printf allows interpolation of variables. With this as the test file:
$ cat file
a,b,APPLE,d,MANGO,f
We can use printf to achieve the output you want as follows:
$ awk -F, '{printf "/p/%s_abc/xyz/%s_abc_def/\n",$3,$5;}' file
/p/APPLE_abc/xyz/MANGO_abc_def/
In printf, the string %s means insert-a-variable-here-as-a-string. We have two occurrences of %s, one for $3 and one for $5.
Not as readable, but the printf isn't necessary here. Awk can insert the variables directly into the strings if you quote the string portion.
$ cat file.txt
1,2,APPLE,4,MANGO,6,7,8
$ awk -F, '{print "/p/" $3 "_abc/xyz/" $5 "_abc_def/"}' file.txt
/p/APPLE_abc/xyz/MANGO_abc_def/