Filter file with awk and keep header in output - awk

I have following CSV file:
a,b,c,d
x,1,1,1
y,1,1,0
z,1,0,0
I want to keep lines that add up more than 1, so I execute this awk command:
awk -F "," 'NR > 1{s=0; for (i=2;i<=NF;i++) s+=$i; if (s>1)print}' file
And obtain this:
x,1,1,1
y,1,1,0
How can I do the same but retain the first line (header)?

$ awk -F "," 'NR==1; NR > 1{s=0; for (i=2;i<=NF;i++) s+=$i; if (s>1)print}' file
a,b,c,d
x,1,1,1
y,1,1,0

Since it's only 0s and 1s:
$ awk 'NR==1 || gsub(/1/, "1") > 1' file
a,b,c,d
x,1,1,1
y,1,1,0

Related

Need to retrieve a value from an HL7 file using awk

In a Linux script program, I've got the following awk command for other purposes and to rename the file.
cat $edifile | awk -F\| '
{ OFS = "|"
print $0
} ' | tr -d "\012" > $newname.hl7
While this is happening, I'd like to grab the 5th field of the MSH segment and save it for later use in the script. Is this possible?
If no, how could I do it later or earlier on?
Example of the segment.
MSH|^~\&|business1|business2|/u/tmp/TR0049-GE-1.b64|routing|201811302126||ORU^R01|20181130212105810|D|2.3
What I want to do is retrieve the path and file name in MSH 5 and concatenate it to the end of the new file.
I've used this to capture the data but no luck. If fpth is getting set, there is no evidence of it and I don't have the right syntax for an echo within the awk phrase.
cat $edifile | awk -F\| '
{ OFS = "|"
{fpth=$(5)}
print $0
} ' | tr -d "\012" > $newname.hl7
any suggestions?
Thank you!
Try
filename=`awk -F'|' '{print $5}' $edifile | head -1`
You can skip the piping through head if the file is a single line
First of all, it must be mentioned that the awk line in your first piece of code, has zero use:
$ cat $edifile | awk -F\| ' { OFS = "|"; print $0 }' | tr -d "\012" > $newname.hl7
This is totally equivalent to
$ cat $edifile | tr -d "\012" > $newname.hl7
because OFS is only used to redefine $0 if you redefine a field.
Example:
$ echo "a|b|c" | awk -F\| '{OFS="/"; print $0}'
a|b|c
$ echo "a|b|c" | awk -F\| '{OFS="/"; $1=$1; print $0}'
a/b/c
I understand that you have a hl7 file in which you have a single line starting with the string "MSH". From this line you want to store the 5th field: this is achieved in the following way:
fpth=$(awk -v outputfile="${newname}.hl7" '
BEGIN{FS="|"; ORS="" }
($1 == "MSH"){ print $5 }
{ print $0 > outputfile }' $edifile)
I have replaced ORS to an empty character set, as it is equivalent to tr -d "\012". The above will work very nicely if you only have a single MSH in your file.

Replace end of line with comma and put parenthesis in sed/awk

I am trying to process the contents of a file from this format:
this1,EUR
that2,USD
other3,GBP
to this format:
this1(EUR),that2(USD),other3(GBP)
The result should be a single line.
As of now I have come up with this circuit of commands that works fine:
cat myfile | sed -e 's/,/\(/g' | sed -e 's/$/\)/g' | tr '\n' , | awk '{print substr($0, 0, length($0)- 1)}'
Is there a simpler way to do the same by just an awk command?
Another awk:
$ awk -F, '{ printf "%s%s(%s)", c, $1, $2; c = ","} END { print ""}' file
1(EUR),2(USD),3(GBP)
Following awk may help you on same.
awk -F, '{val=val?val OFS $1"("$2")":$1"("$2")"} END{print val}' OFS=, Input_file
Toying around with separators and gsub:
$ awk 'BEGIN{RS="";ORS=")\n"}{gsub(/,/,"(");gsub(/\n/,"),")}1' file
this1(EUR),that2(USD),other3(GBP)
Explained:
$ awk '
BEGIN {
RS="" # record ends in an empty line, not newline
ORS=")\n" # the last )
}
{
gsub(/,/,"(") # replace commas with (
gsub(/\n/,"),") # and newlines with ),
}1' file # output
Using paste+sed
$ # paste -s will combine all input lines to single line
$ seq 3 | paste -sd,
1,2,3
$ paste -sd, ip.txt
this1,EUR,that2,USD,other3,GBP
$ # post processing to get desired format
$ paste -sd, ip.txt | sed -E 's/,([^,]*)(,?)/(\1)\2/g'
this1(EUR),that2(USD),other3(GBP)

Add delimiters at end of each line

I've a csv file like below.
id,id1,id2,id3,id4,id5
1,101,102,103,104
2,201,202,203
3,301,302
Now what i want to add comma(,) to each line to make all line with same number of delimiters. So desired output should be.
id,id1,id2,id3,id4,id5
1,101,102,103,104,
2,201,202,203,,
3,301,302,,,
Using
awk -F "," ' { print NF-1 } ' file.csv | sort -r | head -1
I am able to find the max occurance of delimiter but not sure how to compare each line and append comma if its less than max.
With GNU awk (as I do not know if this works for other implementations)
$ # simply assign value to NF
$ awk -F, -v OFS=',' '{NF=6} 1' ip.txt
id,id1,id2,id3,id4,id5
1,101,102,103,104,
2,201,202,203,,
3,301,302,,,
If first line determines number of fields required:
$ awk -F, -v OFS=',' 'NR==1{f=NF} {NF=f} 1' ip.txt
id,id1,id2,id3,id4,id5
1,101,102,103,104,
2,201,202,203,,
3,301,302,,,
If any line determines max field:
$ cat ip.txt
id,id1,id2
1,101,102,103
2,201,202,203,204
3,301,302
$ awk -F, -v OFS=',' 'NR==FNR{f=(!f || NF>f) ? NF : f; next} {NF=f} 1' ip.txt ip.txt
id,id1,id2,,
1,101,102,103,
2,201,202,203,204
3,301,302,,
awk -F"," '{i=NF;c="";while (i++ < 6) {c=c","};print $0""c}' file
Output:
id,id1,id2,id3,id4,id5
1,101,102,103,104,
2,201,202,203,,
3,301,302,,,
You are already using the variable NF which indicates how many fields there are on a line.
awk -F , 'NF<6 { OFS=FS; for (i=NF+1; i<=6; i++) $i="" }1' filename
We start looping at the first undefined field and set it to an empty string, until we have six fields. Then the 1 at the end takes care of printing the now fully populated line. The OFS=FS is necessary to make the output field separator also be a comma (it is a space by default).
Following awk may also help you on same.
awk -F, '
FNR==1{
val=NF;
print;
next
}
{
count=NF;
while(count<val){
value=value",";
count++};
print $0 value;
value=count=""
}
' Input_file
Output will be as follows:
id,id1,id2,id3,id4,id5
1,101,102,103,104,
2,201,202,203,,
3,301,302,,,
Unified awk approach (based on number of fields of the 1st header line):
awk -F',' 'NR==1{ max_nf=NF; print }
NR>1{ printf "%s%.*s\n", $0, max_nf-NF, ",,,,,,,,," }' file
The output:
id,id1,id2,id3,id4,id5
1,101,102,103,104,
2,201,202,203,,
3,301,302,,,
Or via loop:
awk -F',' 'NR==1{ max_nf=NF; print }
NR>1{ n=max_nf-NF; r=""; while (n--) r=r","; print $0 r }' file

awk to split variable length record and add unique number on each group of records

i have a file which has variable length columns
x|y|XREC|DELIMITER|ab|cd|ef|IREC|DELIMITER|j|a|CREC|
p|q|IREC|DELIMITER|ww|xx|ZREC|
what i would like is
1|x|y|XREC|
1|ab|cd|ef|IREC|
1|j|a|CREC|
2|p|q|IREC|
2|ww|xx|ZREC|
So far i just managed to get seq number at the beginning
awk '{printf "%d|%s\n", NR, $0}' oldfile > with_seq.txt
Any help?
You could set the delimiter to DELIMITER:
$ awk -F 'DELIMITER[|]' '{for (i=1;i<=NF;i++)print NR"|"$i}' file
1|x|y|XREC|
1|ab|cd|ef|IREC|
1|j|a|CREC|
2|p|q|IREC|
2|ww|xx|ZREC|
Using awk
awk -F "DELIMITER" '{for(i=1;i<=NF;i++)print NR "|" $i}' file|sed 's/||/|/g'
1|x|y|XREC|
1|ab|cd|ef|IREC|
1|j|a|CREC|
2|p|q|IREC|
2|ww|xx|ZREC|

Delete lines from file -- awk

I have a file file.dat containing numbers, for example
4
6
7
I would like to use the numbers of this file to delete lines of another file.
Is there any way to pass this numbers as parameters to awk and delete these lines of another file?
I have this awk solution, but do not like it too much...
awk 'BEGIN { while( (getline x < "./file.dat" ) > 0 ) a[x]=0; } NR in a { next; }1' /path/to/another/file
Can you suggest something more elegant?
using NR==FNR to test which file awk is reading:
$ awk '{if(NR==FNR)idx[$0];else if(!(FNR in idx))print}' idx.txt data.txt
Or
$ awk 'NR==FNR{idx[$0]; next}; !(FNR in idx)' idx.txt data.txt
put index in idx.txt
put data in data.txt
I would use sed instead of awk:
$ sed $(for i in $(<idx.txt);do echo " -e ${i}d";done) file.txt