awk not printing header in output file - awk

The below awk seems to work great with 1 issue, the header lines do hot print in the output? I have been staring at this awhile with no luck. What am I missing? Thank you :).
awk
awk 'NR==FNR{for (i=1;i<=NF;i++) a[$i];next} FNR==1 || ($7 in a)' /home/panels/file1 test.txt |
awk '{split($2,a,"-"); print a[1] "\t" $0}' |
sort |
cut -f2-> /home/panels/test_filtered.vcf
test.txt (used in the awk to give the filtered output --only a small portion of the data but the tab delimited format is shown)
Chr Start End Ref Alt
chr1 949608 949608 G A
current output (has no header)
chr1 949608 949608 G A
desired output (has header)
Chr Start End Ref Alt
chr1 949608 949608 G A

It looks like the header is going to sort, and getting mixed in with your data. A simple solution is to do:
... | { read line; echo $line; sort; } |
to prevent the first line from going to sort.

you can combine your scripts and add the sort into awk and handle header this way.
$ awk 'NR==FNR{for(i=1;i<=NF;i++)a[$i]; next}
FNR==1{print "dummy\t" $0; next}
$7 in a{split($2,b,"-"); print b[1] "\t" $0 | "sort" }' file1 file2 |
cut -f2

Related

Sort a file preserving the header as first position with bash

When sorting a file, I am not preserving the header in its position:
file_1.tsv
Gene Number
a 3
u 7
b 9
sort -k1,1 file_1.tsv
Result:
a 3
b 9
Gene Number
u 7
So I am tryig this code:
sed '1d' file_1.tsv | sort -k1,1 > file_1_sorted.tsv
first='head -1 file_1.tsv'
sed '1 "$first"' file_1_sorted.tsv
What I did is to remove the header and sort the rest of the file, and then trying to add again the header. But I am not able to perform this last part, so I would like to know how can I copy the header of the original file and insert it as the first row of the new file without substituting its actuall first row.
You can do this as well :
{ head -1; sort; } < file_1.tsv
** Update **
For macos :
{ IFS= read -r header; printf '%s\n' "$header" ; sort; } < file_1.tsv
a simpler awk
$ awk 'NR==1{print; next} {print | "sort"}' file
$ head -1 file; tail -n +2 file | sort
Output:
Gene Number
a 3
b 9
u 7
Could you please try following.
awk '
FNR==1{
first=$0
next
}
{
val=(val?val ORS:"")$0
}
END{
print first
print val | "sort"
}
' Input_file
Logical explanation:
Check condition FNR==1 to see if its first line; then save its values to variable and move on to next line by next.
Then keep appending all lines values to another variable with new line till last line.
Now come to END block of this code which executes when Input_file is done being read, there print first line value and put sort command on rest of the lines value there.
This will work using any awk, sort, and cut in any shell on every UNIX box and will work whether the input is coming from a pipe (when you can't read it twice) or from a file (when you can) and doesn't involve awk spawning a subshell:
awk -v OFS='\t' '{print (NR>1), $0}' file | sort -k1,1n -k2,2 | cut -f2-
The above uses awk to stick a 0 at the front of the header line and a 1 in front of the rest so you can sort by that number then whatever other field(s) you want to sort on and then remove the added field again with a cut. Here it is in stages:
$ awk -v OFS='\t' '{print (NR>1), $0}' file
0 Gene Number
1 a 3
1 u 7
1 b 9
$ awk -v OFS='\t' '{print (NR>1), $0}' file | sort -k1,1n -k2,2
0 Gene Number
1 a 3
1 b 9
1 u 7
$ awk -v OFS='\t' '{print (NR>1), $0}' file | sort -k1,1n -k2,2 | cut -f2-
Gene Number
a 3
b 9
u 7

awk multiple row and printing results

I would like to print some specific parts of a results with awk, after multiple pattern selection.
What I have is (filetest):
A : 1
B : 2
I expect to have:
1 - B : 2
So, only the result of the first row, then the whole second row.
The dash was added by me.
I have this:
awk -F': ' '$1 ~ /A|B/ { printf "%s", $2 "-" }' filetest
Result:
1 -2 -
And I cannot get the full second row, without failing in showing just the result of the first one
awk -F': ' '$1 ~ /A|B/ { printf "%s", $2 "$1" }' filetest
Result:
1 - A 2 - B
Is there any way to print in the same line, exactly the column/row that I need with awk?
In my case R1C2 - R2C1: R2C2?
Thanks!
This will do what you are expecting:
awk -F: '/^A/{printf "%s -", $2}/^B/{print}' filetest
$ awk -F: 'NR%2 {printf "%s - ", $2; next}1' filetest
1 - B : 2
You can try this
awk -F: 'NR%2==1{a=$2; } NR%2==0{print a " - " $0}' file
output
1 - B : 2
I'd probably go with #jas's answer as it's clear, simple, and not coupled to your data values but just to show an alternative approach:
$ awk '{printf "%s", (NR%2 ? $3 " - " : $0 ORS)}' file
1 - B : 2
tried on gnu awk
awk -F':' 'NR==1{s=$2;next}{FS="";s=s" - "$0;print s}' filetest

Need to retrieve a value from an HL7 file using awk

In a Linux script program, I've got the following awk command for other purposes and to rename the file.
cat $edifile | awk -F\| '
{ OFS = "|"
print $0
} ' | tr -d "\012" > $newname.hl7
While this is happening, I'd like to grab the 5th field of the MSH segment and save it for later use in the script. Is this possible?
If no, how could I do it later or earlier on?
Example of the segment.
MSH|^~\&|business1|business2|/u/tmp/TR0049-GE-1.b64|routing|201811302126||ORU^R01|20181130212105810|D|2.3
What I want to do is retrieve the path and file name in MSH 5 and concatenate it to the end of the new file.
I've used this to capture the data but no luck. If fpth is getting set, there is no evidence of it and I don't have the right syntax for an echo within the awk phrase.
cat $edifile | awk -F\| '
{ OFS = "|"
{fpth=$(5)}
print $0
} ' | tr -d "\012" > $newname.hl7
any suggestions?
Thank you!
Try
filename=`awk -F'|' '{print $5}' $edifile | head -1`
You can skip the piping through head if the file is a single line
First of all, it must be mentioned that the awk line in your first piece of code, has zero use:
$ cat $edifile | awk -F\| ' { OFS = "|"; print $0 }' | tr -d "\012" > $newname.hl7
This is totally equivalent to
$ cat $edifile | tr -d "\012" > $newname.hl7
because OFS is only used to redefine $0 if you redefine a field.
Example:
$ echo "a|b|c" | awk -F\| '{OFS="/"; print $0}'
a|b|c
$ echo "a|b|c" | awk -F\| '{OFS="/"; $1=$1; print $0}'
a/b/c
I understand that you have a hl7 file in which you have a single line starting with the string "MSH". From this line you want to store the 5th field: this is achieved in the following way:
fpth=$(awk -v outputfile="${newname}.hl7" '
BEGIN{FS="|"; ORS="" }
($1 == "MSH"){ print $5 }
{ print $0 > outputfile }' $edifile)
I have replaced ORS to an empty character set, as it is equivalent to tr -d "\012". The above will work very nicely if you only have a single MSH in your file.

filtering in a text file using awk

i have a tab separated text file like this small example:
chr1 100499714 100499715 1
chr1 100502177 100502178 10
chr1 100502181 100502182 2
chr1 100502191 100502192 18
chr1 100502203 100502204 45
in the new file that I will make:
1- I want to select the rows based on the 4th column meaning in the value of 4th column is at least 10, I will keep the entire row otherwise will be filtered out.
2- in the next step the 4th column will be removed.
the result will look like this:
chr1 100502177 100502178
chr1 100502191 100502192
chr1 100502203 100502204
to get such results I have tried the following code in awk:
cat input.txt | awk '{print $1 "\t" $2 "\t" $3}' > out.txt
but I do not know how to implement the filtering step. do you know how to fix the code?
Just put the condition before output:
cat input.txt | awk '$4 >= 10 {print $1 "\t" $2 "\t" $3}' > out.txt
here is another, might work better if you have many more fields
$ awk '$NF>=10{sub(/\t\w+$/,""); print}' file

using awk to match and sum a file of multiple lines

I am trying to combine matching lines in file.txt $1 and then display the sum of `$2 for those matches. Thank you :).
File.txt
ENSMUSG00000000001:001
ENSMUSG00000000001:002
ENSMUSG00000000001:003
ENSMUSG00000000002:003
ENSMUSG00000000002:003
ENSMUSG00000000003:002
Desired output
ENSMUSG00000000001 6
ENSMUSG00000000002 6
ENSMUSG00000000003 2
awk -F':' -v OFS='\t' '{x=$1;$1="";a[x]=a[x]$0}END{for(x in a)print x,a[x]}' file > output.txt
$ awk -F':' -v OFS='\t' '{sum[$1]+=$2} END{for (key in sum) print key, sum[key]}' file
ENSMUSG00000000001 6
ENSMUSG00000000002 6
ENSMUSG00000000003 2
{x=$1;a[x]=a[x] + $2} END{for(x in a)print x,a[x]}
Just a typo I guess: instead of adding $0 add $2. That gives me the expected output. And the $1="" is not necessary. To make sure that there isn't anything funny with $2 you may consider 1.0*$2.