I want to shuffle and edit column 1 and put it as column 2. I want to keep only the part after "-" in column 1.
I tried to shuffle with
awk '{print $2,$1}'
in:
#9-297 TACCTGAGGTAGTAGGTTGTATAGTTCCTC
#10-276 CACAGCGTTGGTGGTATAGTGGTTAGCCACC
out:
ACCTGAGGTAGTAGGTTGTATAGTTCCTC 297
CACAGCGTTGGTGGTATAGTGGTTAGCCACC 276
You can split the first column to an array on the delimiter "-", then print out the part of the array you want:
awk '{split($1, a, "-");print $2, a[2]}' yourfile.txt
$ awk -F'[-[:space:]]+' '{print $3, $2}' file
TACCTGAGGTAGTAGGTTGTATAGTTCCTC 297
CACAGCGTTGGTGGTATAGTGGTTAGCCACC 276
Related
I have a tab delimited text file:
#CHROM
POS
ID
REF
ALT
1
188277
rs434
C
T
20
54183975
rs5321
CTAAA
C
and I try to replace the "ID" column with specific patern $CHROM_$POS_$REF_$ALT with sed or awk
#CHROM
POS
ID
REF
ALT
1
188277
1_188277_C_T
C
T
20
54183975
20_54183975_CTAAA_C
CTAAA
C
unfortunately, I managed only to delete this ID column with:
sed -i -r 's/\S+//3'
and all patterns I try do not work in all cases. To be honest I am lost in the documentation and I am looking for examples which could help me solve this problem.
Using awk, you can set the value of the 3rd field concatenating field 1,2,4 and 5 with an underscore except for the first line. Using column -t to present the output as a table:
awk '
BEGIN{FS=OFS="\t"}
NR>1 {
$3 = $1"_"$2"_"$4"_"$5
}1' file | column -t
Output
#CHROM POS ID REF ALT
1 188277 1_188277_C_T C T
20 54183975 20_54183975_CTAAA_C CTAAA C
Or writing all fields, with a custom value for the 3rd field:
awk '
BEGIN{FS=OFS="\t"}
NR==1{print;next}
{print $1, $2, $1"_"$2"_"$4"_"$5, $4, $5}
' file | column -t
GNU sed solution
sed '2,$s/\(\S*\)\t\(\S*\)\t\(\S*\)\t\(\S*\)\t\(\S*\)/\1\t\2\t\1_\2_\3_\4_\5\t\4\t\5/' file.txt
Explanation: from line 2 to last line, do following replace: put 5 \t-sheared columns (holding zero or more non-whitespace) into groups. Then replace it with these column joined using \t excluding third one, which is replace by _-join of 1st, 2nd, 3rd, 4th, 5th column.
(tested in sed (GNU sed) 4.2.2)
awk -v OFS='\t' 'NR==1 {print $0}; NR>1 {print $1, $2, $1"_"$2"_"$4"_"$5, $4, $5}' inputfile.txt
I was just introduced to awk and I'm trying to retrieve rows from my file based on the value on column 10.
I need to filter the data based on the value of the third value if ":" was used as a separator in column 10 (last column).
Here is an example data in column 10. 0/1:1,9:10:15:337,0,15.
I was able to extract the third value using this command awk '{print $10}' file.txt | awk -F ":" '/1/ {print $3}'
This returns the value 10 but how can I return other rows (not just the value in column 10) if this third value is less than or greater than a specific number?
I tried this awk '{if($10 -F ":" "/1/ ($3<10))" print $0;}' file.txt but it returns a syntax error.
Thanks!
Your code:
awk '{print $10}' file.txt | awk -F ":" '/1/ {print $3}'
should be just 1 awk script:
awk '$10 ~ /1/ { split($10,f,/:/); print f[3] }' file.txt
but I'm not sure that code is doing what you think it does. If you want to print the 3rd value of all $10s that contain :s, as it sounds like from your text, that'd be:
awk 'split($10,f,/:/) > 1 { print f[3] }' file.txt
and to print the rows where that value is less than 7 would be:
awk '(split($10,f,/:/) > 1) && (f[3] < 7)' file.txt
I have a big text file like this example:
example:
chr11 314980 314981 63 IFITM1 -131
chr11 315025 315026 54 IFITM1 -86
chr5 315085 315086 118 AHRR -53011
chr16 316087 316088 56 ITFG3 -86
chr16 316088 316089 90 ITFG3 -131
chr11 319672 319673 213 IFITM3 -131
chr11 319674 319675 514 IFITM3 -164
I want to group the rows based on the 6th column and sum the values
from the 4th column for every group. the new file would have 2
columns. 1st column would be the group and the 2nd column would be sum
(sum of values from column 4 from similar groups). the expected output
would look like this:
expected output:
-131 366
-86 110
-53011 118
-164 514
I am trying to do that in awk using the following code.
sort myfile.txt | awk -F'\t' '{ sub(/..$/,"**",$6) }1' OFS='\t' | awk '{print $1 "\t" $2}' > outfile.txt
but actually it returns an empty file. do you know how to fix it?
I have no idea what you are thinking with your code: why you are replacing the last 2 chars on the line with asterisks? why aren't you doing any addition anywhere? why do you sort (by column 1) first?
awk -F'\t' '
{sum[$6] += $4}
END {for (key in sum) {print key, sum[key]}}
' file | column -t
Use an associative array:
awk '{a[$NF]+=$4}END{for (i in a){print i, a[i]}}' file
If you're ok with sorted output, you don't need arrays:
sort -k6n file |
awk -F'\t' '
grp != $6 {
grp = $6
printf "%s%s%s%s", sum, sep, grp, FS
sum = 0
sep = ORS
} { sum += $4 } END { print sum }'
The following awk statement is working as expected.
awk '{print $1, $2, $3}' test.txt
But how do I say that I need all the columns after the second column?
awk '{print $1, $2, $3 to $NF}' test.txt
I need all columns from third column till end of that line. There can be 2 to 10 columns and all are considered as a part of the last column.
if you just want $3-$NF fields, standard way would be loop (for/while)
but for your requirement, you could:
awk '{$1=$2="";}sub("^ *","")'
for example:
kent$ seq -s' ' 10|awk '{$1=$2="";}sub("^ *","")'
3 4 5 6 7 8 9 10
if you want to "group" 100 fields into 3 groups: 1,2, 3-100:
awk '{x=$0;sub($1FS$2,"",x);gsub(FS,"",x);print $1,$2,x}'
same example:
kent$ seq -s' ' 10|awk '{x=$0;sub($1FS$2,"",x);gsub(FS,"",x);print $1,$2,x}'
1 2 345678910
hope it is what you want.
The intuitive way.
awk 'BEGIN{ORS=""} {for(i=3; i<=NF; i++) if(i != NF){print $i " "} else {print $i "\n"}}' test.txt
Some more:
awk '{$1=$2=x; $0=$0; $1=$1}1' file
awk '{$1=$1; sub($1 FS $2 FS,x)}1' file
To keep spacing in tact:
awk 'sub($1 "[ \t]*" $2 "[ \t]*",x)' file
All I want is the last two columns printed.
You can make use of variable NF which is set to the total number of fields in the input record:
awk '{print $(NF-1),"\t",$NF}' file
this assumes that you have at least 2 fields.
awk '{print $NF-1, $NF}' inputfile
Note: this works only if at least two columns exist. On records with one column you will get a spurious "-1 column1"
#jim mcnamara: try using parentheses for around NF, i. e. $(NF-1) and $(NF) instead of $NF-1 and $NF (works on Mac OS X 10.6.8 for FreeBSD awkand gawk).
echo '
1 2
2 3
one
one two three
' | gawk '{if (NF >= 2) print $(NF-1), $(NF);}'
# output:
# 1 2
# 2 3
# two three
using gawk exhibits the problem:
gawk '{ print $NF-1, $NF}' filename
1 2
2 3
-1 one
-1 three
# cat filename
1 2
2 3
one
one two three
I just put gawk on Solaris 10 M4000:
So, gawk is the cuplrit on the $NF-1 vs. $(NF-1) issue. Next question what does POSIX say?
per:
http://www.opengroup.org/onlinepubs/009695399/utilities/awk.html
There is no direction one way or the other. Not good. gawk implies subtraction, other awks imply field number or subtraction. hmm.
Please try this out to take into account all possible scenarios:
awk '{print $(NF-1)"\t"$NF}' file
or
awk 'BEGIN{OFS="\t"}' file
or
awk '{print $(NF-1), $NF} {print $(NF-1), $NF}' file
try with this
$ cat /tmp/topfs.txt
/dev/sda2 xfs 32G 10G 22G 32% /
awk print last column
$ cat /tmp/topfs.txt | awk '{print $NF}'
awk print before last column
$ cat /tmp/topfs.txt | awk '{print $(NF-1)}'
32%
awk - print last two columns
$ cat /tmp/topfs.txt | awk '{print $(NF-1), $NF}'
32% /