How to subtract a constant number from a column - awk

Is there a way to subtract the smallest value from all the values of a column? I need to subtract the first number in the 1st column from all other numbers in the first column.
I wrote this script, but it's not giving the right result:
$ awk '{$1 = $1 - 1280449530}' file
1280449530 452
1280449531 2434
1280449531 2681
1280449531 2946
1280449531 1626
1280449532 3217
1280449532 4764
1280449532 4501
1280449532 3372
1280449533 4129
1280449533 6937
1280449533 6423
1280449533 4818
1280449534 4850
1280449534 8980
1280449534 8078
1280449534 6788
1280449535 5587
1280449535 10879
1280449535 9920
1280449535 8146
1280449536 6324
1280449536 12860
1280449536 11612

What you have essentially works, you're just not outputting it. This will output what you want:
awk '{print ($1 - 1280449530) " " $2}' file
You can also be slightly cleverer and not hardcode the shift amount:
awk '{
if(NR == 1) {
shift = $1
}
print ($1 - shift) " " $2
}' file

You were on the right track:
awk '{$1 = $1 - 1280449530; print}' file
Here is a simplified version of Michael's second example:
awk 'NR == 1 {origin = $1} {$1 = $1 - origin; print}' file

bash shell script
#!/bin/bash
exec 4<"file"
read col1 col2<&4
while read -r n1 n2 <&4
do
echo $((n1-$col1))
# echo "scale=2;$n1 - $col1" | bc # dealing with decimals..
done
exec >&4-

In vim you can select the column with
and go to the bottom of the page with
G
then
e
to go to the end of the number
then you may enter the number like 56
56
this will add 56 to the column

Related

awk conditional statement based on a value between colon

I was just introduced to awk and I'm trying to retrieve rows from my file based on the value on column 10.
I need to filter the data based on the value of the third value if ":" was used as a separator in column 10 (last column).
Here is an example data in column 10. 0/1:1,9:10:15:337,0,15.
I was able to extract the third value using this command awk '{print $10}' file.txt | awk -F ":" '/1/ {print $3}'
This returns the value 10 but how can I return other rows (not just the value in column 10) if this third value is less than or greater than a specific number?
I tried this awk '{if($10 -F ":" "/1/ ($3<10))" print $0;}' file.txt but it returns a syntax error.
Thanks!
Your code:
awk '{print $10}' file.txt | awk -F ":" '/1/ {print $3}'
should be just 1 awk script:
awk '$10 ~ /1/ { split($10,f,/:/); print f[3] }' file.txt
but I'm not sure that code is doing what you think it does. If you want to print the 3rd value of all $10s that contain :s, as it sounds like from your text, that'd be:
awk 'split($10,f,/:/) > 1 { print f[3] }' file.txt
and to print the rows where that value is less than 7 would be:
awk '(split($10,f,/:/) > 1) && (f[3] < 7)' file.txt

awk multiple row and printing results

I would like to print some specific parts of a results with awk, after multiple pattern selection.
What I have is (filetest):
A : 1
B : 2
I expect to have:
1 - B : 2
So, only the result of the first row, then the whole second row.
The dash was added by me.
I have this:
awk -F': ' '$1 ~ /A|B/ { printf "%s", $2 "-" }' filetest
Result:
1 -2 -
And I cannot get the full second row, without failing in showing just the result of the first one
awk -F': ' '$1 ~ /A|B/ { printf "%s", $2 "$1" }' filetest
Result:
1 - A 2 - B
Is there any way to print in the same line, exactly the column/row that I need with awk?
In my case R1C2 - R2C1: R2C2?
Thanks!
This will do what you are expecting:
awk -F: '/^A/{printf "%s -", $2}/^B/{print}' filetest
$ awk -F: 'NR%2 {printf "%s - ", $2; next}1' filetest
1 - B : 2
You can try this
awk -F: 'NR%2==1{a=$2; } NR%2==0{print a " - " $0}' file
output
1 - B : 2
I'd probably go with #jas's answer as it's clear, simple, and not coupled to your data values but just to show an alternative approach:
$ awk '{printf "%s", (NR%2 ? $3 " - " : $0 ORS)}' file
1 - B : 2
tried on gnu awk
awk -F':' 'NR==1{s=$2;next}{FS="";s=s" - "$0;print s}' filetest

grouping and summarizing the rows in a big text file using awk

I have a big text file like this example:
example:
chr11 314980 314981 63 IFITM1 -131
chr11 315025 315026 54 IFITM1 -86
chr5 315085 315086 118 AHRR -53011
chr16 316087 316088 56 ITFG3 -86
chr16 316088 316089 90 ITFG3 -131
chr11 319672 319673 213 IFITM3 -131
chr11 319674 319675 514 IFITM3 -164
I want to group the rows based on the 6th column and sum the values
from the 4th column for every group. the new file would have 2
columns. 1st column would be the group and the 2nd column would be sum
(sum of values from column 4 from similar groups). the expected output
would look like this:
expected output:
-131 366
-86 110
-53011 118
-164 514
I am trying to do that in awk using the following code.
sort myfile.txt | awk -F'\t' '{ sub(/..$/,"**",$6) }1' OFS='\t' | awk '{print $1 "\t" $2}' > outfile.txt
but actually it returns an empty file. do you know how to fix it?
I have no idea what you are thinking with your code: why you are replacing the last 2 chars on the line with asterisks? why aren't you doing any addition anywhere? why do you sort (by column 1) first?
awk -F'\t' '
{sum[$6] += $4}
END {for (key in sum) {print key, sum[key]}}
' file | column -t
Use an associative array:
awk '{a[$NF]+=$4}END{for (i in a){print i, a[i]}}' file
If you're ok with sorted output, you don't need arrays:
sort -k6n file |
awk -F'\t' '
grp != $6 {
grp = $6
printf "%s%s%s%s", sum, sep, grp, FS
sum = 0
sep = ORS
} { sum += $4 } END { print sum }'

linux csv file concatenate columns into one column

I've been looking to do this with sed, awk, or cut. I am willing to use any other command-line program that I can pipe data through.
I have a large set of data that is comma delimited. The rows have between 14 and 20 columns. I need to recursively concatenate column 10 with column 11 per row such that every row has exactly 14 columns. In other words, this:
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
will become:
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
I can get the first 10 columns. I can get the last N columns. I can concatenate columns. I cannot think of how to do it in one line so I can pass a stream of endless data through it and end up with exactly 14 columns per row.
Examples (by request):
How many columns are in the row?
sed 's/[^,]//g' | wc -c
Get the first 10 columns:
cut -d, -f1-10
Get the last 4 columns:
rev | cut -d, -f1-4 | rev
Concatenate columns 10 and 11, showing columns 1-10 after that:
awk -F',' ' NF { print $1","$2","$3","$4","$5","$6","$7","$8","$9","$10$11}'
Awk solution:
awk 'BEGIN{ FS=OFS="," }
{
diff = NF - 14;
for (i=1; i <= NF; i++)
printf "%s%s", $i, (diff > 1 && i >= 10 && i < (10+diff)?
"": (i == NF? ORS : ","))
}' file
The output:
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
With GNU awk for the 3rd arg to match() and gensub():
$ cat tst.awk
BEGIN{ FS="," }
match($0,"(([^,]+,){9})(([^,]+,){"NF-14"})(.*)",a) {
$0 = a[1] gensub(/,/,"","g",a[3]) a[5]
}
{ print }
$ awk -f tst.awk file
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
If perl is okay - can be used just like awk for stream processing
$ cat ip.txt
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
1,2,3,4,5,6,3,4,2,4,3,4,3,2,5,2,3,4
1,2,3,4,5,6,3,4,2,4,a,s,f,e,3,4,3,2,5,2,3,4
$ awk -F, '{print NF}' ip.txt
16
18
22
$ perl -F, -lane '$n = $#F - 4;
print join ",", (#F[0..8], join("", #F[9..$n]), #F[$n+1..$#F])
' ip.txt
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
1,2,3,4,5,6,3,4,2,43432,5,2,3,4
1,2,3,4,5,6,3,4,2,4asfe3432,5,2,3,4
-F, -lane split on , results saved in #F array
$n = $#F - 4 magic number, to ensure output ends with 14 columns. $#F gives the index of last element of array (won't work if input line has less than 14 columns)
join helps to stitch array elements together with specified string
#F[0..8] array slice with first 9 elements
#F[9..$n] and #F[$n+1..$#F] the other slices as needed
Borrowing from Ed Morton's regex based solution
$ perl -F, -lape '$n=$#F-13; s/^([^,]*,){9}\K([^,]*,){$n}/$&=~tr|,||dr/e' ip.txt
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
1,2,3,4,5,6,3,4,2,43432,5,2,3,4
1,2,3,4,5,6,3,4,2,4asfe3432,5,2,3,4
$n=$#F-13 magic number
^([^,]*,){9}\K first 9 fields
([^,]*,){$n} fields to change
$&=~tr|,||dr use tr to delete the commas
e this modifier allows use of Perl code in replacement section
this solution also has the added advantage of working even if input field is less than 14
You can try this gnu sed
sed -E '
s/,/\n/9g
:A
s/([^\n]*\n)(.*)(\n)(([^\n]*\n){4})/\1\2\4/
tA
s/\n/,/g
' infile
First variant - with awk
awk -F, '
{
for(i = 1; i <= NF; i++) {
OFS = (i > 9 && i < NF - 4) ? "" : ","
if(i == NF) OFS = "\n"
printf "%s%s", $i, OFS
}
}' input.txt
Second variant - with sed
sed -r 's/,/#/10g; :l; s/#(.*)((#[^#]){4})/\1\2/; tl; s/#/,/g' input.txt
or, more straightforwardly (without loop) and probably faster.
sed -r 's/,(.),(.),(.),(.)$/#\1#\2#\3#\4/; s/,//10g; s/#/,/g' input.txt
Testing
Input
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p,q,r,s,t,u
Output
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p
a,b,c,d,e,f,g,h,i,jklmn,o,p,q,r
a,b,c,d,e,f,g,h,i,jklmnopq,r,s,t,u
Solved a similar problem using csvtool. Source file, copied from one of the other answers:
$ cat input.txt
a,b,c,d,e,f,g,h,i,j,k,l,m,n,o,p
1,2,3,4,5,6,3,4,2,4,3,4,3,2,5,2,3,4
1,2,3,4,5,6,3,4,2,4,a,s,f,e,3,4,3,2,5,2,3,4
Concatenating columns:
$ cat input.txt | csvtool format '%1,%2,%3,%4,%5,%6,%7,%8,%9,%10%11%12,%13,%14,%15,%16,%17,%18,%19,%20,%21,%22\n' -
a,b,c,d,e,f,g,h,i,jkl,m,n,o,p,,,,,,
1,2,3,4,5,6,3,4,2,434,3,2,5,2,3,4,,,,
1,2,3,4,5,6,3,4,2,4as,f,e,3,4,3,2,5,2,3,4
anatoly#anatoly-workstation:cbs$ cat input.txt

awk script for finding smallest value from column

I am beginner in AWK, so please help me to learn it. I have a text file with name snd and it values are
1 0 141
1 2 223
1 3 250
1 4 280
I want to print the entire row when the third column value is minimu
This should do it:
awk 'NR == 1 {line = $0; min = $3}
NR > 1 && $3 < min {line = $0; min = $3}
END{print line}' file.txt
EDIT:
What this does is:
Remember the 1st line and its 3rd field.
For the other lines, if the 3rd field is smaller than the min found so far, remember the line and its 3rd field.
At the end of the script, print the line.
Note that the test NR > 1 can be skipped, as for the 1st line, $3 < min will be false. If you know that the 3rd column is always positive (not negative), you can also skip the NR == 1 ... test as min's value at the beginning of the script is zero.
EDIT2:
This is shorter:
awk 'NR == 1 || $3 < min {line = $0; min = $3}END{print line}' file.txt
You don't need awk to do what you want. Use sort
sort -nk 3 file.txt | head -n 1
Results:
1 0 141
I think sort is an excellent answer, unless for some reason what you're looking for is the awk logic to do this in a larger script, or you want to avoid the extra pipes, or the purpose of this question is to learn more about awk.
$ awk 'NR==1{x=$3;line=$0} $3<x{line=$0} END{print line}' snd
Broken out into pieces, this is:
NR==1 {x=$3;line=$0} -- On the first line, set an initial value for comparison and store the line.
$3<x{line=$0} - On each line, compare the third field against our stored value, and if the condition is true, store the line. (We could make this run only on NR>1, but it doesn't matter.
END{print line} -- At the end of our input, print whatever line we've stored.
You should read man awk to learn about any parts of this that don't make sense.
a short answer for this would be:
sort -k3,3n temp|head -1
since you have asked for awk:
awk '{if(min>$3||NR==1){min=$3;a[$3]=$0}}END{print a[min]}' your_file
But i prefer the shorter one always.
For calculating the smallest value in any column , let say last column
awk '(FNR==1){a=$NF} {a=$NF < a?$NF:a} END {print a}'
this will only print the smallest value of the column.
In case if complete line is needed better to use sort:
sort -r -n -t [delimiter] -k[column] [file name]
awk -F ";" '(NR==1){a=$NF;b=$0} {a=$NF<a?$NF:a;b=$NF>a?b:$0} END {print b}' filename
this will print the line with smallest value which is encountered first.
awk 'BEGIN {OFS=FS=","}{if ( a[$1]>$2 || a[$1]=="") {a[$1]=$2;} if (b[$1]<$2) {b[$1]=$2;} } END {for (i in a) {print i,a[i],b[i]}}' input_file
We use || a[$1]=="" because when 1st value of field 1 is encountered it will have null in a[$1].