extracting data from a file with awk - awk

I have a data set like below
first 0 1
first 1 2
first 2 3
second 0 1
second 1 2
second 2 3
third 0 1
third 1 2
third 2 3
I need to check this file and extract the third columns for first, second and third and store them in different files.
The output files should contain:
1
2
3

This is pretty straight forward awk '{print $3>$1}' file i.e. print the third field and redirect the output to the file, where the filename is the first field.
Demo:
$ ls
file
$ awk '{print $3>$1}' file
$ ls
file first second third
$ cat first
1
2
3
$ cat second
1
2
3
$ cat third
1
2
3

Related

Apply a sed command to every column of a specific row

I have a tab separated file:
samplename1/filename1 anotherthing/anotherfile asdfgh/hjklñ
2 3 4
5 6 7
I am trying to remove everything after the / just in the header of the file using sed:
sed 's/[/].*//' samplenames.txt
How can I do this for each column of the file? because right now I am removing everything after the first /, but I want to remove just the part of each column after the /.
Actual output:
samplename1
2 3 4
5 6 7
Desired output:
samplename1 anotherthing asdfgh
2 3 4
5 6 7
With GNU sed, you may use
sed -i '1 s,/[^[:space:]]*,,g' samplenames.txt
With FreeBSD sed, you need to add '' after -i.
See the online demo
The -i option will make sed change the file inline. The 1 means only the first line will be modified in the file.
The s,/[^[:space:]]*,,g command means that all occurrences of / followed with 0 or more non-whitespace chars after it will be removed.
Given:
printf "samplename1/filename1\tanotherthing/anotherfile\tasdfgh/hjklñ
2\t3\t4
5\t6\t7" >file # ie, note only one tab between fields...
Here is an POSIX awk to do this:
awk -F $"\t" 'NR==1{gsub("/[^\t]*",""); print; next} 1' file
Prints:
samplename1 anotherthing asdfgh
2 3 4
5 6 7
You can get those to line up with the column command:
awk -F $"\t" 'NR==1{gsub("/[^\t]*",""); print; next} 1' file | column -t
samplename1 anotherthing asdfgh
2 3 4
5 6 7

Find the ratio among columns

I have some input files of the following format:
File1.txt File2.txt File3.txt
1 2 1 6 1 20
2 3 2 9 2 21
3 7 3 14 3 28
Now I need to output a new single file using AWK with three columns, the first column remains the same, and it is the same among the three files (just an ordinal number).
However for 2nd and the 3rd column of this newly created file, I need to values of the 2nd column of the second file divided by the values of the 2nd column of the 1st file, also the values of the second column of the third file divided by the value of the 2nd column of the first file. In other words, the 2nd columns for the 2nd and 3rd file divided by the 2nd column of the first file.
e.g.:
Result.txt
1 3 10
2 3 7
3 2 4
Use a multidimensional matrix to store the values:
awk 'FNR==NR {a[$1]=$2; next}
{b[$1,ARGIND]=$2/a[$1]}
END {for (i in a)
print i,b[i,2],b[i,3]
}' f1 f2 f3
Test
$ awk 'FNR==NR {a[$1]=$2; next} {b[$1,ARGIND]=$2/a[$1]} END {for (i in a) print i,b[i,2],b[i,3]}' f1 f2 f3
1 3 10
2 3 7
3 2 4

How to extract files from a merged file

I want to separate a merged file into two files. The file:
file.dat
i =100
1 2 3
i =1
-1 -2 -3
i =101
1 2 3
i =102
1 2 3
i =103
1 2 3
i =2
-1 -2 -3
....
The mixed indices are
1,2,3,4, ...,99
and
100, 101, 102, 103,...,200.
The indices appear alternately, but there is no rule.
The data
1 2 3
and
-1 -2 -3
just denote the data block in each step.
Could you give an idea to separate the merged file into two files with respect to the indices?
If you just want the data blocks appended to two different files, depending on which group of indexes it belongs to, this should work:
# separate.awk
{
if ($1 == "i")
{
split($2,a,"=");
i = a[2];
}
if (i < 100)
print > "1-99.dat";
else
print > "100-200.dat"
}
$ awk -f separate.awk file.dat
$ cat 1-99.dat
i =1
-1 -2 -3
i =2
-1 -2 -3
$ cat 100-200.dat
i =100
1 2 3
i =101
1 2 3
i =102
1 2 3
i =103
1 2 3
This awk should do it for you:
awk -F= '/=/{f="a.txt";if($2>99)f="b.txt";next} {print >f}' file.dat
First, it sets the field separator to =. Then it checks if the line contains an equals sign, and if so, it is time to set the name of the output file to either "a.txt" or "b.txt" depending on the number after the equals sign. Then on subsequent records we just write to the file we last selected.

Count the repetitions of an element from a file with awk

I have a one column file composed by only integer as
1
1
4
3
3
2
I want to count how many time a number appear in the file. The output file should be:
1 2
2 1
3 2
4 1
Thanks
try this line:
awk '{a[$0]++}END{for(x in a)print x,a[x]}' file
awk '{tot[$0]++} END{for (n in tot) {print n,tot[n]}} ' numbers

Find the difference between two files

I have the following situation:
The file1.dat is like:
1 2
1 3
1 4
2 1
and the file2.dat is like:
1 2
2 1
2 3
3 4
I want to find the differences between the second file from the first. I tried wit grep -v -f file1 file2 but my real files are bigger than this two and when I tried with it the shell never ended is work.
The result should be:
2 3
3 4
The files are sorted and they have the same number of elements. Any way to find a solution with awk?
Seems like you want lines in file2 that are not in file1:
$ awk 'FNR==NR{a[$0];next}!($0 in a)' file1 file2
2 3
3 4
However it's simpler to use comm:
$ comm -13 file1 file2
2 3
3 4