Lining up columns using awk - awk

I'm trying to use awk to pull every 9th column out of a dataset with 210 columns. How can I get the columns to line up evenly if the data in each column do not contain the same number of characters?

Use for loop to skip over the column you don't need:
awk '{
for(i=1;i<=8;i++) {
printf "%s%s",$i,FS
}
for(i=10;i<=NF;i++) {
printf "%s%s",$i,(i==NF?RS:FS)
}
}' file
Note: Please set your field separator accordingly. You haven't stated what it is so I am going with default (that is space)
Sample Test: (skipping over 3rd column)
$ cat file
1 2 3 4 5 6
1 2 3 4 5 6
1 2 3 4 5 6
$ awk '{
for(i=1;i<=2;i++) {
printf "%s%s",$i,FS
}
for(i=4;i<=NF;i++) {
printf "%s%s",$i,(i==NF?RS:FS)
}
}' file
1 2 4 5 6
1 2 4 5 6
1 2 4 5 6

Related

Processing Multiple Files with Awk - Unwanted Lines Are Printing

I'm trying to write a script to process two files. I'm getting stuck on a small detail that I've been unsuccessful in troubleshooting - hoping someone here can help!
I have two text files, the first with a single column and seven rows (all fruits). The second text file has two columns and seventeen rows (first column numbers, second column colors). My script is below - I've eliminated the rest of it, because after some troubleshooting I've found that the problem is here.
This script...:
BEGIN { FS = " " }
NR==FNR
{
print NR "\t" FNR
}
END{}
When invoked with "awk -f script.awk file1.txt file2.txt", produces this output:
apples
1 1
oranges
2 2
pears
3 3
grapes
4 4
mango
5 5
kiwi
6 6
banana
7 7
8 1
9 2
10 3
11 4
(truncated)
I don't understand what's happening here. The fields of file1 (the fruits) are being printed, but the only print statement in this script is printing the values of NR and FNR, which, from what I understand, are always numbers.
When I comment out the NR==FNR statement,
BEGIN { FS = " " }
#NR==FNR
{
print NR "\t" FNR
}
END{}
The output is as expected:
1 1
2 2
2 2
3 3
4 4
5 5
6 6
7 7
8 1
9 2
10 3
11 4
(truncated)
I need to use the NR==FNR statement in order to process multiple files.
Does anyone know what's happening here? Seems like such a basic issue (it's only 3 statements!), but I can't get rid of the damn fruits.
NR==FNR by itself is a pattern without an action. And the default action is to print the line (e.g {print}).
So awk sees NR==FNR as a test for the first file (as you indicated) and when it succeeds it then uses the default action.
So your script is effectively:
BEGIN { FS = " " }
NR==FNR {
print
}
{
print NR "\t" FNR
}
END{}

AWK: Comparing two different columns in two files

I have these two files
File1:
9 8 6 8 5 2
2 1 7 0 6 1
3 2 3 4 4 6
File2: (which has over 4 million lines)
MN 1 0
JK 2 0
AL 3 90
CA 4 83
MK 5 54
HI 6 490
I want to compare field 6 of file1, and compare field 2 of file 2. If they match, then put field 3 of file2 at the end of file1
I've looked at other solutions but I can't get it to work correctly.
Desired output:
9 8 6 8 5 2 0
2 1 7 0 6 1 0
3 2 3 4 4 6 490
My attempt:
awk 'NR==FNR{a[$2]=$2;next}a[$6]{print $0,a[$6]}' file2 file1
program just hangs after that.
To print all lines in file1 with match if available:
$ awk 'FNR==NR{a[$2]=$3;next;} {print $0,a[$6];}' file2 file1
9 8 6 8 5 2 0
2 1 7 0 6 1 0
3 2 3 4 4 6 490
To print only the lines that have a match:
$ awk 'NR==FNR{a[$2]=$3;next} $6 in a {print $0,a[$6]}' file2 file1
9 8 6 8 5 2 0
2 1 7 0 6 1 0
3 2 3 4 4 6 490
Note that I replaced a[$2]=$2 with a[$2]=$3 and changed the test a[$6] (which is false if the value is zero) to $6 in a.
Your own attempt basically has two bugs as seen in #John1024's answer:
You use field 2 as both key and value in a, where you should be storing field 3 as the value (since you want to keep it for later), i.e., it should be a[$2] = $3.
The test a[$6] is false when the value in a is zero, even if it exists. The correct test is $6 in a.
Hence:
awk 'NR==FNR { a[$2]=$3; next } $6 in a {print $0, a[$6] }' file2 file1
However, there might be better approaches, but it is not clear from your specifications. For instance, you say that file2 has over 4 million lines, but it is unknown if there are also that many unique values for field 2. If yes, then a will also have that many entries in memory. And, you don't specify how long file1 is, or if its order must be preserved for output, or if every line (even without matches in file2) should be output.
If it is the case that file1 has many fewer lines than file2 has unique values for field 2, and only matching lines need to be output, and order does not need to be preserved, then you might wish to read file1 first…

Convert n number of rows to columns repeatedly using awk

My data is a large text file that consists of 12 rows repeating. It looks something like this:
{
1
2
3
4
5
6
7
8
9
10
}
repeating over and over. I want to turn every 12 rows into columns. so the data would look like this:
{ 1 2 3 4 5 6 7 8 9 10 }
{ 1 2 3 4 5 6 7 8 9 10 }
{ 1 2 3 4 5 6 7 8 9 10 }
I have found some examples of how to convert all the rows to columns using awk: awk '{printf("%s ", $0)}', but no examples of how to convert every 12 rows into columns and then repeat the process.
Here is an idiomatic way (read golfed down version of Tom Fenech's answer) of doing it with awk:
$ awk '{ORS=(NR%12?FS:RS)}1' file
{ 1 2 3 4 5 6 7 8 9 10 }
{ 1 2 3 4 5 6 7 8 9 10 }
{ 1 2 3 4 5 6 7 8 9 10 }
ORS stands for Output Record Separator. We set the ORS to FS which by default is space for every line except the 12th line where we set it to RS which is a newline by default.
You could use something like this:
awk '{printf "%s%s", $0, (NR%12?OFS:RS)}' file
NR%12 evaluates to true except when the record number is exactly divisible by 0. When it is true, the output field separator is used (which defaults to a space). When it is false, the record separator is used (by default, a newline).
Testing it out:
$ awk '{printf "%s%s", $0, (NR%12?OFS:RS)}' file
{ 1 2 3 4 5 6 7 8 9 10 }

How to append columns of data using awk

I have a file in this format:-
1 2 3 4
5 6 7 8
9 10 11 12
I need assistance to append the columns in a loop like this
1
5
9
2
6
10
...
this line should work with dynamic rows and columns
awk '{for(i=1;i<=NF;i++)a[NR][i]=$i}END{for(i=1;i<=NF;i++){for(j=1;j<=NR;j++)print a[j][i]; print ""}}' file
it looks better in this format:
awk '{for(i=1;i<=NF;i++)a[NR][i]=$i}
END{
for(i=1;i<=NF;i++){
for(j=1;j<=NR;j++)
print a[j][i]
print ""
}
}' file
with your example:
kent$ awk '{for(i=1;i<=NF;i++)a[NR][i]=$i}END{for(i=1;i<=NF;i++){for(j=1;j<=NR;j++)print a[j][i]; print ""}}' file
1
5
9
2
6
10
3
7
11
4
8
12

Get identical rows

I have a file like this: (data.dat)
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 7
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 9
6 6 6 6 6 6 6 6 6 6 6 7 6 7
7 9 7 7 7 7 7 7 7 7 7 8 7 9
8 10 8 9 8 9 8 8 8 8 8 9
9 11 9 10 9 9 9 9 9 10
10 12 10 11 10 10 10 11
The odd columns are simple line counters (NR), the even columns are simple values. I would like to get those values, in which the second (or even) colum values are the same in all even columns, i.e. I should get this output:
1
2
3
9
I have already tried to make this line, but something is wrong:
awk '{arr1[$1]=$2;arr2[$3]=$4;arr3[$5]=$6;arr4[$7]=$8;arr5[$9]=$10;arr6[$11]=$12;arr7[$13]=$14;arr8[$15]=$16;}END{for(x in arr1) if(x in arr2 && x in arr3 && x in arr4 && x in arr5 && x in arr6 && x in arr7 && x in arr8) print arr1[x];}' data.dat | sort -n
Is there a better way, by the way?
UPDATE: The real problem is that the array indices are different. So, the arr[...] method does not work... :(
This would work -
awk '
BEGIN{x=0}
{if (x<NF) x=NF;for (i=2;i<=NF;i+=2) a[$i]++}
END{x=x/2;for (y in a) if (x==a[y]) print y}' INPUT_FILE
Explanation:
We set a variable x=0 in the BEGIN statement.
We use this variable to get to find out maximum number of fields (This is useful later).
We store value of every second column to an array and get their number of occurrences.
We divide the variable x by 2 to verify maximum number a value can occur in every second column.
If the occurrences of numbers in an array matches this variable it means they are present in every second column.
Test: with your sample file
[jaypal:~/Temp] awk '
BEGIN{x=0}
{if (x<NF) x=NF;for (i=2;i<=NF;i+=2) a[$i]++}
END{x=x/2;for (y in a) if (x==a[y]) print y}' file
2
3
9
1
You can either pipe the output to sort -n to get it in order or use this -
awk '
BEGIN{x=0}
{if (x<NF) x=NF;for (i=2;i<=NF;i+=2) a[$i]++}
END{x=x/2;for (i=1;i<=length(a);i++) if (x==a[i]) print i}' INPUT_FILE
Your example works with just a simple;
awk '{if($2==$4 && $2==$6 && $2==$8 && $2==$10 && $2==$12 && $2==$14 && $2==$16) print $1}' test.txt | sort -n
Any other requirements I'm missing?
EDIT: Apparently with the missing columns you added :) Try
awk '{if(NF>1) { found=1; for(i=4; i<NF+1; i+=2) { if($2!=$i) { found=0; } } } if(found) print $1}' test.txt | sort -n
In your input data row # 9 doesn't have all even columns same so not sure how you show 9 in your desired output. You can try following awk command to print 1st col for your task:
awk '{same=0; prev=-1; for(i=2;i<=NF;i+=2) {if (prev != -1 && prev != $i) {same=1; break;} else prev=$i;} if (same==0) print $1;}' awk '{same=0; prev=-1; for(i=2;i<=NF;i+=2) {if (prev != -1 && prev != $i) {same=1; break;} else prev=$i;} if (same==0) print $1;}'