Transpose column to row using awk - awk

In my data file, there is a certain column I am interested. So I used awk to print out only that column (awk '{print $4}') and put a condition to eliminate using "if". However, I could not figure out how to transpose every nth line on that column to new row.
input:
1
2
3
4
5
6
7
8
9
desired output:
1 4 7
2 5 8
3 6 9
I have checked out the other solutions and tried but none of them gave me what I want. I will appreciate if anyone could help me with that.

you can also use pr here
$ seq 9 | pr -3ts' '
1 4 7
2 5 8
3 6 9
$ seq 9 | pr -5ts' '
1 3 5 7 9
2 4 6 8
where the number indicates how many columns you need and the s option allows to specify the delimiter between columns

Using awk:
$ seq 9 |
awk ' {
i=((i=NR%3)?i:3) # index to hash a
a[i]=a[i] (a[i]==""?"":" ") $1 # space separate items to a[i]
}
END {
for(i=1;i<=3;i++) # from 1 to 3 (yes, hardcoded)
print a[i] # output
}'
Output:
1 4 7
2 5 8
3 6 9

The columns program from the autogen package can do this, e.g.:
seq 9 | columns --by-column -w1 -c3
Output:
1 4 7
2 5 8
3 6 9

Related

Transform a 1xA table into a BxC table in awk

I am trying to turn a 1xA table into a BxC table. Let's say A is 15, B is 3 and C is 5, hence after each 5 entries I want it to start a new row in the same table.
I have a rather tedious way that appears to get close be it misses some values after each 5. I think the issue is with RS, as a new line forgets the "space" needed by RS, but I tried changing this to something else in file.sum and still no luck. Perhaps there is a better way to do it, but feel this should work.
awk -v RS=" " '{getline a1; getline a2; getline a3; getline a4; getline a5; print a1,a2,a3,a4,a5}' OFS='\t' file.sum
file.sum (my 1xA):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Expected results (my BxC):
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
Actual results:
1 2 3 4 5
7 8 9 10 11
13 14 15 10 11
This should be one of the simplest solution:
xargs -n5 <file
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
To follow up on your awk. I do not like the getline so I always try to avoid it. Also loop slows down awk some.
But using RS=" " you can do like this:
awk -v RS=" " '{$1=$1} {printf NR%5==0?"%s\n":"%s ",$0}' file
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15
You can remove the {$1=$1}, but will then get a blank line at the end.
The NR%5==0 test if record is every 5th and insert newline when needed.
A tab version:
awk -v RS=" " '{$1=$1} {printf NR%5==0?"%s\n":"%s\t",$0}' file
1 2 3 4 5
6 7 8 9 10
11 12 13 14 15

Convert n number of rows to columns repeatedly using awk

My data is a large text file that consists of 12 rows repeating. It looks something like this:
{
1
2
3
4
5
6
7
8
9
10
}
repeating over and over. I want to turn every 12 rows into columns. so the data would look like this:
{ 1 2 3 4 5 6 7 8 9 10 }
{ 1 2 3 4 5 6 7 8 9 10 }
{ 1 2 3 4 5 6 7 8 9 10 }
I have found some examples of how to convert all the rows to columns using awk: awk '{printf("%s ", $0)}', but no examples of how to convert every 12 rows into columns and then repeat the process.
Here is an idiomatic way (read golfed down version of Tom Fenech's answer) of doing it with awk:
$ awk '{ORS=(NR%12?FS:RS)}1' file
{ 1 2 3 4 5 6 7 8 9 10 }
{ 1 2 3 4 5 6 7 8 9 10 }
{ 1 2 3 4 5 6 7 8 9 10 }
ORS stands for Output Record Separator. We set the ORS to FS which by default is space for every line except the 12th line where we set it to RS which is a newline by default.
You could use something like this:
awk '{printf "%s%s", $0, (NR%12?OFS:RS)}' file
NR%12 evaluates to true except when the record number is exactly divisible by 0. When it is true, the output field separator is used (which defaults to a space). When it is false, the record separator is used (by default, a newline).
Testing it out:
$ awk '{printf "%s%s", $0, (NR%12?OFS:RS)}' file
{ 1 2 3 4 5 6 7 8 9 10 }

AWK: print colums of a matrix using first column as reference

I want to read first colum in a matrix, and then print columns of this matrix using this first colum as reference. And example:
mat.txt
2 10 6 12 3
4 11 1 22 6
5 15 3 18 9
Using first column as reference, I would like to get columns 2, 4 and 5, and also put the value of first colum at the begining.
2 10 12 3
4 11 22 6
5 15 18 9
I try this, but doesn't work well:
awk 'FNR==NR{c++;cols[c]=$1;end}
{for(i=1;i&lt=c;i++) printf("%s%s",$(cols[i]+1),i&ltc ? OFS : "\n")}' mat.txt mat.txt
This may do:
awk 'FNR==NR {a[NR]=$1;next} {printf "%s ",a[FNR];for (i in a) printf "%s ",$(a[i]);print ""}' mat.txt{,}
2 10 12 3
4 11 22 6
5 15 18 9
The {,} make the file be used two times.

Lookup and Replace with two files in awk

I am trying to correct one file with another with a single line of AWK code. I am trying to take $1 from FILE2, look it up in FILE1, get the corresponding $3 and $4. After I set them as variables I want the program to stop evaluating FILE1, change $10 and $11 from FILE2 to the values of the variables, and print this out.
I am having trouble getting the awk to switch from FILE1 to FILE2 after I have extracted the variables. I've tried nextfile, but this resets the program and it tires to extract variables from FILE2, I set NR to the last Record, but it did not switch.
I am also doing a loop to get each line out of FILE1, but if that can be part of the script I am sure it would speed things up not having to reopen awk over and over again.
here is the parts I have figured out.
for file in `cut -f 1 FILE2`; do
awk -v a=$file '$1=a{s=$2;q=$4; ---GO TO FILE1---}{if ($1==a) {$10=s; $11=q; print 0;exit}' FILE1 FILE2 >> FILEOUT
done
a quick example set NOTE: Despite how I have this written, the two files are not in the same order and on the order of 8GB in size, so a little unwieldy to sort.
FILE1
A 12345 + AJD$JD
B 12504 + DKFJ#%
C 52042 + DSJTJE
FILE2
A 2 3 4 5 6 7 8 9 345 D$J
B 2 3 4 5 6 7 8 9 250 KFJ
C 2 3 4 5 6 7 8 9 204 SJT
OUTFILE
A 2 3 4 5 6 7 8 9 12345 AJD$JD
B 2 3 4 5 6 7 8 9 12504 DKFJ#%
C 2 3 4 5 6 7 8 9 52042 DSJTJE
This is the code I got to work based on Kent's answer below.
awk 'NR==FNR{a[$1]=$2" "$4;next}$1 in a{$9=$9" "a[$1]}{$10="";$11=""}2' f1 f2
try this one-liner:
kent$ awk 'NR==FNR{a[$1]=$2" "$4;next}$1 in a{NF-=2;$0=$0" "a[$1]}7' f1 f2
A 2 3 4 5 6 7 8 9 12345 AJD$JD
B 2 3 4 5 6 7 8 9 12504 DKFJ#%
C 2 3 4 5 6 7 8 9 52042 DSJTJE
No need to loop over the files repeatedly - just read one file and store the relevant fields in arrays keyed on $1, then go through the other file and use those arrays to look up the values you want to insert.
awk '(FILENAME=="FILE1"){y[$1]=$2;z[$1]=$4}; (FILENAME=="FILE2" && $1 in y){$10=y[$1];$11=z[$1];print $0}' FILE1 FILE2
That said, it sounds like you might have a use for the join command here rather than messing about with awk (the above script assumes all your $1/$2/$4 values will fit in memory).

Get identical rows

I have a file like this: (data.dat)
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 7
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 9
6 6 6 6 6 6 6 6 6 6 6 7 6 7
7 9 7 7 7 7 7 7 7 7 7 8 7 9
8 10 8 9 8 9 8 8 8 8 8 9
9 11 9 10 9 9 9 9 9 10
10 12 10 11 10 10 10 11
The odd columns are simple line counters (NR), the even columns are simple values. I would like to get those values, in which the second (or even) colum values are the same in all even columns, i.e. I should get this output:
1
2
3
9
I have already tried to make this line, but something is wrong:
awk '{arr1[$1]=$2;arr2[$3]=$4;arr3[$5]=$6;arr4[$7]=$8;arr5[$9]=$10;arr6[$11]=$12;arr7[$13]=$14;arr8[$15]=$16;}END{for(x in arr1) if(x in arr2 && x in arr3 && x in arr4 && x in arr5 && x in arr6 && x in arr7 && x in arr8) print arr1[x];}' data.dat | sort -n
Is there a better way, by the way?
UPDATE: The real problem is that the array indices are different. So, the arr[...] method does not work... :(
This would work -
awk '
BEGIN{x=0}
{if (x<NF) x=NF;for (i=2;i<=NF;i+=2) a[$i]++}
END{x=x/2;for (y in a) if (x==a[y]) print y}' INPUT_FILE
Explanation:
We set a variable x=0 in the BEGIN statement.
We use this variable to get to find out maximum number of fields (This is useful later).
We store value of every second column to an array and get their number of occurrences.
We divide the variable x by 2 to verify maximum number a value can occur in every second column.
If the occurrences of numbers in an array matches this variable it means they are present in every second column.
Test: with your sample file
[jaypal:~/Temp] awk '
BEGIN{x=0}
{if (x<NF) x=NF;for (i=2;i<=NF;i+=2) a[$i]++}
END{x=x/2;for (y in a) if (x==a[y]) print y}' file
2
3
9
1
You can either pipe the output to sort -n to get it in order or use this -
awk '
BEGIN{x=0}
{if (x<NF) x=NF;for (i=2;i<=NF;i+=2) a[$i]++}
END{x=x/2;for (i=1;i<=length(a);i++) if (x==a[i]) print i}' INPUT_FILE
Your example works with just a simple;
awk '{if($2==$4 && $2==$6 && $2==$8 && $2==$10 && $2==$12 && $2==$14 && $2==$16) print $1}' test.txt | sort -n
Any other requirements I'm missing?
EDIT: Apparently with the missing columns you added :) Try
awk '{if(NF>1) { found=1; for(i=4; i<NF+1; i+=2) { if($2!=$i) { found=0; } } } if(found) print $1}' test.txt | sort -n
In your input data row # 9 doesn't have all even columns same so not sure how you show 9 in your desired output. You can try following awk command to print 1st col for your task:
awk '{same=0; prev=-1; for(i=2;i<=NF;i+=2) {if (prev != -1 && prev != $i) {same=1; break;} else prev=$i;} if (same==0) print $1;}' awk '{same=0; prev=-1; for(i=2;i<=NF;i+=2) {if (prev != -1 && prev != $i) {same=1; break;} else prev=$i;} if (same==0) print $1;}'