Get identical rows - awk

I have a file like this: (data.dat)
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 7
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 9
6 6 6 6 6 6 6 6 6 6 6 7 6 7
7 9 7 7 7 7 7 7 7 7 7 8 7 9
8 10 8 9 8 9 8 8 8 8 8 9
9 11 9 10 9 9 9 9 9 10
10 12 10 11 10 10 10 11
The odd columns are simple line counters (NR), the even columns are simple values. I would like to get those values, in which the second (or even) colum values are the same in all even columns, i.e. I should get this output:
1
2
3
9
I have already tried to make this line, but something is wrong:
awk '{arr1[$1]=$2;arr2[$3]=$4;arr3[$5]=$6;arr4[$7]=$8;arr5[$9]=$10;arr6[$11]=$12;arr7[$13]=$14;arr8[$15]=$16;}END{for(x in arr1) if(x in arr2 && x in arr3 && x in arr4 && x in arr5 && x in arr6 && x in arr7 && x in arr8) print arr1[x];}' data.dat | sort -n
Is there a better way, by the way?
UPDATE: The real problem is that the array indices are different. So, the arr[...] method does not work... :(

This would work -
awk '
BEGIN{x=0}
{if (x<NF) x=NF;for (i=2;i<=NF;i+=2) a[$i]++}
END{x=x/2;for (y in a) if (x==a[y]) print y}' INPUT_FILE
Explanation:
We set a variable x=0 in the BEGIN statement.
We use this variable to get to find out maximum number of fields (This is useful later).
We store value of every second column to an array and get their number of occurrences.
We divide the variable x by 2 to verify maximum number a value can occur in every second column.
If the occurrences of numbers in an array matches this variable it means they are present in every second column.
Test: with your sample file
[jaypal:~/Temp] awk '
BEGIN{x=0}
{if (x<NF) x=NF;for (i=2;i<=NF;i+=2) a[$i]++}
END{x=x/2;for (y in a) if (x==a[y]) print y}' file
2
3
9
1
You can either pipe the output to sort -n to get it in order or use this -
awk '
BEGIN{x=0}
{if (x<NF) x=NF;for (i=2;i<=NF;i+=2) a[$i]++}
END{x=x/2;for (i=1;i<=length(a);i++) if (x==a[i]) print i}' INPUT_FILE

Your example works with just a simple;
awk '{if($2==$4 && $2==$6 && $2==$8 && $2==$10 && $2==$12 && $2==$14 && $2==$16) print $1}' test.txt | sort -n
Any other requirements I'm missing?
EDIT: Apparently with the missing columns you added :) Try
awk '{if(NF>1) { found=1; for(i=4; i<NF+1; i+=2) { if($2!=$i) { found=0; } } } if(found) print $1}' test.txt | sort -n

In your input data row # 9 doesn't have all even columns same so not sure how you show 9 in your desired output. You can try following awk command to print 1st col for your task:
awk '{same=0; prev=-1; for(i=2;i<=NF;i+=2) {if (prev != -1 && prev != $i) {same=1; break;} else prev=$i;} if (same==0) print $1;}' awk '{same=0; prev=-1; for(i=2;i<=NF;i+=2) {if (prev != -1 && prev != $i) {same=1; break;} else prev=$i;} if (same==0) print $1;}'

Related

Select current and previous line if certain value is found

To figure out my problem, I subtract column 3 and create a new column 5 with new values, then I print the previous and current line if the value found is equal to 25 in column 5.
Input file
1 1 35 1
2 5 50 1
2 6 75 1
4 7 85 1
5 8 100 1
6 9 125 1
4 1 200 1
I tried
awk '{$5 = $3 - prev3; prev3 = $3; print $0}' file
output
1 1 35 1 35
2 5 50 1 15
2 6 75 1 25
4 7 85 1 10
5 8 100 1 15
6 9 125 1 25
4 1 200 1 75
Desired Output
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
Thanks in advance
you're almost there, in addition to previous $3, keep the previous $0 and only print when condition is satisfied.
$ awk '{$5=$3-p3} $5==25{print p0; print} {p0=$0;p3=$3}' file
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
this can be further golfed to
$ awk '25==($5=$3-p3){print p0; print} {p0=$0;p3=$3}' file
check the newly computed field $5 whether equal to 25. If so print the previous line and current line. Save the previous line and previous $3 for the computations in the next line.
You are close to the answer, just pipe it another awk and print it
awk '{$5 = $3 - prev3; prev3 = $3; print $0}' oxxo.txt | awk ' { curr=$0; if($5==25) { print prev;print curr } prev=curr } '
with Inputs:
$ cat oxxo.txt
1 1 35 1
2 5 50 1
2 6 75 1
4 7 85 1
5 8 100 1
6 9 125 1
4 1 200 1
$ awk '{$5 = $3 - prev3; prev3 = $3; print $0}' oxxo.txt | awk ' { curr=$0; if($5==25) { print prev;print curr } prev=curr } '
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
$
Could you please try following.
awk '$3-prev==25{print line ORS $0,$3} {$(NF+1)=$3-prev;prev=$3;line=$0}' Input_file | column -t
Here's one:
$ awk '{$5=$3-q;t=p;p=$0;q=$3;$0=t ORS $0}$10==25' file
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
Explained:
$ awk '{
$5=$3-q # subtract
t=p # previous to temp
p=$0 # store previous for next round
q=$3 # store subtract value for next round
$0=t ORS $0 # prepare record for output
}
$10==25 # output if equals
' file
No checking for duplicates so you might get same record printed twice. Easiest way to fix is to pipe the output to uniq.

Transpose column to row using awk

In my data file, there is a certain column I am interested. So I used awk to print out only that column (awk '{print $4}') and put a condition to eliminate using "if". However, I could not figure out how to transpose every nth line on that column to new row.
input:
1
2
3
4
5
6
7
8
9
desired output:
1 4 7
2 5 8
3 6 9
I have checked out the other solutions and tried but none of them gave me what I want. I will appreciate if anyone could help me with that.
you can also use pr here
$ seq 9 | pr -3ts' '
1 4 7
2 5 8
3 6 9
$ seq 9 | pr -5ts' '
1 3 5 7 9
2 4 6 8
where the number indicates how many columns you need and the s option allows to specify the delimiter between columns
Using awk:
$ seq 9 |
awk ' {
i=((i=NR%3)?i:3) # index to hash a
a[i]=a[i] (a[i]==""?"":" ") $1 # space separate items to a[i]
}
END {
for(i=1;i<=3;i++) # from 1 to 3 (yes, hardcoded)
print a[i] # output
}'
Output:
1 4 7
2 5 8
3 6 9
The columns program from the autogen package can do this, e.g.:
seq 9 | columns --by-column -w1 -c3
Output:
1 4 7
2 5 8
3 6 9

Convert n number of rows to columns repeatedly using awk

My data is a large text file that consists of 12 rows repeating. It looks something like this:
{
1
2
3
4
5
6
7
8
9
10
}
repeating over and over. I want to turn every 12 rows into columns. so the data would look like this:
{ 1 2 3 4 5 6 7 8 9 10 }
{ 1 2 3 4 5 6 7 8 9 10 }
{ 1 2 3 4 5 6 7 8 9 10 }
I have found some examples of how to convert all the rows to columns using awk: awk '{printf("%s ", $0)}', but no examples of how to convert every 12 rows into columns and then repeat the process.
Here is an idiomatic way (read golfed down version of Tom Fenech's answer) of doing it with awk:
$ awk '{ORS=(NR%12?FS:RS)}1' file
{ 1 2 3 4 5 6 7 8 9 10 }
{ 1 2 3 4 5 6 7 8 9 10 }
{ 1 2 3 4 5 6 7 8 9 10 }
ORS stands for Output Record Separator. We set the ORS to FS which by default is space for every line except the 12th line where we set it to RS which is a newline by default.
You could use something like this:
awk '{printf "%s%s", $0, (NR%12?OFS:RS)}' file
NR%12 evaluates to true except when the record number is exactly divisible by 0. When it is true, the output field separator is used (which defaults to a space). When it is false, the record separator is used (by default, a newline).
Testing it out:
$ awk '{printf "%s%s", $0, (NR%12?OFS:RS)}' file
{ 1 2 3 4 5 6 7 8 9 10 }

AWK: print colums of a matrix using first column as reference

I want to read first colum in a matrix, and then print columns of this matrix using this first colum as reference. And example:
mat.txt
2 10 6 12 3
4 11 1 22 6
5 15 3 18 9
Using first column as reference, I would like to get columns 2, 4 and 5, and also put the value of first colum at the begining.
2 10 12 3
4 11 22 6
5 15 18 9
I try this, but doesn't work well:
awk 'FNR==NR{c++;cols[c]=$1;end}
{for(i=1;i&lt=c;i++) printf("%s%s",$(cols[i]+1),i&ltc ? OFS : "\n")}' mat.txt mat.txt
This may do:
awk 'FNR==NR {a[NR]=$1;next} {printf "%s ",a[FNR];for (i in a) printf "%s ",$(a[i]);print ""}' mat.txt{,}
2 10 12 3
4 11 22 6
5 15 18 9
The {,} make the file be used two times.

awk solution for comparing current line to next line and printing one of the lines based on a condition

I have an input file that looks like this (first column is a location number and the second is a count that should increase over time):
1 0
1 2
1 6
1 7
1 7
1 8
1 7
1 7
1 9
1 9
1 10
1 10
1 9
1 10
1 10
1 10
1 10
1 10
1 10
1 9
1 10
1 10
1 10
1 10
1 10
1 10
and I'd like to fix it look like this (substitute counts that decreased with the previous count):
1 0
1 2
1 6
1 7
1 7
1 8
1 8
1 8
1 9
1 9
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
I've been trying to use awk for this, but am stumbling with getline since I can't seem to figure out how to reset the line number (NR?) so it'll read each line and it's next line, not two lines at a time. This is the code I have so far, any ideas?
awk '{a=$1; b=$2; getline; c=$1; d=$2; if (a==c && b<=d) print a"\t"b; else print c"\t"d}' original.txt > fixed.txt
Also, this is the output I'm currently getting:
1 0
1 6
1 7
1 7
1 9
1 10
1 9
1 10
1 10
1 9
1 10
1 10
1 10
Perhaps all you want is:
awk '$2 < p { $2 = p } { p = $2 } 1' input-file
This will fail on the first line if the value in the second column is negative, so do:
awk 'NR > 1 && $2 < p ...'
This simply sets the second column to the previous value if the current value is less, then stores the current value in the variable p, then prints the line.
Note that this also slightly modifies the spacing of the output on lines that change. If your input is tab-separated, you might want to do:
awk 'NR > 1 && $2 < p { $2 = p } { p = $2 } 1' OFS=\\t input-file
This script will do what you like:
{
if ($2 < prev_count)
$2 = prev_count
else
prev_count = $2
printf("%d %d\n", $1, $2)
}
This is a verbose version to be easily readable :)