Split a column in lines after every 5 entries - awk - awk

I have a file that looks like the following
1
1
1
1
1
1
12
2
2
2
2
2
2
2
3
4
What I want to do is convert this column in multiple rows. Each new line/row should start after 5 entries, so that the output will like like this
1 1 1 1 1
1 12 2 2 2
2 2 2 2 3
4
I tried to achieve that by using
awk '{printf "%s" (NR%5==0?"RS:FS),$1}' file
but I get the following error
awk: line 1: runaway string constant "RS:FS),$1} ...
Any idea on how to achieve the desired output?

Maybe this awk one liner can help.
awk '{if (NR%5==0){a=a $0" ";print a; a=""} else a=a $0" "}END{print}' file
Output:
1 1 1 1 1
1 12 2 2 2
2 2 2 2 3
4
Longer awk:
{
if (NR%5==0)
{
a=a $0" ";
print a;
a="";
}
else
{
a=a $0" ";
}
}
END
{
print
}

Slightly different approach, still using awk:
$ awk '{if (NR%5) {ORS=""} else {ORS="\n"}{print " "$0}}' input.txt
1 1 1 1 1
1 12 2 2 2
2 2 2 2 3
4
Using perl:
$ perl -p -e 's/\n/ / if $.%5' input.txt
1 1 1 1 1
1 12 2 2 2
2 2 2 2 3
4

No need to complicate...
$ pr -5ats' ' <file
1 1 1 1 1
1 12 2 2 2
2 2 2 2 3
4

Related

Select current and previous line if certain value is found

To figure out my problem, I subtract column 3 and create a new column 5 with new values, then I print the previous and current line if the value found is equal to 25 in column 5.
Input file
1 1 35 1
2 5 50 1
2 6 75 1
4 7 85 1
5 8 100 1
6 9 125 1
4 1 200 1
I tried
awk '{$5 = $3 - prev3; prev3 = $3; print $0}' file
output
1 1 35 1 35
2 5 50 1 15
2 6 75 1 25
4 7 85 1 10
5 8 100 1 15
6 9 125 1 25
4 1 200 1 75
Desired Output
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
Thanks in advance
you're almost there, in addition to previous $3, keep the previous $0 and only print when condition is satisfied.
$ awk '{$5=$3-p3} $5==25{print p0; print} {p0=$0;p3=$3}' file
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
this can be further golfed to
$ awk '25==($5=$3-p3){print p0; print} {p0=$0;p3=$3}' file
check the newly computed field $5 whether equal to 25. If so print the previous line and current line. Save the previous line and previous $3 for the computations in the next line.
You are close to the answer, just pipe it another awk and print it
awk '{$5 = $3 - prev3; prev3 = $3; print $0}' oxxo.txt | awk ' { curr=$0; if($5==25) { print prev;print curr } prev=curr } '
with Inputs:
$ cat oxxo.txt
1 1 35 1
2 5 50 1
2 6 75 1
4 7 85 1
5 8 100 1
6 9 125 1
4 1 200 1
$ awk '{$5 = $3 - prev3; prev3 = $3; print $0}' oxxo.txt | awk ' { curr=$0; if($5==25) { print prev;print curr } prev=curr } '
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
$
Could you please try following.
awk '$3-prev==25{print line ORS $0,$3} {$(NF+1)=$3-prev;prev=$3;line=$0}' Input_file | column -t
Here's one:
$ awk '{$5=$3-q;t=p;p=$0;q=$3;$0=t ORS $0}$10==25' file
2 5 50 1 15
2 6 75 1 25
5 8 100 1 15
6 9 125 1 25
Explained:
$ awk '{
$5=$3-q # subtract
t=p # previous to temp
p=$0 # store previous for next round
q=$3 # store subtract value for next round
$0=t ORS $0 # prepare record for output
}
$10==25 # output if equals
' file
No checking for duplicates so you might get same record printed twice. Easiest way to fix is to pipe the output to uniq.

awk - insert a row when value in column changes

Using awk I would like to insert a row whenever the value in the second column changes.
I have:
1 3
2 3
3 1
4 1
5 2
I would like to get:
1 3
2 3
>
3 1
4 1
>
5 2
Could anyone point me in the right direction how this can be achieved in one file?
You can use this awk:
awk 'NR==1{prev=$2; print; next} prev!=$2{print ">"} {prev=$2}1' file

awk solution for comparing current line to next line and printing one of the lines based on a condition

I have an input file that looks like this (first column is a location number and the second is a count that should increase over time):
1 0
1 2
1 6
1 7
1 7
1 8
1 7
1 7
1 9
1 9
1 10
1 10
1 9
1 10
1 10
1 10
1 10
1 10
1 10
1 9
1 10
1 10
1 10
1 10
1 10
1 10
and I'd like to fix it look like this (substitute counts that decreased with the previous count):
1 0
1 2
1 6
1 7
1 7
1 8
1 8
1 8
1 9
1 9
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
1 10
I've been trying to use awk for this, but am stumbling with getline since I can't seem to figure out how to reset the line number (NR?) so it'll read each line and it's next line, not two lines at a time. This is the code I have so far, any ideas?
awk '{a=$1; b=$2; getline; c=$1; d=$2; if (a==c && b<=d) print a"\t"b; else print c"\t"d}' original.txt > fixed.txt
Also, this is the output I'm currently getting:
1 0
1 6
1 7
1 7
1 9
1 10
1 9
1 10
1 10
1 9
1 10
1 10
1 10
Perhaps all you want is:
awk '$2 < p { $2 = p } { p = $2 } 1' input-file
This will fail on the first line if the value in the second column is negative, so do:
awk 'NR > 1 && $2 < p ...'
This simply sets the second column to the previous value if the current value is less, then stores the current value in the variable p, then prints the line.
Note that this also slightly modifies the spacing of the output on lines that change. If your input is tab-separated, you might want to do:
awk 'NR > 1 && $2 < p { $2 = p } { p = $2 } 1' OFS=\\t input-file
This script will do what you like:
{
if ($2 < prev_count)
$2 = prev_count
else
prev_count = $2
printf("%d %d\n", $1, $2)
}
This is a verbose version to be easily readable :)

Get identical rows

I have a file like this: (data.dat)
1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1
2 2 2 2 2 2 2 2 2 2 2 2 2 2 2 2
3 3 3 3 3 3 3 3 3 3 3 3 3 3 3 3
4 4 4 4 4 4 4 4 4 4 4 4 4 4 4 7
5 5 5 5 5 5 5 5 5 5 5 5 5 5 5 9
6 6 6 6 6 6 6 6 6 6 6 7 6 7
7 9 7 7 7 7 7 7 7 7 7 8 7 9
8 10 8 9 8 9 8 8 8 8 8 9
9 11 9 10 9 9 9 9 9 10
10 12 10 11 10 10 10 11
The odd columns are simple line counters (NR), the even columns are simple values. I would like to get those values, in which the second (or even) colum values are the same in all even columns, i.e. I should get this output:
1
2
3
9
I have already tried to make this line, but something is wrong:
awk '{arr1[$1]=$2;arr2[$3]=$4;arr3[$5]=$6;arr4[$7]=$8;arr5[$9]=$10;arr6[$11]=$12;arr7[$13]=$14;arr8[$15]=$16;}END{for(x in arr1) if(x in arr2 && x in arr3 && x in arr4 && x in arr5 && x in arr6 && x in arr7 && x in arr8) print arr1[x];}' data.dat | sort -n
Is there a better way, by the way?
UPDATE: The real problem is that the array indices are different. So, the arr[...] method does not work... :(
This would work -
awk '
BEGIN{x=0}
{if (x<NF) x=NF;for (i=2;i<=NF;i+=2) a[$i]++}
END{x=x/2;for (y in a) if (x==a[y]) print y}' INPUT_FILE
Explanation:
We set a variable x=0 in the BEGIN statement.
We use this variable to get to find out maximum number of fields (This is useful later).
We store value of every second column to an array and get their number of occurrences.
We divide the variable x by 2 to verify maximum number a value can occur in every second column.
If the occurrences of numbers in an array matches this variable it means they are present in every second column.
Test: with your sample file
[jaypal:~/Temp] awk '
BEGIN{x=0}
{if (x<NF) x=NF;for (i=2;i<=NF;i+=2) a[$i]++}
END{x=x/2;for (y in a) if (x==a[y]) print y}' file
2
3
9
1
You can either pipe the output to sort -n to get it in order or use this -
awk '
BEGIN{x=0}
{if (x<NF) x=NF;for (i=2;i<=NF;i+=2) a[$i]++}
END{x=x/2;for (i=1;i<=length(a);i++) if (x==a[i]) print i}' INPUT_FILE
Your example works with just a simple;
awk '{if($2==$4 && $2==$6 && $2==$8 && $2==$10 && $2==$12 && $2==$14 && $2==$16) print $1}' test.txt | sort -n
Any other requirements I'm missing?
EDIT: Apparently with the missing columns you added :) Try
awk '{if(NF>1) { found=1; for(i=4; i<NF+1; i+=2) { if($2!=$i) { found=0; } } } if(found) print $1}' test.txt | sort -n
In your input data row # 9 doesn't have all even columns same so not sure how you show 9 in your desired output. You can try following awk command to print 1st col for your task:
awk '{same=0; prev=-1; for(i=2;i<=NF;i+=2) {if (prev != -1 && prev != $i) {same=1; break;} else prev=$i;} if (same==0) print $1;}' awk '{same=0; prev=-1; for(i=2;i<=NF;i+=2) {if (prev != -1 && prev != $i) {same=1; break;} else prev=$i;} if (same==0) print $1;}'

Selecting first nth rows by groups using AWK

I have the following file with 4 fields. There are 3 groups in field 2, and the 4th field consists 0's and 1's.
The first field is just the index.
I like to use AWK to do the following task
Select the first 3 rows of group 1 (Note that group 1 has only 2 rows). The number of rows is based on the number of 1's found in the 4th field times 3.
Select the first 6 rows of group 2. The number of rows is based on the number of 1's found in the 4th field times 3.
Select the first 9 rows of group 3. The number of rows is based on the number of 1's found in the 4th field times 3.
So 17 rows are selected for the output file.
Thank you for your help.
Input
1 1 TN1148 1
2 1 S52689 0
3 2 TA2081 1
4 2 TA2592 1
5 2 TA4011 0
6 2 TA4246 0
7 2 TA4275 0
8 2 TB0159 0
9 2 TB0392 0
10 3 TB0454 1
11 3 TB0496 1
12 3 TB1181 1
13 3 TC0027 0
14 3 TC1340 0
15 3 TC2247 0
16 3 TC3094 0
17 3 TD0106 0
18 3 TD1146 0
19 3 TD1796 0
20 3 TD3587 0
Output
1 1 TN1148 1
2 1 S52689 0
3 2 TA2081 1
4 2 TA2592 1
5 2 TA4011 0
6 2 TA4246 0
7 2 TA4275 0
8 2 TB0159 0
10 3 TB0454 1
11 3 TB0496 1
12 3 TB1181 1
13 3 TC0027 0
14 3 TC1340 0
15 3 TC2247 0
16 3 TC3094 0
17 3 TD0106 0
18 3 TD1146 0
The key to this awk program is to pass the input file in twice: Once to count how many rows you want and once to print them.
awk '
NR == FNR {wanted_rows[$2] += 3*$4; next}
--wanted_rows[$2] >= 0 {print}
' input_file.txt input_file.txt
#!/usr/bin/awk -f
# by Dennis Williamson - 2010-12-02
# for http://stackoverflow.com/questions/4334167/selecting-first-nth-rows-by-groups-using-awk
$2 == prev {
count += $4
groupcount++
array[idx++] = $0
}
$2 != prev {
if (NR > 1) {
for (i=0; i<count*3; i++) {
if (i == groupcount) break
print array[i]
}
}
prev = $2
count = 1
groupcount = 1
split("", array) # delete the array
idx = 0
array[idx++] = $0
}
END {
for (i=0; i<count*3; i++) {
if (i == groupcount) break
print array[i]
}
}