Fill in rows missing row labels for repeated values - awk

I have the following input:
printf "Name\tArea\tNumber\tA\tB\tC\n\t\t\tA\tB\tC\n\t\t\tA\tB\tC\n"
Name Area Number A B C
A B C
A B C
If first 3 columns are blank,
I want to print the previous 3 columns along with the data of the new line,
else print the line as is. Output should look like this:
printf "Name\tArea\tNumber\tA\tB\tC\nName\tArea\tNumber\tA\tB\tC\nName\tArea\tNumber\tA\tB\tC\n"
Name Area Number A B C
Name Area Number A B C
Name Area Number A B C

My interpretation of the question is that fields 1 to 3 can appear anywhere in the file, with values possibly different from the ones they had previously. So the idea would be to reproduce the last fields 1 to 3 seen so far, so that the input:
Name Area Number A B C
A B D
F G T
Nam Zig BBA U Z x
A B D
would produce the output:
Name Area Number A B C
Name Area Number A B D
Name Area Number F G T
Nam Zig BBA U Z x
Nam Zig BBA A B D
So I propose:
awk 'BEGIN {FS=OFS="\t"; hd1=hd2=hd3=""} $1=="" {$1=hd1;$2=hd2;$3=hd3; print; next} {hd1=$1;hd2=$2;hd3=$3; print}' yourfile
ok, I only checked the non-nullity of $1, but we could easily adapt to add only the missing fields.

I would solve this as a fixed width problem. A GNU awk solution:
$ awk '$1~/^ +$/{sub($1,h)}{h=$1}1' FIELDWIDTHS=23 file
Name Area Number A B C
Name Area Number A B D
Name Area Number F G T
Nam Zig BBA U Z x
Nam Zig BBA A B D
Just change the FIELDWIDTHS variable to match your data as needed.
The other more verbose approach is to iterate over each of the fields that could be missing:
$ awk '{for(i=1;i<=c;i++)if($i=="")$i=h[i]}{for(i=1;i<=c;i++)h[i]=$i}1' c=3 FS='\t' OFS='\t' file
Name Area Number A B C
Name Area Number A B D
Name Area Number F G T
Nam Zig BBA U Z x
Nam Zig BBA A B D
Just change c to the value of missing columns you want to check.

Related

Count number of occurrences of a number larger than x from every raw

I have a file with multiple rows and 26 columns. I want to count the number of occurrences of values that are higher than 0 (I guess is also valid different from 0) in each row (excluding the first two columns). The file looks like this:
X Y Sample1 Sample2 Sample3 .... Sample24
a a1 0 7 0 0
b a2 2 8 0 0
c a3 0 3 15 3
d d3 0 0 0 0
I would like to have an output file like this:
X Y Result
a a1 1
b b1 2
c c1 3
d d1 0
awk or sed would be good.
I saw a similar question but in that case the columns were summed and the desired output was different.
awk 'NR==1{printf "X\tY\tResult%s",ORS} # Printing the header
NR>1{
count=0; # Initializing count for each row to zero
for(i=3;i<=NF;i++){ #iterating from field 3 to end, NF is #fields
if($i>0){ #$i expands to $3,$4 and so which are the fields
count++; # Incrementing if the condition is true.
}
};
printf "%s\t%s\t%s%s",$1,$2,count,ORS # For each row print o/p
}' file
should do that
another awk
$ awk '{if(NR==1) c="Result";
else for(i=3;i<=NF;i++) c+=($i>0);
print $1,$2,c; c=0}' file | column -t
X Y Result
a a1 1
b a2 2
c a3 3
d d3 0
$ awk '{print $1, $2, (NR>1 ? gsub(/ [1-9]/,"") : "Result")}' file
X Y Result
a a1 1
b a2 2
c a3 3
d d3 0

matching rows and fields from two files

I would like to match the record number in one file will the same field number in another file:
file1:
1
3
5
4
3
1
5
file2:
A B C D E F G
H I J J K L M
N O P Q R S T
I would like to use the record numbers corresponding to 5 in the first file to obtain the corresponding fields in the second file. Desired output:
C G
J M
P T
So far, I've done:
awk '{ if ($1=="5") print NR }' file1 > temp
for i in $(cat temp); do
awk '{ print $"'${i}'" }' file2
done
But get the output:
C
J
P
G
M
T
I would like to have this in the format of the desired output above, but can't get it to work. Perhaps using prinf or awk for-loop might work, but I have had no success.
Thank you all.
awk 'NR==FNR{if($1==5)a[NR];next}{for(i in a){printf $i" "}print ""}' a b
C G
J M
P T

How to replace 1 or 2 with M and F in command line

I have a file with four columns( four fields). One column is sex coded as 1 or 2. How could I use awk command to replace 1 by M and 2 by F?
awk '$3=$3==1?"M":"F"' file
for example:
kent$ echo "a b 1 c
c d 2 x"|awk '$3=$3==1?"M":"F"'
a b M c
c d F x
in this example, your 3rd column is 1 or 2, you just change the $3 to the right column index.
It is always good to show an example of your input, also with expected output.

awk: delete first and last entry of comma-separated field

I have a 4 column data that looks something like the following:
a 1 g 1,2,3,4,5,6,7
b 2 g 3,5,3,2,6,4,3,2
c 3 g 5,2,6,3,4
d 4 g 1,5,3,6,4,7
I am trying to delete first two numbers and the last two numbers on entire fourth column so the output looks like the following
a 1 g 3,4,5
b 2 g 3,2,6,4
c 3 g 6
d 4 g 3,6
Can someone give me a help? I would appreciate it.
You can use this:
$ awk '{n=split($4, a, ","); for (i=3; i<=n-2; i++) t=t""a[i](i==n-2?"":","); print $1, $2, $3, t; t=""}' file
a 1 g 3,4,5
b 2 g 3,2,6,4
c 3 g 6
d 4 g 3,6
Explanation
n=split($4, a, ",") slices the 4th field in pieces, based on comma as delimiter. As split() returns the number of pieces we got, we store it in n to work with it later on.
for (i=3; i<=n-2; i++) t=t""a[i](i==n-2?"":",") stores in t the last field, looping through all the slices.
print $1, $2, $3, t; t="" prints the new output and blanks the variable t.
This will work for your posted sample input:
$ awk '{gsub(/^([^,]+,){2}|(,[^,]+){2}$/,"",$NF)}1' file
a 1 g 3,4,5
b 2 g 3,2,6,4
c 3 g 6
d 4 g 3,6
If you have cases where there's less than 4 commas in your 4th field then update your question to show how those should be handled.
This uses bash array manipulation. It may be a little ... gnarly:
while read -a fields; do # read the fields for each line
IFS=, read -a values <<< "${fields[3]}" # split the last field on comma
new=("${values[#]:2:${#values[#]}-4}") # drop the first 2 and last fields
fields[3]=$(IFS=,; echo "${new[*]}") # join the new list on comma
printf "%s\t" "${fields[#]}"; echo # print the new line
done <<END
a 1 g 1,2,3,4,5,6,7
b 2 g 3,5,3,2,6,4,3,2
c 3 g 5,2,6,3,4
d 4 g 1,5,3,6,4,7
END
a 1 g 3,4,5
b 2 g 3,2,6,4
c 3 g 6
d 4 g 3,6

Clubiing two lines into one using AWK

I have a file with following format
Time Number Val
x 1 y
x 1 y
a 1 z
b 1 m
b 2 m
I want to club lines with same value, the final file should be something like this
Time Number Val
x 2 y
a 1 z
b 3 m
How to do this using awk?
You can use awk's associative array:
awk 'NR==1{print $0} NR!=1{a[$1]+=$2; b[$1]=$3;} \
END{ for ( i in a) print i, a[i], b[i]}' file
For your sample input, it prints:
Time Number Val
x 2 y
a 1 z
b 3 m
Count all duplicate Time and Val combinations:
awk 'NR>1{a[$1,$3]+=$2;next}$1=$1;END{for(k in a){split(k,s,SUBSEP);print s[1],a[k],s[2]}}' OFS="\t" file
Time Number Val
a 1 z
b 3 m
x 2 y