My input looks like this:
A|B|C
1|2|3
4|5|6
Using awk, I am trying to get:
A|1
B|2
C|3
A|4
B|5
C|6
My code:
gawk '
BEGIN{FS=OFS="|"}
NR==1{
for(i=1; i<=NF; i++){
x_i=$i
}
}
NR>1{
for(i=1; i<=NF; i++){
print x_i FS $i
}
}' input
But it keeps only the last iteration of the NR==1 block even if I use the same loop in NR>1 bock:
C|1
C|2
C|3
C|4
C|5
C|6
Any trick?
EDIT
Thanks to Jose, I needed to change x_i by x[i].
What about if using the same input, I would need to output:
A;B;C|1|2|3
A;B;C|4|5|6
$ awk 'BEGIN{FS=OFS="|"} NR==1{split($0,h);next} {for (i=1;i<=NF;i++) print h[i], $i}' file
A|1
B|2
C|3
A|4
B|5
C|6
$ awk 'BEGIN{FS=OFS="|"} NR==1{gsub(/\|/,";");h=$0;next} {print h, $0}' file
A;B;C|1|2|3
A;B;C|4|5|6
Read Effective Awk Programming, 4th Edition, by Arnold Robbins.
you can try,
awk 'BEGIN{FS=OFS="|"}
NR==1{for(i=1; i<=NF; ++i) d[i]=$i; next}
{for(i=1; i<=NF; ++i) print d[i], $i}
' input
you get
A|1
B|2
C|3
A|4
B|5
C|6
Important Note
your logic is correct, only x[i] instead of x_i
gawk '
BEGIN{FS=OFS="|"}
NR==1{
for(i=1; i<=NF; i++){
x[i]=$i
}
}
NR>1{
for(i=1; i<=NF; i++){
print x[i] FS $i
}
}' input
Here is another using split and for:
$ awk 'NR==1 { split($0,a,"|") }
NR>1 { n=split($0,b,"|");
for(i=1;i<=n;i++)
print a[i] "|" b[i] }' file
A|1
B|2
C|3
A|4
B|5
C|6
I have a text file with n number of rows (separated by commas) and columns and I want to find average of each column, excluding empty field.
A sample input looks like:
1,2,3
4,,6
,7,
The desired output is:
2.5, 4.5, 4.5
I tried with
awk -F',' '{ for(i=1;i<=NF;i++) sum[i]=sum[i]+$i;if(max < NF)max=NF;};END { for(j=1;j<=max;j++) printf "%d\t",sum[j]/max;}' input
But it treats consecutive delimiters as one and mixing columns.
Any help is much appreciated.
You can use this one-liner:
$ awk -F, '{for(i=1; i<=NF; i++) {a[i]+=$i; if($i!="") b[i]++}}; END {for(i=1; i<=NF; i++) printf "%s%s", a[i]/b[i], (i==NF?ORS:OFS)}' foo
2.5 4.5 4.5
Otherwise, you can save this in a file script.awk and run awk -f script.awk your_file:
{
for(i=1; i<=NF; i++) {
a[i]+=$i
if($i!="")
b[i]++}
}
END {
for(i=1; i<=NF; i++)
printf "%s%s", a[i]/b[i], (i==NF?ORS:OFS)
}
I have a file with data as follows
col1,col2,col3,col4,col5,col6,col7,col8,col9,col10
1,2,3,4,5,6,7,8,9,10
1,2,1,2,0,1,0,1,0,1
1,1,1,1,0,2,3,0,0,0
5,1,1,0,0,0,0,0,1,0
I would like to change the delimiters from col6 through column 10 to pipe '|' and the column value would be followed by column name.
Desired Output:
1,2,3,4,5,col6:6|col7:7|col8:8|col9:9|col10:10
1,2,1,2,0,col6:1|col8:1|col10:1
1,1,1,1,0,col6:2|col7:3
5,1,1,0,0,col9:1
I tried using the command
awk -F ', *' 'NR==1{for (i=1; i<=NF; i++) hdr[i]=$i; next}
{for (i=1; i<=NF; i++) if ($i>0) printf "%s%s", ((i>5)?hdr[i] ":":"") $i,
((i<NF)? ((i>5)?"|":",") : ORS)}' data.csv
but not getting the result as expected
Output:
1,2,3,4,5,col6:6|col7:7|col8:8|col9:9|col10:10
1,2,1,2,col6:1|col8:1|col10:1
1,1,1,1,col6:2|col7:3|5,1,1,col9:1|
The columns that does not contain a zero is ending with '|' and the next line is starting data starts there !
In this example, row 2 data ends with a pipe '|' and row 3 data starts in the row 2. row 4 data ends with a pipe '|'
Can some one help me fix this please
P.S: For people looking for the reason behind all this work, I'm trying to load the data from a csv file in to a framework. Source data has 10 columns and The destination dataset would have 6 columns - first 5 from source as is and the rest as a map. Also, I'll have to make sure that there is no map key with the value as zero and then start the data analysis on the set.
This post is to get help for making the data set ready for analysis.
$ awk -F ', *' 'NR==1{for (i=1; i<=NF; i++) hdr[i]=$i":"; next} {for (i=1; i<=5; i++) printf $i","; b=""; for (i=6; i<=NF; i++) if ($i>0) {printf "%s%s", b, hdr[i] $i; b="|";} printf ORS}' data.csv
1,2,3,4,5,col6:6|col7:7|col8:8|col9:9|col10:10
1,2,1,2,0,col6:1|col8:1|col10:1
1,1,1,1,0,col6:2|col7:3
5,1,1,0,0,col9:1
Or, written over multiple lines:
awk -F ', *' '
NR==1{
for (i=1; i<=NF; i++) hdr[i]=$i":"
next
}
{
for (i=1; i<=5; i++) printf $i","
b=""
for (i=6; i<=NF; i++) if ($i>0) {printf "%s%s", b, hdr[i] $i; b="|";}
printf ORS
}
' data.csv
How it works
NR==1{for (i=1; i<=NF; i++) hdr[i]=$i":"; next}
For the first line, NR==1, we save each field and a trailing colon into array hdr. Then, the rest of the commands are skipped and we just to the next line.
for (i=1; i<=5; i++) printf $i","
If we get here, we are working or the second are later lines. In this case, we print the first five fields, each followed by a comma.
b=""
We initialize the variable b to the empty string.
for (i=6; i<=NF; i++) if ($i>0) {printf "%s%s", b, hdr[i] $i; b="|";}
For fields 6 to the end, if the field is nonzero, we print b followed by the hdr followed by the field value. After we have encountered the first such nonzero field, b is set to |.
printf ORS
After printing the last field, we print an output record separator (default is a newline).
The above solution is excellent. Helps me with a similar issue. However, I need to cater for an all-zero case in columns 6 to 10. See the last line of your data below.
col1,col2,col3,col4,col5,col6,col7,col8,col9,col10
1,2,3,4,5,6,7,8,9,10
1,2,1,2,0,1,0,1,0,1
1,1,1,1,0,2,3,0,0,0
5,1,1,0,0,0,0,0,1,0
5,1,1,0,0,0,0,0,0,0
This might never happen in your data, however if it does you are left with an inconvenient comma at the end of the line:
1,2,3,4,5,col6:6|col7:7|col8:8|col9:9|col10:10
1,2,1,2,0,col6:1|col8:1|col10:1
1,1,1,1,0,col6:2|col7:3
5,1,1,0,0,col9:1
5,1,1,0,0,
To get around it I made change. Here it is, somewhat spread out for clarity:
awk -F ', *' '
NR==1{
for (i=1; i<=NF; i++) hdr[i]=$i":"
next
}
{
for (i=1; i<5; i++) printf("%s,", $i);
if(i==5) printf("%s", $i);
b="";
for (i=6; i<=NF; i++) {
if ($i>0) {
if(b=="") b=","; else b="|";
printf("%s%s",b, hdr[i] $i);
}
}
printf(ORS);
}
I have a file b.xyz as,
-19.794325 -23.350704 -9.552335
-20.313872 -23.948248 -8.924463
-18.810708 -23.571757 -9.494047
-20.048543 -23.660052 -10.478968
I want to limit each of the entries to three decimal digits.
I tried this one
awk '{ $1=sprintf("%.3f",$1)} {$2=sprintf("%.3f",$2)} {$3=sprintf("%.3f",$3)} {print $1, $2, $3}' b.xyz
it works for three columns, but how to expand it to apply for n/all columns?
If you will always have three fields, then you can use:
$ awk '{printf "%.3f %.3f %.3f\n", $1, $2, $3}' file
-19.794 -23.351 -9.552
-20.314 -23.948 -8.924
-18.811 -23.572 -9.494
-20.049 -23.660 -10.479
For an undefined number of lines, you can do:
$ awk '{for (i=1; i<=NF; i++) printf "%.3f%s", $i, (i==NF?"\n":" ")}' file
-19.794 -23.351 -9.552
-20.314 -23.948 -8.924
-18.811 -23.572 -9.494
-20.049 -23.660 -10.479
It will loop through all the fields and print them. (i==NF?"\n":" ") prints a new line when the last item is reached.
Or even (thanks Jotne!):
awk '{for (i=1; i<=NF; i++) printf "%.3f %s", $i, (i==NF?RS:FS)}' file
Example
$ cat a
-19.794325 -23.350704 -9.552335 2.13423 23 23223.23 23.23442
-20.313872 -23.948248 -8.924463
-18.810708 -23.571757 -9.494047
-20.048543 -23.660052 -10.478968
$ awk '{for (i=1; i<=NF; i++) printf "%.3f %s", $i, (i==NF?"\n":" ")}' a
-19.794 -23.351 -9.552 2.134 23.000 23223.230 23.234
-20.314 -23.948 -8.924
-18.811 -23.572 -9.494
-20.049 -23.660 -10.479
$ awk '{for (i=1; i<=NF; i++) printf "%.3f %s", $i, (i==NF?RS:FS)}' a
-19.794 -23.351 -9.552 2.134 23.000 23223.230 23.234
-20.314 -23.948 -8.924
-18.811 -23.572 -9.494
-20.049 -23.660 -10.479