Converting Rows into Columns using awk or sed - awk

I have a file with *.xvg format.
It contains six columns with 500 numbers each.
Except the time column (first column) all other columns contain floats.
I want to generate an output file in same format, in which these columns are converted into rows with each number separated by space.
I have written a program in C, which works fine for me but I am looking for an alternative way using awk or sed, which will allow me to do the same.
I am absolutely new to these scripting languages. I couldn't find any relevant answer for me in previously asked questions. So, If somebody can help me out with this task I will be grateful.
Input file looks like this :-
# This file was created Thu Oct 1 17:18:10 2015
# by the following command:
# /home/durba/gmx455/bin/mdrun -np 1 -deffnm md0 -v
#
# title "dH/d\xl\f{}, \xD\f{}H"
# xaxis label "Time (ps)"
# yaxis label "(kJ/mol)"
#TYPE xy
# subtitle "T = 200 (K), \xl\f{} = 0"
# view 0.15, 0.15, 0.75, 0.85
# legend on
# legend box on
# legend loctype view
# legend 0.78, 0.8
# legend length 2
# s0 legend "dH/d\xl\f{} \xl\f{} 0"
# s1 legend "\xD\f{}H \xl\f{} 0.05"
0 19.3191 1.16531 1.8 -447.07 -47.07
2 -447.072 -17.6454 1.5 -17.633 -1.33
4 -17.633 -0.446508 1.3 -75.455 -5.45
6 -75.4555 -2.83981 1.4 -28.724 -28.4
8 -28.7246 -0.884639 1.5 -41.877 -14.87
10 -41.8779 -1.45569 2.8 -43.685 -3.685
12 -43.6851 -1.4797 -3.1 -91.651 -91.651
14 -91.6515 -3.52492 -3.5 -61.135 -1.135
16 -61.1356 -2.30129 -3.2 -48.847 -48.47
output file should look like this :-
# This file was created Thu Oct 1 17:18:10 2015
# by the following command:
# /home/durba/gmx455/bin/mdrun -np 1 -deffnm md0 -v
#
# title "dH/d\xl\f{}, \xD\f{}H"
# xaxis label "Time (ps)"
# yaxis label "(kJ/mol)"
#TYPE xy
# subtitle "T = 200 (K), \xl\f{} = 0"
# view 0.15, 0.15, 0.75, 0.85
# legend on
# legend box on
# legend loctype view
# legend 0.78, 0.8
# legend length 2
# s0 legend "dH/d\xl\f{} \xl\f{} 0"
# s1 legend "\xD\f{}H \xl\f{} 0.05"
0 2 4 6 8 10 12
19.3191 -447.072 -17.633 -17.633 -75.4555 -28.7246 -41.8779 -43.6851 -91.6515 -61.1356
1.16531 -17.6454 -0.446508 -2.83981 -0.884639 -1.45569 -1.4797 -3.52492 -2.30129
1.8 1.5 1.3 1.4 1.5 2.8 -3.1 -3.5 -3.2
-447.07 -17.633 -75.455 -28.724 -41.877 -43.685 -91.651 -61.135 -48.847
-47.07 -1.33 -5.45 -28.4 -14.87 -3.685 -91.651 -1.135 -48.47
Please note that lines starting with "#" and "#" should be the same in both files.

Answer for original question
Let's consider this test file:
$ cat file
123 1.2 1.3 1.4 1.5
124 2.2 2.3 2.4 2.5
125 3.2 3.3 3.4 3.5
To convert columns to row:
$ awk '{for (i=1;i<=NF;i++)a[i,NR]=$i} END{for (i=1;i<=NF;i++) for (j=1;j<=NR;j++) printf "%s%s",a[i,j],(j==NR?ORS:OFS)}' file
123 124 125
1.2 2.2 3.2
1.3 2.3 3.3
1.4 2.4 3.4
1.5 2.5 3.5
How it works
for (i=1;i<=NF;i++)a[i,NR]=$i
As we loop through each line, we save the values in array a.
END{for (i=1;i<=NF;i++) for (j=1;j<=NR;j++) printf "%s%s",a[i,j],(j==NR?ORS:OFS)}
After we reach the end of the file, we print each of the values followed by the output field separator (OFS) if we are in the midst of a line or the output record separator (ORS) if we are at the end of the line.
Multi-line version
If you like your code spread over several lines:
awk '
{
for (i=1;i<=NF;i++)
a[i,NR]=$i
}
END{
for (i=1;i<=NF;i++)
for (j=1;j<=NR;j++)
printf "%s%s",a[i,j],(j==NR?ORS:OFS)
}
' file
Answer for revised question
In the revised question, there are lines at the beginning of the file that start with # or # that are not to be changed. In this case:
$ awk '/^[##]/{print;next}{k++; for (i=1;i<=NF;i++)a[i,k]=$i;} END{for (i=1;i<=NF;i++) for (j=1;j<=k;j++) printf "%s%s",a[i,j],(j==k?ORS:OFS)}' input
# This file was created Thu Oct 1 17:18:10 2015
# by the following command:
# /home/durba/gmx455/bin/mdrun -np 1 -deffnm md0 -v
#
#
#
# title "dH/d\xl\f{}, \xD\f{}H"
# xaxis label "Time (ps)"
# yaxis label "(kJ/mol)"
#TYPE xy
# subtitle "T = 200 (K), \xl\f{} = 0"
# view 0.15, 0.15, 0.75, 0.85
# legend on
# legend box on
# legend loctype view
# legend 0.78, 0.8
# legend length 2
# s0 legend "dH/d\xl\f{} \xl\f{} 0"
# s1 legend "\xD\f{}H \xl\f{} 0.05"
0 2 4 6 8 10 12 14 16
19.3191 -447.072 -17.633 -75.4555 -28.7246 -41.8779 -43.6851 -91.6515 -61.1356
1.16531 -17.6454 -0.446508 -2.83981 -0.884639 -1.45569 -1.4797 -3.52492 -2.30129
1.8 1.5 1.3 1.4 1.5 2.8 -3.1 -3.5 -3.2
-447.07 -17.633 -75.455 -28.724 -41.877 -43.685 -91.651 -61.135 -48.847
-47.07 -1.33 -5.45 -28.4 -14.87 -3.685 -91.651 -1.135 -48.47

This might work for you (GNU sed):
sed -r 'H;$!d;x;:a;h;s/\n(\S+)[^\n]*/\1 /g;s/ $//p;g;s/\n\S+ ?/\n/g;ta;d' file
Slurp the file into hold space (HS) deleting the pattern space (PS) until the end-of-file condition is met. At end-of-file swap the HS for the PS. Copy the PS to the HS and then remove all but the first field following a newline with the first field followed by a space, globally. Remove the last space and print the line. Then recall the copy of the line from the HS and do the inverse. If any of the substitutions were successful repeat the process until nothing but newlines exist. Delete the unwanted newlines.
Since first answering the original question changed. The new solution below caters for the new question using essentially the same method:
sed -r '/^[0-9]/{s/ +/ /g;H};//!p;$!d;x;:a;h;s/\n(\S+)[^\n]*/\1 /g;s/ $//p;g;s/\n\S+ ?/\n/g;ta;d' file

Related

sum 4th field data between the pattern

suppose my data is :
*dnet *1234 1.2
1 port *12 2.3
3 port1 *34 0.2
7 *15 0.1
*dnet *234 0.2
2 *12 0.1
4 *123 *234 1.2
fields are separated by space.
In this I want to get the sum of 4th fields of data present inside each *dnet. Some fields have 4th field data some has not. I want 4th field sum value for each *dnet seperate.
I tried using awk but could not get. It will be thankful if someone helps.
the output for above will look like
*dnet *1234 1.2 2.5
*dnet *234 0.2 1.2
Commented, slightly simplified, version of the comment...
awk '
# look for header line
$1=="*dnet" {
# print any previously calculated sum
if (header) print header, sum
# reset sum for next block of lines
sum = 0
# save new header line
header = $0
# skip remaining actions
next
}
# if we get here, we know this is not a header line
# if there is a 4th field, add it to the sum
$4 {
sum += $4
}
END {
# print the final sum
if (header) print header, sum
}
' datafile

plotting lines between points in gnuplot

I've got next script to plot dots from file "puntos"
set title "recorrido vehiculos"
set term png
set output "rutasVehiculos.png"
plot "puntos" u 2:3:(sprintf("%d",$1)) with labels font ",7" point pt 7 offset char 0.5,0.5 notitle
file "puntos" has next format:
#i x y
1 2.1 3.2
2 0.2 0.3
3 2.9 0.3
in another file called "routes" i have the routes that joins the points, for example:
2
1 22 33 20 18 14 8 27 1
1 13 2 17 31 1
Route 1 joins points 1, 22, 33, etc.
Route 2 joins points 1, 13, 12, etc.
Is there a way that perform this with gnuplot?
PS: sorry for my English
Welcome to stackoverflow. This is an interesting task. It's pretty clear what to do, however, to my opinion not very obvious how to do this gnuplot.
The following code seems to work, probably with room for improvements. Tested in gnuplot 5.2.5
Tested with the files puntos.dat and routes.dat:
# puntos.dat
#i x y
1 2.1 3.2
2 0.2 0.3
3 2.9 0.3
4 1.3 4.5
5 3.1 2.3
6 1.9 0.7
7 3.6 1.7
8 2.3 1.5
9 1.0 2.0
and
# routes.dat
2
1 5 7 3 6 2 9
6 8 5 9 4
and the code:
### plot different routes
reset session
set title "recorrido vehiculos"
set term pngcairo
set output "rutasVehiculos.png"
POINTS = "puntos.dat"
ROUTES = "routes.dat"
# load routes file into datablock
set datafile separator "\n"
set table $Routes
plot ROUTES u (stringcolumn(1)) with table
unset table
# loop routes
set datafile separator whitespace
stats $Routes u 0 nooutput # get the number of routes
RoutesCount = STATS_records-1
set print $RoutesData
do for [i=1:RoutesCount] {
# get the points of a single route
set datafile separator "\n"
set table $Dummy
plot ROUTES u (SingleRoute = stringcolumn(1),$1) every ::i::i with table
unset table
# create a table of the coordinates of the points of a single route
set datafile separator whitespace
do for [j=1:words(SingleRoute)] {
set table $Dummy2
plot POINTS u (a=$2,$2):(b=$3,$3) every ::word(SingleRoute,j)-1::word(SingleRoute,j)-1 with table
print sprintf("%g %s %g %g", j, word(SingleRoute,j), a, b)
unset table
}
print "" # add empty line
}
set print
print sprintf("%g different Routes\n", RoutesCount)
print "RoutesData:"
print $RoutesData
set colorsequence classic
plot \
POINTS u 2:3:(sprintf("%d",$1)) with labels font ",7" point pt 7 offset char 0.5,0.5 notitle,\
for [i=1:RoutesCount] $RoutesData u 3:4 every :::i-1::i-1 w lp lt i title sprintf("Route %g",i)
set output
### end code
which results in something like:

Display user specified contour levels in GrADS

I would like to know how to display specific contour levels on the colorbar. For example, as shown in the schematic above taken from pivotalweather, shows a colorbar for precipitation values that are not really equally spaced. I would like to know how to achieve a similar result with GrADS.
PS: I use the cbarn.gs and the xcbar.gs script sometimes.
You need to use the original color set of GRADS for this.
THREE steps:
1). Set the color using the 'set rgb # R G B'. You need the RGB of the colors in your color bar. Since there are 15 default colors in GrADS, you should start the # at 16.
Check this link for details of the colors:
http://cola.gmu.edu/grads/gadoc/colorcontrol.html
2). You need to set the color level as follows:
set clevs 0.01 0.05 0.1 0.02 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 1.2 1.4 1.6 1.8 2
2.5 3 3.5 4 5 6 8 15
3). You need to specify the colors based on your defined RGBs.
set ccols 16, 17, 18,....etc.

print Sprintf error while the input is a string which consist of symbol in heatmap gnuplot

I am working on heat map with a unique dataset. The dataset consists of a symbol.
Here is the example of my dataset 1q.txt
one two three
2009 0/0 1 0/0 1 0/0 1
2010 0/0 1 0/0 1 0/0 1
2011 0/0 1 0/0 1 6/179.5 1
2012 0/0 1 2/0.4 1 11/83.0 1
2013 7/0.8 1 7/21.3 1 17/268.5 1
2014 1/3.5 1 4/7.7 1 9/37.9 1
and here is my gnuplot script
set term pos eps font 20
unset colorbox
unset key
set nocbtics
set cblabel "Score"
set cbtics scale 0
set cbrange [ 0.00000 : 110.00000 ] noreverse nowriteback
set palette defined ( 0.0 "#FFFFFF",\
1 "#FFCCCC",\
2 "#FF9999 ",\
3 "#FF6666")
set size 1, 0.5
set output '1q.eps'
YTICS="`awk 'BEGIN{getline}{printf "%s ",$1}' '1q.dat'`"
XTICS="`head -1 '1q.dat'`"
set for [i=1:words(XTICS)] xtics ( word(XTICS,i) i-1 )
set for [i=1:words(YTICS)] ytics ( word(YTICS,i) i-1 )
set for [i=1:words(XTICS)] xtics ( word(XTICS,i) 2*i-1 )
plot "<awk '{$1=\"\"}1' '1q.dat' | sed '1 d'" matrix every 2::1 w image, \
'' matrix using ($1+1):2:(sprintf('%.f', $3)) every 2 with labels
What I'm trying to do here is I want to displaying "0/0" as a label in the heatmap and the integer number as a heatmap color.
The problem that I face here is the gnuplot only take a number before "/"
and ignore the other one.
Here is the result of my current plot.
How to make the heatmap show a lable like "1/3.5" and have color based on the integer number.
There is no need to usesprintf at all. Simply use stringcolumn to get the raw content of a column as saved in the data file:
plot "<awk '{$1=\"\"}1' '1q.dat' | sed '1 d'" matrix every 2::1 w image, \
'' matrix using ($1+1):2:(stringcolumn(3)) every 2 with labels

How to do multi-row calculations using awk on a large file

I have a big file that is sorted on the first word. I need to add a new column for each line with the proportional value: line value/total value for that group; group is determined by the first column. In the below example, the total of group "a" = 100 and hence each line gets a proportion. The total of group "the" is 1000 and hence each line gets the proprotion value of the total of that group.
I need an awk script to do this.
Sample File:
a lot 10
a few 20
a great 20
a little 40
a good 10
the best 250
the dog 750
zisty cool 20
Output:
a lot 10 0.1
a few 20 0.2
a great 20 0.1
a little 40 0.4
a good 10 0.1
the best 25 .25
the dog 75 .75
zisty cool 20 1
You describe this as a "big file." Consequently, this solution tries to save memory: it holds no more than one group in memory at a time. When we are done with that group, we print it out before starting on the next group:
$ awk -v i=0 'NR==1{name=$1} $1==name{a[i]=$0;b[i++]=$3;tot+=$3+0;next} {for (j=0;j<i;j++){print a[j],b[j]/tot} name=$1;a[0]=$0;tot=b[0]=$3;i=1} END{for (j=0;j<i;j++){print a[j],b[j]/tot}}' file
a lot 10 0.1
a few 20 0.2
a great 20 0.2
a little 40 0.4
a good 10 0.1
the best 250 0.25
the dog 750 0.75
zisty cool 20 1
How it works
-v i=0
This initializes the variable i to zero.
NR==1{name=$1}
For the first line, set the variable name to the first field, $1. This is the name of the group.
$1==name {a[i]=$0; b[i++]=$3; tot+=$3+0; next}
If the first field matches name, then save the whole line into array a and save the value of column (field) three into array b. Increment the variable tot by the value of the third field. Then, skip the rest of the commands and jump to the next line.
for (j=0;j<i;j++){print a[j],b[j]/tot} name=$1;a[0]=$0;tot=b[0]=$3;i=1
If we get to this line, then we are at the start of a new group. Print out all the values for the old group and initialize the variables for the start of the next group.
END{for (j=0;j<i;j++){print a[j],b[j]/tot}}
After we get to the last line, print out what we have for the last group.
awk '{a[$1]+=$3; b[i++]=$0; c[j++]=$1; d[k++]=$3} END{for(i=0;i<NR;i++) {print b[i], d[i]/a[c[i]]}}' File
Example:
sdlcb#Goofy-Gen:~/AMD$ cat ff
a lot 10
a few 20
a great 20
a little 40
a good 10
the best 250
the dog 750
zisty cool 20
sdlcb#Goofy-Gen:~/AMD$ awk '{a[$1]+=$3; b[i++]=$0; c[j++]=$1; d[k++]=$3} END{for(i=0;i<NR;i++) {print b[i], d[i]/a[c[i]]}}' ff
a lot 10 0.1
a few 20 0.2
a great 20 0.2
a little 40 0.4
a good 10 0.1
the best 250 0.25
the dog 750 0.75
zisty cool 20 1
Logic: update an array (a[]) with first column as index for each line. save array b[] with complete line for each line, to be used in the end for printing. similarly, update arrays c[] and d[] with first and third column values for each line. at the end, use these arrays to get the results using a for loop, looping through all the lines processed. First printing the line as itself, then the proportion value.