iterate over line gnupllot - iteration

all,
I have a file that contains "time" in the first column and then bunch of data points in the following columns, and I want to print all of them to the same file and show how each object moves differently in time, but i am not sure how to iterative over such a file, I have search for a long time but to no luck.
Here is an example of some data:
0 0.001 0.006
1 0.001 0.090
2 0.005 0.099
3 0.008 0.999
4 0.009 0.100
5 0.010 0.100
Expect in my file i have 100 + lines after the time column. This is what i have so far in my gnuplot loop:
do for [i=2:99] {
plot 'data.out' using 1:i w l lt 7 lw 1 }
Any help is appreciated, thanks all.

in case you want to have everything in "one plot", you could interchange the order of the for loop and the plot command:
plot for [i=2:99] 'data.out' using 1:i w l lt 7 lw 1
In order to determine the number of columns automatically, one might use the stat command as in:
fName = 'data.out'
stat fName nooutput
N = STATS_columns #number of columns found in file
plot for [i=2:N] fName u 1:i w l lt 7 lw 1

Related

Python - Looping through dataframe using methods other than .iterrows()

Here is the simplified dataset:
Character x0 x1
0 T 0.0 1.0
1 h 1.1 2.1
2 i 2.2 3.2
3 s 3.3 4.3
5 i 5.5 6.5
6 s 6.6 7.6
8 a 8.8 9.8
10 s 11.0 12.0
11 a 12.1 13.1
12 m 13.2 14.2
13 p 14.3 15.3
14 l 15.4 16.4
15 e 16.5 17.5
16 . 17.6 18.6
The simplified dataset is generated by the following code:
ch = ['T']
x0 = [0]
x1 = [1]
string = 'his is a sample.'
for s in string:
ch.append(s)
x0.append(round(x1[-1]+0.1,1))
x1.append(round(x0[-1]+1,1))
df = pd.DataFrame(list(zip(ch, x0, x1)), columns = ['Character', 'x0', 'x1'])
df = df.drop(df.loc[df['Character'] == ' '].index)
x0 and x1 represents the starting and ending position of each Character, respectively. Assume that the distance between any two adjacent characters equals to 0.1. In other words, if the difference between x0 of a character and x1 of the previous character is 0.1, the two characters belongs to the same string. If such difference is larger than 0.1, the character should be the start of a new string, etc. I need to produce a dataframe of strings and their respective x0 and x1, which is done by looping through the dataframe using .iterrows()
string = []
x0 = []
x1 = []
for index, row in df.iterrows():
if index == 0:
string.append(row['Character'])
x0.append(row['x0'])
x1.append(row['x1'])
else:
if round(row['x0']-x1[-1],1) == 0.1:
string[-1] += row['Character']
x1[-1] = row['x1']
else:
string.append(row['Character'])
x0.append(row['x0'])
x1.append(row['x1'])
df_string = pd.DataFrame(list(zip(string, x0, x1)), columns = ['String', 'x0', 'x1'])
Here is the result:
String x0 x1
0 This 0.0 4.3
1 is 5.5 7.6
2 a 8.8 9.8
3 sample. 11.0 18.6
Is there any other faster way to achieve this?
You could use groupby + agg:
# create diff column
same = (df['x0'] - df['x1'].shift().fillna(df.at[0, 'x0'])).abs()
# create grouper column, had to use this because of problems with floating point
grouper = ((same - 0.1) > 0.00001).cumsum()
# group and aggregate accordingly
res = df.groupby(grouper).agg({ 'Character' : ''.join, 'x0' : 'first', 'x1' : 'last' })
print(res)
Output
Character x0 x1
0 This 0.0 4.3
1 is 5.5 7.6
2 a 8.8 9.8
3 sample. 11.0 18.6
The tricky part is this one:
# create grouper column, had to use this because of problems with floating point
grouper = ((same - 0.1) > 0.00001).cumsum()
The idea is to convert the column of diffs (same) into a True or False column, where every time a True appears it means a new group needs to be created. The cumsum will take care of assigning the same id to each group.
As suggested by #ShubhamSharma, you could do:
# create diff column
same = (df['x0'] - df['x1'].shift().fillna(df['x0'])).abs().round(3).gt(.1)
# create grouper column, had to use this because of problems with floating point
grouper = same.cumsum()
The other part remains the same.

plotting lines between points in gnuplot

I've got next script to plot dots from file "puntos"
set title "recorrido vehiculos"
set term png
set output "rutasVehiculos.png"
plot "puntos" u 2:3:(sprintf("%d",$1)) with labels font ",7" point pt 7 offset char 0.5,0.5 notitle
file "puntos" has next format:
#i x y
1 2.1 3.2
2 0.2 0.3
3 2.9 0.3
in another file called "routes" i have the routes that joins the points, for example:
2
1 22 33 20 18 14 8 27 1
1 13 2 17 31 1
Route 1 joins points 1, 22, 33, etc.
Route 2 joins points 1, 13, 12, etc.
Is there a way that perform this with gnuplot?
PS: sorry for my English
Welcome to stackoverflow. This is an interesting task. It's pretty clear what to do, however, to my opinion not very obvious how to do this gnuplot.
The following code seems to work, probably with room for improvements. Tested in gnuplot 5.2.5
Tested with the files puntos.dat and routes.dat:
# puntos.dat
#i x y
1 2.1 3.2
2 0.2 0.3
3 2.9 0.3
4 1.3 4.5
5 3.1 2.3
6 1.9 0.7
7 3.6 1.7
8 2.3 1.5
9 1.0 2.0
and
# routes.dat
2
1 5 7 3 6 2 9
6 8 5 9 4
and the code:
### plot different routes
reset session
set title "recorrido vehiculos"
set term pngcairo
set output "rutasVehiculos.png"
POINTS = "puntos.dat"
ROUTES = "routes.dat"
# load routes file into datablock
set datafile separator "\n"
set table $Routes
plot ROUTES u (stringcolumn(1)) with table
unset table
# loop routes
set datafile separator whitespace
stats $Routes u 0 nooutput # get the number of routes
RoutesCount = STATS_records-1
set print $RoutesData
do for [i=1:RoutesCount] {
# get the points of a single route
set datafile separator "\n"
set table $Dummy
plot ROUTES u (SingleRoute = stringcolumn(1),$1) every ::i::i with table
unset table
# create a table of the coordinates of the points of a single route
set datafile separator whitespace
do for [j=1:words(SingleRoute)] {
set table $Dummy2
plot POINTS u (a=$2,$2):(b=$3,$3) every ::word(SingleRoute,j)-1::word(SingleRoute,j)-1 with table
print sprintf("%g %s %g %g", j, word(SingleRoute,j), a, b)
unset table
}
print "" # add empty line
}
set print
print sprintf("%g different Routes\n", RoutesCount)
print "RoutesData:"
print $RoutesData
set colorsequence classic
plot \
POINTS u 2:3:(sprintf("%d",$1)) with labels font ",7" point pt 7 offset char 0.5,0.5 notitle,\
for [i=1:RoutesCount] $RoutesData u 3:4 every :::i-1::i-1 w lp lt i title sprintf("Route %g",i)
set output
### end code
which results in something like:

AMPL Non-Linear least Square

Could anyone help me to find the error in this AMPL's code for a simple least-square error base on the function:
F(X)=1/1+e^-x
param N>=1;# N Number of simulations
param M>=1;# Number of inputs
param simulations {1..N};
param training{1..N,1..M};
var W{1..10};
minimize obj: sum{i in simulations , j in 1..4} (1/(1+exp-(W[9]/(1+exp(-
W[j]/(1+exp(-training[i][j]))))+ W[10]/(1+exp(-W[2*j]/(1+exp(-training[i][j]))))))-training[i][5])^2;
'###### DATA
param N:=6;
param M:=4;
param training:
1 2 3 4 5 :=
1 0.209 0.555 0.644 0.355 0.0
2 0.707 0.450 0.587 0.305 1.0
3 0.579 0.521 0.745 0.394 1.0
4 0.574 0.883 0.211 0.550 1.0
5 0.797 0.055 0.430 0.937 1.0
6 0.782 0.865 0.114 0.317 1.0 ;
Thank you!
A couple of things:
is that quote mark before ###### DATA meant to be there?
You have specified that training has dimension N x M, and your data specifies that N=6, M=4, but you then define training as 6 x 5 and your objective function also refers to column 5.
If that doesn't answer your question, you might want to give more information about what error messages you're getting.

Most efficient way to shift MultiIndex time series

I have a DataFrame that consists of many stacked time series. The index is (poolId, month) where both are integers, the "month" being the number of months since 2000. What's the best way to calculate one-month lagged versions of multiple variables?
Right now, I do something like:
cols_to_shift = ["bal", ...5 more columns...]
df_shift = df[cols_to_shift].groupby(level=0).transform(lambda x: x.shift(-1))
For my data, this took me a full 60 s to run. (I have 48k different pools and a total of 718k rows.)
I'm converting this from R code and the equivalent data.table call:
dt.shift <- dt[, list(bal=myshift(bal), ...), by=list(poolId)]
only takes 9 s to run. (Here "myshift" is something like "function(x) c(x[-1], NA)".)
Is there a way I can get the pandas verison to be back in line speed-wise? I tested this on 0.8.1.
Edit: Here's an example of generating a close-enough data set, so you can get some idea of what I mean:
ids = np.arange(48000)
lens = np.maximum(np.round(15+9.5*np.random.randn(48000)), 1.0).astype(int)
id_vec = np.repeat(ids, lens)
lens_shift = np.concatenate(([0], lens[:-1]))
mon_vec = np.arange(lens.sum()) - np.repeat(np.cumsum(lens_shift), lens)
n = len(mon_vec)
df = pd.DataFrame.from_items([('pool', id_vec), ('month', mon_vec)] + [(c, np.random.rand(n)) for c in 'abcde'])
df = df.set_index(['pool', 'month'])
%time df_shift = df.groupby(level=0).transform(lambda x: x.shift(-1))
That took 64 s when I tried it. This data has every series starting at month 0; really, they should all end at month np.max(lens), with ragged start dates, but good enough.
Edit 2: Here's some comparison R code. This takes 0.8 s. Factor of 80, not good.
library(data.table)
ids <- 1:48000
lens <- as.integer(pmax(1, round(rnorm(ids, mean=15, sd=9.5))))
id.vec <- rep(ids, times=lens)
lens.shift <- c(0, lens[-length(lens)])
mon.vec <- (1:sum(lens)) - rep(cumsum(lens.shift), times=lens)
n <- length(id.vec)
dt <- data.table(pool=id.vec, month=mon.vec, a=rnorm(n), b=rnorm(n), c=rnorm(n), d=rnorm(n), e=rnorm(n))
setkey(dt, pool, month)
myshift <- function(x) c(x[-1], NA)
system.time(dt.shift <- dt[, list(month=month, a=myshift(a), b=myshift(b), c=myshift(c), d=myshift(d), e=myshift(e)), by=pool])
I would suggest you reshape the data and do a single shift versus the groupby approach:
result = df.unstack(0).shift(1).stack()
This switches the order of the levels so you'd want to swap and reorder:
result = result.swaplevel(0, 1).sortlevel(0)
You can verify it's been lagged by one period (you want shift(1) instead of shift(-1)):
In [17]: result.ix[1]
Out[17]:
a b c d e
month
1 0.752511 0.600825 0.328796 0.852869 0.306379
2 0.251120 0.871167 0.977606 0.509303 0.809407
3 0.198327 0.587066 0.778885 0.565666 0.172045
4 0.298184 0.853896 0.164485 0.169562 0.923817
5 0.703668 0.852304 0.030534 0.415467 0.663602
6 0.851866 0.629567 0.918303 0.205008 0.970033
7 0.758121 0.066677 0.433014 0.005454 0.338596
8 0.561382 0.968078 0.586736 0.817569 0.842106
9 0.246986 0.829720 0.522371 0.854840 0.887886
10 0.709550 0.591733 0.919168 0.568988 0.849380
11 0.997787 0.084709 0.664845 0.808106 0.872628
12 0.008661 0.449826 0.841896 0.307360 0.092581
13 0.727409 0.791167 0.518371 0.691875 0.095718
14 0.928342 0.247725 0.754204 0.468484 0.663773
15 0.934902 0.692837 0.367644 0.061359 0.381885
16 0.828492 0.026166 0.050765 0.524551 0.296122
17 0.589907 0.775721 0.061765 0.033213 0.793401
18 0.532189 0.678184 0.747391 0.199283 0.349949
In [18]: df.ix[1]
Out[18]:
a b c d e
month
0 0.752511 0.600825 0.328796 0.852869 0.306379
1 0.251120 0.871167 0.977606 0.509303 0.809407
2 0.198327 0.587066 0.778885 0.565666 0.172045
3 0.298184 0.853896 0.164485 0.169562 0.923817
4 0.703668 0.852304 0.030534 0.415467 0.663602
5 0.851866 0.629567 0.918303 0.205008 0.970033
6 0.758121 0.066677 0.433014 0.005454 0.338596
7 0.561382 0.968078 0.586736 0.817569 0.842106
8 0.246986 0.829720 0.522371 0.854840 0.887886
9 0.709550 0.591733 0.919168 0.568988 0.849380
10 0.997787 0.084709 0.664845 0.808106 0.872628
11 0.008661 0.449826 0.841896 0.307360 0.092581
12 0.727409 0.791167 0.518371 0.691875 0.095718
13 0.928342 0.247725 0.754204 0.468484 0.663773
14 0.934902 0.692837 0.367644 0.061359 0.381885
15 0.828492 0.026166 0.050765 0.524551 0.296122
16 0.589907 0.775721 0.061765 0.033213 0.793401
17 0.532189 0.678184 0.747391 0.199283 0.349949
Perf isn't too bad with this method (it might be a touch slower in 0.9.0):
In [19]: %time result = df.unstack(0).shift(1).stack()
CPU times: user 1.46 s, sys: 0.24 s, total: 1.70 s
Wall time: 1.71 s

gnuplot to plot histogram with lines

I've two column data as:
9 17.52
11 29.77
7 62.75
11 36.15
7 30.46
7 52.5
9 65.26
9 90.05
14 101.87
12 86.88
15 74.78
And want that first column be plotted as histogram according to index of y2, and second column be plotted as line according to index of y1. Anyone has ideas?
Maybe I didn't understand your question correctly, but are you possibly looking for something like this:
set style fill solid border -1
set boxwidth 0.4
plot "Data.dat" u 2 w boxes t "boxes", "" u (column(0)):1 t "lines" w l
?