I am trying to compute voronoi using python code mentioned in the following link:
https://freud.readthedocs.io/en/latest/gettingstarted/examples/module_intros/locality.Voronoi.html
Following is the code which I am using:
import freud
import matplotlib
import matplotlib.pyplot as plt
import numpy as np
points = np.loadtxt('COORDINATE.dat')
L = 26
box = freud.box.Box.square(L)
voro = freud.locality.Voronoi()
cells = voro.compute((box, points)).polytopes
plt.figure()
ax = plt.gca()
voro.plot(ax=ax)
ax.scatter(points[:, 0], points[:, 1], s=2, c="b")
plt.show()
The code compiles but does not generate the output which in this case is the voronoi diagram. It shows a message at the terminal as:
Duplicate: 1142 (18.72,8.6,0) matches 95 (18.72,8.6,0)
The above code works fine for some of the data files provided but not for all.
Following is the data which is used as an input
3.6100 14.7100 0.0000
2.8300 2.1300 0.0000
6.1700 13.9300 0.0000
12.1800 6.6500 0.0000
17.9300 19.5400 0.0000
11.3000 26.6200 0.0000
7.6300 4.7900 0.0000
23.9000 12.8600 0.0000
7.8200 12.9900 0.0000
1.5600 5.5700 0.0000
15.0400 9.5400 0.0000
16.5000 12.1200 0.0000
1.2500 14.0600 0.0000
2.3200 12.3500 0.0000
7.4600 3.2900 0.0000
2.2100 9.7500 0.0000
17.1400 9.1900 0.0000
4.5700 23.6100 0.0000
16.2300 17.0000 0.0000
25.6600 12.1400 0.0000
18.6600 0.4900 0.0000
8.3000 17.1100 0.0000
4.2700 13.1500 0.0000
3.5000 5.9000 0.0000
24.5700 7.8700 0.0000
1.8900 11.1000 0.0000
10.2600 2.1200 0.0000
5.6900 12.6900 0.0000
12.3800 1.9700 0.0000
19.8200 27.2400 0.0000
8.0800 21.6400 0.0000
18.5100 24.3300 0.0000
16.4600 9.9300 0.0000
8.1600 9.7800 0.0000
0.9100 11.7600 0.0000
6.5800 22.8900 0.0000
17.7500 18.1700 0.0000
3.3100 11.7500 0.0000
18.5800 16.3000 0.0000
26.3400 8.6000 0.0000
14.5700 9.3300 0.0000
8.0300 3.8200 0.0000
10.5200 26.4600 0.0000
12.9900 5.8600 0.0000
10.3300 24.6600 0.0000
7.6400 10.1900 0.0000
20.2500 5.5100 0.0000
9.4600 25.7000 0.0000
24.7200 12.4700 0.0000
5.7400 14.4700 0.0000
22.6500 6.0500 0.0000
3.8400 11.1000 0.0000
17.4400 5.4200 0.0000
6.5700 7.6400 0.0000
18.0400 15.6900 0.0000
Related
I am trying to obtain the minimum value of three consecutive cells in pandas. The calculation should take into account the one cell above and one below.
I have tried scipy's argelextrema but I have a feeling it does not perform a rolling window.
Thanks
This is a wild approach but it did not perform as expected.
def pivot_swing_low(df):
data = df.copy()
data['d1'] = data.Close.shift(-1)
data['d3'] = data.Close.shift(0)
data['d4'] = data.Close.shift(1)
data['minPL'] = data[['d1', 'd3', 'd4']].min(axis=1)
data['PL'] = np.where(data['minPL'] == data['d3'], data['d3'], "NaN")
data['recentPL'] = data.PL.shift(2).astype(float).fillna(method='ffill')
data = data.drop(columns=['d1', 'd3', 'd4'])
return data
It will always capture the row number 33, but to me row 31 is relevant as well.
38.78 1671068699999 2022-12-15 01:44:59.999 NaN NaN -0.37 0.00 0.37 0.023571 0.054286 0.023125 0.057698 0.400805 28.612474 NaN NaN 38.78 38.78 39.15
30 38.79 1671068999999 2022-12-15 01:49:59.999 NaN NaN 0.01 0.01 0.00 0.022857 0.054286 0.022188 0.053576 0.414137 29.285496 NaN NaN 38.48 NaN 39.15
31 38.48 1671069299999 2022-12-15 01:54:59.999 NaN NaN -0.31 0.00 0.31 0.021429 0.076429 0.020603 0.071892 0.286583 22.274722 22.274722 NaN 38.48 38.48 38.78
32 38.67 1671069599999 2022-12-15 01:59:59.999 NaN NaN 0.19 0.19 0.00 0.035000 0.074286 0.032703 0.066757 0.489878 32.880419 NaN NaN 38.37 NaN 38.78
33 38.37 1671069899999 2022-12-15 02:04:59.999 38.37000000 NaN -0.30 0.00 0.30 0.035000 0.093571 0.030367 0.083417 0.364036 26.688174 NaN NaN 38.37 38.37 38.48
34 38.58 1671070199999 2022-12-15 02:09:59.999 NaN NaN 0.21 0.21 0.00 0.050000 0.090000 0.043198 0.077459 0.557687 35.802263 NaN NaN 38.37 NaN 38.48
35 38.70 1671070499999 2022-12-15 02:14:59.999 NaN NaN 0.12 0.12 0.00 0.058571 0.090000 0.048684 0.071926 0.676857 40.364625 NaN 40.364625 38.58 NaN 38.37
import pandas as pd
# Load the data into a dataframe
df = pd.read_csv('data.csv')
# Calculate the minimum of the current cell, the cell above, and the cell below
min_three_cells = df['value'].rolling(3, min_periods=1).min()
# View the results
print(min_three_cells)
This might help.
I have two dataframes df1 and df2. Both are indexed the same with [i_batch, i_example]
The columns are different rmse errors. I would like to find [i_batch, i_example] that df1 is a lot lower than df2, or find the rows that df1 has less error than df2 based on the common [i_batch, i_example].
Note that it is possible that a specific [i_batch, i_example] only happens in one of the df1 or df2. But I need to only consider [i_batch, i_example] that exists in both df1 and df2.
df1 =
rmse_ACCELERATION rmse_CENTER_X rmse_CENTER_Y rmse_HEADING rmse_LENGTH rmse_TURN_RATE rmse_VELOCITY rmse_WIDTH
i_batch i_example
0 0.0 1.064 1.018 0.995 0.991 1.190 0.967 1.029 1.532
1 0.0 1.199 1.030 1.007 1.048 1.278 0.967 1.156 1.468
1.0 1.101 1.026 1.114 2.762 0.967 0.967 1.083 1.186
2 0.0 1.681 1.113 1.090 1.001 1.670 0.967 1.205 1.160
1.0 1.637 1.122 1.183 0.987 1.521 0.967 1.191 1.278
2.0 1.252 1.035 1.035 2.507 1.108 0.967 1.210 1.595
3 0.0 1.232 1.014 1.019 1.627 1.143 0.967 1.080 1.583
1.0 1.195 1.028 1.019 1.151 1.097 0.967 1.071 1.549
2.0 1.233 1.010 1.004 1.616 1.135 0.967 1.082 1.573
3.0 1.179 1.017 1.014 1.368 1.132 0.967 1.099 1.518
and
df2 =
rmse_ACCELERATION rmse_CENTER_X rmse_CENTER_Y rmse_HEADING rmse_LENGTH rmse_TURN_RATE rmse_VELOCITY rmse_WIDTH
i_batch i_example
1 0.0 0.071 0.034 0.048 0.114 0.006 1.309e-03 0.461 0.004
1.0 0.052 0.055 0.062 2.137 0.023 8.232e-04 0.357 0.011
2 0.0 1.665 0.156 0.178 0.112 0.070 3.751e-03 2.326 0.016
1.0 0.880 0.210 0.088 0.055 0.202 1.449e-03 0.899 0.047
2.0 0.199 0.072 0.078 1.686 0.010 6.240e-04 0.239 0.008
3 0.0 0.332 0.068 0.097 1.211 0.022 5.127e-04 0.167 0.016
1.0 0.252 0.075 0.070 0.368 0.013 5.295e-04 0.136 0.008
2.0 0.268 0.067 0.064 1.026 0.010 5.564e-04 0.175 0.010
3.0 0.171 0.051 0.054 0.473 0.011 4.150e-04 0.220 0.009
5 0.0 0.014 0.099 0.119 0.389 0.123 3.846e-04 0.313 0.037
For instance how can I get the [i_batch, i_example] that `df1[rmse_ACCELERATION] < df1[rmse_ACCELERATION]'?
Do a merge and then just filter according to your needs
df_merge = df_1.merge(df_2,
left_index=True,
right_index=True,
suffixes=('_1','_2'))
df_merge[
df_merge['rmse_ACCELERATION_1'] < df_merge['rmse_ACCELERATION_2']
].index
However I don't see any records with same [i_batch, i_example] in both dataframes that passes the condition
Use .sub(), that directly matches the indices and subtracts matches.
df3=df1.sub(df2)
df3[(df3<0).any(1)]
Or go specific and try searching in df1 by
df1[(df1.sub(df2)<0).any(1)]
rmse_ACCELERATION rmse_CENTER_X rmse_CENTER_Y \
i_batch i_example
2 0.0 0.016 0.957 0.912
rmse_HEADING rmse_LENGTH rmse_TURN_RATE rmse_VELOCITY \
i_batch i_example
2 0.0 0.889 1.6 0.963249 -1.121
rmse_WIDTH
i_batch i_example
2 0.0 1.144
Sorry for the title which is maybe more complicated than the problem itself ;)
I have de following pandas dataframe
grh anc anc1 anc2 anc3 anc4 anc5 anc6 anc7
1 2 5 0.10000 0.12000 0.1800 0.14000 0.15000 0.1900 0.20000
2 3 7 0.03299 0.05081 0.0355 0.02884 0.03054 0.0332 0.03115
3 4 3 0.00000 0.00000 0.0000 0.00000 0.00000 0.0000 0.00000
4 5 4 0.00000 0.00000 0.0000 0.00000 0.00000 0.0000 0.00000
5 6 1 0.10000 0.10000 0.1000 0.10000 0.10000 0.1000 0.10000
anc8 anc9 anc10
1 0.10000 0.21000 0.24000
2 0.02177 0.04903 0.04399
3 0.00000 0.00000 0.00000
4 0.00000 0.00000 0.00000
5 0.10000 0.10000 0.10000
I would like to add new columns with a forloop lap1, lap2, ....depending on the values of variable anc. For instance, on the first row, anc=5 so lap1 should be equal to the value of anc5 (0.1500), lap2 equal to anc6 (0.1900)...on the second row lap1=anc7 (0.03115), lap2=anc8 (0.02177),...
So, the output should look like
grh anc anc1 anc2 anc3 anc4 anc5 anc6 anc7 anc8 anc9 anc10 lap1 lap2 lap3
2 5 0.10000 0.12000 0.18000 0.14000 0.15000 0.19000 0.20000 0.1000 0.21000 0.24000 0.15000 0.19000 0.20000
3 7 0.03299 0.05081 0.0355 0.02884 0.03054 0.0332 0.03115 0.02177 0.04903 0.04399 0.03115 0.02177 0.04903
4 3 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
5 4 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
6 1 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
I've tried something very basic, but doesn't seem to work
for i in range(1,4):
j=df['anc']+i
df['lap'+str(i)]= df['anc'+str(j)]
I would be very grateful if you have any idea.
Thks
set grh & anc as index as we are looking to index into the anc[1-9] columns. This also comes in handy when we write the output columns:
df2 = df.set_index(['grh', 'anc'])
for each row slice into the columns using the anc value, which is now in the index, taking the 3 adjacent values, convert them to a series with names as you expect in the output and assign them to matching output columns
outcols = ['lap1', 'lap2', 'lap3']
df2[outcols] = df2.apply(lambda x: pd.Series(x[x.name[1]-1:x.name[1]+2].values, index=outcols), axis=1)
df2 looks like this:
anc1 anc2 anc3 anc4 anc5 anc6 anc7 anc8 anc9 anc10 lap1 lap2 lap3
grh anc
2 5 0.10000 0.12000 0.1800 0.14000 0.15000 0.1900 0.20000 0.10000 0.21000 0.24000 0.15000 0.19000 0.20000
3 7 0.03299 0.05081 0.0355 0.02884 0.03054 0.0332 0.03115 0.02177 0.04903 0.04399 0.03115 0.02177 0.04903
4 3 0.00000 0.00000 0.0000 0.00000 0.00000 0.0000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
5 4 0.00000 0.00000 0.0000 0.00000 0.00000 0.0000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000
6 1 0.10000 0.10000 0.1000 0.10000 0.10000 0.1000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000
reset the index again if you like to revert grh & anc back to being columns.
alternative name based look-up instead of positional lookup:
define a utility function to perform the column lookup provided an float. It needs to accept a float because pandas would automatically upcast an int64 to a float64 if the series contains any non-integer values. Use this function to perform lookup & to assign the output. The one benefit of this approach is that no set_index is required.
def cols(n,p): return [f'{p}{i}' for i in range(int(n), int(n+3))]
df[cols(1, 'lap')] = df.apply(lambda x: pd.Series(x[cols(x.anc, 'anc')].values), axis=1)
A bit of a 'brute-force' approach, but I can't see how you can do this otherwise:
df[[f"lap{i}" for i in range(1,4)]]= \
df.apply(lambda x: \
pd.Series({f"lap{j}": x[f"anc{int(j+x['anc']-1)}"] for j in range(1,4)}) \
, axis=1)
(Assuming per your sample, that you have max lap at 3)
# Where is the new lap column starting
startingNewColsNumber = df.shape[1]
# How many new lap columns to add
numNewCols = df.grh.max()
# Generate new lap columns
newColNames = ['lap'+str(x) for x in range(1, numNewCols + 1)]
# add new lap columns to the dataframe
for lapName in newColNames:
df[lapName] = np.NaN
# now fill the values for each of rows for the new 'lap' columns
for row in df.index:
startCopyCol = df.loc[row,'anc'] + 1 # What is the begening anc value to start copying
howmany = df.loc[row,'grh'] # How many lap values should I fill
df.iloc[row, startingNewColsNumber : startingNewColsNumber + howmany] = \
df.iloc[row, startCopyCol : startCopyCol + howmany].values
df
Here is the output I got:
grh anc anc1 anc2 anc3 anc4 anc5 anc6 anc7 anc8 anc9 anc10 lap1 lap2 lap3 lap4 lap5 lap6
0 2 5 0.10000 0.12000 0.1800 0.14000 0.15000 0.1900 0.20000 0.10000 0.21000 0.24000 0.15000 0.19000 NaN NaN NaN NaN
1 3 7 0.03299 0.05081 0.0355 0.02884 0.03054 0.0332 0.03115 0.02177 0.04903 0.04399 0.03115 0.02177 0.04903 NaN NaN NaN
2 4 3 0.00000 0.00000 0.0000 0.00000 0.00000 0.0000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.0 NaN NaN
3 5 4 0.00000 0.00000 0.0000 0.00000 0.00000 0.0000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.00000 0.0 0.0 NaN
4 6 1 0.10000 0.10000 0.1000 0.10000 0.10000 0.1000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.10000 0.1 0.1 0.1
Let me know if this gives some kind of solution for you are looking
col_A vi_B data_source index_as_date
2017-01-21 0.000000 0.199354 sat 2017-01-21
2017-01-22 0.000000 0.204250 NaN NaT
2017-01-23 0.000000 0.208077 NaN NaT
2017-01-27 0.000000 0.215081 NaN NaT
2017-01-28 0.000000 0.215300 NaN NaT
In the pandas dataframe above, I want to insert a row for 24th January 2017 with value of 0.01, 0.4, sat, NaT, how do I do that? I could use iloc and manually insert but I would prefer an automated solution which takes the datetime index into account
I think you need setting with enlargement with sort_index:
#if necessary convert to datetime
df.index = pd.to_datetime(df.index)
df['index_as_date'] = pd.to_datetime(df['index_as_date'])
df.loc[pd.to_datetime('2017-01-24')] = [0.01,0.4,'sat', pd.NaT]
df = df.sort_index()
print (df)
col_A vi_B data_source index_as_date
2017-01-21 0.00 0.199354 sat 2017-01-21
2017-01-22 0.00 0.204250 NaN NaT
2017-01-23 0.00 0.208077 NaN NaT
2017-01-24 0.01 0.400000 sat NaT
2017-01-27 0.00 0.215081 NaN NaT
2017-01-28 0.00 0.215300 NaN NaT
I am looking for a way to perform a matrix multiplication on two sets of columns in a dataframe. One set of columns will need to be transposed and then multiplied with the other set. Then I need to take the resulting matrix and do an element wise product with a scalar matrix and add up. Below is an example:
Data for testing:
import pandas as pd
import numpy as np
dftest = pd.DataFrame(data=[['A',0.18,0.25,0.36,0.21,0,0.16,0.16,0.64,0.04,0,0],['B',0,0,0.5,0.5,0,0,0,0.25,0.75,0,0]],columns = ['Ticker','f1','f2','f3','f4','f5','p1','p2','p3','p4','p5','multiplier'])
Starting dataframe with data for Tickers. f1 through f5 represent one set of categories and p1 through p5 represent another.
dftest
Out[276]:
Ticker f1 f2 f3 f4 f5 p1 p2 p3 p4 p5 multiplier
0 A 0.18 0.25 0.36 0.21 0 0.16 0.16 0.64 0.04 0 0
1 B 0.00 0.00 0.50 0.50 0 0.00 0.00 0.25 0.75 0 0
For each row, I need to do transpose columns p1 through p5 and then multiply them to columns f1 through f5. I think I have found the solution using below.
dftest.groupby('Ticker')['f1','f2','f3','f4','f5','p1','p2','p3','p4','p5'].apply(lambda x: x[['p1','p2','p3','p4','p5']].T.dot(x[['f1','f2','f3','f4','f5']]))
Out[408]:
f1 f2 f3 f4 f5
Ticker
A p1 0.0288 0.04 0.0576 0.0336 0.0
p2 0.0288 0.04 0.0576 0.0336 0.0
p3 0.1152 0.16 0.2304 0.1344 0.0
p4 0.0072 0.01 0.0144 0.0084 0.0
p5 0.0000 0.00 0.0000 0.0000 0.0
B p1 0.0000 0.00 0.0000 0.0000 0.0
p2 0.0000 0.00 0.0000 0.0000 0.0
p3 0.0000 0.00 0.1250 0.1250 0.0
p4 0.0000 0.00 0.3750 0.3750 0.0
p5 0.0000 0.00 0.0000 0.0000 0.0
Next I need to do a element wise product of the above matrix against another 5x5 matrix that is in another DataFrame and then add up the columns or rows (you get the same result either way). If I extend the above statement as below, I get the result I want.
dftest.groupby('Ticker')['f1','f2','f3','f4','f5','p1','p2','p3','p4','p5'].apply(lambda x: pd.DataFrame(m.values * x[['p1','p2','p3','p4','p5']].T.dot(x[['f1','f2','f3','f4','f5']]).values, columns = m.columns, index = m.index).sum().sum())
Out[409]:
Ticker
A 2.7476
B 1.6250
dtype: float64
So far so good, I think. Happy to know a better and faster way to do this. The next question and this is where I am stuck.
How do I take this and update the "multiplier" column on my original dataFrame?
if I try to do the following:
dftest['multiplier']=dftest.groupby('Ticker')['f1','f2','f3','f4','f5','p1','p2','p3','p4','p5'].apply(lambda x: pd.DataFrame(m.values * x[['p1','p2','p3','p4','p5']].T.dot(x[['f1','f2','f3','f4','f5']]).values, columns = m.columns, index = m.index).sum().sum())
I get NaNs in the multiplier column.
dftest
Out[407]:
Ticker f1 f2 f3 f4 f5 p1 p2 p3 p4 p5 multiplier
0 A 0.18 0.25 0.36 0.21 0 0.16 0.16 0.64 0.04 0 NaN
1 B 0.00 0.00 0.50 0.50 0 0.00 0.00 0.25 0.75 0 NaN
I suspect it has to do with indexing and whether all the indices after grouping are translating back to the original dataframe. Second, do I need a group by statement for this one? Since it is a row by row solution, can't I just do it without grouping or group by the index? any suggestions on that?
I need to do this without iterating row by row because the whole code will iterate due to some optimization I have to do. So I need to run this whole process, look at the results and if they are outside some constraints, calculate new f1 through f5 and p1 through p5 and run the whole thing again.
I posted a question on this earlier but it was confusing so this a second attempt. Hope it makes sense.
Thanks in advance for all your help.