I have some time series data in Pandas where I need to extract specific local minimums from a column so I can use them as Features in a LSTM model. To visualize what I'm looking for I've attached a Picture, where the circled points are the values that I wish to locate.
The other red dots that you see at the bottom of the graph is my failed attempt of using "argrelextrema" with the following code:
#Trying to Locate Minimum Values
df['HKL Min'] = df.iloc[argrelextrema(df.hkla.values, np.less_equal,order=50)[0]]['hkla']
#Plotting a range of values from dataset:
sns.lineplot(x=df.index[0:3000], y= 'hkla', data=df[0:3000], label='Hookload');
sns.scatterplot(x=df.index[0:3000], y= 'HKL Min', data=df[0:3000], s= 50, color ='red', label='HKL Min');
As you may notice, my column data has a repetitive pattern, and the points I wish to locate are the minimas found between two "peaks-pairs".Is there some existing functions in Python that can help me locate these specific points? Any form of help would be highly appreciated. I am also open to other suggestions that can solve my issue here...
You could do something like this with your data:
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
from scipy.signal import argrelextrema
rs = np.random.randn(500)
xs = [0]
for r in rs:
xs.append(xs[-1] * 0.999 + r)
df = pd.DataFrame(xs, columns=['point'])
which gives this data
0 0.000000
1 0.471435
2 -0.720012
3 0.713415
4 0.400050
.. ...
496 3.176240
497 3.007734
498 3.123841
499 1.045736
500 0.041935
[501 rows x 1 columns]
You can choose how often you want to mark a local ma or min by playing with a parameter:
n = 10
df['min'] = df.iloc[argrelextrema(df.point.values, np.less_equal,
df['max'] = df.iloc[argrelextrema(df.point.values, np.greater_equal,
plt.scatter(df.index, df['min'], c='r')
plt.scatter(df.index, df['max'], c='r')
plt.plot(df.index, df['point'])
Which gives:
Another choice for n might be (and it all depends on what you want):
n = 40
df['min'] = df.iloc[argrelextrema(df.point.values, np.less_equal,
df['max'] = df.iloc[argrelextrema(df.point.values, np.greater_equal,
plt.scatter(df.index, df['min'], c='r')
plt.scatter(df.index, df['max'], c='g')
plt.plot(df.index, df['point'])
To get a marking for which points actually where max and min, you can make a new df:
new_df = pd.DataFrame(np.where(df.T == df.T.max(), 1, 0),index=df.columns).T
which gives the information about which row in df is a maximum or a minimum. Otherwise, the original df contains that information in the created min and max columns, those instance that aren't nan
EDIT: Finding peaks above threshold
If you are intrested of peaks above a certain value, then you should use find_peaks in the following way:
from scipy.signal import find_peaks
peaks, _ = find_peaks(df['point'], height = 15)
plt.plot(peaks, df['point'][peaks], "x")
which will produce:
(array([304, 309, 314, 317, 324, 329, 333, 337, 343, 349, 352, 363, 366,
369, 372, 374, 377, 379, 381, 383, 385, 387, 391, 394, 397, 400,
403, 410, 413, 418, 424, 427, 430, 433, 436, 439, 442, 444, 448],
{'peak_heights': array([15.68868141, 15.97184882, 15.04790966, 15.6146908 , 16.49191501,
18.0852033 , 18.11467247, 19.48469432, 21.32391722, 19.90407526,
19.93683051, 24.40980129, 28.00319793, 26.1080406 , 24.44322213,
23.16993982, 22.27505873, 21.47500832, 22.3236231 , 24.02484906,
23.83727054, 24.32609486, 21.25365717, 21.10295203, 20.03162979,
20.64021444, 19.78510855, 21.62624829, 22.34904425, 21.60431638,
18.41968769, 18.24153961, 18.00747871, 18.02793964, 16.72552016,
17.58573207, 16.90982675, 16.9905686 , 16.30563852])})
and graphically
I was able to fix my problem using the approach provided by #Serge de Gosson de Varennes. I switched out the "argrelextrema" with scipy "find_peaks()" as follows:
df['Min'] = df.iloc[find_peaks(-df.column[0:3000], height=(-350000,-250000), threshold = None,
distance=200, )[0]]['column']
The height input here gave me the option to choose an interval in the y-direction, which made it quite easy to detect the local minimas that I was looking for within said interval. When plotting the results like this:
plt.plot(df.index[0:3000], df.column[0:3000])
plt.plot(df.index, df['Min'],'ro', color = 'red', label = 'Min Values')
I got the following graph
Thank you for the assistance!
I was trying to load an excel data into python and plot it as a histogram. My aim would be to color the histogram according to specific ranges: every number smaller than 4 = yellow, numbers between 4 and 12 = orange and so on. I encountered 2 problems I don´t have 4 separate histograms the programm plots everything in 1 graph. Second problem,one of the loops, obviously is wrong because it only shows everything in yellow.
Could somebody help me out which loop is wrong and why? Is there a better way to do this?I appriciate every help im pretty new at programming.
import matplotlib.pyplot as plt
import pandas as pd
import openpyxl
# Load the workbook
#workbook = openpyxl.load_workbook("Geotechnikstammdaten.xlsx","C:/Users/akosw/OneDrive/Desktop/Programmieren")
workbook = openpyxl.load_workbook(path)
# Select the sheet
sheet = workbook['Tabelle3']
# Extract the values from each column
columns = [[cell.value for cell in column] for column in zip(*sheet.rows)]
# Iterate over the columns
for i, values in enumerate(columns):
# Create the histogram
plt.hist(values, bins=50)
# Color the bars according to the specified rules
for patch in plt.gca().patches:
if patch.get_height() < 4:
elif patch.get_height() < 12:
elif patch.get_height() < 26:
elif patch.get_height() < 51:
plt.title(f'Column {i+1}')
# Save the histogram to a file
# plt.savefig(f'histogram_{i+1}.png')
Here is kind of what im trying to achiev but instead of a bar chart i want a histogramm.
from bisect import bisect
import matplotlib.pyplot as plt
import numpy as np
OM_VALUES = [4, 12, 26, 51]
OM_COLORS = ["yellow", "orange", "blue", "green", "red"]
data = [4, 6, 7, 7, 11, 16, 23, 30, 30, 27, 1, 3, 4, 33, 37, 39, 45, 51]
labels = range(len(data))#[0,1,2,3,4,5,6,7,8,9,10,11]
plt.barh(labels, data,height=1.0, color=[OENORM_COLORS[bisect(OENORM_VALUES, v)] for v in data])
plt.title('Counts per depth')
this is my second try for the same question and I really hope that someone may help me...
Even thought some really nice people tried to help me. There is a lot I couldn't figure out, despite there help.
From the beginning:
I created a dataframe. This dataframe is huge and gives information about travellers in a city. The dataframe looks like this. This is only the head.
In origin and destination you have the ids of the citylocations, in move how many travelled from origin to destination. longitude and latitude is where the exact point is and the linestring the combination of the points..
I created the linestring with this code:
erg2['Linestring'] = erg2.apply(lambda x: LineString([(x['latitude_origin'], x['longitude_origin']), (x['latitude_destination'], x['longitude_destination'])]), axis = 1)
Now my question is how to plot the ways over a map. Even thought I tried all th eexamples from the geopandas documentary etc. I cant help myself..
I cant show you what I already plotted because it doesnt make sense and I guess it would be smarter to start plotting from the beginning.
You see that in the column move there are some 0. This means that no one travelled this route. So this I dont need to plot..
I have to plot the lines with the information where the traveller started origin and where he went destination.
also I need to outline the different lines depending on movements..
with this plotting code
fig = px.line_mapbox(erg2, lat="latitude_origin", lon="longitude_origin", color="move",
hover_name= gdf["origin"] + " - " + gdf["destination"],
center =dict(lon=13.41053,lat=52.52437), zoom=3, height=600
fig.update_layout(mapbox_style="stamen-terrain", mapbox_zoom=4, mapbox_center_lat = 52.52437,
Maybe someone has an idea???
I tried it with thios code:
import requests, io, json
import geopandas as gpd
import shapely.geometry
import pandas as pd
import numpy as np
import itertools
import plotly.express as px
# get some public addressess - hospitals. data that has GPS lat / lon
dfhos = pd.read_csv(io.StringIO(requests.get("http://media.nhschoices.nhs.uk/data/foi/Hospital.csv").text),
sep="¬",engine="python",).loc[:, ["OrganisationName", "Latitude", "Longitude"]]
a = np.arange(len(dfhos))
# establish N links between hospitals
N = 10
df = (
pd.DataFrame({0:a[0:N], 1:a[25:25+N]}).merge(dfhos,left_on=0,right_index=True)
.merge(dfhos,left_on=1, right_index=True, suffixes=("_origin", "_destination"))
# build a geopandas data frame that has LineString between two hospitals
gdf = gpd.GeoDataFrame(
lambda r: shapely.geometry.LineString(
[(r["Longitude_origin"], r["Latitude_origin"]),
(r["Longitude_destination"], r["Latitude_destination"]) ]), axis=1)
# sample code https://plotly.com/python/lines-on-mapbox/#lines-on-mapbox-maps-from-geopandas
lats = []
lons = []
names = []
for feature, name in zip(gdf.geometry, gdf["OrganisationName_origin"] + " - " + gdf["OrganisationName_destination"]):
if isinstance(feature, shapely.geometry.linestring.LineString):
linestrings = [feature]
elif isinstance(feature, shapely.geometry.multilinestring.MultiLineString):
linestrings = feature.geoms
for linestring in linestrings:
x, y = linestring.xy
lats = np.append(lats, y)
lons = np.append(lons, x)
names = np.append(names, [name]*len(y))
lats = np.append(lats, None)
lons = np.append(lons, None)
names = np.append(names, None)
fig = px.line_mapbox(lat=lats, lon=lons, hover_name=names)
which looks like the perfect code but I cant really use it for my data..
I am very new to coding. So please be patient a bit;))
Thanks a lot in advance.
All the best
previously answered this question How to plot visualize a Linestring over a map with Python?. I suggested that you update that question, I still recommend that you do
line strings IMHO are not the way to go. plotly does not use line strings, so it's extra complexity to encode to line strings to decode to numpy arrays. check out the examples on official documentation https://plotly.com/python/lines-on-mapbox/. here it is very clear geopandas is just a source that has to be encoded into numpy arrays
your sample data it appears should be one Dataframe and has no need for geopandas or line strings
almost all of your sample data is unusable as every row where origin and destination are different have move of zero which you note should be excluded
import pandas as pd
import numpy as np
import plotly.express as px
df = pd.DataFrame({"origin": [88, 88, 88, 88, 88, 87],
"destination": [88, 89, 110, 111, 112, 83],
"move": [20, 0, 5, 0, 0, 10],
"longitude_origin": [13.481016, 13.481016, 13.481016, 13.481016, 13.481016, 13.479667],
"latitude_origin": [52.457055, 52.457055, 52.457055, 52.457055, 52.457055, 52.4796],
"longitude_destination": [13.481016, 13.504075, 13.613772, 13.586891, 13.559341, 13.481016],
"latitude_destination": [52.457055, 52.443923, 52.533194, 52.523562, 52.507418, 52.457055]})
have further refined line_array() function so it can be used to encode hover and color parameters from simplified solution I previously provided
# lines in plotly are delimited by none
def line_array(data, cols=[], empty_val=None):
if isinstance(data, pd.DataFrame):
vals = data.loc[:, cols].values
elif isinstance(data, pd.Series):
a = data.values
vals = np.pad(a.reshape(a.shape[0], -1), [(0, 0), (0, 1)], mode="edge")
return np.pad(vals, [(0, 0), (0, 1)], constant_values=empty_val).reshape(
1, (len(df) * 3))[0]
# only draw lines where move > 0 and destination is different to origin
df = df.loc[df["move"].gt(0) & (df["origin"]!=df["destination"])]
lons = line_array(df, ["longitude_origin", "longitude_destination"])
lats = line_array(df, ["latitude_origin", "latitude_destination"])
fig = px.line_mapbox(
df.loc[:, ["origin", "destination"]].astype(str).apply(" - ".join, axis=1)
"move": line_array(df, ["move", "move"], empty_val=-99),
"origin": line_array(df, ["origin", "origin"], empty_val=-99),
color=line_array(df, ["origin", "origin"], empty_val=-99),
).update_traces(visible=False, selector={"name": "-99"})
"style": "stamen-terrain",
"zoom": 9.5,
"center": {"lat": lats[0], "lon": lons[0]},
margin={"r": 0, "t": 0, "l": 0, "b": 0},
I am trying to use a custom colormap to display a ConfusionMatrixDisplay object to have a finer range between 0 and 50 than between 50 and 100 using this answer.
from sklearn.datasets import make_classification
from sklearn.metrics import confusion_matrix, ConfusionMatrixDisplay
from sklearn.model_selection import train_test_split
from sklearn.svm import SVC
from matplotlib.colors import LinearSegmentedColormap
import matplotlib.pyplot as plt
import numpy as np
plt.rcParams["figure.figsize"] = (15, 15)
font = {'family' : 'DejaVu Sans',
'weight' : 'bold',
'size' : 22}
plt.rc('font', **font)
class nlcmap(LinearSegmentedColormap):
def __init__(self, cmap, levels):
self.cmap = cmap
self.N = cmap.N
self.monochrome = self.cmap.monochrome
self.levels = np.asarray(levels, dtype='float64')
self._x = self.levels
self.levmax = self.levels.max()
self.transformed_levels = np.linspace(0.0, self.levmax, len(self.levels))
def __call__(self, xi, alpha=1.0, **kw):
yi = np.interp(xi, self._x, self.transformed_levels)
return self.cmap(yi / self.levmax, alpha)
levels = [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 100]
cmap_nonlin = nlcmap(plt.cm.viridis, levels)
X, y = make_classification(random_state=0)
X_train, X_test, y_train, y_test = train_test_split(X, y,
clf = SVC(random_state=0)
clf.fit(X_train, y_train)
predictions = clf.predict(X_test)
cm = confusion_matrix(y_test, predictions, labels=clf.classes_)
disp = ConfusionMatrixDisplay(confusion_matrix=cm,
lin_cmap = plt.cm.viridis
levels = [0, 5, 10, 15, 20, 25, 30, 35, 40, 45, 50, 100]
cmap_nonlin = nlcmap(plt.cm.viridis, levels)
fig, ax = plt.subplots()
im = disp.plot(cmap=cmap_nonlin, colorbar=False)
disp.ax_.get_images()[0].set_clim(0, 100)
disp.figure_.colorbar(disp.im_, orientation="horizontal", pad=0.1)
Produces the following error:
It seems the error is related to imshow in conjunction with custom colormap since I can reproduce without sklearn with:
fig, ax = plt.subplots()
ax.imshow(np.array([[10, 15], [20, 30]]), cmap=cmap_nonlin)
Any idea ? I wish to modify the colormap not the data itself if possible.
According to matplotlib's doc on LinearSegmentedColormaps one can do the following to vary the contrast between segments with fast varying segment and slow varying segments.
In this case to answer my question let's have a finer range between 0 and 50 than between 50 and 100 but my solution can be extended to an arbitrary number of different paced segments by changing the levels:
import matplotlib.pyplot as plt
import numpy as np
import matplotlib.colors as colors
# A dict with {percentage_of_max_value: percentage_of_variation}. The keys are thus all < 1. and should be in ascending order alongside associated values in the colormap (also ordered and < 1.).
# In this example we have 90% of the variation of the colormap in its first half (until 0.5) and the remaining 10% in its right half
levels = {0.5: 0.9}
# We are not limited to one segment and we can provide for instance the following dict
# levels = {0.4:0.8, 0.5:0.9} to have 80% of variations between 0 and 40% of the colormap max then 10% between 40 and 50% and then the remaining 10% for the rest
cdict = {"red": None, "green": None, "blue": None}
num_values_per_segment = 50
for k, v in cdict.items():
cdict[k] = []
# We start the first segment by 0. both for value and cmap_value
left_val = 0.
left_cmap_val = 0.
for val, cmap_val in levels.items():
values = np.linspace(left_val, val, num_values_per_segment).tolist()
dynamic_range = np.linspace(left_cmap_val, cmap_val, num_values_per_segment).tolist()
for i, (v, r) in enumerate(zip(values, dynamic_range)):
cdict[k].append((v, r, r))
left_val = val
left_cmap_val = cmap_val
# Last segment towards 1.
values = np.linspace(val, 1., num_values_per_segment).tolist()
dynamic_range = np.linspace(cmap_val, 1., num_values_per_segment).tolist()
for i, (v, r) in enumerate(zip(values, dynamic_range)):
cdict[k].append((v, r, r))
# Mapping levels to colormap
cmap = plt.cm.viridis
for k, v in cdict.items():
if k == "red":
for i in range(len(v)):
cdict[k][i] = (v[i][0], cmap(v[i][1])[0], cmap(v[i][2])[0])
elif k == "green":
for j in range(len(v)):
cdict[k][j] = (v[j][0], cmap(v[j][1])[1], cmap(v[j][2])[1])
elif k == "blue":
for l in range(len(v)):
cdict[k][l] = (v[l][0], cmap(v[l][1])[2], cmap(v[l][2])[2])
raise ValueError("Color not recognized")
cdict[k] = tuple(cdict[k])
cmap_nonlin = colors.LinearSegmentedColormap('MyCustomCMap', cdict)
fig, ax = plt.subplots()
my_image = np.array([[30, 45], [25, 10]])
confusion = ax.imshow(my_image, cmap=cmap_nonlin, vmin=0, vmax=100)
plt.colorbar(confusion, ax=ax)
And the resulting cmap_nonlin object can be used in conjunction with imshow without any issue:
I'm currently trying to build an N-body simulation but I'm having a little trouble with plotting the results the way I'd like.
In the code below (with some example data for a few points in an orbit) I'm importing the position and time data and organizing it into a pandas dataframe. To create the 3D animation I use matplotlib's animation class, which works perfectly.
However, the usual way to set up an animation is limited in that you can't customize the points in each frame individually (please let me know if I'm wrong here :p). Since my animation is showing orbiting bodies I would like to vary their sizes and colors. To do that I essentially create a graph for each body and set it's color etc. When it gets to the update_graph function, I iterate over the n bodies, retrieve their individual (x,y,z) coordinates, and update their graphs.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d.axes3d import get_test_data
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.animation as animation
import pandas as pd
nbodies = 2
x = np.array([[1.50000000e-10, 0.00000000e+00, 0.00000000e+00],
[9.99950000e-01, 1.00000000e-02, 0.00000000e+00],
[4.28093585e-06, 3.22964816e-06, 0.00000000e+00],
[-4.16142210e-01, 9.09335149e-01, 0.00000000e+00],
[5.10376489e-06, 1.42204430e-05, 0.00000000e+00],
[-6.53770813e-01, -7.56722445e-01, 0.00000000e+00]])
t = np.array([0.01, 0.01, 2.0, 2.0, 4.0, 4.0])
tt = np.array([0.01, 2.0, 4.0])
x = x.reshape((len(tt), nbodies, 3))
x_coords = x[:, :, 0].flatten()
y_coords = x[:, :, 1].flatten()
z_coords = x[:, :, 2].flatten()
df = pd.DataFrame({"time": t[:] ,"x" : x_coords, "y" : y_coords, "z" : z_coords})
def update_graph(num):
data=df[df['time']==tt[num]] # x,y,z of all bodies at current time
for n in range(nbodies): # update graphs
data_n = data[data['x']==x_coords[int(num * nbodies) + n]] # x,y,z of body n
graph = graphs[n]
graph.set_data(data_n.x, data_n.y)
graphs[n] = graph
return graphs
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.set_xlabel('x (AU)')
ax.set_ylabel('y (AU)')
ax.set_zlabel('z (AU)')
# initialize
ms_list = [5, 1]
c_list = ['yellow', 'blue']
graphs = []
for n in range(nbodies):
graphs.append(ax.plot([], [], [], linestyle="", marker=".",
markersize=ms_list[n], color=c_list[n])[0])
ani = animation.FuncAnimation(fig, update_graph, len(tt),
interval=400, blit=True, repeat=True)
However, doing this gives me the following error:
I'm not sure what this really means, but I do know the problem is something to do with updating the graphs with only one row of coordinates rather than all three. Because if I instead have
def update_graph(num):
data=df[df['time']==tt[num]] # x,y,z of all bodies at current time
for n in range(nbodies): # update graphs
#data_n = data[data['x']==x_coords[int(num * nbodies) + n]] # x,y,z of body n
graph = graphs[n]
graph.set_data(data.x, data.y) # using data rather than data_n here now
graphs[n] = graph
return graphs
it actually works, and plots three copies of the bodies with varying colors and sizes on top of each other as you would expect.
Any help would be much appreciated. Thanks!
I don't understand why you are going through a pandas DataFrame, when you seem to already have all the data you need in your numpy array. I couldn't reproduce the initial problem, by I propose this solution that uses pure numpy arrays, which may fix the problem:
import numpy as np
import matplotlib.pyplot as plt
from matplotlib import cm
from mpl_toolkits.mplot3d.axes3d import get_test_data
from mpl_toolkits.mplot3d import Axes3D
import matplotlib.animation as animation
import pandas as pd
nbodies = 2
x = np.array([[1.50000000e-10, 0.00000000e+00, 0.00000000e+00],
[9.99950000e-01, 1.00000000e-02, 0.00000000e+00],
[4.28093585e-06, 3.22964816e-06, 0.00000000e+00],
[-4.16142210e-01, 9.09335149e-01, 0.00000000e+00],
[5.10376489e-06, 1.42204430e-05, 0.00000000e+00],
[-6.53770813e-01, -7.56722445e-01, 0.00000000e+00]])
t = np.array([0.01, 0.01, 2.0, 2.0, 4.0, 4.0])
tt = np.array([0.01, 2.0, 4.0])
x = x.reshape((len(tt), nbodies, 3))
def update_graph(i):
data = x[i, :, :] # x,y,z of all bodies at current time
for body, graph in zip(data, graphs): # update graphs
graph.set_data(body[0], body[1])
return graphs
fig = plt.figure()
ax = fig.add_subplot(111, projection='3d')
ax.set_xlabel('x (AU)')
ax.set_ylabel('y (AU)')
ax.set_zlabel('z (AU)')
plt.xlim(-1.5, 1.5)
plt.ylim(-1.5, 1.5)
# initialize
ms_list = [50, 10]
c_list = ['yellow', 'blue']
graphs = []
for n in range(nbodies):
graphs.append(ax.plot([], [], [], linestyle="", marker=".",
markersize=ms_list[n], color=c_list[n])[0])
ani = animation.FuncAnimation(fig, func=update_graph, frames=len(tt),
interval=400, blit=True, repeat=True)
I am fitting a parametric spline curve(t) from a bunch of (x, y) sampling points. How do I compute the intersection point with a line given by slope and one point? In my special case the spline intersects with the line once or not at all but never multiple times.
Here's the code for spline & line...
import matplotlib.pyplot as plt
import numpy as np
from scipy import interpolate
# Fit spline from points
x = np.array([152, 200, 255, 306, 356, 407, 457, 507, 561, 611, 661, 711, 761, 811, 861])
y = np.array([225, 227, 229, 229, 228, 226, 224, 222, 218, 215, 213, 212, 212, 215, 224])
tck, u = interpolate.splprep((x, y), k=3, s=1)
# Plot it...
u = np.linspace(0, 1, 100)
xy = np.asarray(interpolate.splev(u, tck, der=0))
# Line defined by slope and (x, y) point
m = 3
(x, y) = (500, 100)
# Plot it...
x_vals = np.array([400, 700])
y_vals = m * (x_vals - x) + y
plt.plot(x_vals, y_vals)
... which looks like this:
Add the following lines
from scipy.interpolate import interp1d
spline = interp1d(xy[0], xy[1]) # define function based on spline data points
line = interp1d(x_vals, y_vals) # define function based on line data points
import scipy.optimize as spopt
f = lambda x: spline(x) - line(x) # difference function, its zero marks the intersection
r = spopt.bisect(f, a = max(xy[0][0], x_vals[0]), b = min(xy[0][-1], x_vals[-1])) # find root via bisection
plt.scatter(r, spline(r))
print(r, spline(r))
First define functions for your spline and line based on its data. The root of the difference function f marks your intersection. Since there is exactly one, bisection works nicely to find it.
It would probably be more accurate to somehow re-use splev to define the function for the spline, but I'll leave that one to you.