Which dx do I choose for np.gradient argument? - numpy

I wont be to specific but I have a graph E vs T ( T being the independent quantity)
I want the derivative of E with respect to T. I am unsure what dx spacing I should choose?
Details:
T = 10**(np.arange(-1,1.5,0.05)) (I.e the spacing is not equal)
E is a function of T.
Questions:
Which spacing do I use?
My thoughts:
I think I take the spacing of T i.e np.gradient(Energy, dx = T) ??

For non-uniform spacing, pass in an array of positional values (not differences) which gradient will to use to calculate dx for each point. That is, pass in the array of absolute positions, not differences. So in your case, just pass in T.
Here's an example, as a test, where the blue is the curve and red is the calculated gradients.
import numpy as np
import matplotlib.pyplot as plt
T = 10**(np.arange(-1,1.5,0.05))
E = T**2
gradients = np.gradient(E, T)
plt.plot(T, E, '-o') # plot the curve
for i, g in enumerate(gradients): # plot the gradients at each point
plt.plot([T[i], T[i]+1], [E[i], E[i]+g], 'r')
Here's the line from the docs that's of interest:
N arrays to specify the coordinates of the values along each dimension
of F. The length of the array must match the size of the corresponding
dimension

Related

Adding text using matplotlib

I am trying to build a graph using matplotlib, and I am having trouble placing descriptive text on the graph itself.
My y values range from .9 to 1.65, and x ranges from the dates 2001 to 2021 and are sourced from a datetime series.
Here are the basics of what I am working with:
fig, ax = plt.subplots(figsize=(10,7))
I know that I have to use ax.text() to place any text, but whenever I try to enter basically any values for the x and y coordinates of the text, the entire graph disappears when I re-run the cell. I have plotted the following line, but if I use the same coordinates in ax.text(), I get the output I just described. Why might this be happening?
plt.axhline(y=1.19, xmin=.032, xmax=.96)
By default, the y argument in the axhline method is in data coordinates, while the xmin and xmax arguments are in axis coordinates, with 0 corresponding to the far left of the plot, and 1 corresponding to the far right of the plot. See the axhline documentation for more information.
On the other hand, both the x and y arguments used in the text method are in data coordinates, so you position text relative to the data. However, you can change this to axis coordinates using the transform parameter. By setting this to ax.transAxes, you actually indicate that the x and y arguments should be interpreted as axis coordinates, again with 0 being the far left (or bottom) of the plot, and 1 being the far right (or top) of the plot. In this case, you would use ax.text as follows:
ax.text(x, y, 'text', transform=ax.transAxes)
Again, see the text documentation for more information.
However, it sounds like you might want to combine data and axis coordinates to place your text, because you want to reuse the arguments from axhline for your text. In this case, you need to create a transform object that interprets the x coordinate as axis coordinate, and the y coordinate as data coordinate. This is also possible by creating a blended transformation. For example:
import matplotlib.transforms as transforms
# create your ax object here
trans = transforms.blended_transform_factory(x_transform=ax.transAxes, y_transform=ax.transData)
ax.text(x, y, 'text', transform=trans)
See the Blended transformations section of the transformations tutorial for more information.
In short, you can refer to the following figure to compare the results of these various transformations:
import matplotlib.pyplot as plt
import matplotlib.transforms as transforms
fig, ax = plt.subplots()
ax.set_xlim(0, 2)
ax.set_ylim(0, 2)
# note that the line is plotted at y=1.5, but between x=1.6 and x=1.8
# because xmin/xmax are in axis coordinates
ax.axhline(1.5, xmin=.8, xmax=.9)
# x and y are in data coordinates
ax.text(0.5, 0.5, 'A')
# here, x and y are in axis coordinates
ax.text(0.5, 0.5, 'B', transform=ax.transAxes)
trans = transforms.blended_transform_factory(x_transform=ax.transAxes, y_transform=ax.transData)
# here, x is in axis coordinates, but y is in data coordinates
ax.text(0.5, 0.5, 'C', transform=trans)

plotting graph of 3 parameters (PosX ,PosY) vs Time .It is a timeseries data

I am new to this module. I have time series data for movement of particle against time. The movement has its X and Y component against the the time T. I want to plot these 3 parameters in the graph. The sample data looks like this. The first coloumn represent time, 2nd- Xcordinate , 3rd Y-cordinate.
1.5193 618.3349 487.5595
1.5193 619.3349 487.5595
2.5193 619.8688 489.5869
2.5193 620.8688 489.5869
3.5193 622.9027 493.3156
3.5193 623.9027 493.3156
If you want to add a 3rd info to a 2D curve, one possibility is to use a color mapping instituting a relationship between the value of the 3rd coordinate and a set of colors.
In Matplotlib we have not a direct way of plotting a curve with changing color, but we can fake one using matplotlib.collections.LineCollection.
In the following I've used some arbitrary curve but I have no doubt that you could adjust my code to your particular use case if my code suits your needs.
import numpy as np
import matplotlib.pyplot as plt
from matplotlib.collections import LineCollection
# e.g., a Lissajous curve
t = np.linspace(0, 2*np.pi, 6280)
x, y = np.sin(4*t), np.sin(5*t)
# to use LineCollection we need an array of segments
# the canonical answer (to upvote...) is https://stackoverflow.com/a/58880037/2749397
points = np.array([x, y]).T.reshape(-1,1,2)
segments = np.concatenate([points[:-1],points[1:]], axis=1)
# instantiate the line collection with appropriate parameters,
# the associated array controls the color mapping, we set it to time
lc = LineCollection(segments, cmap='nipy_spectral', linewidth=6, alpha=0.85)
lc.set_array(t)
# usual stuff, just note ax.autoscale, not needed here because we
# replot the same data but tipically needed with ax.add_collection
fig, ax = plt.subplots()
plt.xlabel('x/mm') ; plt.ylabel('y/mm')
ax.add_collection(lc)
ax.autoscale()
cb = plt.colorbar(lc)
cb.set_label('t/s')
# we plot a thin line over the colormapped line collection, especially
# useful when our colormap contains white...
plt.plot(x, y, color='black', linewidth=0.5, zorder=3)
plt.show()

Estimation of t-distribution by mean of samples does not work

I am trying to create a t-distribution by taking the mean of many samples from a normal distribution (and then estimating the shape with kernel density estimation).
For some reason, I am getting pretty different results when I compare what I get with a proper t-distribution. I don't understand what is going wrong, so I think I am confused about something.
Here is the code:
import numpy as np
from scipy.stats import gaussian_kde
import matplotlib.pyplot as plt
import seaborn
inner_sample_size = 10
X = np.arange(-3, 3, 0.01)
results = [
np.mean(np.random.normal(size=inner_sample_size))
for _ in range(10000)
]
estimation = gaussian_kde(results)
plt.plot(X, estimation.evaluate(X))
t_samples = np.random.standard_t(inner_sample_size, 10000)
t_estimator = gaussian_kde(t_samples)
plt.plot(X, t_estimator.evaluate(X))
plt.ylabel("Probability density")
plt.show()
And here is the plot I get:
Where the orange line is numpy's own t-distribution, and the blue line is the one estimated by sampling.
Your assumption that the mean of Standard Normals has T distribution is incorrect. In fact, the mean of Standard Normals has Normal Distribution, which explains the shape of your blue graph. To generate one random variable T from a T distribution with k degrees of freedom, you first generate k+1 independent Standard Normals Z_i, i=0,...,k. You then compute
T = Z_0 / sqrt( sum(Z_i^2, i=1 to k)/k ).
The sum of squared Standard Normals sum(Z_i^2, i=1 to k) has Chi-Squared Distribution with k degrees of freedom, so if there is a pre-canned method to generate this, you should use it, since it's likely more efficient.

Locally weighted smoothing for binary valued random variable

I have a random variable as follows:
f(x) = 1 with probability g(x)
f(x) = 0 with probability 1-g(x)
where 0 < g(x) < 1.
Assume g(x) = x. Let's say I am observing this variable without knowing the function g and obtained 100 samples as follows:
import numpy as np
import matplotlib.pyplot as plt
from scipy.stats import binned_statistic
list = np.ndarray(shape=(200,2))
g = np.random.rand(200)
for i in range(len(g)):
list[i] = (g[i], np.random.choice([0, 1], p=[1-g[i], g[i]]))
print(list)
plt.plot(list[:,0], list[:,1], 'o')
Plot of 0s and 1s
Now, I would like to retrieve the function g from these points. The best I could think is to use draw a histogram and use the mean statistic:
bin_means, bin_edges, bin_number = binned_statistic(list[:,0], list[:,1], statistic='mean', bins=10)
plt.hlines(bin_means, bin_edges[:-1], bin_edges[1:], lw=2)
Histogram mean statistics
Instead, I would like to have a continuous estimation of the generating function.
I guess it is about kernel density estimation but I could not find the appropriate pointer.
straightforward without explicitly fitting an estimator:
import seaborn as sns
g = sns.lmplot(x= , y= , y_jitter=.02 , logistic=True)
plug in x= your exogenous variable and analogously y = dependent variable. y_jitter is jitter the point for better visibility if you have a lot of data points. logistic = True is the main point here. It will give you the logistic regression line of the data.
Seaborn is basically tailored around matplotlib and works great with pandas, in case you want to extend your data to a DataFrame.

Numpy: find mean coordinate of points along line

I have a bunch of points in a 2D space which all reside on a line (polygon). How can I compute the mean coordinate of these points on the line?
I don't mean the centroid of the points in the 2D space (as #rth initially proposed in his answer), but the mean location of the points along the line on which they reside. So basically, I could transform the line to a 1D axis, compute the mean location in 1D, and transform the location of the mean back into the 2D space.
Maybe these are exactly the necessary steps, but I think (or hope) that there is a function in numpy/scipy which allows me to do this in one step.
Edit: The approach you describe in the question is indeed probably the simplest way for solving this problem.
Here is an implementation that calculates the positions of vertices along the line in 1D, takes their mean, and finally calculates the corresponding 2D position with parametric interpolation,
import numpy as np
from scipy.interpolate import splprep, splev
vert = np.random.randn(1000, 2) # vertices definition here
# calculate the Euclidean distances between consecutive vertices
# equivalent to a for loop with
# dl[i] = ((vert[i+1, 0] - vert[i, 0])**2 + (vert[i+1,1] - vert[i,1])**2)**0.5
dl = (np.diff(vert, axis=0)**2).sum(axis=1)**0.5
# pad with 0, so dl.shape[0] == vert.shape[0] for convenience
dl = np.insert(dl, 0, 0.0)
l = np.cumsum(dl) # 1D coordinates along the line
l_mean = np.mean(l) # mean in the line coordinates
# calculate the coordinate of l_mean in 2D space
# with parametric B-spline interpolation
tck, _ = splprep(x=vert.T, u=l, k=3)
res = splev(l_mean, tck)
print(res)
Edit2: Assuming now that you have a high resolution set of points for your path vert_full and some approximate measurements vert_1, vert_2, etc, what you could do is the following.
Project each points of vert_1, etc. onto the exact path. Assuming that vert_full has much more datapoints than vert_1, we can simply look for the nearest neighbours of vert_1 in vert_full:
from scipy.spatial import cKDTree
tr = cKDTree(vert_full)
d, idx = tr.query(vert_1, k=1)
vert_1_proj = vert_full[idx] # this gives the projected corrdinates onto vert_full
# I have not actually run this, so it might require minor changes
Use the above mean calculation with the new vert_1_proj vector.
Meanwhile I've found the answer to my question, although using Shapely instead of Numpy.
from shapely.geometry import LineString, Point
# lists of points as (x,y) tuples
path_xy = [...]
points_xy = [...] # should be on or near path
path = LineString(path_xy) # create path object
pts = [Point(p) for p in points_xy] # create point objects
dist = [path.project(p) for p in pts] # distances along path
mean_dist = np.mean(dist) # mean distance along path
mean = path.interpolate(mean_dist) # mean point
mean_xy = (mean.x,mean.y)
This works perfectly!
(That's is also why I have to accept it as the answer, though I highly appreciate #rth's help!)