color range in LineCollection - matplotlib

I'm overplotting multicolored lines on an image, the color of the lines is supposed to represent a given parameter that varies between roughtly -1 and 3.
The following portion of code is the one that builds these lines :
x = self._tprun.r[0,p,::100] # x coordinate
y = self._tprun.r[1,p,::100] # y coordinate
points = np.array([x, y]).T.reshape(-1, 1, 2)
segments = np.concatenate([points[:-1], points[1:]], axis=1)
# 'color' is the parameter that will color the line
vmin = self._color[p,:].min()
vmax = self._color[p,:].max()
lc = LineCollection(segments,
cmap=plt.get_cmap('jet'),
norm=plt.Normalize(vmin=vmin,vmax=vmax))
lc.set_array(self._color[p,:])
lc.set_linewidth(1)
self._ax.add_collection(lc)
This code is inside a loop on 'p' and so it will create several lines at locations given by the arrays 'x' and 'y' and for which the color should be given by the value of 'self._color[p,:]'.
As I said, '_color[p,:]' roughly varies between -1 and 3. Here is an example of what '_color[p,:]' may be :
My problem is that the lines that are created appear without much variation of the color, they all look kind of monochrome dark blue whereas _color[p,:] varies much and I ask for the normalization to take its min/max values.
here is an example of such a line (look at the oscillating dark blue line, other black lines are a contour of another value) :
Is there something I'm missing in the way these functions work?

Got it!
Answer to the question is here :
x = self._tprun.r[0,p,::100] # re-sample every 100 values !!
y = self._tprun.r[1,p,::100] #
# [...]
#lc.set_array(self._color[p,:]) # self._color[p,:] is not resampled
lc.set_array(self._color[p,::100]) # this works because resampled
meaning that the 'color' array was actually much larger than the arrays used for position of the line segments.... only the first values of '_color' where used where its values do not vary that much.

Related

Can I use Lattice auto.key or key to make a legend with points for some data and lines for others?

I often make figures that have observed data represented as points and model-predicted data represented as lines, using distribute.type to assign graph types. Is there a way to make a legend that only shows points for the points data, and lines for the lines data? The auto.key default is points, and if I add lines with "list(lines=TRUE)" the legend shows both points and lines for every data label:
x <- seq(0, 8*pi, by=pi/6)
Y1pred <- sin(x)
Y1obs <- Y1pred + rnorm(length(x), mean=0, sd=0.2)
Y2pred <- cos(x)
Y2obs <- Y2pred + rnorm(length(x), mean=0, sd=0.4)
xyplot(Y1obs + Y2obs + Y1pred + Y2pred ~ x,
type=c('p','p','l','l'),
distribute.type=TRUE,
auto.key=list(lines=TRUE, columns=2)
)
There is a rather complicated example using 'key' on p. 158 of Deepayans' book on Lattice, I am wondering if there are simple options?
Yes, following S, the lines components of key supports different type-s (but not points). Using auto.key, you could do
xyplot(Y1obs + Y2obs + Y1pred + Y2pred ~ x,
type=c('p','p','l','l'),
distribute.type=TRUE,
auto.key = list(points = FALSE, lines = TRUE,
columns = 2,
type = c('p','p','l','l')))
Ideally, you would want to put the type only inside the lines component, and that's how you should do it if you use key. For auto.key, there can be only one line anyway, so this should be fine.

how to get better Kriging result graphs in openturns?

I performed spherical Kriging, but I can't seem to get good output graphs.
The coordinates(x, and y) range from around around 51 latitude and around 6.5 as longitude
my observations range from -70 to +10
here is my code :
import openturns as ot
import pandas as pd
# your input / output data can be easily formatted as samples for openturns
df = pd.read_csv("kreuzkerpenutm.csv")
inputdata = ot.Sample(df[['x','y']].values)
outputdata = ot.Sample(df[['z']].values)
dimension = 2 # dimension of your input (x,y)
basis = ot.ConstantBasisFactory(dimension).build()
covarianceModel = ot.SphericalModel(dimension)
algo = ot.KrigingAlgorithm(inputdata, outputdata, covarianceModel, basis)
algo.run()
result = algo.getResult()
metamodel = result.getMetaModel()
lower = [-10.0] * 2 # lower bound of the 2D window
upper = [50.0] * 2 # upper bound of the 2D window
graph = metamodel.draw(lower, upper)
graph.setBoundingBox(ot.Interval(lower, upper))
graph.add(ot.Cloud(inputdata)) # overlay a scatter plot of the observation points
graph.setTitle("Kriging metamodel")
# A View object allows us to interact with the underlying matplotlib figure
from openturns.viewer import View
view = View(graph, legend_kw={'bbox_to_anchor':(1,1), 'loc':"upper left"})
view.getFigure().tight_layout()
here is my output:
kriging metamodel graph
I don't know why my graph won't show me my inputs aswell as my kriging results.
thanks for ideas and help
If the input data is not scaled in [-1,1]^d, the kriging metamodel may have issues to identify the scale parameters using maximum likelihood optimization. In order to help for this, we may:
provide a better starting point for the scale parameters of the covariance model (this is trick "A" below),
set the bounds of the optimization algorithm so that the interval where the parameters are searched for correspond to the data at hand (this is trick "B" below).
This is what the following script does, using simulated data instead of a csv data file. In the script, I create the data using a g function which is scaled so that it produces results in the [-10, 70] range, as in your problem. Please look carefuly at the setScale() method which sets the initial value of the covariance model: this is the starting point of the optimization algorithm. Then look at the setOptimizationBounds() method, which sets the bounds of the optimization algorithm.
import openturns as ot
dimension = 2 # dimension of your input (x,y)
distribution = ot.ComposedDistribution([ot.Uniform(-10.0, 50.0)] * dimension)
inputdata = distribution.getSample(100)
g = ot.SymbolicFunction(["x", "y"], ["30 + 3.0 * sin(x / 10.0) * (y / 10.0) ^ 2"])
outputdata = g(inputdata)
basis = ot.ConstantBasisFactory(dimension).build()
covarianceModel = ot.SphericalModel(dimension)
covarianceModel.setScale(inputdata.getMax()) # Trick A
algo = ot.KrigingAlgorithm(inputdata, outputdata, covarianceModel, basis)
# Trick B, v2
x_range = inputdata.getMax() - inputdata.getMin()
scale_max_factor = 2.0 # Must be > 1, tune this to match your problem
scale_min_factor = 0.1 # Must be < 1, tune this to match your problem
maximum_scale_bounds = scale_max_factor * x_range
minimum_scale_bounds = scale_min_factor * x_range
scaleOptimizationBounds = ot.Interval(minimum_scale_bounds, maximum_scale_bounds)
algo.setOptimizationBounds(scaleOptimizationBounds)
algo.run()
result = algo.getResult()
metamodel = result.getMetaModel()
metamodel.setInputDescription(["x", "y"])
metamodel.setOutputDescription(["z"])
lower = [-10.0] * 2 # lower bound of the 2D window
upper = [50.0] * 2 # upper bound of the 2D window
graph = metamodel.draw(lower, upper)
graph.setBoundingBox(ot.Interval(lower, upper))
graph.add(ot.Cloud(inputdata)) # overlay a scatter plot of the observation points
graph.setTitle("Kriging metamodel")
# A View object allows us to interact with the underlying matplotlib figure
from openturns.viewer import View
view = View(graph, legend_kw={"bbox_to_anchor": (1, 1), "loc": "upper left"})
view.getFigure().tight_layout()
The previous script produces the following figure.
There are other ways to implement trick B. Here is one provided by J.Pelamatti:
# Trick B, v3
for d in range(X_train.getDimension()):
dist = scipy.spatial.distance.pdist(X_train[:,d])
scale_max_factor = 2.0 # Must be > 1, tune this to match your problem
scale_min_factor = 0.1 # Must be < 1, tune this to match your problem
maximum_scale_bounds = scale_max_factor * np.max(dist)
minimum_scale_bounds = scale_min_factor * np.min(dist)
This topic is discussed in this particular thread in OT's forum.
Sorry for the late answer.
Which version of openturns are you using?
Probably you have an embedded transformation of (input) data, which makes the data range between (-3, 3) approximately (standard scaling). The kriging result should contains the transformation in such a case.
With more recent openturns implementations, this feature has been removed.
Hope this can help.
Cheers

Plotting an exponential function given one parameter

I'm fairly new to python so bare with me. I have plotted a histogram using some generated data. This data has many many points. I have defined it with the variable vals. I have then plotted a histogram with these values, though I have limited it so that only values between 104 and 155 are taken into account. This has been done as follows:
bin_heights, bin_edges = np.histogram(vals, range=[104, 155], bins=30)
bin_centres = (bin_edges[:-1] + bin_edges[1:])/2.
plt.errorbar(bin_centres, bin_heights, np.sqrt(bin_heights), fmt=',', capsize=2)
plt.xlabel("$m_{\gamma\gamma} (GeV)$")
plt.ylabel("Number of entries")
plt.show()
Giving the above plot:
My next step is to take into account values from vals which are less than 120. I have done this as follows:
background_data=[j for j in vals if j <= 120] #to avoid taking the signal bump, upper limit of 120 MeV set
I need to plot a curve on the same plot as the histogram, which follows the form B(x) = Ae^(-x/λ)
I then estimated a value of λ using the maximum likelihood estimator formula:
background_data=[j for j in vals if j <= 120] #to avoid taking the signal bump, upper limit of 120 MeV set
#print(background_data)
N_background=len(background_data)
print(N_background)
sigma_background_data=sum(background_data)
print(sigma_background_data)
lamb = (sigma_background_data)/(N_background) #maximum likelihood estimator for lambda
print('lambda estimate is', lamb)
where lamb = λ. I got a value of roughly lamb = 27.75, which I know is correct. I now need to get an estimate for A.
I have been advised to do this as follows:
Given a value of λ, find A by scaling the PDF to the data such that the area beneath
the scaled PDF has equal area to the data
I'm not quite sure what this means, or how I'd go about trying to do this. PDF means probability density function. I assume an integration will have to take place, so to get the area under the data (vals), I have done this:
data_area= integrate.cumtrapz(background_data, x=None, dx=1.0)
print(data_area)
plt.plot(background_data, data_area)
However, this gives me an error
ValueError: x and y must have same first dimension, but have shapes (981555,) and (981554,)
I'm not sure how to fix it. The end result should be something like:
See the cumtrapz docs:
Returns: ... If initial is None, the shape is such that the axis of integration has one less value than y. If initial is given, the shape is equal to that of y.
So you are either to pass an initial value like
data_area = integrate.cumtrapz(background_data, x=None, dx=1.0, initial = 0.0)
or discard the first value of the background_data:
plt.plot(background_data[1:], data_area)

Find or calculate intersection points of a straight line with a diagonal scatter plot using VBA

I am trying to understand how I can go about finding or calculating the intersection points of a straight line and a diagonal scatter plot. Just to give a better idea, on an X,Y plot, if I have a straight horizontal line at y= # (any number), that crosses an array of scatters points (which form a diagonal line), how can I calculate points of intersection the two lines?
The problem that I am having is that the scattered array has multiple points around my horizontal line, what I would like to do is find the point that hits the horizontal line first, and the point that hits the horizontal line the last.
please refer to the image for a better understanding. The two points that are annotated are the ones that I am trying to extract with VBA. Is this possible? The image shows two sets of scattered arrays, I am only interested in figuring out the method for 1 of the arrays. If I can extract this for 1 scattered array, I can replicate the method for the next one.
http://imgur.com/9YTNeco
It's hard to give you any specifics without knowing the structure of your Data. But this is the approach I'd use.
I'll assume your data looks like this (for both of the plots)
A B
x1 y1
x2 y2
x3 y3
Loop through the axis like so:
'the y values need to be as high as the black axis you've got there
'I'll assume that's zero
i = 0
k = .Cells(1,1)
'we begin at the first x-value in your column
for i = 0 to Worksheets("Sheet name").UsedRange.Rows.Count
'now we are looking for the lowest value of x, k will be this value
if .Cells(i,1) < k Then
if .cells(i,2) = 0 Then '0 = y-value of the "black" axis
k = .Cells(i,1)
End If
End If
'every time we find a lower value than our existing k
'we will assign it to k
Next
The lowest value will be your "low limit"-point.
You can use that same kind of algorithm for the highest value of the same scatter plot (just change the "<" to ">" or the lowest and highest value for the one, just change the Column ID.
HTH

How to draw an alternating line with pyplot?

I have a series of x coordinates (e.g.: 1,2,3,4) and y coordinates (e.g.: 10,20,30,40). I would like pyplot to draw a line between two consecutive points, while skipping every other line (e.g.: draw a line between (1,10) and (2,20), and a line between (3,30) and (4,40).)
How can this be done?
Do you mean something like this?
x = [1,2,3,4,5,6]
y = [10,20,30,40,50,60]
for n in np.arange(0,len(x),2):
plt.plot(x[n:n+2],y[n:n+2])
(Copied from #Floris' comment above)
The quick and dirty trick would be to insert NaN values in the arrays at every third position (both X and Y).