How do I fit a function to a candle plot TH2 in ROOT? - root-framework

I've collected test data for a system and would like to fit a line to the linear portion of the system.
Here's my current plot, which is very barebones:
This was generated by drawing a TH2D with the CANDLE option, which gives nice error bars. However, neither of the two obvious TH2 fit options, both of which take slices and fit to slices, seems to be able to fit a line to this candle plot. The only clear solution I see is to make an array of TH1Ds corresponding to my X axis, filling them, then extracting their means and standard deviations and using those in a TGraphErrors plot, which makes fitting easy.
Is there a way to fit a line directly in the candle plot? If not, is there a more elegant solution than the one that I outlined above?

Related

Halcon - Extract straight edge from XLD

I have a XLD edge, like the one in red in the sample picture below.
I need to extract start/endpoint of straight lines that reppresent it. Hough lines sort of work for this, but the results are not really replicable. minor changes in the contour produce unexpected results.
How can the contours be extracted as straight lines? (blue) with start and finish coordinates?
lines shorter than a specified length should not be counted as separate line.
Contour needs to be converted to a polygon using the following function:
gen_polygons_xld (Object, Polygons, 'ramer', 25.0)
The only adjustable parameter is the alpha (25.0) which decides the approximation threshold.

How to make gnuplot generate figures with smaller/fized size (Bytes)?

I would like to avoid using every command since it simply discards data that, however, might be very important (like a spike for instance). I would like also to avoid posterior downsizing since this might lead to the deterioration of the text on the figure...
Is there a manner/option to force gnuplot generating files (eps) with maximum size?
You'd need some adaptive compression on your data. Without actually knowing it, that's rather tough.
The stats command can tell you how many datapoints you actually have, and you can then adjust the every statement to a sensible value. Otherwise, you can use smooth to achieve a predefined (set sample) number of datapoints, or (if you have a sensible model for you data) you can do a fit and simply plot the fitted model function instead of you dataset.
If you specifically want outliers to show in the plot, this might be helpful:
fit f(x) data via *parameter*
plot f(x), data using ((abs($2-f($1)) > threshold) ? $2 : NaN)
It plots a fit to your dataset, and all actual datapoints that deviate from the fit by more than threshold.

Interpolating data onto a line of points

I have some irregularly spaced data and need to analyze it. I can successfully interpolate this data onto a regular grid using mlab.griddata (or rather, the natgrid implementation of it). This allows me to use pcolormesh and contour to generate plots, extract levels, etc. Using plot.contour, I then extract a certain level using get_paths from the contour CS.collections().
Now, what I'd like to do is then, with my original irregularly spaced data, interpolate some quantities onto this specific contour line (i.e., NOT onto a regular grid). The similarly named griddata function from Scipy allows for this behavior, and it almost works. However, I find that as I increase the number of original points, I can get odd erratic behavior in the interpolation. I'm wondering if there's a way around this, i.e., another way to interpolate irregularly spaced (or regularly spaced data for that matter, since I can use my regularly spaced data from mlab.griddata) onto a specific line.
Let me show some numerical examples of what I'm talking about. Take a look at this figure:
The top left shows my data as points, and the line shows an extracted level of level=0 from some data D that I have at those points (x,y) [note, I have data 'D', 'Energy', and 'Pressure', all defined in this (x,y) space]. Once I have this curve, I can plot the interpolated quantities of D, Energy, and Pressure onto my specific line. First, note the plot of D (middle, right). It should be zero at all points, but it's not quite zero at all points. The likely cause of this is that the line that corresponds to the 0 level is generated from a uniform set of points that came from mlab.griddata, whereas the plot of 'D' is generated from my ORIGINAL data interpolated onto that level curve. You can also see some unphysical wiggles in 'Energy' and 'Pressure'.
Okay, seems easy enough, right? Maybe I should just get more original data points along my level=0 curve. Getting some more of these points, I then generate the following plots:
First look at the top left. You can see that I've sampled the hell out of the (x,y) space in the vicinity of my level=0 curve. Furthermore, you can see that my new "D" plot (middle, right) now correctly interpolates to zero in the region that it originally didn't. But now I get some wiggles at the start of the curve, as well as getting some other wiggles in the 'Energy' and 'Pressure' in this space! It is far from obvious to me that this should occur, since my original data points are still there and I've only supplemented additional points. Furthermore, some regions where my interpolation is going bad aren't even near the points that I added in the second run -- they are exclusively neighbored by my original points.
So this brings me to my original question. I'm worried that the interpolation that produces the 'Energy', 'D', and 'Pressure' curves is not working correctly (this is scigrid's griddata). Mlab's griddata only interpolates to a regular grid, whereas I want to interpolate to this specific line shown in the top left plot. What's another way for me to do this?
Thanks for your time!
After posting this, I decided to try scipy.interpolate.SmoothBivariateSpline, which produced the following result:
You can now see that my line is smoothed, so it seems like this will work. I'll mark this as the answer unless someone posts something soon that hints that there may be an even better solution.
Edit: As requested, below is some of the code used to generate these plots. I don't have a minimally working example, and the above plots were generated in a larger framework of code, but I'll write the important parts schematically below with comments.
# x,y,z are lists of data where the first point is x[0],y[0],z[0], and so on
minx=min(x)
maxx=max(x)
miny=min(y)
maxy=max(y)
# convert to numpy arrays
x=np.array(x)
y=np.array(y)
z=np.array(z)
# here we are creating a fine grid to interpolate the data onto
xi=np.linspace(minx,maxx,100)
yi=np.linspace(miny,maxy,100)
# here we interpolate our data from the original x,y,z unstructured grid to the new
# fine, regular grid in xi,yi, returning the values zi
zi=griddata(x,y,z,xi,yi)
# now let's do some plotting
plt.figure()
# returns the CS contour object, from which we'll be able to get the path for the
# level=0 curve
CS=plt.contour(x,y,z,levels=[0])
# can plot the original data if we want
plt.scatter(x,y,alpha=0.5,marker='x')
# now let's get the level=0 curve
for c in CS.collections:
data=c.get_paths()[0].vertices
# lineX,lineY are simply the x,y coordinates for our level=0 curve, expressed as arrays
lineX=data[:,0]
lineY=data[:,1]
# so it's easy to plot this too
plt.plot(lineX,lineY)
# now what to do if we want to interpolate some other data we have, say z2
# (also at our original x,y positions), onto
# this level=0 curve?
# well, first I tried using scipy.interpolate.griddata == scigrid like so
origdata=np.transpose(np.vstack((x,y))) # just organizing this data like the
# scigrid routine expects
lineZ2=scigrid(origdata,z2,data,method='linear')
# plotting the above curve (as plt.plot(lineZ2)) gave me really bad results, so
# trying a spline approach
Z2spline=SmoothBivariateSpline(x,y,z2)
# the above creates a spline object on our original data. notice we haven't EVALUATED
# it anywhere yet (we'll want to evaluate it on our level curve)
Z2Line=[]
# here we evaluate the spline along all our points on the level curve, and store the
# result as a new list
for i in range(0,len(lineX)):
Z2Line.append(Z2spline(lineX[i],lineY[i])[0][0]) # the [0][0] is just to get the
# value, which is enclosed in
# some array structure for some
# reason otherwise
# you can then easily plot this
plt.plot(Z2Line)
Hope this helps someone!

Discrete Scatter Plot Visualization

This is a very special plotting request, but I have data I want to view in a very particular way. Here's the situation:
1) The data I have is binned into 25 bins, each bin contains a different number of data points. The larger the bin value, the smaller then number of data points it has within it, roughly speaking (This is just a result of the data processing which was done).
[9568, 10079, 10137, 10090, 10154, 10091, 10046, 10116, 9959, 9401, 7703, 5216, 3089, 1632, 854, 466, 221, 106, 63, 27, 12, 5, 1, 0]
2) I have access to the bin values.
[ 0.02648645 0.09996368 0.1734409 0.24691813 0.32039536 0.39387258
0.46734981 0.54082703 0.61430426 0.68778148 0.76125871 0.83473593
0.90821316 0.98169038 1.05516761 1.12864483 1.20212206 1.27559928
1.34907651 1.42255373 1.49603096 1.56950818 1.64298541 1.71646264]
I can easily produce an 'errorbar' type plot in matplotlib (the y-axis is scaled from radius to degrees below):
But, this is not particularly insightful for what I'd like to study. I'd really like to know if there are 'islands' of angle values within each bin, and to do this, I would need something like a scatterplot or an imshow/hexbin type plot, where the density of points can be represented by color (in the case of imshow/hexbin at least). The following is an example of what happens when represented by a regular scatterplot with the smallest marker size:
Would anybody know of a good way to generate this type of visualization?
EDIT: This may help clarify a couple of things. The following plot is a sample of what a histogram would look like for the first couple of bins. Data contained within bins seem to follow some sort of distribution (I mentioned 'islands' before, because I am not ruling out the possibility of multiple peaks in the distribution). I would like this distribution to be visualized for all bins simultaneously. In other words, is there a way to do a vertical temperature map for each bin and have them all shown on the same plot?
The violin plot mentioned in the comments was a nice solution to my problem. Here's where I found a python implementation of it - it would certainly be nice if this were included into matplotlib eventually. Overplotted is a box plot centered on the median value, and includes the 2nd and 3rd quartiles.

Easiest way to plot values as symbols in scatter plot?

In an answer to an earlier question of mine regarding fixing the colorspace for scatter images of 4D data, Tom10 suggested plotting values as symbols in order to double-check my data. An excellent idea. I've run some similar demos in the past, but I can't for the life of me find the demo I remember being quite simple.
So, what's the easiest way to plot numerical values as the symbol in a scatter plot instead of 'o' for example? Tom10 suggested plt.txt(x,y,value)- and that is the implementation used in a number of examples. I however wonder if there's an easy way to evaluate "value" from my array of numbers? Can one simply say: str(valuearray) ?
Do you need a loop to evaluate the values for plotting as suggested in the matplotlib demo section for 3D text scatter plots?
Their example produces:
(source: sourceforge.net)
However, they're doing something fairly complex in evaluating the locations as well as changing text direction based on data. So, is there a cute way to plot x,y,C data (where C is a value often taken as the color in the plot data- but instead I wish to make the symbol)?
Again, I think we have a fair answer to this- I just wonder if there's an easier way?
The easiest way I've seen to do this is:
for x, y, val in zip(x_array, y_array, val_array):
plt.text(x, y, val)
Also, btw, you suggested using str(valarray), and this, as you may have noticed doesn't work. To convert an array of numbers to a sequence of strings you could use
valarray.astype(str)
to get a numpy array, or,
[str(v) for v in valarray]
to get a Python list. But even with valarray as a proper sequence of strings, plt.text won't iterate over it's inputs.