How do I create a histogram where the bar heights cover a range of values (preferably in Open Office Calc )? - spreadsheet

I have a spreadsheet that contains data that ranges from 0.0 to 1.0, e.g.
a, 0.1
b, 0.11
c, 0.7
d, 0.12
...
I'd like a histogram where each bar covers a range of values, e.g. there would be a bar with a height of 3 for the range [0.1, 0.2). How do I do this in Open Office Calc? If it is hard to do, is there a commonly available tool that makes it easy? I'd prefer something that is available on both Linux and Windows.

So far, I've found two "solutions", both of which can do the job, but neither of which are ideal. However, they are both free and available for both Linux and Windows.
Ggobi provides a GUI that allows you to read in data from a CSV file and produce histograms. Unfortunately, the interface isn't that great, and it is hard to figure out how to manipulate the display. For example, by default, the histogram is "on its side", and thus far, I haven't figured out how to make the bars vertical rather than horizontal.
R provides a programming environment for statistics with some handy graphics packages. For example, you can create a histogram and put it into a PDF file with just a few lines of code:
result <- read.csv("myTable.csv")
str(result) # look at the structure of the resulting data frame
attach(result) # make the components of result available as objects
pdf("myTable.pdf")
hist(X.TCC)
plot(X.TCC, MWE, pch="*")
dev.off()
The drawback is that you need to learn something about the R environment.

Related

Multiple axis scale in Lets plot Kotlin

I'm learning some data science related topics and oh boy, this is a jungle of different libraries for everything 😅
Because of things, I went with Lets-plot, which has a nice Kotlin API that I'm using combined with Kotlin kernel for Jupyter notebooks
Overall, things are going pretty good. Most tutorials & docs I see online use different libraries for plotting (e.g. Seaborn, Matplotlib, Plotly) so most of the time I have to do some reading of the Lets-Plot-Kotlin reference and try/error until I find the equivalent code for my graphs
Currently, I'm trying to graph the distribution of differences between two values. Overall, this looks pretty good. I can just do something like
(letsPlot(df)
+ geomHistogram { x = "some-column" }
).show()
which gives a nice graph
It would be interesting to see the density estimator as well, geomDensity to the rescue!
(letsPlot(df)
+ geomDensity(color = "red") { x = "some-column" }
).show()
Nice! Now let's watch them both together
(letsPlot(df)
+ geomDensity(color = "red") { x = "some-column" }
+ geomHistogram() { x = "some-column" }
).show()
As you can see, there's a small red line in the bottom (the geomDensity!). Problem here (I would say) is that both layers are using the same Y scale. Histogram is working with 0-20 values and density with 0-0.02 so when plotted together it's just a line at the bottom
Is there any way to add several layers in the same plot that use their own scale? I've read some blogposts that claim that you should not go for it (seems to be pretty much accepted by the community.
My target is to achieve something similar to what you can do with Seaborn by doing
plt.figure(figsize=(10,4),dpi=200)
sns.histplot(data=df,x='some_column',kde=True,bins=25)
(yes I know I took the lets plot screenshot without the bins configured. Not relevant, I'd say ¯_(ツ)_/¯ )
Maybe I'm just approaching the problem with a mindset I should not? As mentioned, I'm still learning so every alternative will be highly welcomed 😃
Just, please, don't go with the "Switch to Python". I'm exploring and I'd prefer to go one topic at a time
In order for histogram and density layers to share the same y-scale you need to map variable "..density.." to aesthetic "y" in the histogram layer (by default histogram maps "..count.." to "y").
You will find an example of it in cell [4] in this notebook: https://nbviewer.org/github/JetBrains/lets-plot-kotlin/blob/master/docs/examples/jupyter-notebooks/distributions.ipynb
BWT, many of the pages in Lets-Plot Kotlin API Reference are equipped with links on demo-notebooks, in "Examples" section: geomHistogram().
And of course you can find a lot of info online on the R ggplot2 package which is largely applicable to Lets-Plot as well. For example: Histogram with kernel density estimation.
Finally :) , calling show() is not necessary - Jupyter Kotlin kernel will render plot automatically if plot expression is the last one in the cell which is often the case.

Is it possible that two displacement sensors value( one is digital input and the other is analog) express simultaneously in a vi?

I tried to express two displacement values in one waveform chart.
I have two displacement sensors, one is a digital input sensor and the other is analog input sensor.
I have to see those values in one waveform chart simultaneously.
In my attempts, two VIs of each instruments were combined into one VI. I found the error that when I ran the VI, only one VI would report values and not both simultaneously.
By any chance, is there a way to run it at the same time and see the values on one graph?
Let me share a few scenarios that might help solve your issue.
Plot multiple doubles on a chart: bundle them together and put the resulting cluster into a chart.
bundled doubles on a chart
Two VIs measuring a double and plotting those results.
bundled subvi outputs to a chart
You could similarly plot boolean values (your digital input sensor) by using the boolean to 0,1 VI first (converts the T/F to 1/0 respectively) and that can be bundled as above.
(These are the most direct/easiest ways of doing this; you have a number generator and you bundle the numbers to a graph).
Of course, I can imagine that your question might actually be about how to share values from parallel-running subVIs. If that's the case, say so and this answer can be edited to point you in the right direction.

VTK / ITK Dice Similarity Coefficient on Meshes

I am new to VTK and am trying to compute the Dice Similarity Coefficient (DSC), starting from 2 meshes.
DSC can be computed as 2 Vab / (Va + Vb), where Vab is the overlapping volume among mesh A and mesh B.
To read a mesh (i.e. an organ contour exported in .vtk format using 3D Slicer, https://www.slicer.org) I use the following snippet:
string inputFilename1 = "organ1.vtk";
// Get all data from the file
vtkSmartPointer<vtkGenericDataObjectReader> reader1 = vtkSmartPointer<vtkGenericDataObjectReader>::New();
reader1->SetFileName(inputFilename1.c_str());
reader1->Update();
vtkSmartPointer<vtkPolyData> struct1 = reader1->GetPolyDataOutput();
I can compute the volume of the two meshes using vtkMassProperties (although I observed some differences between the ones computed with VTK and the ones computed with 3D Slicer).
To then intersect 2 meshses, I am trying to use vtkIntersectionPolyDataFilter. The output of this filter, however, is a set of lines that marks the intersection of the input vtkPolyData objects, and NOT a closed surface. I therefore need to somehow generate a mesh from these lines and compute its volume.
Do you know which can be a good, accurate way to generete such a mesh and how to do it?
Alternatively, I tried to use ITK as well. I found a package that is supposed to handle this problem (http://www.insight-journal.org/browse/publication/762, dated 2010) but I am not able to compile it against the latest version of ITK. It says that ITK must be compiled with the (now deprecated) ITK_USE_REVIEW flag ON. Needless to say, I compiled it with the new Module_ITKReview set to ON and also with backward compatibility but had no luck.
Finally, if you have any other alternative (scriptable) software/library to solve this problem, please let me know. I need to perform these computation automatically.
You could try vtkBooleanOperationPolyDataFilter
http://www.vtk.org/doc/nightly/html/classvtkBooleanOperationPolyDataFilter.html
filter->SetOperationToIntersection();
if your data is smooth and well-behaved, this filter works pretty good. However, sharp structures, e.g. the ones originating from binary image marching cubes algorithm can make a problem for it. That said, vtkPolyDataToImageStencil doesn't necessarily perform any better on this regard.
I had once impression that the boolean operation on polygons is not really ideal for "organs" of size 100k polygons and more. Depends.
If you want to compute a Dice Similarity Coefficient, I suggest you first generate volumes (rasterize) from the meshes by use of vtkPolyDataToImageStencil.
Then it's easy to compute the DSC.
Good luck :)

Creating grid and interpolating (x,y,z) for contour plot sagemath

!I have values in the form of (x,y,z). By creating a list_plot3d plot i can clearly see that they are not quite evenly spaced. They usually form little "blobs" of 3 to 5 points on the xy plane. So for the interpolation and the final "contour" plot to be better, or should i say smoother(?), do i have to create a rectangular grid (like the squares on a chess board) so that the blobs of data are somehow "smoothed"? I understand that this might be trivial to some people but i am trying this for the first time and i am struggling a bit. I have been looking at the scipy packages like scipy.interplate.interp2d but the graphs produced at the end are really bad. Maybe a brief tutorial on 2d interpolation in sagemath for an amateur like me? Some advice? Thank you.
EDIT:
https://docs.google.com/file/d/0Bxv8ab9PeMQVUFhBYWlldU9ib0E/edit?pli=1
This is mostly the kind of graphs it produces along with this message:
Warning: No more knots can be added because the number of B-spline
coefficients
already exceeds the number of data points m. Probably causes:
either
s or m too small. (fp>s)
kx,ky=3,3 nx,ny=17,20 m=200 fp=4696.972223 s=0.000000
To get this graph i just run this command:
f_interpolation = scipy.interpolate.interp2d(*zip(*matrix(C)),kind='cubic')
plot_interpolation = contour_plot(lambda x,y:
f_interpolation(x,y)[0], (22.419,22.439),(37.06,37.08) ,cmap='jet', contours=numpy.arange(0,1400,100), colorbar=True)
plot_all = plot_interpolation
plot_all.show(axes_labels=["m", "m"])
Where matrix(c) can be a huge matrix like 10000 X 3 or even a lot more like 1000000 x 3. The problem of bad graphs persists even with fewer data like the picture i attached now where matrix(C) was only 200 x 3. That's why i begin to think that it could be that apart from a possible glitch with the program my approach to the use of this command might be totally wrong, hence the reason for me to ask for advice about using a grid and not just "throwing" my data into a command.
I've had a similar problem using the scipy.interpolate.interp2d function. My understanding is that the issue arises because the interp1d/interp2d and related functions use an older wrapping of FITPACK for the underlying calculations. I was able to get a problem similar to yours to work using the spline functions, which rely on a newer wrapping of FITPACK. The spline functions can be identified because they seem to all have capital letters in their names here http://docs.scipy.org/doc/scipy/reference/interpolate.html. Within the scipy installation, these newer functions appear to be located in scipy/interpolate/fitpack2.py, while the functions using the older wrappings are in fitpack.py.
For your purposes, RectBivariateSpline is what I believe you want. Here is some sample code for implementing RectBivariateSpline:
import numpy as np
from scipy import interpolate
# Generate unevenly spaced x/y data for axes
npoints = 25
maxaxis = 100
x = (np.random.rand(npoints)*maxaxis) - maxaxis/2.
y = (np.random.rand(npoints)*maxaxis) - maxaxis/2.
xsort = np.sort(x)
ysort = np.sort(y)
# Generate the z-data, which first requires converting
# x/y data into grids
xg, yg = np.meshgrid(xsort,ysort)
z = xg**2 - yg**2
# Generate the interpolated, evenly spaced data
# Note that the min/max of x/y isn't necessarily 0 and 100 since
# randomly chosen points were used. If we want to avoid extrapolation,
# the explicit min/max must be found
interppoints = 100
xinterp = np.linspace(xsort[0],xsort[-1],interppoints)
yinterp = np.linspace(ysort[0],ysort[-1],interppoints)
# Generate the kernel that will be used for interpolation
# Note that the default version uses three coefficients for
# interpolation (i.e. parabolic, a*x**2 + b*x +c). Higher order
# interpolation can be used by setting kx and ky to larger
# integers, i.e. interpolate.RectBivariateSpline(xsort,ysort,z,kx=5,ky=5)
kernel = interpolate.RectBivariateSpline(xsort,ysort,z)
# Now calculate the linear, interpolated data
zinterp = kernel(xinterp, yinterp)

Optimizing interpolation in Mathematica

As part of my work, I often have to visualize complex 3 dimensional densities. One program suite that I work with outputs the radial component of the densities as a set of 781 points on a logarithmic grid, ri = (Rmax/Rstep)^((i-1)/(pts-1), times a spherical harmonic. For low symmetry systems, the number of spherical harmonics can be fairly large to ensure accuracy, e.g. one system requires 49 harmonics corresponding to lmax = 6. So, to use this data within Mathematica, I would have a sum of up to 49 interpolated functions with each multiplied by a different spherical harmonic. While using v.6 and constructing the interpolated radial functions using Interpolation and setting r = Sqrt(x^2 + y^2 + z^2), I would stop ContourPlot3D after well over an hour without anything displayed. This included reducing both the InterpolationOrder and MaxRecursion to 1.
Several alternatives presented themselves:
Evaluate the density function on a fixed grid, and use ListContourPlot instead.
Or, linearly spline the radial function and use Piecewise to stitch them together. (This presented itself, as I could use simplify to help reduce the complexity of the resulting function.)
I ended up using both, as InterpolatingFunction gives a noticeable delay in its evaluation, and with up to 49 interpolated functions to evaluate, any delay can become noticeable. Also, ContourPlot3D was faster with the spline, but it didn't give me the speed up I desired.
I'll freely admit that I haven't tried Interpolation on v.7, nor I have tried this on my upgraded hardware (G4 v. Intel Core i5). However, I'm looking for alternatives to my current scheme; preferably, one where I can use ContourPlot3D directly. I could try some other form of spline, such as a B-spline, and possibly combine that with UnitBox instead of using Piecewise.
Edit: Just to clarify, my current implementation involves creating a first order spline for each radial part, multiplying each one by their respective spherical harmonic, summing and Simplifying the equations on each radial interval, and then using Piecewise to bind them into one function. So, my implementation is semi-analytical in that the spherical harmonics are exact, and only the radial part is numerical. This is part of the reason why I would like to be able to use ContourPlot3D, so that I can take advantage of the semi-analytical nature of the data. As a point of note, the radial grid is fine enough that a good representation of the radial part is generated and can be smoothly interpolated. While this gave me a significant speed-up, when I wrote the code, it was still to slow for the hardware I was using at the time.
So, instead of using ContourPlot3D, I would first generate the function, as above, then I would evaluate it on an 803 Cartesian grid. It is the data from this step that I used in ListContourPlot3D. Since this is not an adaptive grid, in some places this was too course, and I was missing features.
If you can do without Mathematica, I would suggest you have a look at Paraview (US government funded FOSS, all platforms) which I have found to be superior to everything when it comes to visualizing massive amounts of data.
The core of the software is the "Visualization Toolkit" VTK, and you can find/write other frontends if need be.
VTK/Paraview can handle almost any data-type: scalar and vector on structured grids or random points, polygons, time-series data, etc. From Mathematica I often just dump grid data into VTK legacy format which in then simplest case looks like this
# vtk DataFile Version 2.0
Generated by mma via vtkGridDump
ASCII
DATASET STRUCTURED_POINTS
DIMENSIONS 49 25 15
SPACING 0.125 0.125 0.0625
ORIGIN 8.5 5. 0.7124999999999999
POINT_DATA 18375
SCALARS RF_pondpot_1V1MHz1amu double 1
LOOKUP_TABLE default
0.04709501616121583
0.04135197485227461
... <18373 more numbers> ...
HTH!
If it really is the interpolation of the radial functions that is slowing you down, you could consider hand-coding that part based on your knowledge of the sample points. As demonstrated below, this gives a significant speedup:
I set things up with your notation. lookuprvals is a list of 100000 r values to look up for timing.
First, look at stock interpolation as a basemark
With[{interp=Interpolation[N#Transpose#{rvals,yvals}]},
Timing[interp[lookuprvals]][[1]]]
Out[259]= 2.28466
Switching to 0th-order interpolation is already an order of magnitude faster (first order is almost same speed):
With[{interp=Interpolation[N#Transpose#{rvals,yvals},InterpolationOrder->0]},
Timing[interp[lookuprvals]][[1]]]
Out[271]= 0.146486
We can get another 1.5 order of magnitude by calculating indices directly:
Module[{avg=MovingAverage[yvals,2],idxfact=N[(pts-1) /Log[Rmax/Rstep]]},
Timing[res=Part[avg,Ceiling[idxfact Log[lookuprvals]]]][[1]]]
Out[272]= 0.006067
As a middle ground, do a log-linear interpolation by hand. This is slower than the above solution but still much faster than stock interpolation:
Module[{diffs=Differences[yvals],
idxfact=N[(pts-1) /Log[Rmax/Rstep]]},
Timing[Block[{idxraw,idxfloor,idxrel},
idxraw=1+idxfact Log[lookuprvals];
idxfloor=Floor[idxraw];
idxrel=idxraw-idxfloor;
res=Part[yvals,idxfloor]+Part[diffs,idxfloor]idxrel
]][[1]]]
Out[276]= 0.026557
If you have the memory for it, I would cache the spherical harmonics and radius (or even radius-index) on the full grid. Then flatten the grid caches so you can do
Sum[ interpolate[yvals[lm],gridrvals] gridylmvals[lm], {lm,lmvals} ]
and recreate your grid as discussed here.