How could I read information off the graphics card after processing? - gpu

Supposing, for example, that I had a 10x10 "cloth" mesh, with each square being two triangles. Now, if I wanted to animate this, I could do the spring calculations on the CPU. Each vertex would have its own "spring" data and would, hopefully, bounce like whatever type of "cloth" it was supposed to represent.
However, that would involve a minimum of about 380? spring calculations per frame. Happily, the per-vertex calculations are "embarrassingly parallel" - Had I one CPU per vertex, each vertex could be run on a single CPU. GPUs, therefore, are theoretically an excellent choice for running such calculations on.
Except (and this is using DirectX/SlimDX) - I have no idea/am not sure how I would/should:
1) Send all this vertex data to the graphics card (yes, I know how to render stuff and have even written my own per-pixel and texture-blending global lighting effect file; however, it is necessary for each vertex to be able to access the position data of at least three other vertices). I suppose I could stick the relevant vertex positions and number of vertex positions in TextureCoords, but there may be a different, standard solution.
2) Read all the vertex data afterwards, so I can update the mesh in memory. Otherwise, each update will act on the exact same data to the exact same result, rather like adding 2 + 3 = 5, 2 + 3 = 5, 2 + 3 = 5 when you want is 2 + 3 = 5 - 2 = 3 + 1.5 = 4.5.
And it may be that I'm looking in the wrong direction to do this.
Thanks.

You could use the approach you have described to pack the data into textures and write special HLSL shaders to compute spring forces and then update vertex positions. That approach is totally valid but can be troublesome when you try to debug problems because you are using texture pixels in an unconventional way (you can draw the texture and maybe write some code to watch the values in a given pixel). It is probably easier in the long run to use something like CUDA, DirectCompute, or OpenCL. CUDA allows you to "bind" the DirectX vertex-buffer for access in CUDA. Then in a CUDA kernel you can use the positions to calculate forces and then write new positions to the vertex-buffer (in parallel on the GPU) before you render the updated positions.
There is a cloth demo that uses DirectCompute in the DirectX 10/11 DirectX SDK.

Related

How can isosurface, isocaps and stlwrite affect the volume size?

I am trying to extract a STL file from a volume (3D-matrix). Since I am working with a DICOM volume, I want to adapt the matrix elements to the voxel size. Everything seems to work fine, but when I check the stl output size (for example in a CAD software), it seems that a slice, a "voxel", is missing in each direction. This can be (maybe) reasonable for a DICOM, where the slices are counted considering the respective center (and therefore, in a certain sense, one slice may be missing in the finale measure), but not in the following example I tried.
I leave here a simple code: my voxel size is 0.723x0.723x0.8; I created a 10x10x10 matrix, so I would expect a 7.23 x 7.23 x 8 mm stl. Instead, I get a 6.507 x 6.507 x 7.2 mm stl, so I think something goes wrong.
R=randi(3,10,10,10);
R(R>1)=0;
[F1,V1]=isosurface(R,0);
[F2,V2]=isocaps(R,0);
faces=[F1;F2+length(V1(:,1))];
vertices=[V1;V2];
S=makehgtform('scale',[0.723,0.723,0.8]);
new_vertices=vertices*S(1:3,1:3);
p=patch('Vertices',new_vertices,'Faces',faces,'FaceColor','blue');
view(3)
camlight headlight; lighting gouraud
str=strcat(['R','.stl']);
stlwrite(str,faces,new_vertices);
How can isosurface, isocaps or stlwrite be responsible for this?

In CGAL, can one convert a triangulation in more than three dimensions to a polytope?

If this question would be more appropriate on a related site, let me know, and I'd be happy to move it.
I have 165 vertices in ℤ11, all of which are at a distance of √8 from the origin and are extreme points on their corresponding convex hull. CGAL is able to calculate their d-dimensional triangulation in only 133 minutes on my laptop using just under a gigabyte of RAM.
Magma manages a similar 66 vertex case quite quickly, and, crucially for my application, it returns an actual polytope instead of a triangulation. Thus, I can view each d-dimensional face as a single object which can be bounded by an arbitrary number of vertices.
Additionally, although less essential to my application, I can also use Graph : TorPol -> GrphUnd to calculate all the topological information regarding how those faces are connected, and then AutomorphismGroup : Grph -> GrpPerm, ... to find the corresponding automorphism group of that cell structure.
Unfortunately, when applied to the original polytope, Magma's AutomorphismGroup : TorPol -> GrpMat only returns subgroups of GLd(ℤ), instead of the full automorphism group G, which is what I'm truly hoping to calculate. As a matrix group, G ∉ GL11(ℤ), but is instead ∈ GL11(𝔸), where 𝔸 represents the algebraic numbers. In general, I won't need the full algebraic closure of the rationals, ℚ̅, but just some field extension. However, I could make use of any non-trivially powerful representation of G.
With two days of calculation, Magma can manage the 165 vertex case, but is only able to provide information about the polytope's original 165 vertices, 10-facets, and volume. However, attempting to enumerate the d-faces, for any 2 ≤ d < 10, quickly consumes the 256 GB of RAM I have at my disposal.
CGAL's triangulation, on the other hand, only calculates collections of d-simplices, all of which have d + 1 vertices. It seems possible to derive the same facial information from such a triangulation, but I haven't thought of an easy way to code that up.
Am I missing something obvious in CGAL? Do you have any suggestions for alternative ways to calculate the polytope's face information, or to find the full automorphism group of my set of points?
You can use the package Combinatorial maps in CGAL, that is able to represent polytopes in nD. A combinatorial map describes all cells and all incidence and adjacency relations between the cells.
In this package, there is an undocumented method are_cc_isomorphic allowing to test if an isomorphism exist from two starting points. I think you can use this method from all possible pair of starting points to find all automorphisms.
Unfortunatly, there is no method to build a combinatorial map from a dD triangulation. Such method exists in 3D (cf. this file). It can be extended in dD.

VTK / ITK Dice Similarity Coefficient on Meshes

I am new to VTK and am trying to compute the Dice Similarity Coefficient (DSC), starting from 2 meshes.
DSC can be computed as 2 Vab / (Va + Vb), where Vab is the overlapping volume among mesh A and mesh B.
To read a mesh (i.e. an organ contour exported in .vtk format using 3D Slicer, https://www.slicer.org) I use the following snippet:
string inputFilename1 = "organ1.vtk";
// Get all data from the file
vtkSmartPointer<vtkGenericDataObjectReader> reader1 = vtkSmartPointer<vtkGenericDataObjectReader>::New();
reader1->SetFileName(inputFilename1.c_str());
reader1->Update();
vtkSmartPointer<vtkPolyData> struct1 = reader1->GetPolyDataOutput();
I can compute the volume of the two meshes using vtkMassProperties (although I observed some differences between the ones computed with VTK and the ones computed with 3D Slicer).
To then intersect 2 meshses, I am trying to use vtkIntersectionPolyDataFilter. The output of this filter, however, is a set of lines that marks the intersection of the input vtkPolyData objects, and NOT a closed surface. I therefore need to somehow generate a mesh from these lines and compute its volume.
Do you know which can be a good, accurate way to generete such a mesh and how to do it?
Alternatively, I tried to use ITK as well. I found a package that is supposed to handle this problem (http://www.insight-journal.org/browse/publication/762, dated 2010) but I am not able to compile it against the latest version of ITK. It says that ITK must be compiled with the (now deprecated) ITK_USE_REVIEW flag ON. Needless to say, I compiled it with the new Module_ITKReview set to ON and also with backward compatibility but had no luck.
Finally, if you have any other alternative (scriptable) software/library to solve this problem, please let me know. I need to perform these computation automatically.
You could try vtkBooleanOperationPolyDataFilter
http://www.vtk.org/doc/nightly/html/classvtkBooleanOperationPolyDataFilter.html
filter->SetOperationToIntersection();
if your data is smooth and well-behaved, this filter works pretty good. However, sharp structures, e.g. the ones originating from binary image marching cubes algorithm can make a problem for it. That said, vtkPolyDataToImageStencil doesn't necessarily perform any better on this regard.
I had once impression that the boolean operation on polygons is not really ideal for "organs" of size 100k polygons and more. Depends.
If you want to compute a Dice Similarity Coefficient, I suggest you first generate volumes (rasterize) from the meshes by use of vtkPolyDataToImageStencil.
Then it's easy to compute the DSC.
Good luck :)

Texture format for cellular automata in OpenGL ES 2.0

I need some quick advice.
I would like to simulate a cellular automata (from A Simple, Efficient Method
for Realistic Animation of Clouds) on the GPU. However, I am limited to OpenGL ES 2.0 shaders (in WebGL) which does not support any bitwise operations.
Since every cell in this cellular automata represents a boolean value, storing 1 bit per cell would have been the ideal. So what is the most efficient way of representing this data in OpenGL's texture formats? Are there any tricks or should I just stick with a straight-forward RGBA texture?
EDIT: Here's my thoughts so far...
At the moment I'm thinking of going with either plain GL_RGBA8, GL_RGBA4 or GL_RGB5_A1:
Possibly I could pick GL_RGBA8, and try to extract the original bits using floating point ops. E.g. x*255.0 gives an approximate integer value. However, extracting the individual bits is a bit of a pain (i.e. dividing by 2 and rounding a couple times). Also I'm wary of precision problems.
If I pick GL_RGBA4, I could store 1.0 or 0.0 per component, but then I could probably also try the same trick as before with GL_RGBA8. In this case, it's only x*15.0. Not sure if it would be faster or not seeing as there should be fewer ops to extract the bits but less information per texture read.
Using GL_RGB5_A1 I could try and see if I can pack my cells together with some additional information like a color per voxel where the alpha channel stores the 1 bit cell state.
Create a second texture and use it as a lookup table. In each 256x256 block of the texture you can represent one boolean operation where the inputs are represented by the row/column and the output is the texture value. Actually in each RGBA texture you can represent four boolean operations per 256x256 region. Beware texture compression and MIP maps, though!

Optimizing interpolation in Mathematica

As part of my work, I often have to visualize complex 3 dimensional densities. One program suite that I work with outputs the radial component of the densities as a set of 781 points on a logarithmic grid, ri = (Rmax/Rstep)^((i-1)/(pts-1), times a spherical harmonic. For low symmetry systems, the number of spherical harmonics can be fairly large to ensure accuracy, e.g. one system requires 49 harmonics corresponding to lmax = 6. So, to use this data within Mathematica, I would have a sum of up to 49 interpolated functions with each multiplied by a different spherical harmonic. While using v.6 and constructing the interpolated radial functions using Interpolation and setting r = Sqrt(x^2 + y^2 + z^2), I would stop ContourPlot3D after well over an hour without anything displayed. This included reducing both the InterpolationOrder and MaxRecursion to 1.
Several alternatives presented themselves:
Evaluate the density function on a fixed grid, and use ListContourPlot instead.
Or, linearly spline the radial function and use Piecewise to stitch them together. (This presented itself, as I could use simplify to help reduce the complexity of the resulting function.)
I ended up using both, as InterpolatingFunction gives a noticeable delay in its evaluation, and with up to 49 interpolated functions to evaluate, any delay can become noticeable. Also, ContourPlot3D was faster with the spline, but it didn't give me the speed up I desired.
I'll freely admit that I haven't tried Interpolation on v.7, nor I have tried this on my upgraded hardware (G4 v. Intel Core i5). However, I'm looking for alternatives to my current scheme; preferably, one where I can use ContourPlot3D directly. I could try some other form of spline, such as a B-spline, and possibly combine that with UnitBox instead of using Piecewise.
Edit: Just to clarify, my current implementation involves creating a first order spline for each radial part, multiplying each one by their respective spherical harmonic, summing and Simplifying the equations on each radial interval, and then using Piecewise to bind them into one function. So, my implementation is semi-analytical in that the spherical harmonics are exact, and only the radial part is numerical. This is part of the reason why I would like to be able to use ContourPlot3D, so that I can take advantage of the semi-analytical nature of the data. As a point of note, the radial grid is fine enough that a good representation of the radial part is generated and can be smoothly interpolated. While this gave me a significant speed-up, when I wrote the code, it was still to slow for the hardware I was using at the time.
So, instead of using ContourPlot3D, I would first generate the function, as above, then I would evaluate it on an 803 Cartesian grid. It is the data from this step that I used in ListContourPlot3D. Since this is not an adaptive grid, in some places this was too course, and I was missing features.
If you can do without Mathematica, I would suggest you have a look at Paraview (US government funded FOSS, all platforms) which I have found to be superior to everything when it comes to visualizing massive amounts of data.
The core of the software is the "Visualization Toolkit" VTK, and you can find/write other frontends if need be.
VTK/Paraview can handle almost any data-type: scalar and vector on structured grids or random points, polygons, time-series data, etc. From Mathematica I often just dump grid data into VTK legacy format which in then simplest case looks like this
# vtk DataFile Version 2.0
Generated by mma via vtkGridDump
ASCII
DATASET STRUCTURED_POINTS
DIMENSIONS 49 25 15
SPACING 0.125 0.125 0.0625
ORIGIN 8.5 5. 0.7124999999999999
POINT_DATA 18375
SCALARS RF_pondpot_1V1MHz1amu double 1
LOOKUP_TABLE default
0.04709501616121583
0.04135197485227461
... <18373 more numbers> ...
HTH!
If it really is the interpolation of the radial functions that is slowing you down, you could consider hand-coding that part based on your knowledge of the sample points. As demonstrated below, this gives a significant speedup:
I set things up with your notation. lookuprvals is a list of 100000 r values to look up for timing.
First, look at stock interpolation as a basemark
With[{interp=Interpolation[N#Transpose#{rvals,yvals}]},
Timing[interp[lookuprvals]][[1]]]
Out[259]= 2.28466
Switching to 0th-order interpolation is already an order of magnitude faster (first order is almost same speed):
With[{interp=Interpolation[N#Transpose#{rvals,yvals},InterpolationOrder->0]},
Timing[interp[lookuprvals]][[1]]]
Out[271]= 0.146486
We can get another 1.5 order of magnitude by calculating indices directly:
Module[{avg=MovingAverage[yvals,2],idxfact=N[(pts-1) /Log[Rmax/Rstep]]},
Timing[res=Part[avg,Ceiling[idxfact Log[lookuprvals]]]][[1]]]
Out[272]= 0.006067
As a middle ground, do a log-linear interpolation by hand. This is slower than the above solution but still much faster than stock interpolation:
Module[{diffs=Differences[yvals],
idxfact=N[(pts-1) /Log[Rmax/Rstep]]},
Timing[Block[{idxraw,idxfloor,idxrel},
idxraw=1+idxfact Log[lookuprvals];
idxfloor=Floor[idxraw];
idxrel=idxraw-idxfloor;
res=Part[yvals,idxfloor]+Part[diffs,idxfloor]idxrel
]][[1]]]
Out[276]= 0.026557
If you have the memory for it, I would cache the spherical harmonics and radius (or even radius-index) on the full grid. Then flatten the grid caches so you can do
Sum[ interpolate[yvals[lm],gridrvals] gridylmvals[lm], {lm,lmvals} ]
and recreate your grid as discussed here.