Texture format for cellular automata in OpenGL ES 2.0 - optimization

I need some quick advice.
I would like to simulate a cellular automata (from A Simple, Efficient Method
for Realistic Animation of Clouds) on the GPU. However, I am limited to OpenGL ES 2.0 shaders (in WebGL) which does not support any bitwise operations.
Since every cell in this cellular automata represents a boolean value, storing 1 bit per cell would have been the ideal. So what is the most efficient way of representing this data in OpenGL's texture formats? Are there any tricks or should I just stick with a straight-forward RGBA texture?
EDIT: Here's my thoughts so far...
At the moment I'm thinking of going with either plain GL_RGBA8, GL_RGBA4 or GL_RGB5_A1:
Possibly I could pick GL_RGBA8, and try to extract the original bits using floating point ops. E.g. x*255.0 gives an approximate integer value. However, extracting the individual bits is a bit of a pain (i.e. dividing by 2 and rounding a couple times). Also I'm wary of precision problems.
If I pick GL_RGBA4, I could store 1.0 or 0.0 per component, but then I could probably also try the same trick as before with GL_RGBA8. In this case, it's only x*15.0. Not sure if it would be faster or not seeing as there should be fewer ops to extract the bits but less information per texture read.
Using GL_RGB5_A1 I could try and see if I can pack my cells together with some additional information like a color per voxel where the alpha channel stores the 1 bit cell state.

Create a second texture and use it as a lookup table. In each 256x256 block of the texture you can represent one boolean operation where the inputs are represented by the row/column and the output is the texture value. Actually in each RGBA texture you can represent four boolean operations per 256x256 region. Beware texture compression and MIP maps, though!

Related

Simulate Camera in Numpy

I have the task to simulate a camera with a full well capacity of 10.000 Photons per sensor element
in numpy. My first Idea was to do it like that:
camera = np.random.normal(0.0,1/10000,np.shape(img))
Imgwithnoise= img+camera
but it hardly shows an effect.
Has someone an idea how to do it?
From what I interpret from your question, if each physical pixel of the sensor has a 10,000 photon limit, this points to the brightest a digital pixel can be on your image. Similarly, 0 incident photons make the darkest pixels of the image.
You have to create a map from the physical sensor to the digital image. For the sake of simplicity, let's say we work with a grayscale image.
Your first task is to fix the colour bit-depth of the image. That is to say, is your image an 8-bit colour image? (Which usually is the case) If so, the brightest pixel has a brightness value = 255 (= 28 - 1, for 8 bits.) The darkest pixel is always chosen to have a value 0.
So you'd have to map from the range 0 --> 10,000 (sensor) to 0 --> 255 (image). The most natural idea would be to do a linear map (i.e. every pixel of the image is obtained by the same multiplicative factor from every pixel of the sensor), but to correctly interpret (according to the human eye) the brightness produced by n incident photons, often different transfer functions are used.
A transfer function in a simplified version is just a mathematical function doing this map - logarithmic TFs are quite common.
Also, since it seems like you're generating noise, it is unwise and conceptually wrong to add camera itself to the image img. What you should do, is fix a noise threshold first - this can correspond to the maximum number of photons that can affect a pixel reading as the maximum noise value. Then you generate random numbers (according to some distribution, if so required) in the range 0 --> noise_threshold. Finally, you use the map created earlier to add this noise to the image array.
Hope this helps and is in tune with what you wish to do. Cheers!

Explain how premultiplied alpha works

Can somebody please explain why rendering with premultiplied alpha (and corrected blending function) looks differently than "normal" alpha when, mathematically speaking, those are the same?
I've looked into this post for understanding of premultiplied alpha:
http://blogs.msdn.com/b/shawnhar/archive/2009/11/06/premultiplied-alpha.aspx
The author also said that the end computation is the same:
"Look at the blend equations for conventional vs. premultiplied alpha. If you substitute this color format conversion into the premultiplied blend function, you get the conventional blend function, so either way produces the same end result. The difference is that premultiplied alpha applies the (source.rgb * source.a) computation as a preprocess rather than inside the blending hardware."
Am I missing something? Why is the result different then?
neshone
The difference is in filtering.
Imagine that you have a texture with just two pixels and you are sampling it exactly in the middle between the two pixels. Also assume linear filtering.
Schematically:
R|G|B|A + R|G|B|A = R|G|B|A
non-premultiplied:
1|0|0|1 + 0|1|0|0 = 0.5|0.5|0|0.5
premultiplied:
1|0|0|1 + 0|0|0|0 = 0.5|0|0|0.5
Notice the difference in green channel.
Filtering premultiplied alpha produces correct results.
Note that all this has nothing to do with blending.
This is a guess, because there is not enough information yet to figure it out.
It should be the same. One common way of getting a different value is to use a different Gamma correction method between the premultiply and the rendering step.
I am going to guess that one of your stages, either the blending, or the premultiplying stage is being done with a different gamma value. If you generate your premultiplied textures with a tool like DirectXTex texconv and use the default srgb option for premultiplying alpha, then your sampler needs to be an _SRGB format and your render target should be _SRGB as well. If you are treating them linearly then you may not be able to render to an _SRGB target or sample the texture with gamma correction, even if you are doing the premultiply in the same shader that samples (depending on 3D API and render target setup differences). Doing so will cause the alpha to be significantly different between the two methods in the midtones.
See: The Importance of Being Linear.
If you are generating the alpha in Photoshop then you should know a couple things. Photoshop does not save alpha in linear OR sRGB format. It saves it as a Gamma value about half way between linear and sRGB. If you premultiply in Photoshop it will compute the premultiply correctly but save the result with the wrong ramp. If you generate a normal alpha then sample it as sRGB or LINEAR in your 3d API it will be close but will not match the values Photoshop shows in either case.
For a more in depth reply the information we would need would be.
What 3d API are you using.
How are your textures generated and sampled
When and how are you premultiplying the alpha.
and preferably a code or shader example that shows the error.
I was researching why one would use Pre vs non-Pre and found this interesting info from Nvidia
https://developer.nvidia.com/content/alpha-blending-pre-or-not-pre
It seems that their specific case has more precision when using Pre, over Post-Alpha.
I also read (I believe on here but cannot find it), that doing pre-alpha (which is multiplying Alpha to each RGB value), you will save time. I still need to find out if that's true or not, but there seems to be a reason why pre-alpha is preferred.

OpenGL GL_POINTS size

How can I make my GL_POINT bigger? I'm using glPointSize, but its working just up to some size. So if I write
glPointSize(100);
its the same size as
glPointSize(500);
How can I make it as big as I need?
The OpenGL wiki says:
There is an implementation-defined range for point sizes, and the size given by either method is clamped to that range. You can query the range with GL_POINT_SIZE_RANGE​ (returns 2 floats). There is also a point granularity that you can query with GL_POINT_SIZE_GRANULARITY​; the implementation will clamp sizes to its granularity as needed.
If the size of point you want isn't in the allowable range consider using a textured quad or even a TRIANGLE_FAN to make a (nearly) circular polygon of whatever size you desire.
You can draw a view aligned quad of whatever size you want at the point location.

Where to start with Fourier Analysis

I'm reading data from the microphone and want to perform some analysis on it. I'm attempting to generate a spectrum analyser something like this:
What I have at the moment is this:
My understanding is that I need to perform a Fourier analysis - a Fast Fourier Transform ? - to extract the component frequencies and their amplitudes.
Can someone confirm my understanding is correct and exactly what type of Fourier transform I need to apply?
At the moment, I'm getting frames containing 4k samples from the mic (using NAudio). The buffer I've got is 16bits/sample (Signed Short). For reference, the above plot shows approx half a frame
I'm coding in VB so any .Net libraries/examples (preferably on NuGet) would be of most use. I believe implementations vary considerably so the less I have to massage my data, the better.
The top plot is that of a spectrograph, where each vertical time line is colored based on the magnitudes of the result from an FFT (likely windowed) of a slice in time (possibly overlapped) of the input waveform. The number of vertical points to plot (the frequency resolution) is related to the length of the FFT. Almost any FFT will do. If you use the most common complex-to-complex FFT, just set the imaginary portion of each complex input sample to zero, copy a slice in time of samples of your input waveform to the "real" part, FFT, and take the magnitude or log magnitude of each complex result bin, then map these values to colors per your preference.

Optimizing interpolation in Mathematica

As part of my work, I often have to visualize complex 3 dimensional densities. One program suite that I work with outputs the radial component of the densities as a set of 781 points on a logarithmic grid, ri = (Rmax/Rstep)^((i-1)/(pts-1), times a spherical harmonic. For low symmetry systems, the number of spherical harmonics can be fairly large to ensure accuracy, e.g. one system requires 49 harmonics corresponding to lmax = 6. So, to use this data within Mathematica, I would have a sum of up to 49 interpolated functions with each multiplied by a different spherical harmonic. While using v.6 and constructing the interpolated radial functions using Interpolation and setting r = Sqrt(x^2 + y^2 + z^2), I would stop ContourPlot3D after well over an hour without anything displayed. This included reducing both the InterpolationOrder and MaxRecursion to 1.
Several alternatives presented themselves:
Evaluate the density function on a fixed grid, and use ListContourPlot instead.
Or, linearly spline the radial function and use Piecewise to stitch them together. (This presented itself, as I could use simplify to help reduce the complexity of the resulting function.)
I ended up using both, as InterpolatingFunction gives a noticeable delay in its evaluation, and with up to 49 interpolated functions to evaluate, any delay can become noticeable. Also, ContourPlot3D was faster with the spline, but it didn't give me the speed up I desired.
I'll freely admit that I haven't tried Interpolation on v.7, nor I have tried this on my upgraded hardware (G4 v. Intel Core i5). However, I'm looking for alternatives to my current scheme; preferably, one where I can use ContourPlot3D directly. I could try some other form of spline, such as a B-spline, and possibly combine that with UnitBox instead of using Piecewise.
Edit: Just to clarify, my current implementation involves creating a first order spline for each radial part, multiplying each one by their respective spherical harmonic, summing and Simplifying the equations on each radial interval, and then using Piecewise to bind them into one function. So, my implementation is semi-analytical in that the spherical harmonics are exact, and only the radial part is numerical. This is part of the reason why I would like to be able to use ContourPlot3D, so that I can take advantage of the semi-analytical nature of the data. As a point of note, the radial grid is fine enough that a good representation of the radial part is generated and can be smoothly interpolated. While this gave me a significant speed-up, when I wrote the code, it was still to slow for the hardware I was using at the time.
So, instead of using ContourPlot3D, I would first generate the function, as above, then I would evaluate it on an 803 Cartesian grid. It is the data from this step that I used in ListContourPlot3D. Since this is not an adaptive grid, in some places this was too course, and I was missing features.
If you can do without Mathematica, I would suggest you have a look at Paraview (US government funded FOSS, all platforms) which I have found to be superior to everything when it comes to visualizing massive amounts of data.
The core of the software is the "Visualization Toolkit" VTK, and you can find/write other frontends if need be.
VTK/Paraview can handle almost any data-type: scalar and vector on structured grids or random points, polygons, time-series data, etc. From Mathematica I often just dump grid data into VTK legacy format which in then simplest case looks like this
# vtk DataFile Version 2.0
Generated by mma via vtkGridDump
ASCII
DATASET STRUCTURED_POINTS
DIMENSIONS 49 25 15
SPACING 0.125 0.125 0.0625
ORIGIN 8.5 5. 0.7124999999999999
POINT_DATA 18375
SCALARS RF_pondpot_1V1MHz1amu double 1
LOOKUP_TABLE default
0.04709501616121583
0.04135197485227461
... <18373 more numbers> ...
HTH!
If it really is the interpolation of the radial functions that is slowing you down, you could consider hand-coding that part based on your knowledge of the sample points. As demonstrated below, this gives a significant speedup:
I set things up with your notation. lookuprvals is a list of 100000 r values to look up for timing.
First, look at stock interpolation as a basemark
With[{interp=Interpolation[N#Transpose#{rvals,yvals}]},
Timing[interp[lookuprvals]][[1]]]
Out[259]= 2.28466
Switching to 0th-order interpolation is already an order of magnitude faster (first order is almost same speed):
With[{interp=Interpolation[N#Transpose#{rvals,yvals},InterpolationOrder->0]},
Timing[interp[lookuprvals]][[1]]]
Out[271]= 0.146486
We can get another 1.5 order of magnitude by calculating indices directly:
Module[{avg=MovingAverage[yvals,2],idxfact=N[(pts-1) /Log[Rmax/Rstep]]},
Timing[res=Part[avg,Ceiling[idxfact Log[lookuprvals]]]][[1]]]
Out[272]= 0.006067
As a middle ground, do a log-linear interpolation by hand. This is slower than the above solution but still much faster than stock interpolation:
Module[{diffs=Differences[yvals],
idxfact=N[(pts-1) /Log[Rmax/Rstep]]},
Timing[Block[{idxraw,idxfloor,idxrel},
idxraw=1+idxfact Log[lookuprvals];
idxfloor=Floor[idxraw];
idxrel=idxraw-idxfloor;
res=Part[yvals,idxfloor]+Part[diffs,idxfloor]idxrel
]][[1]]]
Out[276]= 0.026557
If you have the memory for it, I would cache the spherical harmonics and radius (or even radius-index) on the full grid. Then flatten the grid caches so you can do
Sum[ interpolate[yvals[lm],gridrvals] gridylmvals[lm], {lm,lmvals} ]
and recreate your grid as discussed here.