Is there a way to allocate the data section (i.e. the data) of a numpy array on a page boundary?
For why I care, if I were using PyOpenCL on an Intel device, and I wanted to create a buffer using CL_MEM_USE_HOST_PTR, they recommend that the data is 1) page aligned and 2) size a multiple of a cache line.
There are various ways in C of allocating page aligned memory, see for example: aligned malloc() in GCC?
I'm not aware that Numpy has any explicit calls to align memory at this time. The only way I can think of doing this, short of Cython as suggested by #Saulio Castro, would be through judicious allocation of memory, with "padding", using the numpy allocation or PyOpenCL APIs.
You would need to create a buffer "padded" to align on multiples of 64K bytes. You would also need to "pad" the individual data structure elements you were allocating in the array so they too, in turn, were aligned to 4k byte boundaries. This would of course depend on what your elements look like, whether they were built in numpy data types, or structures created using the numpy dtype. The API for dtype has an "align" keyword but I would be wary of that, based on the discussion at this link.
An old school trick to align structure is to start with the largest elements, work your way down, then "pad" with enough uint8's so one or N structs fill out the alignment boundary.
Hope that's not too vague...
Related
Say I have a memory buffer with a vector of type std::decimal::decimal128 (IEEE754R) elements, can I wrap and expose that as a NumPy array, and do fast operations on those decimal vectors, like for example compute variance or auto-correlation over the vector? How would I do that best?
Numpy does not support such a data type yet (at least on mainstream architectures). Only float16, float32, float64 and the non standard native extended double (generally with 80 bits) are supported. Put it shortly, only floating-point types natively supported by the target architecture. If the target machine support 128 bit double-precision numbers, then you could try the numpy.longdouble type but I do not expect this to be the case. In practice, x86 processors does not support that yet as well as ARM. IBM processors like POWER9 supports that natively but I am not sure they (fully) support the IEEE-754R standard. For more information please read this. Note that you could theoretically wrap binary data in Numpy types but you will not be able to do anything (really) useful with it. The Numpy code can theoretically be extended with new types but please note that Numpy is written in C and not C++ so adding the std::decimal::decimal128 in the source code will not be easy.
Note that if you really want to wrap such a type in Numpy array without having to change/rebuild the Numpy code, could wrap your type in a pure-Python class. However, be aware that the performance will be very bad since using pure-Python object prevent all the optimization done in Numpy (eg. SIMD vectorization, use of fast native code, specific algorithm optimized for a given type, etc.).
I would like to deep copy one array to another array. What is the best way to do it ?
I have attempted this way and it seems to work. I would like to deep copy it.
Thanks
Perhaps you're used to a different language where everything is done by reference, but you don't need to do any of this in LabVIEW. LabVIEW automatically copies data on a wire when necessary, but not when it isn't necessary.
The only thing your code is doing is creating an array with an extra dimension, because inside your loop you're building each scalar value into a 1D array with one element, then passing that array to an indexing array terminal which builds an array of the data that is wired to it - since you're passing a 1D array in, you get a 2D array out. However you could have got exactly the same result, if that's what you really wanted, by wiring your original array to a Build Array function then reshaping it from 1 x n to n x 1 using Reshape Array:
If you're worried about memory allocations, which you shouldn't need to be unless your code is actually running out of memory or running too slowly, you can see where LabVIEW will and won't make a copy by choosing Tools > Profile > Show Buffer Allocations. This adds a little black dot to any terminal, of one of the data types you select, where a new memory buffer has had to be allocated. If you do this for the code above you'll see that building an array from lower-dimensional data needs a new buffer, but reshaping an array doesn't.
If you have a very special case where you need to force LabVIEW not to allocate a buffer you can use an In Place Element Structure. But for the vast majority of programming you don't need to think about any of this: just let LabVIEW take care of it for you.
In the meantime I suggest you read the tutorial on loops.
In DirectX12, you render multiple objects in different locations using the equivalent of a single uniform buffer for the world transform like:
// Basic simplified pseudocode
SetRootSignature();
SetPrimitiveTopology();
SetPipelineState();
SetDepthStencilTarget();
SetViewportAndScissor();
for (auto object : objects)
{
SetIndexBuffer();
SetVertexBuffer();
struct VSConstants
{
QEDx12::Math::Matrix4 modelToProjection;
} vsConstants;
vsConstants.modelToProjection = ViewProjMat * object->GetWorldProj();
SetDynamicConstantBufferView(0, sizeof(vsConstants), &vsConstants);
DrawIndexed();
}
However, in Vulkan, if you do something similar with a single uniform buffer, all the objects are rendered in the location of last world matrix:
for (auto object : objects)
{
SetIndexBuffer();
SetVertexBuffer();
UploadUniformBuffer(object->GetWorldProj());
DrawIndexed();
}
Is there a way to draw multiple objects with a single uniform buffer in Vulkan, just like in DirectX12?
I'm aware of Sascha Willem's Dynamic uniform buffer example (https://github.com/SaschaWillems/Vulkan/tree/master/dynamicuniformbuffer) where he packs many matrices in one big uniform buffer, and while useful, is not exactly what I am looking for.
Thanks in advance for any help.
I cannot find a function called SetDynamicConstantBufferView in the D3D 12 API. I presume this is some function of your invention, but without knowing what it does, I can only really guess.
It looks like you're uploading data to the buffer object while rendering. If that's the case, well, Vulkan can't do that. And that's a good thing. Uploading to memory that you're currently reading from requires synchronization. You have to issue a barrier between the last rendering command that was reading the data you're about to overwrite, and the next rendering command. It's just not a good idea if you like performance.
But again, I'm not sure exactly what that function is doing, so my understanding may be wrong.
In Vulkan, descriptors are generally not meant to be changed in the middle of rendering a frame. However, the makers of Vulkan realized that users sometimes want to draw using different subsets of the same VkBuffer object. This is what dynamic uniform/storage buffers are for.
You technically don't have multiple uniform buffers; you just have one. But you can use the offset(s) provided to vkCmdBindDescriptorSets to shift where in that buffer the next rendering command(s) will get their data from. So it's a light-weight way to supply different rendering commands with different data.
Basically, you rebind your descriptor sets, but with different pDynamicOffset array values. To make these work, you need to plan ahead. Your pipeline layout has to explicitly declare those descriptors as being dynamic descriptors. And every time you bind the set, you'll need to provide the offset into the buffer used by that descriptor.
That being said, it would probably be better to make your uniform buffer store larger arrays of matrices, using the dynamic offset to jump from one block of matrices to the other. You would tehn
The point of that is that the uniform data you provide (depending on hardware) will remain in shader memory unless you do something to change the offset or shader. There is some small cost to uploading such data, so minimizing the need for such uploads is probably not a bad idea.
So you should go and upload all of your objects buffer data in a single DMA operation. Then you issue a barrier, and do your rendering, using dynamic offsets and such to tell each offset where it goes.
You either have to use Push constants or have separate uniform buffers for each location. These can be bound either with a descriptor per location of dynamic offset.
In Sasha's example you can have more than just the one matrix inside the uniform.
That means that inside UploadUniformBuffer you append the new matrix to the buffer and bind the new location.
In Matplotlib it is possible to plot a very long array A using rasterize=True, as in the following:
plt.plot(A, rasterise=True)
This typically lowers the memory usage.
It is possible to do the same when drawing a rugplot on the support axis in Seaborn's sns.distplot(see http://seaborn.pydata.org/generated/seaborn.distplot.html)? In fact, such a rugplot can consist of many points and consume lot of memory, too.
EDIT:
As noticed in the answer below, this does not lower memory RAM consumption, but we saving the plot on file in pdf format, can alter (i.e., decrease or even increase, under certain circumstances) the dimension of the file on disk.
Seaborn distplot, like many other seaborn plots, allows to pass keyword arguments to the underlying matplotlib functions.
In this case, distplot has a keyword argument rug_kws, which accepts a dictionary of keyword arguments to be passed to the rugplot. Those are again transfered to the underlying matplotlib axvline function.
As such, you can easily provide rasterized=True to axvline via
ax = sns.distplot(x, rug=True, hist=False, rug_kws=dict(rasterized=True))
However, I'm not sure if this has the desired effect of lowering memory consumption. In general, rasterization is applied when saving the figure, so the plot shown on the screen will not be affected at all.
During the process of saving, the rasterization has to be applied, which takes more time and memory than without rasterization.
While bitmap files like png are completely rasterized anyways and will not be affected at all, the generated vector files (like pdf, eps or svg) may even have a larger filesize compared to their unrasterized counterparts.
Rasterization will then only pay off when actually opening such a file (e.g. pdf in a viewer) or processing it (e.g. in latex) where having the rasterized part consumes much less memory and allowing for faster rendering on the screen or printing.
I have a seemingly simple problem, but an easy solution is alluding me. I have a very large series (tens or hundreds of thousands of points), and I just need to visualize it at different zoom levels, but generally zoomed well out. Basically, I want to plot it in a tool like Matlab or Pyplot, but knowing that each pixel can't represent the potentially many hundreds of points that map to it, I'd like to see both the min and the max of all the array entries that map to a pixel, so that I can generally understand what's going on. Is there a simple way of doing this?
Try hexbin. By setting the reduce_C_function I think you can get what you want. Ex:
import matplotlib.pyplot as plt
import numpy as np
plt.hexbin(x,y,C=C, reduce_C_function=np.max) # C = f(x,y)
would give you a hexagonal heatmap where the color in the pixel is the maximum value in the bin.
If you only want to bin in one direction, see this this method.
First option you may want to try is Gephi- https://gephi.org/
Here is another option, though I'm not quite sure it will work. It's hard to say without seeing the data.
Try going to this link- http://bl.ocks.org/3887118. Do you see toward the bottom of the page data.tsv with all of the values? IF you can save your data to resemble this then the HTML code above should be able to build your data in the scatter plot example shown in that link.
Otherwise, try visiting this link to fashion your data to a more appropriate web page.
There are a set of research tools called TimeSearcher 1--3 that provide some examples of how to deal with large time-series datasets. Below are some example images from TimeSearcher 2 and 3.
I realized that simple plot() in MATLAB actually gives me more or less what I want. When zoomed out, it renders all of the datapoints that map to a pixel column as vertical line segments from the minimum to the maximum within the set, so as not to obscure the function's actual behavior. I used area() to increase the contrast.