Fastest way to apply arithmetic operations to System.Array in IronPython

Fastest way to apply arithmetic operations to System.Array in IronPython - iterator

I would like to add (arithmetics) two large System.Arrays element-wise in IronPython and store the result in the first array like this:
for i in range(0:ArrA.Count) :
arrA.SetValue(i, arrA.GetValue(i) + arrB.GetValue(i));
However, this seems very slow. Having a C background I would like to use pointers or iterators. However, I do not know how I should apply the IronPython idiom in a fast way. I cannot use Python lists, as my objects are strictly from type System.Array. The type is 3d float.
What is the fastests / a fast way to perform to compute this computation?
Edit:
The number of elements is appr. 256^3.
3d float means that the array can be accessed like this: array.GetValue(indexX, indexY, indexZ). I am not sure how the respective memory is organized in IronPython's System.Array.
Background: I wrote an interface to an IronPython API, which gives access to data in a simulation software tool. I retrieve 3d scalar data and accumulate it to a temporal array in my IronPython script. The accumulation is performed 10,000 times and should be fast, so that the simulation does not take ages.

Is it possible to use the numpy library developed for IronPython?
https://pytools.codeplex.com/wikipage?title=NumPy%20and%20SciPy%20for%20.Net
It appears to be supported, and as far as I know is as close you can get in python to C style pointer functionality with arrays and such.
Create an array:
x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
Multiply all elements by 3.0:
x *= 3.0

Related

How do you use/view memoryview objects in Cython?

I've got a project where a handful of nested for-loops are slowing down the runtime of the code so I've started implementing some Cython typing and it sped up the runtime of the loops significantly but I've run into a new problem, the typing I'm using doesn't allow for any computations to be done one them. Here's a mock sketch of my code:
cdef double[:,:] my_matrix = np.zeros([width, height])
for i in range(0,width):
for j in range(0,height):
a = v1[i] - v2[j]
my_matrix[i,j] = np.sqrt(a**2)
After that I want to compute the product of my_matrix using
A complex number
Two constants
The exponential function
The matrix itself, like so:
product = constant1 * np.exp(-1j * constant2 * my_matrix) / my_matrix
By attempting this I get the error:
TypeError: unsupported operand type(s) for *: 'complex' and 'my_cython_function_cy._memoryviewslice'
I understand the implication of this error but I dont get how to use the contents of the memoryview-object as an array, I tried doing this;
new_matrix = my_matrix
but that won't compile. I'm new to both C and Cython and the documentation isn't very helpful for these rookie-questions so I would be very grateful for any help here.

The best thing to do is:
new_matrix = np.as_array(my_matrix)
That lets you access the full set of Numpy operations on the array. It should be a pretty lightweight transformation (they'll share the same underlying data).
You could also get the wrapped object with my_matrix.base (this would probably be the original Numpy array that you initialized it with). However, depending on what you've done with slicing this might not be quite the same as the memoryview, so be a bit wary of this approach.

Matrices with different row lengths in numpy

Is there a way of defining a matrix (say m) in numpy with rows of different lengths, but such that m stays 2-dimensional (i.e. m.ndim = 2)?
For example, if you define m = numpy.array([[1,2,3], [4,5]]), then m.ndim = 1. I understand why this happens, but I'm interested if there is any way to trick numpy into viewing m as 2D. One idea would be padding with a dummy value so that rows become equally sized, but I have lots of such matrices and it would take up too much space. The reason why I really need m to be 2D is that I am working with Theano, and the tensor which will be given the value of m expects a 2D value.

I'll give here very new information about Theano. We have a new TypedList() type, that allow to have python list with all elements with the same type: like 1d ndarray. All is done, except the documentation.
There is limited functionality you can do with them. But we did it to allow looping over the typed list with scan. It is not yet integrated with scan, but you can use it now like this:
import theano
import theano.typed_list
a = theano.typed_list.TypedListType(theano.tensor.fvector)()
s, _ = theano.scan(fn=lambda i, tl: tl[i].sum(),
non_sequences=[a],
sequences=[theano.tensor.arange(2, dtype='int64')])
f = theano.function([a], s)
f([[1, 2, 3], [4, 5]])
One limitation is that the output of scan must be an ndarray, not a typed list.

No, this is not possible. NumPy arrays need to be rectangular in every pair of dimensions. This is due to the way they map onto memory buffers, as a pointer, itemsize, stride triple.
As for this taking up space: np.array([[1,2,3], [4,5]]) actually takes up more space than a 2×3 array, because it's an array of two pointers to Python lists (and even if the elements were converted to arrays, the memory layout would still be inefficient).

Data Compression Through Polynomial Equations

I am trying to make a new method of compressing data. What I want to do is take say a massive set of numbers, and then try and apply/generate polynomial functions to the data. I would then store the polynomial, with starting parameters, rather than each individual bit of data. This would mean at runtime, that the opening program would just use the polynomial to recreate the data.
My question: Are there any libraries or functions in Visual Basic .NET that will take say an array of data {0, 2, 4, 6, ..., Large Even} and return a "trendline" for it? Much like excel's graph trendline option.

Understanding Numpy internals for profiling purposes

Profiling a piece of numpy code shows that I'm spending most of the time within these two functions
numpy/matrixlib/defmatrix.py.__getitem__:301
numpy/matrixlib/defmatrix.py.__array_finalize__:279
Here's the Numpy source:
https://github.com/numpy/numpy/blob/master/numpy/matrixlib/defmatrix.py#L301
https://github.com/numpy/numpy/blob/master/numpy/matrixlib/defmatrix.py#L279
Question #1:
__getitem__ seems to be called every time I'm using something like my_array[arg] and it's getting more expensive if arg is not an integer but a slice. Is there any way to speed up calls to array slices?
E.g. in
for i in range(idx): res[i] = my_array[i:i+10].mean()
Question #2:
When exactly does __array_finalize__ get called and how can I speed up by reducing the number of calls to this function?
Thanks!

You could not use matrices as much and just use 2d numpy arrays. I typically only use matrices for a short-time to take advantage of the syntax for multiplication (but with the addition of the .dot method on arrays, I find I do that less and less as well).
But, to your questions:
1) There really is no short-cut to __getitem__ unless defmatrix over-rides __getslice__ which it could do but doesn't yet. There are the .item and .itemset methods which are optimized for integer getting and setting (and return Python objects rather than NumPy's array-scalars)
2) __array_finalize__ is called whenever an array object (or a subclass) is created. It is called from the C-function that every array-creation gets funneled through. https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c#L1003
In the case of sub-classes defined purely in Python, it is calling back into the Python interpreter from C which has overhead. If the matrix class were a builtin type (a Cython-based cdef class, for example), then the call could avoid the Python interpreter overhead.

Question 1:
Since array slices can sometimes require a copy of the underlying data structure (holding the pointers to the data in memory) they can be quite expensive. If you're really bottlenecked by this in your above example, you can perform mean operations by actually iterating over the i to i+10 elements and manually creating the mean. For some operations this won't give any performance improvement, but avoiding creating new data structures will generally speed up the process.
Another note, if you're not using native types inside numpy you will get a Very large performance penalty to manipulating a numpy array. Say you're array has dtype=float64 and your native machine float size is float32 -- this will cost a lot of extra computation power for numpy and performance overall will drop. Sometimes this is fine and you can just take the hit for maintaining a data type. Other times it's arbitrary what type the float or int is stored as internally. In these cases try dtype=float instead of dtype=float64. Numpy should default to your native type. I've had 3x+ speedups on numpy intensive algorithms by making this change.
Question 2:
__array_finalize__ "is called whenever the system internally allocates a new array from obj, where obj is a subclass (subtype) of the (big)ndarray" according to SciPy. Thus this is a result described in the first question. When you slice and make a new array, you have to finalize that array by either making structural copies or wrapping the original structure. This operation takes time. Avoiding slices will save on this operation, though for multidimensional data it may be impossible to completely avoid calls to __array_finalize__.

Numpy: Reduce memory footprint of dot product with random data

I have a large numpy array that I am going to take a linear projection of using randomly generated values.
>>> input_array.shape
(50, 200000)
>>> random_array = np.random.normal(size=(200000, 300))
>>> output_array = np.dot(input_array, random_array)
Unfortunately, random_array takes up a lot of memory, and my machine starts swapping. It seems to me that I don't actually need all of random_array around at once; in theory, I ought to be able to generate it lazily during the dot product calculation...but I can't figure out how.
How can I reduce the memory footprint of the calculation of output_array from input_array?

This obviously isn't the fastest solution, but have you tried:
m, inner = input_array.shape
n = 300
out = np.empty((m, n))
for i in xrange(n):
out[:, i] = np.dot(input_array, np.random.normal(size=inner))

This might be a situation where using cython could reduce your memory usage. You could generate the random numbers on the fly and accumulate the result as you go. I don't have the time to write and test the full function, but you would definitely want to use randomkit (the library that numpy uses under the hood) at the c-level.
You can take a look at some example code I wrote for another application to see how to wrap randomkit:
https://github.com/synapticarbors/pylangevin-integrator/blob/master/cIntegrator.pyx
And also check out how matrix multiplication is implemented in the following paper on cython:
http://conference.scipy.org/proceedings/SciPy2009/paper_2/full_text.pdf
Instead of having both arrays as inputs, just have input_array as one, and then in the method, generate small chunks of the random array as you go.
Sorry if it is just a sketch instead of actual code, but hopefully it is enough to get you started.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas