Creating an "array_like" QImage subclass for numpy.array() - numpy

I want to create an "array_like" QImage subclass that can be passed to numpy.array().
I'd like to avoid using PIL as a substitute; the whole point of this is to avoid the dependency on PIL. Besides, constantly converting between QImage and the PIL Image is impractical for my program.
I find the documentation cryptic, and after reading it I'm still confused about how to emulate the array interface. As the numpy documentation states, to qualify as an "array_like" object, it needs the __array_interface__ attribute, which is a dictionary with five keys. However, I've never dealt with types, buffers, and memory before; if someone could explain how to solve this problem it would be much appreciated.
I'm using Python 3.3 and PySide 1.1.2.
Thanks to all who reply!

It's easier to just use the buffer object returned from QImage.bits() and np.frombuffer().
def qimage2array(q_image):
width = q_image.width()
height = q_image.height()
arr = np.frombuffer(q_image.bits(), dtype=np.uint8).reshape([height, width, -1])
return arr

Related

Is there a fast way to insert values back into a Numpy Array

Alright, I have some segmented image data s which is defined by a mask s, which is basically a 3D binary field where 1 defines a voxel which is part of the segment and 0 defines a voxel which is not part of it. I am trying to get representation of this segment which is as small as possible. this is rather easy, I can use the following:
compressed = s.flatten()[m.flatten() == 1]
my question is, given compressed and m, is there a similar Numpy method or an equally fast way to reconstruct s?
Alright feeling pretty dumb that I didn't realize that I could've just used the following:
a = np.zeros(m.shape)
a[m == 1] = compressed
>>> np.equal(a, s).all()
True
Hope this still is of some use to someone who isn't able to figure this out either!

How do you use/view memoryview objects in Cython?

I've got a project where a handful of nested for-loops are slowing down the runtime of the code so I've started implementing some Cython typing and it sped up the runtime of the loops significantly but I've run into a new problem, the typing I'm using doesn't allow for any computations to be done one them. Here's a mock sketch of my code:
cdef double[:,:] my_matrix = np.zeros([width, height])
for i in range(0,width):
for j in range(0,height):
a = v1[i] - v2[j]
my_matrix[i,j] = np.sqrt(a**2)
After that I want to compute the product of my_matrix using
A complex number
Two constants
The exponential function
The matrix itself, like so:
product = constant1 * np.exp(-1j * constant2 * my_matrix) / my_matrix
By attempting this I get the error:
TypeError: unsupported operand type(s) for *: 'complex' and 'my_cython_function_cy._memoryviewslice'
I understand the implication of this error but I dont get how to use the contents of the memoryview-object as an array, I tried doing this;
new_matrix = my_matrix
but that won't compile. I'm new to both C and Cython and the documentation isn't very helpful for these rookie-questions so I would be very grateful for any help here.
The best thing to do is:
new_matrix = np.as_array(my_matrix)
That lets you access the full set of Numpy operations on the array. It should be a pretty lightweight transformation (they'll share the same underlying data).
You could also get the wrapped object with my_matrix.base (this would probably be the original Numpy array that you initialized it with). However, depending on what you've done with slicing this might not be quite the same as the memoryview, so be a bit wary of this approach.

Copying a PyTorch Variable to a Numpy array

Suppose I have a PyTorch Variable in GPU:
var = Variable(torch.rand((100,100,100))).cuda()
What's the best way to copy (not bridge) this variable to a NumPy array?
var.clone().data.cpu().numpy()
or
var.data.cpu().numpy().copy()
By running a quick benchmark, .clone() was slightly faster than .copy(). However, .clone() + .numpy() will create a PyTorch Variable plus a NumPy bridge, while .copy() will create a NumPy bridge + a NumPy array.
This is a very interesting question. According to me, the question is little bit opinion-based and I would like to share my opinion on this.
From the above two approaches, I would prefer the first one (use clone()). Since your goal is to copy information, essentially you need to invest extra memory. clone() and copy() should take a similar amount of storage since creating numpy bridge doesn't cause extra memory. Also, I didn't understand what you meant by, copy() will create two numPy arrays. And as you mentioned, clone() is faster than copy(), I don't see any other problem with using clone().
I would love to give a second thought on this if anyone can provide some counter arguments.
Because clone() is recorded by AD second options is less intense. There are few options you may also consider.

How do I include a matplotlib Figure object as subplot? [duplicate]

This question already has an answer here:
Embed matplotlib figure in larger figure
(1 answer)
Closed 8 years ago.
How can I use a matplotlib Figure object as a subplot? Specifically, I have a function that creates a matplotlib Figure object, and I would like to include this as a subplot in another Figure.
In short, here's stripped-down pseudocode for what I've tried:
fig1 = plt.figure(1, facecolor='white')
figa = mySeparatePlottingFunc(...)
figb = mySeparatePlottingFunc(...)
figc = mySeparatePlottingFunc(...)
figd = mySeparatePlottingFunc(...)
fig1.add_subplot(411, figure=figa)
fig1.add_subplot(412, figure=figb)
fig1.add_subplot(413, figure=figc)
fig1.add_subplot(414, figure=figd)
fig1.show()
Sadly, however, this fails. I know for a fact the individual plots returned from the function invocations are viable--I did a figa.show(),...,figd.show() to confirm that they are OK. What I get for the final line in the above code block--fig1.show()--is
a collection of four empty plots that have frames and x- and y- tickmarks/labels.
I've done quite a bit of googling around, and experimented extensively, but it's clear that I've missed something that is either really subtle, or embarrassingly obvious (I'll be happy for it to be the latter as long as I can get un-stuck).
Thanks for any advice you can offer!
You can't put a figure in a figure.
You should modify your plotting functions to take axes objects as an argument.
I am also unclear why the kwarg figure is there, I think it is an artifact of the way that inheritance works, the way that the documentation is auto-generated, and the way some of the getter/setter work is automated. If you note, it says figure is undocumented in the Figure documentation, so it might not do what you want;). If you dig down a bit, what that kwarg really controls is the figure that the created axes is attached too, which is not what you want.
In general, moving existing axes/artists between figures is not easy, there are too many bits of internal plumbing that need to be re-connected. I think it can be done, but will involving touching the internals and there is no guarantee that it will work with future versions or that you will get warning if the internals change in a way that will break it.
You need to your plotting functions to take an Axes object as argument. You can use a pattern like:
def myPlotting(..., ax=None):
if ax is None:
# your existing figure generating code
ax = gca()
so if you pass in an Axes object it gets drawn to (the new functionality you need), but if you don't all of your old code will work as expected.

Understanding Numpy internals for profiling purposes

Profiling a piece of numpy code shows that I'm spending most of the time within these two functions
numpy/matrixlib/defmatrix.py.__getitem__:301
numpy/matrixlib/defmatrix.py.__array_finalize__:279
Here's the Numpy source:
https://github.com/numpy/numpy/blob/master/numpy/matrixlib/defmatrix.py#L301
https://github.com/numpy/numpy/blob/master/numpy/matrixlib/defmatrix.py#L279
Question #1:
__getitem__ seems to be called every time I'm using something like my_array[arg] and it's getting more expensive if arg is not an integer but a slice. Is there any way to speed up calls to array slices?
E.g. in
for i in range(idx): res[i] = my_array[i:i+10].mean()
Question #2:
When exactly does __array_finalize__ get called and how can I speed up by reducing the number of calls to this function?
Thanks!
You could not use matrices as much and just use 2d numpy arrays. I typically only use matrices for a short-time to take advantage of the syntax for multiplication (but with the addition of the .dot method on arrays, I find I do that less and less as well).
But, to your questions:
1) There really is no short-cut to __getitem__ unless defmatrix over-rides __getslice__ which it could do but doesn't yet. There are the .item and .itemset methods which are optimized for integer getting and setting (and return Python objects rather than NumPy's array-scalars)
2) __array_finalize__ is called whenever an array object (or a subclass) is created. It is called from the C-function that every array-creation gets funneled through. https://github.com/numpy/numpy/blob/master/numpy/core/src/multiarray/ctors.c#L1003
In the case of sub-classes defined purely in Python, it is calling back into the Python interpreter from C which has overhead. If the matrix class were a builtin type (a Cython-based cdef class, for example), then the call could avoid the Python interpreter overhead.
Question 1:
Since array slices can sometimes require a copy of the underlying data structure (holding the pointers to the data in memory) they can be quite expensive. If you're really bottlenecked by this in your above example, you can perform mean operations by actually iterating over the i to i+10 elements and manually creating the mean. For some operations this won't give any performance improvement, but avoiding creating new data structures will generally speed up the process.
Another note, if you're not using native types inside numpy you will get a Very large performance penalty to manipulating a numpy array. Say you're array has dtype=float64 and your native machine float size is float32 -- this will cost a lot of extra computation power for numpy and performance overall will drop. Sometimes this is fine and you can just take the hit for maintaining a data type. Other times it's arbitrary what type the float or int is stored as internally. In these cases try dtype=float instead of dtype=float64. Numpy should default to your native type. I've had 3x+ speedups on numpy intensive algorithms by making this change.
Question 2:
__array_finalize__ "is called whenever the system internally allocates a new array from obj, where obj is a subclass (subtype) of the (big)ndarray" according to SciPy. Thus this is a result described in the first question. When you slice and make a new array, you have to finalize that array by either making structural copies or wrapping the original structure. This operation takes time. Avoiding slices will save on this operation, though for multidimensional data it may be impossible to completely avoid calls to __array_finalize__.