Indexing in Rust ndarray crate based on a boolean mask - numpy

I would like to efficiently index into an ndarray using a boolean mask. To better convey what I mean I have some working numpy code and then my attempt in rust ndarray which works but is extremely inefficient.
Numpy:
import numpy as np
shape = (100, 100, 100)
grouping_array = np.random.randint(0, 100, size=shape)
data_array = np.random.rand(*shape)
for i in range(1, 100):
ith_mean = data_array[grouping_array == i].mean()
print(ith_mean)
Rust ndarray:
fn group_means(
data: &Array<f32, IxDyn>,
grouping_var: &Array<f32, IxDyn>,
n_groups: i32,
) {
for group in 1..n_groups {
let index_array = grouping_var.mapv(|x| x == roi as f32);
let roi_data = Array::from_iter(
image_data
.iter()
.zip(index_array.iter())
.map(|(x, y)| if *y { *x } else { 0. })
);
let mean_roi = roi_data.mean().unwrap();
println!("group {}; mean {}", group, mean_roi);
}
}
Here each iteration in the n_groups loop takes about as long as the whole numpy script which is done in less than a second. Is there a better way to do this in the rust-ndarray version?

This is likely not a surprise to others, but since my grouping_var array should (in my use case) always be 3D array, I changed its type (and therefore also index_array) from &Array<f32, IxDyn> to &Array<f32, Ix3> which dramatically improved performance.

Related

How to use the 'sphereize data' option with PCA in TensorFlow

I have used PCA with the 'Sphereize data' option on the following page successfully: https://projector.tensorflow.org/
I wonder how to run the same computation locally using the TensorFlow API. I found the PCA documentation in the API documentation, but I am not sure if sphereizing the data is available somewhere in the API too?
The "sphereize data" option normalizes the data by shifting each point by the centroid and making unit norm.
Here is the code used in Tensorboard (in typescript):
normalize() {
// Compute the centroid of all data points.
let centroid = vector.centroid(this.points, (a) => a.vector);
if (centroid == null) {
throw Error('centroid should not be null');
}
// Shift all points by the centroid and make them unit norm.
for (let id = 0; id < this.points.length; ++id) {
let dataPoint = this.points[id];
dataPoint.vector = vector.sub(dataPoint.vector, centroid);
if (vector.norm2(dataPoint.vector) > 0) {
// If we take the unit norm of a vector of all 0s, we get a vector of
// all NaNs. We prevent that with a guard.
vector.unit(dataPoint.vector);
}
}
}
You can reproduce that normalization using the following python function:
def sphereize_data(x):
"""
x is a 2D Tensor of shape :(num_vectors, dim_vectors)
"""
centroids = tf.reduce_mean(x, axis=0, keepdims=True)
return tf.math.div_no_nan((x - centroids), tf.norm(x - centroids, axis=0, keepdims=True))

Why does cythons in-place division of numpy arrays use conversion to python floats?

I tried to normalize a vector stored as numpy array, but cython -a shows unexpected conversions to Python values in this code.
Minimal example:
import numpy as np
cimport cython
cimport numpy as np
#cython.wraparound(False)
#cython.boundscheck(False)
cdef vec_diff(np.ndarray[double, ndim=1] vec1, double m):
vec1/=m
return vec1
Cython 0.29.6 run with the -a option generates the following code for the line vec1/=m:
__pyx_t_1 = PyFloat_FromDouble(__pyx_v_m); if (unlikely(!__pyx_t_1)) __PYX_ERR(0, 8, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_1);
__pyx_t_2 = __Pyx_PyNumber_InPlaceDivide(((PyObject *)__pyx_v_vec1), __pyx_t_1); if (unlikely(!__pyx_t_2)) __PYX_ERR(0, 8, __pyx_L1_error)
__Pyx_GOTREF(__pyx_t_2);
__Pyx_DECREF(__pyx_t_1); __pyx_t_1 = 0;
if (!(likely(((__pyx_t_2) == Py_None) || likely(__Pyx_TypeTest(__pyx_t_2, __pyx_ptype_5numpy_ndarray))))) __PYX_ERR(0, 8, __pyx_L1_error)
__pyx_t_3 = ((PyArrayObject *)__pyx_t_2);
{
__Pyx_BufFmt_StackElem __pyx_stack[1];
__Pyx_SafeReleaseBuffer(&__pyx_pybuffernd_vec1.rcbuffer->pybuffer);
__pyx_t_4 = __Pyx_GetBufferAndValidate(&__pyx_pybuffernd_vec1.rcbuffer->pybuffer, (PyObject*)__pyx_t_3, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack);
if (unlikely(__pyx_t_4 < 0)) {
PyErr_Fetch(&__pyx_t_5, &__pyx_t_6, &__pyx_t_7);
if (unlikely(__Pyx_GetBufferAndValidate(&__pyx_pybuffernd_vec1.rcbuffer->pybuffer, (PyObject*)__pyx_v_vec1, &__Pyx_TypeInfo_double, PyBUF_FORMAT| PyBUF_STRIDES, 1, 0, __pyx_stack) == -1)) {
Py_XDECREF(__pyx_t_5); Py_XDECREF(__pyx_t_6); Py_XDECREF(__pyx_t_7);
__Pyx_RaiseBufferFallbackError();
} else {
PyErr_Restore(__pyx_t_5, __pyx_t_6, __pyx_t_7);
}
__pyx_t_5 = __pyx_t_6 = __pyx_t_7 = 0;
}
__pyx_pybuffernd_vec1.diminfo[0].strides = __pyx_pybuffernd_vec1.rcbuffer->pybuffer.strides[0]; __pyx_pybuffernd_vec1.diminfo[0].shape = __pyx_pybuffernd_vec1.rcbuffer->pybuffer.shape[0];
if (unlikely(__pyx_t_4 < 0)) __PYX_ERR(0, 8, __pyx_L1_error)
}
__pyx_t_3 = 0;
__Pyx_DECREF_SET(__pyx_v_vec1, ((PyArrayObject *)__pyx_t_2));
__pyx_t_2 = 0;
where the first line __pyx_t_1 = PyFloat_FromDouble(__pyx_v_m); has PyFloat_FromDouble highlighted in dark red.
Given that I have told cython that the array contains double values, why does it have to convert to a python float?
Note: Memoryviews do not support the /= operation (would require a loop)
Because this isn't something that Cython does anything special for or optimises at all. All it's doing is calling __Pyx_PyNumber_InPlaceDivide on the Numpy array, which calls the Numpy array's __idiv__ operator.
Since it's calling a Python operator it needs to pass a Python object as the second argument, and hence it needs to convert your double to a Python float.
The Numpy __idiv__ operator is almost certainly written in C so likely to be pretty fast (although there is a little overhead calling it) so there's not a lot of value in Cython doing anything except delegating to Numpy's code.
Memoryviews don't define the whole-array operators (they're just ways to access memory so don't make any claims about meaningful mathematical operations) and hence the fact that it doesn't work is consistent with how Cython deals with these operators.

Cython Typing List of Strings

I'm trying to use cython to improve the performance of a loop, but I'm running
into some issues declaring the types of the inputs.
How do I include a field in my typed struct which is a string that can be
either 'front' or 'back'
I have a np.recarray that looks like the following (note the length of the
recarray is unknown as compile time)
import numpy as np
weights = np.recarray(4, dtype=[('a', np.int64), ('b', np.str_, 5), ('c', np.float64)])
weights[0] = (0, "front", 0.5)
weights[1] = (0, "back", 0.5)
weights[2] = (1, "front", 1.0)
weights[3] = (1, "back", 0.0)
as well as inputs of a list of strings and a pandas.Timestamp
import pandas as pd
ts = pd.Timestamp("2015-01-01")
contracts = ["CLX16", "CLZ16"]
I am trying to cythonize the following loop
def ploop(weights, contracts, timestamp):
cwts = []
for gen_num, position, weighting in weights:
if weighting != 0:
if position == "front":
cntrct_idx = gen_num
elif position == "back":
cntrct_idx = gen_num + 1
else:
raise ValueError("transition.columns must contain "
"'front' or 'back'")
cwts.append((gen_num, contracts[cntrct_idx], weighting, timestamp))
return cwts
My attempt involved typing the weights input as a struct in cython,
in a file struct_test.pyx as follows
import numpy as np
cimport numpy as np
cdef packed struct tstruct:
np.int64_t gen_num
char[5] position
np.float64_t weighting
def cloop(tstruct[:] weights_array, contracts, timestamp):
cdef tstruct weights
cdef int i
cdef int cntrct_idx
cwts = []
for k in xrange(len(weights_array)):
w = weights_array[k]
if w.weighting != 0:
if w.position == "front":
cntrct_idx = w.gen_num
elif w.position == "back":
cntrct_idx = w.gen_num + 1
else:
raise ValueError("transition.columns must contain "
"'front' or 'back'")
cwts.append((w.gen_num, contracts[cntrct_idx], w.weighting,
timestamp))
return cwts
But I am receiving runtime errors, which I believe are related to the
char[5] position.
import pyximport
pyximport.install()
import struct_test
struct_test.cloop(weights, contracts, ts)
ValueError: Does not understand character buffer dtype format string ('w')
In addition I am a bit unclear how I would go about typing contracts as well
as timestamp.
Your ploop (without the timestamp variable) produces:
In [226]: ploop(weights, contracts)
Out[226]: [(0, 'CLX16', 0.5), (0, 'CLZ16', 0.5), (1, 'CLZ16', 1.0)]
Equivalent function without a loop:
def ploopless(weights, contracts):
arr_contracts = np.array(contracts) # to allow array indexing
wgts1 = weights[weights['c']!=0]
mask = wgts1['b']=='front'
wgts1['b'][mask] = arr_contracts[wgts1['a'][mask]]
mask = wgts1['b']=='back'
wgts1['b'][mask] = arr_contracts[wgts1['a'][mask]+1]
return wgts1.tolist()
In [250]: ploopless(weights, contracts)
Out[250]: [(0, 'CLX16', 0.5), (0, 'CLZ16', 0.5), (1, 'CLZ16', 1.0)]
I'm taking advantage of the fact that returned list of tuples has same (int, str, int) layout as the input weight array. So I'm just making a copy of weights and replacing selected values of the b field.
Note that I use the field selection index before the mask one. The boolean mask produces a copy, so we have to careful about indexing order.
I'm guessing that loop-less array version will be competitive in time with the cloop (on realistic arrays). The string and list operations in cloop probably limit its speedup.

how to resize and subtract numpy arrays in c++

I have two numpy 3D-array in python with different height and width. I want to pass them to my C-Extension. How can I resize and subtract them in c++? Please see the comments in the code.
static PyObject *my_func(PyObject *self, PyObject *args)
{
Py_Initialize();
import_array();
PyObject *arr1;
PyObject *arr2;
if(!PyArg_ParseTuple(args, "OO", &arr1, &arr2))
{
return NULL;
}
//How can I do this?
//resize arr1 to [100, 100, 3]
//resize arr2 to [100, 100, 3]
//res = arr1 - arr2
//return res
}
Start by making the desired shape. It's easier to do this as a tuple than a list:
PyObject* shape = Py_BuildValue("iii",100,100,3);
Check this against NULL to ensure do error has occurred and handle if it has.
You can call the numpy resize function on both arrays to resize them. Unless you are certain that the data isn't shared then you need to call numpy.resize rather than the .resize method of the arrays. This involves importing the module and getting the resize attribute:
PyObject* np = PyImport_ImportModule("numpy");
PyObject* resize = PyObject_GetAttrString(np,"resize");
PyObject* resize_result = PyObject_CallFunctionObjArgs(resize,arr1, shape,NULL);
I've omitted all the error checking, which you should do after each stage.
Make sure you decrease the reference counts on the various PyObjects once you don't need them any more.
Use PyNumber_Subtract to do the subtraction (do it on the result from resize).
Addition: A shortcut for calling resize that should avoid most of the intermediates:
PyObject* np = PyImport_ImportModule("numpy");
// error check against null
PyObject* resize_result = PyObject_CallMethod(np,"resize","O(iii)",arr1,100,100,3);
(The "(iii)" creates the shape tuple rather than needing to do it separately.)
If you are certain that arr1 and arr2 are the only owners of the data then you can call the numpy .resize method either by the normal C API function calls or the specific numpy function PyArray_Resize.

PyOpenCL reduction Kernel on each pixel of image as array instead of each byte (RGB mode, 24 bits )

I'm trying to calculate the average Luminance of an RGB image. To do this, I find the luminance of each pixel i.e.
L(r,g,b) = X*r + Y*g + Z*b (some linear combination).
And then find the average by summing up luminance of all pixels and dividing by width*height.
To speed this up, I'm using pyopencl.reduction.ReductionKernel
The array I pass to it is a Single Dimension Numpy Array so it works just like the example given.
import Image
import numpy as np
im = Image.open('image_00000001.bmp')
data = np.asarray(im).reshape(-1) # so data is a single dimension list
# data.dtype is uint8, data.shape is (w*h*3, )
I want to incorporate the following code from the example into it . i.e. I would make changes to datatype and the type of arrays I'm passing. This is the example:
a = pyopencl.array.arange(queue, 400, dtype=numpy.float32)
b = pyopencl.array.arange(queue, 400, dtype=numpy.float32)
krnl = ReductionKernel(ctx, numpy.float32, neutral="0",
reduce_expr="a+b", map_expr="x[i]*y[i]",
arguments="__global float *x, __global float *y")
my_dot_prod = krnl(a, b).get()
Except, my map_expr will work on each pixel and convert each pixel to its luminance value.
And reduce expr remains the same.
The problem is, it works on each element in the array, and I need it to work on each pixel which is 3 consecutive elements at a time (RGB ).
One solution is to have three different arrays, one for R, one for G and one for B ,which would work, but is there another way ?
Edit: I changed the program to illustrate the char4 usage instead of float4:
import numpy as np
import pyopencl as cl
import pyopencl.array as cl_array
deviceID = 0
platformID = 0
workGroup=(1,1)
N = 10
testData = np.zeros(N, dtype=cl_array.vec.char4)
dev = cl.get_platforms()[platformID].get_devices()[deviceID]
ctx = cl.Context([dev])
queue = cl.CommandQueue(ctx)
mf = cl.mem_flags
Data_In = cl.Buffer(ctx, mf.READ_WRITE, testData.nbytes)
prg = cl.Program(ctx, """
__kernel void Pack_Cmplx( __global char4* Data_In, int N)
{
int gid = get_global_id(0);
//Data_In[gid] = 1; // This would change all components to one
Data_In[gid].x = 1; // changing single component
Data_In[gid].y = 2;
Data_In[gid].z = 3;
Data_In[gid].w = 4;
}
""").build()
prg.Pack_Cmplx(queue, (N,1), workGroup, Data_In, np.int32(N))
cl.enqueue_copy(queue, testData, Data_In)
print testData
I hope it helps.