PyOpenCL reduction Kernel on each pixel of image as array instead of each byte (RGB mode, 24 bits ) - numpy

I'm trying to calculate the average Luminance of an RGB image. To do this, I find the luminance of each pixel i.e.
L(r,g,b) = X*r + Y*g + Z*b (some linear combination).
And then find the average by summing up luminance of all pixels and dividing by width*height.
To speed this up, I'm using pyopencl.reduction.ReductionKernel
The array I pass to it is a Single Dimension Numpy Array so it works just like the example given.
import Image
import numpy as np
im ='image_00000001.bmp')
data = np.asarray(im).reshape(-1) # so data is a single dimension list
# data.dtype is uint8, data.shape is (w*h*3, )
I want to incorporate the following code from the example into it . i.e. I would make changes to datatype and the type of arrays I'm passing. This is the example:
a = pyopencl.array.arange(queue, 400, dtype=numpy.float32)
b = pyopencl.array.arange(queue, 400, dtype=numpy.float32)
krnl = ReductionKernel(ctx, numpy.float32, neutral="0",
reduce_expr="a+b", map_expr="x[i]*y[i]",
arguments="__global float *x, __global float *y")
my_dot_prod = krnl(a, b).get()
Except, my map_expr will work on each pixel and convert each pixel to its luminance value.
And reduce expr remains the same.
The problem is, it works on each element in the array, and I need it to work on each pixel which is 3 consecutive elements at a time (RGB ).
One solution is to have three different arrays, one for R, one for G and one for B ,which would work, but is there another way ?

Edit: I changed the program to illustrate the char4 usage instead of float4:
import numpy as np
import pyopencl as cl
import pyopencl.array as cl_array
deviceID = 0
platformID = 0
N = 10
testData = np.zeros(N, dtype=cl_array.vec.char4)
dev = cl.get_platforms()[platformID].get_devices()[deviceID]
ctx = cl.Context([dev])
queue = cl.CommandQueue(ctx)
mf = cl.mem_flags
Data_In = cl.Buffer(ctx, mf.READ_WRITE, testData.nbytes)
prg = cl.Program(ctx, """
__kernel void Pack_Cmplx( __global char4* Data_In, int N)
int gid = get_global_id(0);
//Data_In[gid] = 1; // This would change all components to one
Data_In[gid].x = 1; // changing single component
Data_In[gid].y = 2;
Data_In[gid].z = 3;
Data_In[gid].w = 4;
prg.Pack_Cmplx(queue, (N,1), workGroup, Data_In, np.int32(N))
cl.enqueue_copy(queue, testData, Data_In)
print testData
I hope it helps.


Shortest rotation between two vectors not working like expected

def signed_angle_between_vecs(target_vec, start_vec, plane_normal=None):
start_vec = np.array(start_vec)
target_vec = np.array(target_vec)
start_vec = start_vec/np.linalg.norm(start_vec)
target_vec = target_vec/np.linalg.norm(target_vec)
if plane_normal is None:
arg1 =, target_vec), np.cross(start_vec, target_vec))
arg1 =, target_vec), plane_normal)
arg2 =, target_vec)
return np.arctan2(arg1, arg2)
from scipy.spatial.transform import Rotation as R
world_frame_axis = input_rotation_object.apply(canonical_axis)
angle = signed_angle_between_vecs(canonical_axis, world_frame_axis)
axis_angle = np.cross(world_frame_axis, canonical_axis) * angle
C = R.from_rotvec(axis_angle)
transformed_world_frame_axis_to_canonical = C.apply(world_frame_axis)
I am trying to align world_frame_axis to canonical_axis by performing a rotation around the normal vector generated by the cross product between the two vectors, using the signed angle between the two axes.
However, this code does not work. If you start with some arbitrary rotation as input_rotation_object you will see that transformed_world_frame_axis_to_canonical does not match canonical_axis.
What am I doing wrong?
not a python coder so I might be wrong but this looks suspicious:
start_vec = start_vec/np.linalg.norm(start_vec)
from the names I would expect that np.linalg.norm normalizes the vector already so the line should be:
start_vec = np.linalg.norm(start_vec)
and all the similar lines too ...
Also the atan2 operands are not looking right to me. I would (using math):
a = start_vec / |start_vec | // normalized start
b = target_vec / |target_vec| // normalized end
u = a // normalized one axis of plane
v = cross(u ,b)
v = cross(v ,u)
v = v / |v| // normalized second axis of plane perpendicular to u
dx = dot(u,b) // target vector in 2D aligned to start
dy = dot(v,b)
ang = atan2(dy,dx)
beware the ang might negated (depending on your notations) if the case either add minus sign or reverse the order in cross(u,v) to cross(v,u) Also you can do sanity check with comparing result to unsigned:
ang' = acos(dot(a,b))
in absolute values they should be the same (+/- rounding error).

Add a constant variable to a cuda.FloatTensor

I have two question:
1) I'd like to know how can I add/subtract a constante torch.FloatTensor of size 1 to all of the elemets of a torch.FloatTensor of size 30.
2) How can I multiply each element of a torch.FloatTensor of size 30 by a random value (different or not for each).
My code:
import torch
dtype = torch.cuda.FloatTensor
def main():
pop, xmax, xmin = 30, 5, -5
x = (xmax-xmin)*torch.rand(pop).type(dtype)+xmin
y = torch.pow(x, 2)
[miny, indexmin] = y.min(0)
gxbest = x[indexmin]
pxbest = x
pybest = y
v = torch.rand(pop)
vnext = torch.rand()*v + torch.rand()*(pxbest - x) + torch.rand()*(gxbest - x)
What is the best way to do it? I think I should so how convert the gxbest into a torch.FloatTensor of size 30 but how can I do that?
I've try to create a vector:
But it did not work. The multiplication is not working also.
RuntimeError: inconsistent tensor size
Thank you all for your help!
1) How can I add/subtract a constant torch.FloatTensor of size 1 to all of the elements of a torch.FloatTensor of size 30?
You can do it directly in pytorch 0.2.
import torch
a = torch.randn(30)
b = torch.randn(1)
In case if you get any error due to size mismatch, you can make a small change as follows.
print(a-b.expand(a.size(0))) # to make both a and b tensor of same shape
2) How can I multiply each element of a torch.FloatTensor of size 30 by a random value (different or not for each)?
In pytorch 0.2, you can do it directly as well.
import torch
a = torch.randn(30)
b = torch.randn(1)
In case, if you get an error due to size mismatch, do as follows.
So, in your case you can simply change the size of gxbest tensor from 1 to 30 as follows.
gxbest = gxbest.expand(30)

Retrieve indices for rows of a PyTables table matching a condition using `Table.where()`

I need the indices (as numpy array) of the rows matching a given condition in a table (with billions of rows) and this is the line I currently use in my code, which works, but is quite ugly:
indices = np.array([row.nrow for row in the_table.where("foo == 42")])
It also takes half a minute, and I'm sure that the list creation is one of the reasons why.
I could not find an elegant solution yet and I'm still struggling with the pytables docs, so does anybody know any magical way to do this more beautifully and maybe also a bit faster? Maybe there is special query keyword I am missing, since I have the feeling that pytables should be able to return the matched rows indices as numpy array.
tables.Table.get_where_list() gives indices of the rows matching a given condition
I read the source of pytables, where() is implemented in Cython, but it seems not fast enough. Here is a complex method that can speedup:
Create some data first:
from tables import *
import numpy as np
class Particle(IsDescription):
name = StringCol(16) # 16-character String
idnumber = Int64Col() # Signed 64-bit integer
ADCcount = UInt16Col() # Unsigned short integer
TDCcount = UInt8Col() # unsigned byte
grid_i = Int32Col() # 32-bit integer
grid_j = Int32Col() # 32-bit integer
pressure = Float32Col() # float (single-precision)
energy = Float64Col() # double (double-precision)
h5file = open_file("tutorial1.h5", mode = "w", title = "Test file")
group = h5file.create_group("/", 'detector', 'Detector information')
table = h5file.create_table(group, 'readout', Particle, "Readout example")
particle = table.row
for i in range(1001000):
particle['name'] = 'Particle: %6d' % (i)
particle['TDCcount'] = i % 256
particle['ADCcount'] = (i * 256) % (1 << 16)
particle['grid_i'] = i
particle['grid_j'] = 10 - i
particle['pressure'] = float(i*i)
particle['energy'] = float(particle['pressure'] ** 4)
particle['idnumber'] = i * (2 ** 34)
# Insert a new particle record
Read the column in chunks and append the indices into a list and concatenate the list to array finally. You can change the chunk size according to your memory size:
h5file = open_file("tutorial1.h5")
table = h5file.get_node("/detector/readout")
size = 10000
col = "energy"
buf = np.zeros(batch, dtype=table.coldtypes[col])
res = []
for start in range(0, table.nrows, size):
length = min(size, table.nrows - start)
data =, start + batch, field=col, out=buf[:length])
tmp = np.where(data > 10000)[0]
tmp += start
res = np.concatenate(res)

Switch on argument type

Using Open SCAD, I have a module that, like cube(), has a size parameter that can be a single value or a vector of three values. Ultimately, I want a vector of three values.
If the caller passes a single value, I'd like all three values of the vector to be the same. I don't see anything in the language documentation about detecting the type of an argument. So I came up with this hack:
module my_cubelike_thing(size=1) {
dimensions = concat(size, size, size);
width = dimensions[0];
length = dimensions[1];
height = dimensions[2];
// ... use width, length, and height ...
When size is a single value, the result of the concat is exactly what I want: three copies of the value.
When size is a three-value vector, the result of the concat is nine-value vector, and my code just ignores the last six values.
It works but only because what I want in the single value case is to replicate the value. Is there a general way to switch on the argument type and do different things depending on that type?
If type of size only can be single value or a vector with 3 values, the type can helpwise be found by the special value undef:
a = [3,5,8];
// a = 5;
if (a[0] == undef) {
dimensions = concat(a, a, a);
// do something
else {
dimensions = a;
// do something
But assignments are only valid in the scope in which they are defined , documnetation of openscad.
So in each subtree much code is needed and i would prefere to validate the type of size in an external script (e.g. python3) and write the openscad-code with the assignment of variables to a file, which can be included in the openscad-file, here my short test-code:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import os
# size = 20
size = [20,15,10]
if type(size) == int:
dimensions = [size, size, size]
elif type(size) == list:
dimensions = size
# if other types possible
with open('variablen.scad', 'w') as wObj:
for i, v in enumerate(['l', 'w', 'h']):
wObj.write('{} = {};\n'.format(v, dimensions[i]))
os.system('openscad ./typeDef.scad')
content of variablen.scad:
l = 20;
w = 15;
h = 10;
and typeDef.scad can look like this
include <./variablen.scad>;
module my_cubelike_thing() {
linear_extrude(height=h, center=false) square(l, w);

computing cumulative distribution of a conditional probability distribution

I have a conditional probability of z for the given m, p(z|m), where the coefficients are chosen in order that integral over z in the limit of [0,1.5] and m in the range of [18:28] would be equal to one.
def p(z,m):
if (m<21.25):
E = { 'ft':0.55, 'alpha': 2.99, 'z0':0.191, 'km':0.089, 'kt':0.25 }
S = { 'ft':0.39, 'alpha': 2.15, 'z0':0.121, 'km':0.093, 'kt':-0.175 }
I={ 'ft':0.06, 'alpha': 1.77, 'z0':0.045, 'km':0.096, 'kt':-0.9196 }
E = { 'ft':0.25, 'alpha': 1.957, 'z0':0.321, 'km':0.196, 'kt':0.565 }
S = { 'ft':0.61, 'alpha': 1.598, 'z0':0.291, 'km':0.167, 'kt':0.155 }
I = { 'ft':0.14, 'alpha': 0.964, 'z0':0.170, 'km':0.129, 'kt':0.1759 }
return value
I would like to draw a sample from this distribution, therefore I made a grid points in z and m plane to estimate the cumulative distribution, the cumulative integral over m reaches to one but the cumulative integral over z doesn't give me one in the edge. I don't know why it won't get converged to one?!!
grid_m = np.linspace(18, 28, 1000)
grid_z = np.linspace(0, 1.5, 1000)
dz = np.diff(grid_z[:2])
# get cdf on grid, use cumtrapz
prob_zgm=np.empty((grid_z.shape[0], grid_m.shape[0]),float)
for i in range(grid_z.shape[0]):
for j in range(grid_m.shape[0]):
pr = np.column_stack((np.zeros(prob_zgm.shape[0]),prob_zgm))
dm = np.diff(grid_m[:2])
cdf_zgm = integrate.cumtrapz(pr, dx=dm, axis=1)
cdf = integrate.cumtrapz(pr, dx=dz, axis=0)
Which assumption might cause this inconsistency or I compute something wrongly?
Update: The cumulative distribution cdf_zgm is shown as
In the rest, in order to get the inverse of the probability, it is the approach I have used:
# fix bounds of cdf_zgm
cdf_zgm[:, 0] = 0
cdf_zgm[:, -1] = 1
#Interpolate the data using a linear spline to "grid_q" samples
grid_q = np.linspace(0, 1, 200)
grid_qm = np.empty((len(grid_m), len(grid_q)), float)
for i in range(len(grid_m)):
grid_qm[i] = interpolate.interp1d(cdf_zgm[i], grid_z)(grid_q)
# build 2d interpolation for z as function of (q,m)
z_interp = interpolate.interp2d(grid_q, grid_m, grid_qm)
#sample magnitude
r = dist_m.rvs(ng)
rvs_u = np.random.rand(ng)
rvs_z = np.asarray([z_interp(rvs_u[i], r[i]) for i in range(len(rvs_u))]).ravel()
Is it right approach to fix the boundaries of CDF to one?
I don't know what's wrong with that code. But here are a couple of different ideas to try:
(1) Just sum the array elements instead of trying to compute the numerical integrals. It is simpler that way. (Summing the array elements is essentially computing a rectangle rule approximation, which as it turns out, is actually more accurate than the trapezoidal rule.)
(2) Instead of trying to create a whole 2-d array at once, write a function which creates just a 1-d slice of p(z | m) for a given value of m. Then just sum those elements to get the cumulative probability.