BigQuery: Xoring Elements of Two Arrays - sql

I have a two arrays.
a = [1, 2, 3, 4]
b = [11, 22, 33, 44]
How can I xor the respective elements of two arrays to get a result as
result = [10, 20 ,34, 40] i-e 1^11 = 10, 2^22=20 and so on
I have tried BIT_XOR(x) but it takes one array and xor all of the elements of array.
SELECT BIT_XOR(x) AS bit_xor FROM UNNEST([1, 2, 3, 4]) AS x;
Thanks

You can "zip" the two arrays together:
SELECT
ARRAY(
SELECT x ^ b[OFFSET(off)]
FROM UNNEST(a) AS x WITH OFFSET off) AS bit_xor
FROM dataset.table
This combines the elements based on their offset in the two arrays.

Related

Numpy subarrays and relative indexing

I have been searching if there is an standard mehtod to create a subarray using relative indexes. Take the following array into consideration:
>>> m = np.arange(25).reshape([5, 5])
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
I want to access the 3x3 matrix at a specific array position, for example [2,2]:
>>> x = 2, y = 2
>>> m[slice(x-1,x+2), slice(y-1,y+2)]
array([[ 6, 7, 8],
[11, 12, 13],
[16, 17, 18]])
For example for the above somethig like m.subarray(pos=[2,2], shape=[3,3])
I want to sample a ndarray of n dimensions on a specific position which might change.
I did not want to use a loop as it might be inneficient. Scipy functions correlate and convolve do this very efficiently, but for all positions. I am interested only in the sampling of one.
The best answer could solve the issues at edges, in my case I would like for example to have wrap mode:
(a b c d | a b c d | a b c d)
--------------------EDITED-----------------------------
Based on the answer from #Carlos Horn, I could create the following function.
def cell_neighbours(array, index, shape):
pads = [(floor(dim/2), ceil(dim / 2)) for dim in shape]
array = np.pad(self.configuration, pads, "wrap")
views = np.lib.stride_tricks.sliding_window_view
return views(array, shape)[tuple(index)]
Last concern might be about speed, from docs: For many applications using a sliding window view can be convenient, but potentially very slow. Often specialized solutions exist.
From here maybe is easier to get a faster solution.
You could build a view of 3x3 matrices into the array as follows:
import numpy as np
m = np.arange(25).reshape(5,5)
m3x3view = np.lib.stride_tricks.sliding_window_view(m, (3,3))
Note that it will change slightly your indexing on half the window size meaning
x_view = x - 3//2
y_view = y - 3//2
print(m3x3view[x_view,y_view]) # gives your result
In case a copy operation is fine, you could use:
mpad = np.pad(m, 1, mode="wrap")
mpad3x3view = np.lib.stride_tricks.sliding_window_view(mpad, (3,3))
print(mpad3x3view[x % 5,y % 5])
to use arbitrary x, y integer values.

How can we convert pandas dataframe two columns to python list after merging two columns vertically?

I have a dataframe...
print(df)
Name ae_rank adf de_rank
a 1 lk 4
b 2 lp 5
c 3 yi 6
How can I concat ae_rank column and de_rank column vertically and convert them into python list.
Expectation...
my_list = [1, 2, 3, 4, 5, 6]
Simpliest is join lists:
my_list = df['ae_rank'].tolist() + df['de_rank'].tolist()
If need reshape DataFrame with DataFrame.melt:
my_list = df.melt(['Name','adf'])['value'].tolist()
print (my_list )
[1, 2, 3, 4, 5, 6]
Another option is
my_list = df[['ae_rank', 'de_rank']].T.stack().tolist()
#[1, 2, 3, 4, 5, 6]
Most efficiently, use filter to select the columns by name that include "_rank" and use the underlying numpy array with ravel on the 'F' order (column major order):
my_list = df.filter(like='_rank').to_numpy().ravel('F').tolist()
output: [1, 2, 3, 4, 5, 6]

How to compute how many elements in three arrays in python are equal to some value in the same positon betweel the arrays?

I have three numpy arrays
a = [0, 1, 2, 3, 4]
b = [5, 1, 7, 3, 9]
c = [10, 1, 3, 3, 1]
and i wanna to compute how many elements in a, b, c are equal to 3 in the same position, so for that example would be 3.
An elegant solution is to use Numpy functions, like:
np.count_nonzero(np.vstack([a, b, c])==3, axis=0).max()
Details:
np.vstack([a, b, c]) - generate an array with 3 rows, composed of your
3 source arrays.
np.count_nonzero(...==3, axis=0) - count how many values of 3 occurs
in each column. For your data the result is array([0, 0, 1, 3, 0], dtype=int64).
max() - take the greatest value, in your case 3.

I am trying to array index a 4 dimensional numpy array.

i have a 4 dimensional array -- say a=numpy.array(40,40,4,1000)
I also have an index array -- say b = np.arrange(35)
I am looking to make an array doing something like c = a[b,b,3,999] where the resulting array would look something like d = numpy.array(35,35). Would appreciate any thoughts on what the right way to do this is. Thank you. Neela.
Since b=np.arange(35) is just the first 35 indices, use slices instead:
c = a[:35,:35,3,999]
If the values in b are not contiguous, then you will need to adjust its shape
c = a[b[:,None], b[None,:], 3, 999]
e.g.
In [754]: a=np.arange(3*4*5).reshape(3,4,5)
In [755]: b=np.array([2,0,1])
In [756]: a[b[:,None],b[None,:],3]
Out[756]:
array([[53, 43, 48],
[13, 3, 8],
[33, 23, 28]])
b[:,None] is a (3,1) array, b[None,:] a (1,3), together they broadcast to (3,3) arrays.
You may need to read up on broadcasting and advanced indexing.
More explicitly this indexing is:
a[[[2],[0],[1]], [[2,0,1]], 3]
np.ix_ is a handy tool for generating indexes like this:
In [795]: I,J = np.ix_(b,b)
In [796]: I
Out[796]:
array([[2],
[0],
[1]])
In [797]: J
Out[797]: array([[2, 0, 1]])
In [798]: a[I,J,3]
Out[798]:
array([[53, 43, 48],
[13, 3, 8],
[33, 23, 28]])

Numpy Indexing Behavior

I am having a lot of trouble understanding numpy indexing for multidimensional arrays. In this example that I am working with, let's say that I have a 2D array, A, which is 100x10. Then I have another array, B, which is a 100x1 1D array of values between 0-9 (indices for A). In MATLAB, I would use A(sub2ind(size(A), 1:size(A,1)', B) to return for each row of A, the value at the index stored in the corresponding row of B.
So, as a test case, let's say I have this:
A = np.random.rand(100,10)
B = np.int32(np.floor(np.random.rand(100)*10))
If I print their shapes, I get:
print A.shape returns (100L, 10L)
print B.shape returns (100L,)
When I try to index into A using B naively (incorrectly)
Test1 = A[:,B]
print Test1.shape returns (100L, 100L)
but if I do
Test2 = A[range(A.shape[0]),B]
print Test2.shape returns (100L,)
which is what I want. I'm having trouble understanding the distinction being made here. In my mind, A[:,5] and A[range(A.shape[0]),5] should return the same thing, but it isn't here. How is : different from using range(sizeArray) which just creates an array from [0:sizeArray] inclusive, to use an indices?
Let's look at a simple array:
In [654]: X=np.arange(12).reshape(3,4)
In [655]: X
Out[655]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
With the slice we can pick 3 columns of X, in any order (and even repeated). In other words, take all the rows, but selected columns.
In [656]: X[:,[3,2,1]]
Out[656]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
If instead I use a list (or array) of 3 values, it pairs them up with the column values, effectively picking 3 values, X[0,3],X[1,2],X[2,1]:
In [657]: X[[0,1,2],[3,2,1]]
Out[657]: array([3, 6, 9])
If instead I gave it a column vector to index rows, I get the same thing as with the slice:
In [659]: X[[[0],[1],[2]],[3,2,1]]
Out[659]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
This amounts to picking 9 individual values, as generated by broadcasting:
In [663]: np.broadcast_arrays(np.arange(3)[:,None],np.array([3,2,1]))
Out[663]:
[array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]]),
array([[3, 2, 1],
[3, 2, 1],
[3, 2, 1]])]
numpy indexing can be confusing. But a good starting point is this page: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html