BigQuery: Xoring Elements of Two Arrays

BigQuery: Xoring Elements of Two Arrays - sql

I have a two arrays.
a = [1, 2, 3, 4]
b = [11, 22, 33, 44]
How can I xor the respective elements of two arrays to get a result as
result = [10, 20 ,34, 40] i-e 1^11 = 10, 2^22=20 and so on
I have tried BIT_XOR(x) but it takes one array and xor all of the elements of array.
SELECT BIT_XOR(x) AS bit_xor FROM UNNEST([1, 2, 3, 4]) AS x;
Thanks

You can "zip" the two arrays together:
SELECT
ARRAY(
SELECT x ^ b[OFFSET(off)]
FROM UNNEST(a) AS x WITH OFFSET off) AS bit_xor
FROM dataset.table
This combines the elements based on their offset in the two arrays.

Related

Numpy subarrays and relative indexing

I have been searching if there is an standard mehtod to create a subarray using relative indexes. Take the following array into consideration:
>>> m = np.arange(25).reshape([5, 5])
array([[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19],
[20, 21, 22, 23, 24]])
I want to access the 3x3 matrix at a specific array position, for example [2,2]:
>>> x = 2, y = 2
>>> m[slice(x-1,x+2), slice(y-1,y+2)]
array([[ 6, 7, 8],
[11, 12, 13],
[16, 17, 18]])
For example for the above somethig like m.subarray(pos=[2,2], shape=[3,3])
I want to sample a ndarray of n dimensions on a specific position which might change.
I did not want to use a loop as it might be inneficient. Scipy functions correlate and convolve do this very efficiently, but for all positions. I am interested only in the sampling of one.
The best answer could solve the issues at edges, in my case I would like for example to have wrap mode:
(a b c d | a b c d | a b c d)
--------------------EDITED-----------------------------
Based on the answer from #Carlos Horn, I could create the following function.
def cell_neighbours(array, index, shape):
pads = [(floor(dim/2), ceil(dim / 2)) for dim in shape]
array = np.pad(self.configuration, pads, "wrap")
views = np.lib.stride_tricks.sliding_window_view
return views(array, shape)[tuple(index)]
Last concern might be about speed, from docs: For many applications using a sliding window view can be convenient, but potentially very slow. Often specialized solutions exist.
From here maybe is easier to get a faster solution.

You could build a view of 3x3 matrices into the array as follows:
import numpy as np
m = np.arange(25).reshape(5,5)
m3x3view = np.lib.stride_tricks.sliding_window_view(m, (3,3))
Note that it will change slightly your indexing on half the window size meaning
x_view = x - 3//2
y_view = y - 3//2
print(m3x3view[x_view,y_view]) # gives your result
In case a copy operation is fine, you could use:
mpad = np.pad(m, 1, mode="wrap")
mpad3x3view = np.lib.stride_tricks.sliding_window_view(mpad, (3,3))
print(mpad3x3view[x % 5,y % 5])
to use arbitrary x, y integer values.

How can we convert pandas dataframe two columns to python list after merging two columns vertically?

I have a dataframe...
print(df)
Name ae_rank adf de_rank
a 1 lk 4
b 2 lp 5
c 3 yi 6
How can I concat ae_rank column and de_rank column vertically and convert them into python list.
Expectation...
my_list = [1, 2, 3, 4, 5, 6]

Simpliest is join lists:
my_list = df['ae_rank'].tolist() + df['de_rank'].tolist()
If need reshape DataFrame with DataFrame.melt:
my_list = df.melt(['Name','adf'])['value'].tolist()
print (my_list )
[1, 2, 3, 4, 5, 6]

Another option is
my_list = df[['ae_rank', 'de_rank']].T.stack().tolist()
#[1, 2, 3, 4, 5, 6]

Most efficiently, use filter to select the columns by name that include "_rank" and use the underlying numpy array with ravel on the 'F' order (column major order):
my_list = df.filter(like='_rank').to_numpy().ravel('F').tolist()
output: [1, 2, 3, 4, 5, 6]

How to compute how many elements in three arrays in python are equal to some value in the same positon betweel the arrays?

I have three numpy arrays
a = [0, 1, 2, 3, 4]
b = [5, 1, 7, 3, 9]
c = [10, 1, 3, 3, 1]
and i wanna to compute how many elements in a, b, c are equal to 3 in the same position, so for that example would be 3.

An elegant solution is to use Numpy functions, like:
np.count_nonzero(np.vstack([a, b, c])==3, axis=0).max()
Details:
np.vstack([a, b, c]) - generate an array with 3 rows, composed of your
3 source arrays.
np.count_nonzero(...==3, axis=0) - count how many values of 3 occurs
in each column. For your data the result is array([0, 0, 1, 3, 0], dtype=int64).
max() - take the greatest value, in your case 3.

I am trying to array index a 4 dimensional numpy array.

i have a 4 dimensional array -- say a=numpy.array(40,40,4,1000)
I also have an index array -- say b = np.arrange(35)
I am looking to make an array doing something like c = a[b,b,3,999] where the resulting array would look something like d = numpy.array(35,35). Would appreciate any thoughts on what the right way to do this is. Thank you. Neela.

Since b=np.arange(35) is just the first 35 indices, use slices instead:
c = a[:35,:35,3,999]
If the values in b are not contiguous, then you will need to adjust its shape
c = a[b[:,None], b[None,:], 3, 999]
e.g.
In [754]: a=np.arange(3*4*5).reshape(3,4,5)
In [755]: b=np.array([2,0,1])
In [756]: a[b[:,None],b[None,:],3]
Out[756]:
array([[53, 43, 48],
[13, 3, 8],
[33, 23, 28]])
b[:,None] is a (3,1) array, b[None,:] a (1,3), together they broadcast to (3,3) arrays.
You may need to read up on broadcasting and advanced indexing.
More explicitly this indexing is:
a[[[2],[0],[1]], [[2,0,1]], 3]
np.ix_ is a handy tool for generating indexes like this:
In [795]: I,J = np.ix_(b,b)
In [796]: I
Out[796]:
array([[2],
[0],
[1]])
In [797]: J
Out[797]: array([[2, 0, 1]])
In [798]: a[I,J,3]
Out[798]:
array([[53, 43, 48],
[13, 3, 8],
[33, 23, 28]])

Numpy Indexing Behavior

I am having a lot of trouble understanding numpy indexing for multidimensional arrays. In this example that I am working with, let's say that I have a 2D array, A, which is 100x10. Then I have another array, B, which is a 100x1 1D array of values between 0-9 (indices for A). In MATLAB, I would use A(sub2ind(size(A), 1:size(A,1)', B) to return for each row of A, the value at the index stored in the corresponding row of B.
So, as a test case, let's say I have this:
A = np.random.rand(100,10)
B = np.int32(np.floor(np.random.rand(100)*10))
If I print their shapes, I get:
print A.shape returns (100L, 10L)
print B.shape returns (100L,)
When I try to index into A using B naively (incorrectly)
Test1 = A[:,B]
print Test1.shape returns (100L, 100L)
but if I do
Test2 = A[range(A.shape[0]),B]
print Test2.shape returns (100L,)
which is what I want. I'm having trouble understanding the distinction being made here. In my mind, A[:,5] and A[range(A.shape[0]),5] should return the same thing, but it isn't here. How is : different from using range(sizeArray) which just creates an array from [0:sizeArray] inclusive, to use an indices?

Let's look at a simple array:
In [654]: X=np.arange(12).reshape(3,4)
In [655]: X
Out[655]:
array([[ 0, 1, 2, 3],
[ 4, 5, 6, 7],
[ 8, 9, 10, 11]])
With the slice we can pick 3 columns of X, in any order (and even repeated). In other words, take all the rows, but selected columns.
In [656]: X[:,[3,2,1]]
Out[656]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
If instead I use a list (or array) of 3 values, it pairs them up with the column values, effectively picking 3 values, X[0,3],X[1,2],X[2,1]:
In [657]: X[[0,1,2],[3,2,1]]
Out[657]: array([3, 6, 9])
If instead I gave it a column vector to index rows, I get the same thing as with the slice:
In [659]: X[[[0],[1],[2]],[3,2,1]]
Out[659]:
array([[ 3, 2, 1],
[ 7, 6, 5],
[11, 10, 9]])
This amounts to picking 9 individual values, as generated by broadcasting:
In [663]: np.broadcast_arrays(np.arange(3)[:,None],np.array([3,2,1]))
Out[663]:
[array([[0, 0, 0],
[1, 1, 1],
[2, 2, 2]]),
array([[3, 2, 1],
[3, 2, 1],
[3, 2, 1]])]
numpy indexing can be confusing. But a good starting point is this page: http://docs.scipy.org/doc/numpy/reference/arrays.indexing.html

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

BigQuery: Xoring Elements of Two Arrays - sql

You can "zip" the two arrays together: SELECT ARRAY( SELECT x ^ b[OFFSET(off)] FROM UNNEST(a) AS x WITH OFFSET off) AS bit_xor FROM dataset.table This combines the elements based on their offset in the two arrays.

Related

Numpy subarrays and relative indexing

How can we convert pandas dataframe two columns to python list after merging two columns vertically?

How to compute how many elements in three arrays in python are equal to some value in the same positon betweel the arrays?

I am trying to array index a 4 dimensional numpy array.

Numpy Indexing Behavior

Categories

Resources