How to index ndarray in tuple using boolean with Numpy Python?

How to index ndarray in tuple using boolean with Numpy Python? - numpy

I would like to index ndarray in a tuple using a boolean mask such as below
import numpy as np
n_max = 5
list_no = np.arange ( 0, n_max )
lateral = np.tril_indices ( n_max, -1 )
mask= np.diff ( lateral [0].astype ( int ) )
mask [-1] = 1
Expected=lateral[mask!= 0]
However, when executing the line Expected=lateral[mask!= 0],
the compiler return an error
TypeError: only integer scalar arrays can be converted to a scalar
index
Expected=
0 = {ndarray: (4,)} [1 2 3 4]
1 = {ndarray: (4,)} [0 1 2 3]
May I know where did I do wrong?

So it seems like the size of the mask and lateral[0] are different. Since mask is the difference between each element in the array, it is of size n-1 when lateral[0] is of size n. You might want to append to the mask array instead.
Also, since lateral is a tuple, you would need to index on the tuple before applying the mask.
You might be need something like this:
import numpy as np
n_max = 5
list_no = np.arange(0, n_max)
lateral = np.tril_indices(n_max, -1)
mask = np.diff(lateral[0].astype(int))
mask = np.append(mask, 1)
expected_0 = lateral[0][mask != 0]
print(expected_0)
expected_1 = lateral[1][mask != 0]
print(expected_1)

Related

Pandas create new column base on groupby and apply lambda if statement

I have the issue with groupby and apply
df = pd.DataFrame({'A': ['a', 'a', 'a', 'b', 'b', 'b', 'b'], 'B': np.r_[1:8]})
I want to create a column C for each group take value 1 if B > z_score=2 and 0 otherwise. The code:
from scipy import stats
df['C'] = df.groupby('A').apply(lambda x: 1 if np.abs(stats.zscore(x['B'], nan_policy='omit')) > 2 else 0, axis=1)
However, I am unsuccessful with code and cannot figure out the issue

Use GroupBy.transformwith lambda, function, then compare and for convert True/False to 1/0 convert to integers:
from scipy import stats
s = df.groupby('A')['B'].transform(lambda x: np.abs(stats.zscore(x, nan_policy='omit')))
df['C'] = (s > 2).astype(int)
Or use numpy.where:
df['C'] = np.where(s > 2, 1, 0)
Error in your solution is per groups:
from scipy import stats
df = df.groupby('A')['B'].apply(lambda x: 1 if np.abs(stats.zscore(x, nan_policy='omit')) > 2 else 0)
ValueError: The truth value of an array with more than one element is ambiguous. Use a.any() or a.all()
If check gotcha in pandas docs:
pandas follows the NumPy convention of raising an error when you try to convert something to a bool. This happens in an if-statement or when using the boolean operations: and, or, and not.
So if use one of solutions instead if-else:
from scipy import stats
df = df.groupby('A')['B'].apply(lambda x: (np.abs(stats.zscore(x, nan_policy='omit')) > 2).astype(int))
print (df)
A
a [0, 0, 0]
b [0, 0, 0, 0]
Name: B, dtype: object
but then need convert to column, for avoid this problems is used groupby.transform.

You can use groupby + apply a function that finds the z-scores of each item in each group; explode the resulting list; use gt to create a boolean series and convert it to dtype int
df['C'] = df.groupby('A')['B'].apply(lambda x: stats.zscore(x, nan_policy='omit')).explode(ignore_index=True).abs().gt(2).astype(int)
Output:
A B C
0 a 1 0
1 a 2 0
2 a 3 0
3 b 4 0
4 b 5 0
5 b 6 0
6 b 7 0

Randomly select items from two equally sized tensors

Assume that we have two equally sized tensors of size batch_size * 1. For each index in the batch dimension we want to choose randomly between the two tensors. My solution was to create an indices tensor that contains random 0 or 1 indices of size batch_size and use those to index_select from the concatenation of the two tensors. However, to do so I had the "view" that cat tensor and the solution ended up to be quite "ugly":
import torch
bs = 8
a = torch.zeros(bs, 1)
print("a size", a.size())
b = torch.ones(bs, 1)
c = torch.cat([a, b], dim=-1)
print(c)
print("c size", c.size())
# create bs number of random 0 and 1's
indices = torch.randint(0, 2, [bs])
print("idxs size", indices.size())
print("idxs", indices)
# use `indices` to slice the `cat`ted tensor
d = c.view(1, -1).index_select(-1, indices).view(-1, 1)
print("d size", d.size())
print(d)
I am wondering whether there is a prettier and, more importantly, more efficient solution.

Posting two answers that I got over at the PyTorch forums
import torch
bs = 8
a = torch.zeros(bs, 1)
b = torch.ones(bs, 1)
c = torch.cat([a, b], dim=-1)
choices_flat = c.view(-1)
# index = torch.randint(choices_flat.numel(), (bs,))
# or if replace = False
index = torch.randperm(choices_flat.numel())[:bs]
select = choices_flat[index]
print(select)
import torch
bs = 8
a = torch.zeros(bs, 1)
print("a size", a.size())
b = torch.ones(bs, 1)
idx = torch.randint(2 * bs, (bs,))
d = torch.cat([a, b])[idx] # [bs, 1]

How to concatenate two tensors with intervals in tensorflow?

I want to concatenate two tensors checkerboard-ly in tensorflow2, like examples showed below:
example 1:
a = [[1,1],[1,1]]
b = [[0,0],[0,0]]
concated_a_and_b = [[1,0,1,0],[0,1,0,1]]
example 2:
a = [[1,1,1],[1,1,1],[1,1,1]]
b = [[0,0,0],[0,0,0],[0,0,0]]
concated_a_and_b = [[1,0,1,0,1,0],[0,1,0,1,0,1],[1,0,1,0,1,0]]
Is there a decent way in tensorflow2 to concatenate them like this?
A bit of background for this:
I first split a tensor c with a checkerboard mask into two halves a and b. A after some transformation I have to concat them back into oringnal shape and order.
What I mean by checkerboard-ly:

Step 1: Generate a matrix with alternated values
You can do this by first concatenating into [1, 0] pairs, and then by applying a final reshape.
Step 2: Reverse some rows
I split the matrix into two parts, reverse the second part and then rebuild the full matrix by picking alternatively from the first and second part
Code sample:
import math
import numpy as np
import tensorflow as tf
a = tf.ones(shape=(3, 4))
b = tf.zeros(shape=(3, 4))
x = tf.expand_dims(a, axis=-1)
y = tf.expand_dims(b, axis=-1)
paired_ones_zeros = tf.concat([x, y], axis=-1)
alternated_values = tf.reshape(paired_ones_zeros, [-1, a.shape[1] + b.shape[1]])
num_samples = alternated_values.shape[0]
middle = math.ceil(num_samples / 2)
is_num_samples_odd = middle * 2 != num_samples
# Gather first part of the matrix, don't do anything to it
first_elements = tf.gather_nd(alternated_values, [[index] for index in range(middle)])
# Gather second part of the matrix and reverse its elements
second_elements = tf.reverse(tf.gather_nd(alternated_values, [[index] for index in range(middle, num_samples)]), axis=[1])
# Pick alternatively between first and second part of the matrix
indices = np.concatenate([[[index], [index + middle]] for index in range(middle)], axis=0)
if is_num_samples_odd:
indices = indices[:-1]
output = tf.gather_nd(
tf.concat([first_elements, second_elements], axis=0),
indices
)
print(output)

I know this is not a decent way as it will affect time and space complexity. But it solves the above problem
def concat(tf1, tf2):
result = []
for (index, (tf_item1, tf_item2)) in enumerate(zip(tf1, tf2)):
item = []
for (subitem1, subitem2) in zip(tf_item1, tf_item2):
if index % 2 == 0:
item.append(subitem1)
item.append(subitem2)
else:
item.append(subitem2)
item.append(subitem1)
concated_a_and_b.append(item)
return concated_a_and_b

Indexing a 4D array using another array of 3D indices

A have a 4D array M (a x b x c x d) and an array I of indices (3 x f), e.g.
I = np.array([1,2,3, ...], [2,1,3, ...], [4,1,6, ...])
I would like to use I to arrive at a matrix X that has f rows and d columns, where:
X[0,:] = M[1,2,4,:]
X[1,:] = M[2,1,1,:]
X[2,:] = M[3,3,6,:]
...
I know I can use M[I[0], I[1], I[2]], however, I was wondering if there's a more concise solution?

You can use use, for example:
I = np.array([[1,2,3], [2,1,3], [4,1,6]])
M = np.ndarray((10,10,10,10))
X = np.array([M[t,:] for t in I])

This would be one way to do it -
import numpy as np
# Get row indices for use when M is reshaped to a 2D array of d-columns format
row_idx = np.sum(I*np.append(1,np.cumprod(M.shape[1:-1][::-1]))[::-1][:,None],0)
# Reshape M to d-columns 2D array and use row_idx to get final output
out = M.reshape(-1,M.shape[-1])[row_idx]
As, an alternative to find row_idx, if you would like to avoid np.append, you can do -
row_idx = np.sum(I[:-1]*np.cumprod(M.shape[1:-1][::-1])[::-1][:,None],0) + I[-1]
Or little less scary way to get row_idx -
_,p2,p3,_ = M.shape
row_idx = np.sum(I*np.array([p3*p2,p3,1])[:,None],0)

Cleaner pandas apply with function that cannot use pandas.Series and non-unique index

In the following, func represents a function that uses multiple columns (with coupling across the group) and cannot operate directly on pandas.Series. The 0*d['x'] syntax was the lightest I could think of to force the conversion, but I think it's awkward.
Additionally, the resulting pandas.Series (s) still includes the group index, which must be removed before adding as a column to the pandas.DataFrame. The s.reset_index(...) index manipulation seems fragile and error-prone, so I'm curious if it can be avoided. Is there an idiom for doing this?
import pandas
import numpy
df = pandas.DataFrame(dict(i=[1]*8,j=[1]*4+[2]*4,x=list(range(4))*2))
df['y'] = numpy.sin(df['x']) + 1000*df['j']
df = df.set_index(['i','j'])
print('# df\n', df)
def func(d):
x = numpy.array(d['x'])
y = numpy.array(d['y'])
# I want to do math with x,y that cannot be applied to
# pandas.Series, so explicitly convert to numpy arrays.
#
# We have to return an appropriately-indexed pandas.Series
# in order for it to be admissible as a column in the
# pandas.DataFrame. Instead of simply "return x + y", we
# have to make the conversion.
return 0*d['x'] + x + y
s = df.groupby(df.index).apply(func)
# The Series is still adorned with the (unnamed) group index,
# which will prevent adding as a column of df due to
# Exception: cannot handle a non-unique multi-index!
s = s.reset_index(level=0, drop=True)
print('# s\n', s)
df['z'] = s
print('# df\n', df)

Instead of
0*d['x'] + x + y
you could use
pd.Series(x+y, index=d.index)
When using groupy-apply, instead of dropping the group key index using:
s = df.groupby(df.index).apply(func)
s = s.reset_index(level=0, drop=True)
df['z'] = s
you can tell groupby to drop the keys using the keyword parameter group_keys=False:
df['z'] = df.groupby(df.index, group_keys=False).apply(func)
import pandas as pd
import numpy as np
df = pd.DataFrame(dict(i=[1]*8,j=[1]*4+[2]*4,x=list(range(4))*2))
df['y'] = np.sin(df['x']) + 1000*df['j']
df = df.set_index(['i','j'])
def func(d):
x = np.array(d['x'])
y = np.array(d['y'])
return pd.Series(x+y, index=d.index)
df['z'] = df.groupby(df.index, group_keys=False).apply(func)
print(df)
yields
x y z
i j
1 1 0 1000.000000 1000.000000
1 1 1000.841471 1001.841471
1 2 1000.909297 1002.909297
1 3 1000.141120 1003.141120
2 0 2000.000000 2000.000000
2 1 2000.841471 2001.841471
2 2 2000.909297 2002.909297
2 3 2000.141120 2003.141120

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to index ndarray in tuple using boolean with Numpy Python? - numpy

Related

Pandas create new column base on groupby and apply lambda if statement

Randomly select items from two equally sized tensors

How to concatenate two tensors with intervals in tensorflow?

Indexing a 4D array using another array of 3D indices

Cleaner pandas apply with function that cannot use pandas.Series and non-unique index

Categories

Resources