numpy invert stride selection - numpy

Consider the following code:
aa = np.arange(16)
step = 4
bb = aa[::4]
This selects every 4th element. Is there a quick and easy numpy function to select the complement of bb? I'm looking for the following output
array([1, 2, 3, 5, 6, 7, 9, 10, 11, 13, 14, 15])
Yes, I could generate indices and then do np.setdiff1d, but I'm looking for something more elegant than that.

If you're looking for a simple single-liner:
np.delete(aa,slice(None,None,4))
Another solution (I don't know about elegant), but you could define a selection index of ones, and then set every fourth element to False to then index the original array:
o = np.ones_like(s,dtype=bool)
o[::step] = False
aa[o]

A flexible way to select based on an arbitrary repeated position could be to use a modulo:
bb = aa[np.arange(len(aa))%step != step-1]
Output:
array([ 0, 1, 2, 4, 5, 6, 8, 9, 10, 12, 13, 14])

Related

Tensorflow filter operation on dataset with several columns

I want to create a subset of my data by applying tf.data.Dataset filter operation. I have this data:
data = tf.convert_to_tensor([[1, 2, 1, 1, 5, 5, 9, 12], [1, 2, 3, 8, 4, 5, 9, 12]])
dataset = tf.data.Dataset.from_tensor_slices(data)
I want to retrieve a subset of 'dataset' which corresponds to all elements whose first column is equal to 1. So, result should be:
[[1, 1, 1], [1, 3, 8]] # dtype : dataset
I tried this:
subset = dataset.filter(lambda x: tf.equal(x[0], 1))
But I don't get the correct result, since it sends me back x[0]
Someone to help me ?
I finally resolved it:
a = tf.convert_to_tensor([1, 2, 1, 1, 5, 5, 9, 12])
b = tf.convert_to_tensor([1, 2, 3, 8, 4, 5, 9, 12])
data_set = tf.data.Dataset.from_tensor_slices((a, b))
subset = data_set.filter(lambda x, y: tf.equal(x, 1))

Create new panda dataframe with fixed distance using interpolate

I have a dataframe of the following form.
df = {'X': [0, 3, 6, 7, 8, 11],
'Y1': [8, 5, 4, 3, 2, 1.5],
'Y2': [1, 2, 4, 5, 5, 5]}
I would like to create a new dataframe where I use interpolate where 'X' is stepping in fixed steps [0, 2, 4, 6, 8, 10].
To find the new 'Y' values I need to find f(x)=Y1 and then I can evaluate for each step in X. But since I have many Y's I think there must be a more clever way to do this.
The solution I found was the following:
step_size = 0.25
no_steps = int(np.floor(max(b['X'])/step_size))
for i in range(0,no_steps+1):
b = b.append({'X' : 0.25*i, 'StepNo' : 10, 'PointNo' : 23+i}, ignore_index=True)
b = b.sort_values(['X'])
b = b.set_index(['X'])
c = b.interpolate('index')
c = c.reset_index()
c = c.sort_values(['PointNo'])
So first I define step size. Then I calculate number of steps. Then I append the steps into the dataframe. Sort the dataframe and reindex so I can use interpolate using 'index' as values.

How to compare a 2D array against a 1D array column-wise?

I have two numpy arrays. One of them is 2D while the other is 1D.
>>> a = np.arange(0,20).reshape(2,10)
>>> a
array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14, 15, 16, 17, 18, 19]])
>>> b = np.full( a.shape[1], 10 )
>>> b
array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
I want to compare them column-wise:
If the columns elements in a is identical to the column element of b, then store row number(s) of a.
Else, find the closest matching of a to b and store the row number(s).
In my example, the output from the comparison should be:
[ 1, [0,1], [0,1], [0,1], [0,1], [0,1], [0,1], [0,1], [0,1], [0,1] ]
How do I do this in NumPy?
I was thinking of using np.where( a==b, run a function to get row(s) if same, run another function to get row(s) of diff )? Is this the way?

How to delete rows from column which have matching values in the list Pandas

I am finding outliers from a column and storing them in a list. Now i want to delete all the values which
are present in my list from the column.
How can achieve this ?
This is my function for finding outliers
outlier=[]
def detect_outliers(data):
threshold=3
m = np.mean(data)
st = np.std(data)
for i in data:
#calculating z-score value
z_score=(i-m)/st
#if the z_score value is greater than threshold value than its a outlier
if np.abs(z_score)>threshold:
outlier.append(i)
return outlier
This is my column in data frame
df_train_11.AMT_INCOME_TOTAL
import numpy as np, pandas as pd
df = pd.DataFrame(np.random.rand(10,5))
outlier_list=[]
def detect_outliers(data):
threshold=0.5
for i in data:
#calculating z-score value
z_score=(df.loc[:,i]- np.mean(df.loc[:,i])) /np.std(df.loc[:,i])
outliers = np.abs(z_score)>threshold
outlier_list.append(df.index[outliers].tolist())
return outlier_list
outlier_list = detect_outliers(df)
[[1, 2, 4, 5, 6, 7, 9],
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9],
[0, 1, 2, 4, 8],
[0, 1, 3, 4, 6, 8],
[0, 1, 3, 5, 6, 8, 9]]
This way, you get the outliers of each column. outlier_list[0] gives you [1, 2, 4, 5, 6, 7, 9] which means that the rows 1,2,etc are outliers for column 0.
EDIT
Shorter answer:
df = pd.DataFrame(np.random.randn(10, 3), columns=list('ABC'))
df[((df.B - df.B.mean()) / df.B.std()).abs() < 3]
This willfilter the DataFrame where only ONE column (e.g. 'B') is within three standard deviations.

Swap a subset of multi-values in numpy

Given a starting numpy array that looks like:
B = np.array( [1, 1, 1, 0, 2, 2, 1, 3, 3, 0, 4, 4, 4, 4] )
What it the most efficient way to swap one set of values for another when there are duplicates? For example, let
s1 = [1,2,4]
s2 = [4,1,2]
An inefficient swapping method would iterate through s1 and s2 as so:
B2 = B.copy()
for x,y in zip(s1,s2):
B2[B==x] = y
Giving as output
B2 -> [4, 4, 4, 0, 1, 1, 4, 3, 3, 0, 2, 2, 2, 2]
Is there a way to do this essentially in-place without the zip loop?
>>> B = np.array( [1, 1, 1, 0, 2, 2, 1, 3, 3, 0, 4, 4, 4, 4] )
>>> s1 = [1,2,4]
>>> s2 = [4,1,2]
>>> B2 = B.copy()
>>> c, d = np.where(B == np.array(s1)[:,np.newaxis])
>>> B2[d] = np.repeat(s2,np.bincount(c))
>>> B2
array([4, 4, 4, 0, 1, 1, 4, 3, 3, 0, 2, 2, 2, 2])
If you have only integers that are between 0 and n (if not its no problem to generalize to any integer range unless its very sparse), the most efficient way is the use of take/fancy indexing:
swap = np.arange(B.max() + 1) # all values in B
swap[s1] = s2 # replace the values you want to be replaced
B2 = swap.take(B) # or swap[B]
This is seems almost twice as fast for the small B given here, but with larger B it gets even more speedup repeating B to a length of about 100000 gives 8x already. This also avoids the == operation for every s1 element, so will scale much better as s1/s2 get large.
EDIT: you could also use np.put (also in the other answer) for some speedup for swap[s1] = s2. For these 1D problems take/put are simply faster.