Find 5 consecutive numbers in numpy array by row, ignore duplicates

Find 5 consecutive numbers in numpy array by row, ignore duplicates - numpy

I have the following array (3 decks of 7 cards). They are sorted by row and I want to see if there are 5 consecutive numbers. The below code works but has a mistake: when there is a duplicate (like in row 1) the result is incorrect:
cards=
[[ 12. 6. 6. 5. 4. 2. 1.]
[ 12. 9. 6. 6. 1. 1. 1.]
[ 6. 6. 1. 1. 0. 0. 0.]]
cardAmount=cards[0,:].size
has4=cards[:,np.arange(0,cardAmount-4)]-cards[:,np.arange(cardAmount-3,cardAmount)]
isStraight=np.any(has4 == 4, axis=1)
has4 (shows if there is a difference of 4 between any of the cards 5 positions apart)
[[ 8. 4. 5.]
[ 11. 8. 5.]
[ 6. 6. 1.]]
isStraight checks if any of the rows contains a 4, which means there is a straight. Result is incorrect for the first row because the duplicates are not ignored.
[ True False False]
The difficulty is that there is no way in numpy to do a np.unique with return_counts=True on a by row basis, as the results would have different lengths.
Any suggestions are appreciated. It has to be numpy only (or pandas if the speed is not compromised).

I think this is the solution. Is there a way to make it even simpler?
iterations=3
cardAmount=cards[0,:].size
counts=(cards[:,:,None] == np.arange(12,0,-1)).sum(1) # occurences of each cards
present=counts
present[present>1]=1
s1=np.sum(present[:,0:5], axis=1)
s2=np.sum(present[:,1:6], axis=1)
s3=np.sum(present[:,2:7], axis=1)
s=np.stack((s1,s2,s3)).T
s[s < 5] = -1
s[s == 6] = 5
s[s ==7] = 5
s_index=np.argmax(s,axis=1)
straight=s[np.arange(iterations),s_index]>=0

Related

Conditional mean in numpy arrays?

I have a numpy array named "distances" which looks like this:
[[ 5. 1. 1. 1. 2. 1. 3. 1. 1. 1.]
[ 5. 4. 4. 5. 7. 10. 3. 2. 1. 1.]
[ 3. 1. 1. 1. 2. 2. 3. 1. 1. 0.]
[ 6. 8. 8. 1. 3. 4. 3. 7. 1. 1.]
[ 4. 1. 1. 3. 2. 1. 3. 1. 1. 1.]
[ 8. 10. 10. 8. 7. 10. 9. 7. 1. 1.]
[ 1. 1. 1. 1. 2. 10. 3. 1. 1. 0.]
[ 2. 1. 2. 1. 2. 1. 3. 1. 1. 0.]
[ 2. 1. 1. 1. 2. 1. 1. 1. 5. 2.]
[ 4. 2. 1. 1. 2. 1. 2. 1. 1. 1.]]
I want to make a new 3*9 numpy array by taking mean like this:
If last column is 0, define an array c0 (1*9) which is mean of all such rows where last column is 0 where each column is mean of the columns from such rows.
If last column is 1, define an array c1 (1*9) which is mean of all such rows where last column is 1 where each column is mean of the columns from such rows.
If last column is 2, define an array c2 (1*9) which is mean of all such rows where last column is 2 where each column is mean of the columns from such rows.
Post doing this I am doing hstack to get final 3*9 array. I am sure this is the long approach but none the less wrong.
code:
c0=distances.mean(axis=1)
final = np.hstack((c0,c1,c2))
Doing this I get 1*10 array where each column is average of each column from distances array, however I am unable to find a way to do so on a condition that only take average when last column of rows is 0 only ?

With pandas
Would be straight-forward with pandas -
import pandas as pd
df = pd.DataFrame(distances)
df_out = df.groupby(df.shape[1]-1).mean()
df_out['ID'] = df_out.index
out = df_out.values
With NumPy
Using Custom-function
For a NumPy-specific one, we can use groupbycol (perform group-based summations) and hence solve our case, like so -
sums = groupbycol(distances, assume_sorted_col=False, colID=-1)
out = sums/np.bincount(distances[:,-1]).astype(float)[:,None]
With matrix-multiplication
mask = distances[:,-1,None] == np.arange(distances[:,-1].max()+1)
out = mask.T.dot(distances)/mask.sum(0)[:,None].astype(float)

I was able to do it like this:
c0= (distances[distances[:,-1] == 0][:,0:9]).mean(axis=0)
c1 = (distances[distances[:,-1] == 1][:,0:9]).mean(axis=0)
c2 = (distances[distances[:,-1] == 2][:,0:9]).mean(axis=0)

Context expansion for speech frames in tensorflow or keras

Assume I have a tensor of shape [batch_size, T, d] where
T is number of frames for a speech file and d is the dimension of MFCC. Now I would like to expand the context for the left and right frames like this function in numpy:
def make_context(feature, left, right):
'''
Takes a 2-D numpy feature array, and pads each frame with a specified
number of frames on either side.
'''
feature = [feature]
for i in range(left):
feature.append(numpy.vstack((feature[-1][0], feature[-1][:-1])))
feature.reverse()
for i in range(right):
feature.append(numpy.vstack((feature[-1][1:], feature[-1][-1])))
return numpy.hstack(feature)
How to implement this function in tensorflow or keras?

You can use tf.map_fn and tf.py_func to implement this function in tensorflow. tf.map_fn can be used to handle every element in batch. tf.py_func can apply this function to element. For example:
import tensorflow as tf
import numpy as np
def make_context(feature, left, right):
feature = [feature]
for i in range(left):
feature.append(np.vstack((feature[-1][0], feature[-1][:-1])))
feature.reverse()
for i in range(right):
feature.append(np.vstack((feature[-1][1:], feature[-1][-1])))
return np.hstack(feature)
# numpy usage
feature = np.array([[1,2],[3,4],[5,6]])
print(make_context(feature, 2, 3))
# tensorflow usage
feature_tf = tf.placeholder(shape=(None,None,None),dtype=tf.float32)
result = tf.map_fn(lambda element: tf.py_func(lambda feature, left, right: make_context(feature, left, right)
,[element,2,3]
,tf.float32)
,feature_tf,tf.float32)
with tf.Session() as sess:
print(sess.run(result,feed_dict={feature_tf:np.array([feature,feature])}))
# print
[[1 2 1 2 1 2 3 4 5 6 5 6]
[1 2 1 2 3 4 5 6 5 6 5 6]
[1 2 3 4 5 6 5 6 5 6 5 6]]
[[[1. 2. 1. 2. 1. 2. 3. 4. 5. 6. 5. 6.]
[1. 2. 1. 2. 3. 4. 5. 6. 5. 6. 5. 6.]
[1. 2. 3. 4. 5. 6. 5. 6. 5. 6. 5. 6.]]
[[1. 2. 1. 2. 1. 2. 3. 4. 5. 6. 5. 6.]
[1. 2. 1. 2. 3. 4. 5. 6. 5. 6. 5. 6.]
[1. 2. 3. 4. 5. 6. 5. 6. 5. 6. 5. 6.]]]

how to generate array tensor in tensorflow

I generated input tensor A tensor using the following codes in tensorflow;
import tensorflow as tf
A = tf.constant(1.0, shape = [10, 10])
with tf.Session() as sess:
print(sess.run(A))
output = [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
I want to set parts of the entries to zero, say half or quarter along either the column or raw and I did the following;
import numpy as np
output = np.array(A)
A1 = output[:, output.shape[1]//2:] = 0
print(A1)
But I was getting error 'tuple index out of range' Please help
print(sess.run(A1))

Just create the single parts separately and then concatenate them:
A = tf.ones(shape=[10, 5])
B = tf.zeros(shape=[10,5])
AB = tf.concat((A,B), axis=1)
Same holds for a row-wise split.

Why does matrix multiplication give different results depending on how they are grouped?

We know that A*B*C = A*(B*C), but why this matrix multiplication got different result?
import numpy as np
A = np.array([[1,2,3],[4,5,6]])
B = np.array([[1,2,3],[4,5,6],[7,8,9]])
print( A.dot( np.linalg.inv(B) ).dot(A.T) )
print( A.dot( np.linalg.inv(B).dot(A.T) ) )
The result is
[[ 0.5 2. ]
[ 1. 4. ]]
and
[[ 2. 4.]
[ 8. 16.]]

B is of insufficient rank to take an inverse. To get at least consistent results, use np.linalg.pinv for the pseudo inverse.
np.linalg.matrix_rank(B)
# we want 3
# we got 2
2
A = np.array([[1,2,3],[4,5,6]])
B = np.array([[1,2,3],[4,5,6],[7,8,9]])
print( A.dot( np.linalg.pinv(B) ).dot(A.T) )
print( A.dot( np.linalg.pinv(B).dot(A.T) ) )
[[ 1. 4.]
[ 2. 5.]]
[[ 1. 4.]
[ 2. 5.]]

Floating point arithmetical operations are not associative. Usually we don't notice this because the numerical differences between matrices A*(B*C) and (A*B)*C are tiny. But in this case, you are trying to invert a non-invertible matrix B, which Numpy actually tries to do, getting some absurd result:
[[ 3.15251974e+15 -6.30503948e+15 3.15251974e+15]
[ -6.30503948e+15 1.26100790e+16 -6.30503948e+15]
[ 3.15251974e+15 -6.30503948e+15 3.15251974e+15]]
The magnitude of these numbers is such that errors of size ~1 are to be expected at the double precision level (you get about 16 accurate digits). The multiplication by A and A.T brings the matrix entires back to something small, due to a lot of cancellation. But when very large numbers cancel each other, the relative error grows; and the result ends up being fairly meaningless.

Plotting a histogram of 2D numpyArray of (latitude, latitude), in order to determine the proper values for DBSCAN

I am trying to apply DBSCAN on a dataset of (Lan,Lat) .. The algorithm is very sensitive for the parameter; EPS & MinPts.
I would like to have a look through a Histogram over the data, to determine the proper values. Unfortunately, Matplotlib Hist() take only 1D array.
Passing a 2D matrix as argument, Hist() treats each column as a separate input.
Scatter plot and histograms:
Does anyone has a way to solve this,

If you follow the DBSCAN article, you only need the 4-nearest-neighbor distance for each object, not all pairwise distances. I.e., a 1 dimensional array.
Instead of doing a histogram, they sort the values, and try to choose a knee in this plot.
find the 4 nearest neighbor of each object
collect all 4NN distances in one array
sort this array in descending order
plot the resulting curve
look for a knee, often best at around 5%-10% of your x axis (so 95%-90% of objects are core points).
For details, see the original DBSCAN publication!

You could use numpy.histogram2d:
import numpy as np
np.random.seed(2016)
N = 100
arr = np.random.random((N, 2))
xedges = np.linspace(0, 1, 10)
yedges = np.linspace(0, 1, 10)
lat = arr[:, 0]
lng = arr[:, 1]
hist, xedges, yedges = np.histogram2d(lat, lng, (xedges, yedges))
print(hist)
yields
[[ 0. 0. 5. 0. 3. 0. 0. 0. 3.]
[ 0. 3. 0. 3. 0. 0. 4. 0. 2.]
[ 2. 2. 1. 1. 1. 1. 3. 0. 1.]
[ 2. 1. 0. 3. 1. 2. 1. 1. 3.]
[ 3. 0. 3. 2. 0. 1. 0. 2. 0.]
[ 3. 2. 3. 1. 1. 2. 1. 1. 0.]
[ 2. 3. 0. 1. 0. 1. 3. 0. 0.]
[ 1. 1. 1. 1. 2. 0. 2. 1. 1.]
[ 0. 1. 1. 0. 1. 1. 2. 0. 0.]]
Or to visualize the histogram:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.imshow(hist)
plt.show()

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Find 5 consecutive numbers in numpy array by row, ignore duplicates - numpy

Related

Conditional mean in numpy arrays?

Context expansion for speech frames in tensorflow or keras

how to generate array tensor in tensorflow

Why does matrix multiplication give different results depending on how they are grouped?

Plotting a histogram of 2D numpyArray of (latitude, latitude), in order to determine the proper values for DBSCAN

Categories

Resources