I have the following array (3 decks of 7 cards). They are sorted by row and I want to see if there are 5 consecutive numbers. The below code works but has a mistake: when there is a duplicate (like in row 1) the result is incorrect:
cards=
[[ 12. 6. 6. 5. 4. 2. 1.]
[ 12. 9. 6. 6. 1. 1. 1.]
[ 6. 6. 1. 1. 0. 0. 0.]]
cardAmount=cards[0,:].size
has4=cards[:,np.arange(0,cardAmount-4)]-cards[:,np.arange(cardAmount-3,cardAmount)]
isStraight=np.any(has4 == 4, axis=1)
has4 (shows if there is a difference of 4 between any of the cards 5 positions apart)
[[ 8. 4. 5.]
[ 11. 8. 5.]
[ 6. 6. 1.]]
isStraight checks if any of the rows contains a 4, which means there is a straight. Result is incorrect for the first row because the duplicates are not ignored.
[ True False False]
The difficulty is that there is no way in numpy to do a np.unique with return_counts=True on a by row basis, as the results would have different lengths.
Any suggestions are appreciated. It has to be numpy only (or pandas if the speed is not compromised).
I think this is the solution. Is there a way to make it even simpler?
iterations=3
cardAmount=cards[0,:].size
counts=(cards[:,:,None] == np.arange(12,0,-1)).sum(1) # occurences of each cards
present=counts
present[present>1]=1
s1=np.sum(present[:,0:5], axis=1)
s2=np.sum(present[:,1:6], axis=1)
s3=np.sum(present[:,2:7], axis=1)
s=np.stack((s1,s2,s3)).T
s[s < 5] = -1
s[s == 6] = 5
s[s ==7] = 5
s_index=np.argmax(s,axis=1)
straight=s[np.arange(iterations),s_index]>=0
Related
I have a numpy array named "distances" which looks like this:
[[ 5. 1. 1. 1. 2. 1. 3. 1. 1. 1.]
[ 5. 4. 4. 5. 7. 10. 3. 2. 1. 1.]
[ 3. 1. 1. 1. 2. 2. 3. 1. 1. 0.]
[ 6. 8. 8. 1. 3. 4. 3. 7. 1. 1.]
[ 4. 1. 1. 3. 2. 1. 3. 1. 1. 1.]
[ 8. 10. 10. 8. 7. 10. 9. 7. 1. 1.]
[ 1. 1. 1. 1. 2. 10. 3. 1. 1. 0.]
[ 2. 1. 2. 1. 2. 1. 3. 1. 1. 0.]
[ 2. 1. 1. 1. 2. 1. 1. 1. 5. 2.]
[ 4. 2. 1. 1. 2. 1. 2. 1. 1. 1.]]
I want to make a new 3*9 numpy array by taking mean like this:
If last column is 0, define an array c0 (1*9) which is mean of all such rows where last column is 0 where each column is mean of the columns from such rows.
If last column is 1, define an array c1 (1*9) which is mean of all such rows where last column is 1 where each column is mean of the columns from such rows.
If last column is 2, define an array c2 (1*9) which is mean of all such rows where last column is 2 where each column is mean of the columns from such rows.
Post doing this I am doing hstack to get final 3*9 array. I am sure this is the long approach but none the less wrong.
code:
c0=distances.mean(axis=1)
final = np.hstack((c0,c1,c2))
Doing this I get 1*10 array where each column is average of each column from distances array, however I am unable to find a way to do so on a condition that only take average when last column of rows is 0 only ?
With pandas
Would be straight-forward with pandas -
import pandas as pd
df = pd.DataFrame(distances)
df_out = df.groupby(df.shape[1]-1).mean()
df_out['ID'] = df_out.index
out = df_out.values
With NumPy
Using Custom-function
For a NumPy-specific one, we can use groupbycol (perform group-based summations) and hence solve our case, like so -
sums = groupbycol(distances, assume_sorted_col=False, colID=-1)
out = sums/np.bincount(distances[:,-1]).astype(float)[:,None]
With matrix-multiplication
mask = distances[:,-1,None] == np.arange(distances[:,-1].max()+1)
out = mask.T.dot(distances)/mask.sum(0)[:,None].astype(float)
I was able to do it like this:
c0= (distances[distances[:,-1] == 0][:,0:9]).mean(axis=0)
c1 = (distances[distances[:,-1] == 1][:,0:9]).mean(axis=0)
c2 = (distances[distances[:,-1] == 2][:,0:9]).mean(axis=0)
Assume I have a tensor of shape [batch_size, T, d] where
T is number of frames for a speech file and d is the dimension of MFCC. Now I would like to expand the context for the left and right frames like this function in numpy:
def make_context(feature, left, right):
'''
Takes a 2-D numpy feature array, and pads each frame with a specified
number of frames on either side.
'''
feature = [feature]
for i in range(left):
feature.append(numpy.vstack((feature[-1][0], feature[-1][:-1])))
feature.reverse()
for i in range(right):
feature.append(numpy.vstack((feature[-1][1:], feature[-1][-1])))
return numpy.hstack(feature)
How to implement this function in tensorflow or keras?
You can use tf.map_fn and tf.py_func to implement this function in tensorflow. tf.map_fn can be used to handle every element in batch. tf.py_func can apply this function to element. For example:
import tensorflow as tf
import numpy as np
def make_context(feature, left, right):
feature = [feature]
for i in range(left):
feature.append(np.vstack((feature[-1][0], feature[-1][:-1])))
feature.reverse()
for i in range(right):
feature.append(np.vstack((feature[-1][1:], feature[-1][-1])))
return np.hstack(feature)
# numpy usage
feature = np.array([[1,2],[3,4],[5,6]])
print(make_context(feature, 2, 3))
# tensorflow usage
feature_tf = tf.placeholder(shape=(None,None,None),dtype=tf.float32)
result = tf.map_fn(lambda element: tf.py_func(lambda feature, left, right: make_context(feature, left, right)
,[element,2,3]
,tf.float32)
,feature_tf,tf.float32)
with tf.Session() as sess:
print(sess.run(result,feed_dict={feature_tf:np.array([feature,feature])}))
# print
[[1 2 1 2 1 2 3 4 5 6 5 6]
[1 2 1 2 3 4 5 6 5 6 5 6]
[1 2 3 4 5 6 5 6 5 6 5 6]]
[[[1. 2. 1. 2. 1. 2. 3. 4. 5. 6. 5. 6.]
[1. 2. 1. 2. 3. 4. 5. 6. 5. 6. 5. 6.]
[1. 2. 3. 4. 5. 6. 5. 6. 5. 6. 5. 6.]]
[[1. 2. 1. 2. 1. 2. 3. 4. 5. 6. 5. 6.]
[1. 2. 1. 2. 3. 4. 5. 6. 5. 6. 5. 6.]
[1. 2. 3. 4. 5. 6. 5. 6. 5. 6. 5. 6.]]]
I generated input tensor A tensor using the following codes in tensorflow;
import tensorflow as tf
A = tf.constant(1.0, shape = [10, 10])
with tf.Session() as sess:
print(sess.run(A))
output = [[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]
[ 1. 1. 1. 1. 1. 1. 1. 1. 1. 1.]]
I want to set parts of the entries to zero, say half or quarter along either the column or raw and I did the following;
import numpy as np
output = np.array(A)
A1 = output[:, output.shape[1]//2:] = 0
print(A1)
But I was getting error 'tuple index out of range' Please help
print(sess.run(A1))
Just create the single parts separately and then concatenate them:
A = tf.ones(shape=[10, 5])
B = tf.zeros(shape=[10,5])
AB = tf.concat((A,B), axis=1)
Same holds for a row-wise split.
We know that A*B*C = A*(B*C), but why this matrix multiplication got different result?
import numpy as np
A = np.array([[1,2,3],[4,5,6]])
B = np.array([[1,2,3],[4,5,6],[7,8,9]])
print( A.dot( np.linalg.inv(B) ).dot(A.T) )
print( A.dot( np.linalg.inv(B).dot(A.T) ) )
The result is
[[ 0.5 2. ]
[ 1. 4. ]]
and
[[ 2. 4.]
[ 8. 16.]]
B is of insufficient rank to take an inverse. To get at least consistent results, use np.linalg.pinv for the pseudo inverse.
np.linalg.matrix_rank(B)
# we want 3
# we got 2
2
A = np.array([[1,2,3],[4,5,6]])
B = np.array([[1,2,3],[4,5,6],[7,8,9]])
print( A.dot( np.linalg.pinv(B) ).dot(A.T) )
print( A.dot( np.linalg.pinv(B).dot(A.T) ) )
[[ 1. 4.]
[ 2. 5.]]
[[ 1. 4.]
[ 2. 5.]]
Floating point arithmetical operations are not associative. Usually we don't notice this because the numerical differences between matrices A*(B*C) and (A*B)*C are tiny. But in this case, you are trying to invert a non-invertible matrix B, which Numpy actually tries to do, getting some absurd result:
[[ 3.15251974e+15 -6.30503948e+15 3.15251974e+15]
[ -6.30503948e+15 1.26100790e+16 -6.30503948e+15]
[ 3.15251974e+15 -6.30503948e+15 3.15251974e+15]]
The magnitude of these numbers is such that errors of size ~1 are to be expected at the double precision level (you get about 16 accurate digits). The multiplication by A and A.T brings the matrix entires back to something small, due to a lot of cancellation. But when very large numbers cancel each other, the relative error grows; and the result ends up being fairly meaningless.
I am trying to apply DBSCAN on a dataset of (Lan,Lat) .. The algorithm is very sensitive for the parameter; EPS & MinPts.
I would like to have a look through a Histogram over the data, to determine the proper values. Unfortunately, Matplotlib Hist() take only 1D array.
Passing a 2D matrix as argument, Hist() treats each column as a separate input.
Scatter plot and histograms:
Does anyone has a way to solve this,
If you follow the DBSCAN article, you only need the 4-nearest-neighbor distance for each object, not all pairwise distances. I.e., a 1 dimensional array.
Instead of doing a histogram, they sort the values, and try to choose a knee in this plot.
find the 4 nearest neighbor of each object
collect all 4NN distances in one array
sort this array in descending order
plot the resulting curve
look for a knee, often best at around 5%-10% of your x axis (so 95%-90% of objects are core points).
For details, see the original DBSCAN publication!
You could use numpy.histogram2d:
import numpy as np
np.random.seed(2016)
N = 100
arr = np.random.random((N, 2))
xedges = np.linspace(0, 1, 10)
yedges = np.linspace(0, 1, 10)
lat = arr[:, 0]
lng = arr[:, 1]
hist, xedges, yedges = np.histogram2d(lat, lng, (xedges, yedges))
print(hist)
yields
[[ 0. 0. 5. 0. 3. 0. 0. 0. 3.]
[ 0. 3. 0. 3. 0. 0. 4. 0. 2.]
[ 2. 2. 1. 1. 1. 1. 3. 0. 1.]
[ 2. 1. 0. 3. 1. 2. 1. 1. 3.]
[ 3. 0. 3. 2. 0. 1. 0. 2. 0.]
[ 3. 2. 3. 1. 1. 2. 1. 1. 0.]
[ 2. 3. 0. 1. 0. 1. 3. 0. 0.]
[ 1. 1. 1. 1. 2. 0. 2. 1. 1.]
[ 0. 1. 1. 0. 1. 1. 2. 0. 0.]]
Or to visualize the histogram:
import matplotlib.pyplot as plt
fig, ax = plt.subplots()
ax.imshow(hist)
plt.show()