Shaped gradient fill in numpy/scipy - numpy

Looking for a way to fill all of the values within an arbitrary shape with a gradient of values... which must follow the outline of the shape. For example, the "shaped gradient" fill tool in gimp would give you:
Output should be a 2d numpy array.

You could take a look at scipy.ndimage.morphology.distance_transform_edt. This will return the distance to the closest background pixel.
First, you will need to create a binary image of your arbitrary shape
import numpy as np
from scipy.ndimage.morphology import distance_transform_edt
# create dummy image
a = np.arange(100).reshape([10, 10])
# use threshold to define arbitrary shape
b = (a > 54).astype('uint8')
print(b)
[[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 0 0 0 0 0]
[0 0 0 0 0 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]
[1 1 1 1 1 1 1 1 1 1]]
Then, apply the distance transform to the binary image. The output will look like below, with smaller values corresponding to those closer to the edge of the binary object.
# apply Euclidean distance transform
d = distance_transform_edt(b)
print(d.round(2))
[[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[0. 0. 0. 0. 0. 0. 0. 0. 0. 0. ]
[0. 0. 0. 0. 0. 1. 1. 1. 1. 1. ]
[1. 1. 1. 1. 1. 1.41 2. 2. 2. 2. ]
[2. 2. 2. 2. 2. 2.24 2.83 3. 3. 3. ]
[3. 3. 3. 3. 3. 3.16 3.61 4. 4. 4. ]
[4. 4. 4. 4. 4. 4.12 4.47 5. 5. 5. ]]
A color map could then be defined for the range of values in d.

Related

numpy append() function doesn't change my ndarray?

I want to add something to a ndarray, what am I doing wrong?
import numpy as np
sequence =np.repeat(1, 4)
print(sequence)
np.append(sequence, 7)
print(sequence)
Expected result in console:
[1 1 1 1]
[1 1 1 1 7]
Actual result:
[1 1 1 1]
[1 1 1 1]
np.append() returns a new list, so what you wanna do is you wanna do sequence = np.append(sequence, 7)

improve performance of double loop in pandas

I have a dataframe consisting of numeric and categorical fields:
import pandas as pd
df2=pd.DataFrame({'col1':[1,2,3,4],'col2':[5,6,7,8], 'col3':['cat','cat','dog','bird']})
df2
And am calculating how similar each row is with the following code:
#calculate distance matrix comparing how similar two rows are
vals=[]
for i in range(len(df2)):
for j in range(len(df2)):
if(j<=i): continue
a=df2.iloc[i,:]
b=df2.iloc[j,:]
d0=(a[0]-b[0])**2
d1=(a[1]-b[1])**2
d2=np.where(a[2]==b[2],0,10)**2
row_values=(i,j, (d0 + d1 +d2)**0.5)
vals.append(row_values)
new_df = pd.DataFrame(vals, columns =['Row1','Row2','Difference'])
new_df
this works fine for a small dataframe, but when I implement it similarly to a dataframe that has 10k rows and 10 columns being used, it takes a very loooong time to compute.
Are there any suggestions on how to improve the processing power of this code?
I start with:
col1 col2 col3
0 1 5 cat
1 2 6 cat
2 3 7 dog
3 4 8 bird
and end up with:
Row1 Row2 Difference
0 0 1 1.414214
1 0 2 10.392305
2 0 3 10.862780
3 1 2 10.099505
4 1 3 10.392305
5 2 3 10.099505
I am calculating the distance between each row of data.
This is a distance matrix problem, so we can use distance_matrix and broadcasting. But note that this only works when your data is not too large.
from scipy.spatial import distance_matrix
# normal distance:
d01 = distance_matrix(df2[['col1','col2']].values, df2[['col1','col2']].values)**2
# category distance
d2 = x = df2['col3'].values[:,None] != df2['col3'].values
# the matrix
dist_mat = np.sqrt(d1 + x*100)
# we only care for the distance with row != col
np.triu(dist_mat)
Output:
array([[ 0. , 1.41421356, 10.39230485, 10.86278049],
[ 0. , 0. , 10.09950494, 10.39230485],
[ 0. , 0. , 0. , 10.09950494],
[ 0. , 0. , 0. , 0. ]])

Confusion matrix order in tensorflow

I have 6 classes and I used tf-slim in Tensorflow to obtained the confusion matrix such as
[[41 2 0 0 0 0]
[ 1 11 4 1 0 0]
[ 0 1 12 0 0 0]
[ 0 0 0 22 1 0]
[ 0 0 0 0 7 0]
[ 0 0 0 0 0 20]]
My question is that what is confusion matrix order of the above table? Is it right if I said that the columns represent the prediction label, while the rows represent the true label? Some reference said on opposite side.
Did you use tf.confusion_matrix(labels,predictions)?
If so, the columns represent the predicton labels, whereas the rows represent the real labels.
The usual representation is
PREDICTED
[[41 2 0 0 0 0]
T [ 1 11 4 1 0 0]
R [ 0 1 12 0 0 0]
U [ 0 0 0 22 1 0]
E [ 0 0 0 0 7 0]
[ 0 0 0 0 0 20]]
As pointed out by M. Rath (+1), this is also what Tensorflow does. This means for 41 samples you correctly predicted class 0. For 2 samples, you predicted class 1, but it actually was class 0.
Please note that you can also manipulate the order for visualizations. So instead of
class 0, class 1, class 2
you could have (for both, prediction and true value) the order
class 0, class 2, class 1
This contains the same information, but a visualization might convey a different story. See my masters thesis Analysis and Optimization of
Convolutional Neural Network Architectures page 48 (Confusion Matrix Ordering), especially figure 5.12 and 5.13.
An implementation can be found in the tool clana

Given a tensor [5,4,3,4], how to generate a constant tensor where each row has n ones and m zeros, n=5,4,3,4, and m=0,1,2,1.

Given a tensor A: [5,4,3,4], I want to create a tensor B:
[[1,1,1,1,1],
[1,1,1,1,0],
[1,1,1,0,0],
[1,1,1,1,0]]
Each row of B has n ones where n = 5,4,3,4 according to A. The remaining positions are filled with zeros.
Can I realize this in tensorflow, and how?
You can use tf.sequence_mask for this.
import tensorflow as tf
A = tf.constant([5,4,3,4], dtype=tf.int32)
max_len = tf.reduce_max(A)
B = tf.sequence_mask(A, max_len, dtype=tf.int32)
with tf.Session() as sess:
print(sess.run(B))
Prints:
[[1 1 1 1 1]
[1 1 1 1 0]
[1 1 1 0 0]
[1 1 1 1 0]]

Numpy indexing in 3 dimensions

In [93]: a = np.arange(24).reshape(2, 3, 4)
In [94]: a[0, 1, ::2]
Out[94]: array([4, 6])
Can someone explain what '::2' means here?
Thanks!
::2 means : in this dimension, get all the "layers" having a pair index (starting from 0, counting by 2).
it means: get the element at a[0, 1, 0] and a[0, 1, 2] and put it into the same array.
each index position (you have 3 in this sample) is indexable and "sliceable". perhaps you saw slices like [this:slice] before in normal arrays. well... slices can also have a third value which is the "step" value.
so: [a:b:c] means [startPosition:endPosition:step] where endPosition is not included.
so having ::2 means start=0, end=the end of the ... dimension, step=2.
you have at most 4 in that dimension (see your reshape line), so the index it will count are 0 and 2 (1 and 3 are skipped, and 3 is the last element).
0 0 0 => 0
0 0 1 => 1
0 0 2 => 2
0 0 3 => 3
0 1 0 => 4 -> (0, 1, 0) is rescued via the slice
0 1 1 => 5
0 1 2 => 6 -> (0, 1, 2) is rescued via the slice