Radial Interpolation of a DataFrame

Radial Interpolation of a DataFrame - dataframe

I have a dataframe (120,238), with 12 values spread across it. I am trying to use radial interpolation to fill up the remaining empty points. For that I created an list with the coordinates of the points, and another list with the values of each of these points.
for i in range(238):
col.append('')
df_map = pd.DataFrame(columns = col, index = range(120))
x_rbf = [8, 227, 19, 116, 11, 223, 5, 231, 116, 116, 13, 222] #x represents the columns
y_rbf = [59, 59, 102, 111, 17, 17, 9, 9, 62, 17, 7, 7] #y represents the rows
z_rbf = [16.2,15.99,16.2,16.3,15.7,15,14.2,14.2,16.4,16.4,13,11]
y = x_rbf, y_rbf
f = scipy.interpolate.RBFInterpolator(y,z_rbf)
However, when I run this code, I get the following error'
ValueError: Expected the first axis of `d` to have length 2.
Does anyone know how to go around this?

After countless tries, I figured out the issue with utilizing the RBF Interpolator. The x and y coordinates have to be flattened (using np.ravel()), and then stacked into one array
for i in range(238):
col.append('')
df_map = pd.DataFrame(columns = col, index = range(120))
x_rbf = [8, 227, 19, 116, 11, 223, 5, 231, 116, 116, 13, 222] #x represents the columns
y_rbf = [59, 59, 102, 111, 17, 17, 9, 9, 62, 17, 7, 7] #y represents the rows
z_rbf = [16.2,15.99,16.2,16.3,15.7,15,14.2,14.2,16.4,16.4,13,11]
sp = np.stack([y_rbf.ravel(),x_rbf.ravel()],-1)
f = scipy.interpolate.RBFInterpolator(sp,z_rbf.ravel(), kernel = 'linear')
Should work this way

Related

Feeding Word Embedding Matrix into a Pytorch LSTM Model

I have a LSTM model I am using to predict the unemployment rate from federal reserve filings. It uses glove vectors and vocab2index embedding and the training went as planned. However, upon attempting to feed a word embedding into the model for prediction testing it keeps throwing various errors.
Here is the model:
def load_glove_vectors(glove_file= glove_embedding_vectors_text_file):
"""Load the glove word vectors"""
word_vectors = {}
with open(glove_file) as f:
for line in f:
split = line.split()
word_vectors[split[0]] = np.array([float(x) for x in split[1:]])
return word_vectors
def get_emb_matrix(pretrained, word_counts, emb_size = 300):
""" Creates embedding matrix from word vectors"""
vocab_size = len(word_counts) + 2
vocab_to_idx = {}
vocab = ["", "UNK"]
W = np.zeros((vocab_size, emb_size), dtype="float32")
W[0] = np.zeros(emb_size, dtype='float32') # adding a vector for padding
W[1] = np.random.uniform(-0.25, 0.25, emb_size) # adding a vector for unknown words
vocab_to_idx["UNK"] = 1
i = 2
for word in word_counts:
if word in word_vecs:
W[i] = word_vecs[word]
else:
W[i] = np.random.uniform(-0.25,0.25, emb_size)
vocab_to_idx[word] = i
vocab.append(word)
i += 1
return W, np.array(vocab), vocab_to_idx
word_vecs = load_glove_vectors()
pretrained_weights, vocab, vocab2index = get_emb_matrix(word_vecs, counts)
Unfortunately when I feed this array
[array([ 3, 10, 6287, 6, 113, 271, 3, 6639, 104, 5105, 7525,
104, 7526, 9, 23, 9, 10, 11, 24, 7527, 7528, 104,
11, 24, 7529, 7530, 104, 11, 24, 7531, 7530, 104, 11,
24, 7532, 7530, 104, 11, 24, 7533, 7534, 24, 7535, 7536,
104, 7537, 104, 7538, 7539, 7540, 6643, 7541, 7354, 7542, 7543,
7544, 9, 23, 9, 10, 11, 24, 25, 8, 10, 11,
24, 3, 10, 663, 168, 9, 10, 290, 291, 3, 4909,
198, 10, 1478, 169, 15, 4621, 3, 3244, 3, 59, 1967,
113, 59, 520, 198, 25, 5105, 7545, 7546, 7547, 7546, 7548,
7549, 7550, 1874, 10, 7551, 9, 10, 11, 24, 7552, 6287,
7553, 7554, 7555, 24, 7556, 24, 7557, 7558, 7559, 6, 7560,
323, 169, 10, 7561, 1432, 6, 3134, 3, 7562, 6, 7563,
1862, 7144, 741, 3, 3961, 7564, 7565, 520, 7566, 4833, 7567,
7568, 4901, 7569, 7570, 4901, 7571, 1874, 7572, 12, 13, 7573,
10, 7574, 7575, 59, 7576, 59, 638, 1620, 7577, 271, 6488,
59, 7578, 7579, 7580, 7581, 271, 7582, 7583, 24, 669, 5932,
7584, 9, 113, 271, 3764, 3, 5930, 3, 59, 4901, 7585,
793, 7586, 7587, 6, 1482, 520, 7588, 520, 7589, 3246, 7590,
13, 7591])
into torch.LongTensor() I keep getting the following error:
TypeError: can't convert np.ndarray of type numpy.object_. The only supported types are: float64, float32, float16, complex64, complex128, int64, int32, int16, int8, uint8, and bool.
Any ideas on how to remedy? I am fairly new to AI in general, and I am an economist by trade so I am almost certain I have made a boneheaded error.

How to merge classes in multiclass image segmentation

I am performing an image segmentation with a u-net model.
My mask has classes from 0-50.
I also have a text file dictionary with codes representing each class.
For example -
{1: '1234', 2:'5678', 3:'1245'} etc.
How do I combine when the 2 first string characters are the same so for example above key 1 and 3 are the same because they both start with "12".
How can I do this for all classes?

firstTwoCharDict = {}
for key, value in dictionary.items():
if key == 0:
value == value
firstTwoCharDict[key] = value
else:
value = value[:2]
firstTwoCharDict[key] = value
newDict = {}
for key, value in firstTwoCharDict.items():
if value not in newDict:
newDict[value] = [key]
else:
newDict[value].append(key)
This provides this
{'62': [1, 39],
'90': [2, 5, 9, 20, 32, 42, 47, 72, 88, 91, 95],
'97': [3, 49, 55],
'98': [4, 24, 34, 40, 53, 76, 81, 90, 96],
'31': [6, 17, 30, 48, 83],
'69': [7, 13, 15, 16, 27, 44, 51, 54, 56, 75],
'79': [8, 50],
'71': [10, 19, 22, 35, 61, 63, 65],
'99': [11, 12, 21, 46, 52, 69, 78, 84, 89],
'48': [14, 36, 74],
'60': [18],
'64': [23, 38, 66, 97]
```
Now i have an 2d array with integers, how do I replace them with they keys if the array values are equal to the values in the dict?

Numpy array changes shape when accessing with indices

I have a small matrix A with dimensions MxNxO
I have a large matrix B with dimensions KxMxNxP, with P>O
I have a vector ind of indices of dimension Ox1
I want to do:
B[1,:,:,ind] = A
But, the lefthand of my equation
B[1,:,:,ind].shape
is of dimension Ox1xMxN and therefore I can not broadcast A (MxNxO) into it.
Why does accessing B in this way change the dimensions of the left side?
How can I easily achieve my goal?
Thanks

There's a feature, if not a bug, that when slices are mixed in the middle of advanced indexing, the sliced dimensions are put at the end.
Thus for example:
In [204]: B = np.zeros((2,3,4,5),int)
In [205]: ind=[0,1,2,3,4]
In [206]: B[1,:,:,ind].shape
Out[206]: (5, 3, 4)
The 3,4 dimensions have been placed after the ind, 5.
We can get around that by indexing first with 1, and then the rest:
In [207]: B[1][:,:,ind].shape
Out[207]: (3, 4, 5)
In [208]: B[1][:,:,ind] = np.arange(3*4*5).reshape(3,4,5)
In [209]: B[1]
Out[209]:
array([[[ 0, 1, 2, 3, 4],
[ 5, 6, 7, 8, 9],
[10, 11, 12, 13, 14],
[15, 16, 17, 18, 19]],
[[20, 21, 22, 23, 24],
[25, 26, 27, 28, 29],
[30, 31, 32, 33, 34],
[35, 36, 37, 38, 39]],
[[40, 41, 42, 43, 44],
[45, 46, 47, 48, 49],
[50, 51, 52, 53, 54],
[55, 56, 57, 58, 59]]])
This only works when that first index is a scalar. If it too were a list (or array), we'd get an intermediate copy, and couldn't set the value like this.
https://docs.scipy.org/doc/numpy-1.15.0/reference/arrays.indexing.html#combining-advanced-and-basic-indexing
It's come up in other SO questions, though not recently.
weird result when using both slice indexing and boolean indexing on a 3d array

Extracting the indices of outliers in Linear Regression

The following script computes R-squared value between two numpy arrays(x and y).
The R-squared value is very low due to outliers in the data. How can I extract the indices of those outliers?
import numpy as np, matplotlib.pyplot as plt, scipy.stats as stats
x = np.random.random_integers(1,50,50)
y = np.random.random_integers(1,50,50)
r2 = stats.linregress(x, y) [3]**2
print r2
plt.scatter(x, y)
plt.show()

An outlier is defined as: value-mean > 2*standard deviation.
You can do this with the line
[i for i in range(len(x)) if (abs(x[i] - np.mean(x)) > 2*np.std(x))]
What is does:
A list is constructed from the indices of x, where the element at that index satisfies the condition described above.
A quick test:
x = np.random.random_integers(1,50,50)
this gives me the array:
array([16, 6, 13, 18, 21, 37, 31, 8, 1, 48, 4, 40, 9, 14, 6, 45, 20,
15, 14, 32, 30, 8, 19, 8, 34, 22, 49, 5, 22, 23, 39, 29, 37, 24,
45, 47, 21, 5, 4, 27, 48, 2, 22, 8, 12, 8, 49, 12, 15, 18])
Now I add some outliers manually as there are none initially:
x[4] = 200
x[15] = 178
lets test:
[i for i in range(len(x)) if (abs(x[i] - np.mean(x)) > 2*np.std(x))]
result:
[4, 15]
Is this what you was looking for?
EDIT:
I added the abs() function in the line above, because when you are working with negative numbers this might end bad. The abs() function takes the absolute value.

I think Sander's approach is the correct one, but if you must see R2 without those outliers before making a decision here is a way to do it.
Setup data and introduce outlier:
In [1]:
import numpy as np, scipy.stats as stats
np.random.seed(123)
x = np.random.random_integers(1,50,50)
y = np.random.random_integers(1,50,50)
y[5] = 100
Calculate R2 taking out one y value at a time (along with matching x value):
m = np.eye(y.shape[0])
r2 = np.apply_along_axis(lambda a: stats.linregress(np.delete(x, a.argmax()), np.delete(y, a.argmax()))[3]**2, 0, m)
Get index of the biggest outlier:
r2.argmax()
Out[1]:
5
Get R2 when this outlier is taken out:
In [2]:
r2[r2.argmax()]
Out[2]:
0.85892084723588935
Get the value of the outlier:
In [3]:
y[r2.argmax()]
Out[3]:
100
To get top n outliers:
In [4]:
n = 5
sorted_index = r2.argsort()[::-1]
sorted_index[:n]
Out [4]:
array([ 5, 27, 34, 0, 17], dtype=int64)

implementation of Hierarchial Agglomerative clustering

i am newbie and just want to implement Hierarchical Agglomerative clustering for RGB images. For this I extract all values of RGB from an image. And I process image.Next I find its distance and then develop the linkage. Now from linkage I want to extract my original data (i.e RGB values) on specified indices with indices id. Here is code I have done so far.
image = Image.open('image.jpg')
image = image.convert('RGB')
im = np.array(image).reshape((-1,3))
rgb = list(im.getdata())
X = pdist(im)
Y = linkage(X)
I = inconsistent(Y)
based on the 4th column of consistency. I opt minimum value of the cutoff in order to get maximum clusters.
cutoff = 0.7
cluster_assignments = fclusterdata(Y, cutoff)
# Print the indices of the data points in each cluster.
num_clusters = cluster_assignments.max()
print "%d clusters" % num_clusters
indices = cluster_indices(cluster_assignments)
ind = np.array(enumerate(rgb))
for k, ind in enumerate(indices):
print "cluster", k + 1, "is", ind
dendrogram(Y)
I got results like this
cluster 6 is [ 6 11]
cluster 7 is [ 9 12]
cluster 8 is [15]
Means cluster 6 contains the indices of 6 and 11 leafs. Now at this point I stuck in how to map these indices to get original data(i.e rgb values). indices of each rgb values to each pixel in the image. And then I have to generate codebook to implement Agglomeration Clustering. I have no idea how to approach this task. Read a lot of stuff but nothing clued.

Here is my solution:
import numpy as np
from scipy.cluster import hierarchy
im = np.array([[54,101,9],[ 67,89,27],[ 67,85,25],[ 55,106,1],[ 52,108,0],
[ 55,78,24],[ 19,57,8],[ 19,46,0],[ 95,110,15],[112,159,57],
[ 67,118,26],[ 76,127,35],[ 74,128,30],[ 25,62,0],[100,120,9],
[127,145,61],[ 48,112,25],[198,25,21],[203,11,10],[127,171,60],
[124,173,45],[120,133,19],[109,137,18],[ 60,85,0],[ 37,0,0],
[187,47,20],[127,170,52],[ 30,56,0]])
groups = hierarchy.fclusterdata(im, 0.7)
idx_sorted = np.argsort(groups)
group_sorted = groups[idx_sorted]
im_sorted = im[idx_sorted]
split_idx = np.where(np.diff(group_sorted) != 0)[0] + 1
np.split(im_sorted, split_idx)
output:
[array([[203, 11, 10],
[198, 25, 21]]),
array([[187, 47, 20]]),
array([[127, 171, 60],
[127, 170, 52]]),
array([[124, 173, 45]]),
array([[112, 159, 57]]),
array([[127, 145, 61]]),
array([[25, 62, 0],
[30, 56, 0]]),
array([[19, 57, 8]]),
array([[19, 46, 0]]),
array([[109, 137, 18],
[120, 133, 19]]),
array([[100, 120, 9],
[ 95, 110, 15]]),
array([[67, 89, 27],
[67, 85, 25]]),
array([[55, 78, 24]]),
array([[ 52, 108, 0],
[ 55, 106, 1]]),
array([[ 54, 101, 9]]),
array([[60, 85, 0]]),
array([[ 74, 128, 30],
[ 76, 127, 35]]),
array([[ 67, 118, 26]]),
array([[ 48, 112, 25]]),
array([[37, 0, 0]])]

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Radial Interpolation of a DataFrame - dataframe

Related

Feeding Word Embedding Matrix into a Pytorch LSTM Model

How to merge classes in multiclass image segmentation

Numpy array changes shape when accessing with indices

Extracting the indices of outliers in Linear Regression

implementation of Hierarchial Agglomerative clustering

Categories

Resources