Creating dictionary from a numpy array "ValueError: too many values to unpack" - numpy

I am trying to create a dictionary from a relatively large numpy array. I tried using the dictionary constructor like so:
elements =dict((k,v) for (a[:,0] , a[:,-1]) in myarray)
I am assuming I am doing this incorrectly since I get the error: "ValueError: too many values to unpack"
The numPy array looks like this:
[ 2.01206281e+13 -8.42110000e+04 -8.42110000e+04 ..., 0.00000000e+00
3.30000000e+02 -3.90343147e-03]
I want the first column 2.01206281e+13 to be the key and the last column -3.90343147e-03 to be the value for each row in the array
Am I on the right track/is there a better way to go about doing this?
Thanks
Edit: let me be more clear I want the first column to be the key and the last column to be the value. I want to do this for every row in the numpy array

This is kind of a hard question on answer without knowing what exactly myarray is, but this might help you get started.
>>> import numpy as np
>>> a = np.random.randint(0, 10, size=(3, 2))
>>> a
array([[1, 6],
[9, 3],
[2, 8]])
>>> dict(a)
{1: 6, 2: 8, 9: 3}
or
>>> a = np.random.randint(0, 10, size=(3, 5))
>>> a
array([[9, 7, 4, 4, 6],
[8, 9, 1, 6, 5],
[7, 5, 3, 4, 7]])
>>> dict(a[:, [0, -1]])
{7: 7, 8: 5, 9: 6}

elements = dict( zip( * [ iter( myarray ) ] * 2 ) )
What we see here is that we create an iterator based on the myarray list. We put it in a list and double it. Now we've got the same iterator bound to the first and second place in a list which we give as arguments to the zip function which creates a list of pairs for the dict creator.

Related

pytorch tensor indices is confusing [duplicate]

I am trying to access a pytorch tensor by a matrix of indices and I recently found this bit of code that I cannot find the reason why it is not working.
The code below is split into two parts. The first half proves to work, whilst the second trips an error. I fail to see the reason why. Could someone shed some light on this?
import torch
import numpy as np
a = torch.rand(32, 16)
m, n = a.shape
xx, yy = np.meshgrid(np.arange(m), np.arange(m))
result = a[xx] # WORKS for a torch.tensor of size M >= 32. It doesn't work otherwise.
a = torch.rand(16, 16)
m, n = a.shape
xx, yy = np.meshgrid(np.arange(m), np.arange(m))
result = a[xx] # IndexError: too many indices for tensor of dimension 2
and if I change a = np.random.rand(16, 16) it does work as well.
To whoever comes looking for an answer: it looks like its a bug in pyTorch.
Indexing using numpy arrays is not well defined, and it works only if tensors are indexed using tensors. So, in my example code, this works flawlessly:
a = torch.rand(M, N)
m, n = a.shape
xx, yy = torch.meshgrid(torch.arange(m), torch.arange(m), indexing='xy')
result = a[xx] # WORKS
I made a gist to check it, and it's available here
First, let me give you a quick insight into the idea of indexing a tensor with a numpy array and another tensor.
Example: this is our target tensor to be indexed
numpy_indices = torch.tensor([[0, 1, 2, 7],
[0, 1, 2, 3]]) # numpy array
tensor_indices = torch.tensor([[0, 1, 2, 7],
[0, 1, 2, 3]]) # 2D tensor
t = torch.tensor([[1, 2, 3, 4], # targeted tensor
[5, 6, 7, 8],
[9, 10, 11, 12],
[13, 14, 15, 16],
[17, 18, 19, 20],
[21, 22, 23, 24],
[25, 26, 27, 28],
[29, 30, 31, 32]])
numpy_result = t[numpy_indices]
tensor_result = t[tensor_indices]
Indexing using a 2D numpy array: the index is read like pairs (x,y) tensor[row,column] e.g. t[0,0], t[1,1], t[2,2], and t[7,3].
print(numpy_result) # tensor([ 1, 6, 11, 32])
Indexing using a 2D tensor: walks through the index tensor in a row-wise manner and each value is an index of a row in the targeted tensor.
e.g. [ [t[0],t[1],t[2],[7]] , [[0],[1],[2],[3]] ] see the example below, the new shape of tensor_result after indexing is (tensor_indices.shape[0],tensor_indices.shape[1],t.shape[1])=(2,4,4).
print(tensor_result) # tensor([[[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12],
# [29, 30, 31, 32]],
# [[ 1, 2, 3, 4],
# [ 5, 6, 7, 8],
# [ 9, 10, 11, 12],
# [ 13, 14, 15, 16]]])
If you try to add a third row in numpy_indices, you will get the same error you have because the index will be represented by 3D e.g., (0,0,0)...(7,3,3).
indices = np.array([[0, 1, 2, 7],
[0, 1, 2, 3],
[0, 1, 2, 3]])
print(numpy_result) # IndexError: too many indices for tensor of dimension 2
However, this is not the case with indexing by tensor and the shape will be bigger (3,4,4).
Finally, as you see the outputs of the two types of indexing are completely different. To solve your problem, you can use
xx = torch.tensor(xx).long() # convert a numpy array to a tensor
What happens in the case of advanced indexing (rows of numpy_indices > 3 ) as your situation is still ambiguous and unsolved and you can check 1 , 2, 3.

Create new panda dataframe with fixed distance using interpolate

I have a dataframe of the following form.
df = {'X': [0, 3, 6, 7, 8, 11],
'Y1': [8, 5, 4, 3, 2, 1.5],
'Y2': [1, 2, 4, 5, 5, 5]}
I would like to create a new dataframe where I use interpolate where 'X' is stepping in fixed steps [0, 2, 4, 6, 8, 10].
To find the new 'Y' values I need to find f(x)=Y1 and then I can evaluate for each step in X. But since I have many Y's I think there must be a more clever way to do this.
The solution I found was the following:
step_size = 0.25
no_steps = int(np.floor(max(b['X'])/step_size))
for i in range(0,no_steps+1):
b = b.append({'X' : 0.25*i, 'StepNo' : 10, 'PointNo' : 23+i}, ignore_index=True)
b = b.sort_values(['X'])
b = b.set_index(['X'])
c = b.interpolate('index')
c = c.reset_index()
c = c.sort_values(['PointNo'])
So first I define step size. Then I calculate number of steps. Then I append the steps into the dataframe. Sort the dataframe and reindex so I can use interpolate using 'index' as values.

numpy one dimensional array incase of list of different length

First of all i really sorry if i am posting a very silly question. I am very new to Numpy.
Question:
Scenario 1 :
import numpy as np
data=[1,2,3,4]
type(data)
array=np.array(data)
array
array.ndim
array.shape
OUTPUT:
array
Out[63]: array([1, 2, 3, 4])
array.ndim
Out[64]: 1
array.shape
Out[65]: (4,)
My question is what is the meaning of (4,). Does it mean it a single row having 4 element. Can we say it is row vector which has one row and 4 column.
If yes then it is creating confusion in the second scenario
Scenario 2 :
data1 =[[1,2,3,4],[5,6,7]]
array1 =np.array(data)
array1 =array1=np.array(data1)
array1
array1.ndim
array1.shape
OUTPUT:
array1
Out[67]: array([[1, 2, 3, 4], [5, 6, 7]], dtype=object)
array1.ndim
Out[68]: 1
array1.shape
Out[69]: (2,)
Here my question is the answer of array1.shape should be (7,) as the dimension is 1.
I want to know here , it is how many rows and how many column. Also why the output is (2,)
data is a list of numbers:
In [166]: data = [1,2,3,4]
arr is 1 dimensional array made from that list - with 4 elements
In [167]: arr = np.array(data)
In [168]: arr
Out[168]: array([1, 2, 3, 4])
In [169]: arr.ndim
Out[169]: 1
In [170]: arr.shape
Out[170]: (4,)
Note that the display of arr has the same [] as the original list.
When talking about 1d arrays, don't try to use row and column ideas. You did not start with a list of lists, and the array does not have rows. It is 1d.
(4,) is Python notation for a single element tuple.
data1 is a list of lists (of different length):
In [172]: data1 =[[1,2,3,4],[5,6,7]]
In [173]: data1
Out[173]: [[1, 2, 3, 4], [5, 6, 7]]
arr1 is a 1d array containing 2 lists. Note the object dtype:
In [174]: arr1 = np.array(data1)
In [175]: arr1
Out[175]: array([list([1, 2, 3, 4]), list([5, 6, 7])], dtype=object)
To get a 1d array of 7 numbers, we have to concatenate the sublists:
In [176]: np.hstack(data1)
Out[176]: array([1, 2, 3, 4, 5, 6, 7])
That operation is similar to a list join:
In [177]: data1[0] + data1[1]
Out[177]: [1, 2, 3, 4, 5, 6, 7]
If the sublists have equal length then we can make a 2d array - with 2 rows and 4 columns, shape (2,4):
In [178]: data2 =[[1,2,3,4],[5,6,7,8]]
In [179]: np.array(data2)
Out[179]:
array([[1, 2, 3, 4],
[5, 6, 7, 8]])

Default value when indexing outside of a numpy array, even with non-trivial indexing

Is it possible to look up entries from an nd array without throwing an IndexError?
I'm hoping for something like:
>>> a = np.arange(10) * 2
>>> a[[-4, 2, 8, 12]]
IndexError
>>> wrap(a, default=-1)[[-4, 2, 8, 12]]
[-1, 4, 16, -1]
>>> wrap(a, default=-1)[200]
-1
Or possibly more like get_with_default(a, [-4, 2, 8, 12], default=-1)
Is there some builtin way to do this? Can I ask numpy not to throw the exception and return garbage, which I can then replace with my default value?
np.take with clip mode, sort of does this
In [155]: a
Out[155]: array([ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18])
In [156]: a.take([-4,2,8,12],mode='raise')
...
IndexError: index 12 is out of bounds for size 10
In [157]: a.take([-4,2,8,12],mode='wrap')
Out[157]: array([12, 4, 16, 4])
In [158]: a.take([-4,2,8,12],mode='clip')
Out[158]: array([ 0, 4, 16, 18])
Except you don't have much control over the return value - here indexing on 12 return 18, the last value. And treated the -4 as out of bounds in the other direction, returning 0.
One way of adding the defaults is to pad a first
In [174]: a = np.arange(10) * 2
In [175]: ind=np.array([-4,2,8,12])
In [176]: np.pad(a, [1,1], 'constant', constant_values=-1).take(ind+1, mode='clip')
Out[176]: array([-1, 4, 16, -1])
Not exactly pretty, but a start.
This is my first post on any stack exchange site so forgive me for any stylistic errors (hopefully there are only stylistic errors). I am interested in the same feature but could not find anything from numpy better than np.take mentioned by hpaulj. Still np.take doesn't do exactly what's needed. Alfe's answer works but would need some elaboration in order to handle n-dimensional inputs. The following is another workaround that generalizes to the n-dimensional case. The basic idea is similar the one used by Alfe: create a new index with the out of bounds indices masked out (in my case) or disguised (in Alfe's case) and use it to index the input array without raising an error.
def take(a,indices,default=0):
#initialize mask; will broadcast to length of indices[0] in first iteration
mask = True
for i,ind in enumerate(indices):
#each element of the mask is only True if all indices at that position are in bounds
mask = mask & (0 <= ind) & (ind < a.shape[i])
#create in_bound indices
in_bound = [ind[mask] for ind in indices]
#initialize result with default value
result = default * np.ones(len(mask),dtype=a.dtype)
#set elements indexed by in_bound to their appropriate values in a
result[mask] = a[tuple(in_bound)]
return result
And here is the output from Eric's sample problem:
>>> a = np.arange(10)*2
>>> indices = (np.array([-4,2,8,12]),)
>>> take(a,indices,default=-1)
array([-1, 4, 16, -1])
You can restrict the range of the indexes to the size of your value array you want to index in using np.maximum() and np.minimum().
Example:
I have a heatmap like
h = np.array([[ 2, 3, 1],
[ 3, -1, 5]])
and I have a palette of RGB values I want to use to color the heatmap. The palette only names colors for the values 0..4:
p = np.array([[0, 0, 0], # black
[0, 0, 1], # blue
[1, 0, 1], # purple
[1, 1, 0], # yellow
[1, 1, 1]]) # white
Now I want to color my heatmap using the palette:
p[h]
Currently this leads to an error because of the values -1 and 5 in the heatmap:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
IndexError: index 5 is out of bounds for axis 0 with size 5
But I can limit the range of the heatmap:
p[np.maximum(np.minimum(h, 4), 0)]
This works and gives me the result:
array([[[1, 0, 1],
[1, 1, 0],
[0, 0, 1]],
[[1, 1, 0],
[0, 0, 0],
[1, 1, 1]]])
If you really need to have a special value for the indexes which are out of bound, you could implement your proposed get_with_default() like this:
def get_with_default(values, indexes, default=-1):
return np.concatenate([[default], values, [default]])[
np.maximum(np.minimum(indexes, len(values)), -1) + 1]
a = np.arange(10) * 2
get_with_default(a, [-4, 2, 8, 12], default=-1)
Will return:
array([-1, 4, 16, -1])
as wanted.

how to vectorize an operation on a 1 dimensionsal array to produce 2 dimensional matrix in numpy

I have a 1d array of values
i = np.arange(0,7,1)
and a function
# Returns a column matrix
def fn(i):
return np.matrix([[i*2,i*3]]).T
fnv = np.vectorize(fn)
then writing
fnv(i)
gives me an error
File "<stdin>", line 1, in <module>
File "c:\Python33\lib\site-packages\numpy\lib\function_base.py",
line 1872, in __call__
return self._vectorize_call(func=func, args=vargs)
File "c:\Python33\lib\site-packages\numpy\lib\function_base.py",
line 1942, in _vectorize_call
copy=False, subok=True, dtype=otypes[0])
ValueError: setting an array element with a sequence.
The result I am looking for is a matrix with two rows and as many columns as in the input array. What is the best notation in numpy to achieve this?
For example i would equal
[1,2,3,4,5,6]
and the output would equal
[[2,4,6,8,10,12],
[3,6,9,12,15,18]]
EDIT
You should try to avoid using vectorize, because it gives the illusion of numpy efficiency, but inside it's all python loops.
If you really have to deal with user supplied functions that take ints and return a matrix of shape (2, 1) then there probably isn't much you can do. But that seems like a really weird use case. If you can replace that with a list of functions that take an int and return an int, and that use ufuncs when needed, i.e. np.sin instead of math.sin, you can do the following
def vectorize2(funcs) :
def fnv(arr) :
return np.vstack([f(arr) for f in funcs])
return fnv
f2 = vectorize2((lambda x : 2 * x, lambda x : 3 * x))
>>> f2(np.arange(10))
array([[ 0, 2, 4, 6, 8, 10, 12, 14, 16, 18],
[ 0, 3, 6, 9, 12, 15, 18, 21, 24, 27]])
Just for your reference, I have timed this vectorization against your proposed one:
f = vectorize(fn)
>>> timeit.timeit('f(np.arange(10))', 'from __main__ import np, f', number=1000)
0.28073329263679625
>>> timeit.timeit('f2(np.arange(10))', 'from __main__ import np, f2', number=1000)
0.023139129945661807
>>> timeit.timeit('f(np.arange(10000))', 'from __main__ import np, f', number=10)
2.3620706288432984
>>> timeit.timeit('f2(np.arange(10000))', 'from __main__ import np, f2', number=10)
0.002757072593169596
So there is an order of magnitude in speed even for small arrays, that grows to a x1000 speed up, available almost for free, for larger arrays.
ORIGINAL ANSWER
Don't use vectorize unless there is no way around it, it's slow. See the following examples
>>> a = np.array(range(7))
>>> a
array([0, 1, 2, 3, 4, 5, 6])
>>> np.vstack((a, a+1))
array([[0, 1, 2, 3, 4, 5, 6],
[1, 2, 3, 4, 5, 6, 7]])
>>> np.vstack((a, a**2))
array([[ 0, 1, 2, 3, 4, 5, 6],
[ 0, 1, 4, 9, 16, 25, 36]])
Whatever your function is, if it can be constructed with numpy's ufuncs, you can do something like np.vstack((a, f(a))) and get what you want
A simple reimplementation of vectorize gives me what I want
def vectorize( fn):
def do_it (array):
return np.column_stack((fn(p) for p in array))
return do_it
If this is not performant or there is a better way then let me know.