I'm trying out an inventory system in python 3.8 using functions and numpy.
While I am new to numpy, I haven't found anything in the manuals for numpy for this problem.
My problem is this specifically:
I have a 2D array, in this case the unequipped inventory;
unequippedinv = [[""], [""], [""], [""], ["Iron greaves", 15, 10, 10]]
I have an if statement to ensure that the item selected is acceptable. I'm now trying to remove the entire index ["Iron greaves", 15, 10, 10] using unequippedinv.pop(unequippedinv.index(item)) but I keep getting the error ValueError: "'Iron greaves', 15, 10, 10" is not in list
I've tried using numpy's where and argwhere but instead just got [] as the outcome.
Is there a way to search for an entire array in a 2D array, such as how SQL has SELECT * IN y WHERE x IS b but in which it gives me the index for the entire row?
Note: I have now found out that it is something to do with easygui's choicebox, which, I assume, turns the chosen array into a string which is why it creates an error.
Related
I have a numpy array that I need to change the order of the axis.
To do that I am using moveaxis() method, which only returns a view of the input array, by changing only the strides of the array.
However, this does not change the order that the data are stored in the memory. This is problematic for me because I need to pass this reorderd array to a C code in which the order that the data are stored matters.
import numpy as np
a=np.arange(12).reshape((3,4))
a=np.moveaxis(a,1,0)
In this example, a is originally stored continuously in the memory as [0,1,2,...,11].
I would like to have it stored [0,4,8,1,5,9,2,6,10,3,7,11], and obviously moveaxis() did not do the trick
How could I force numpy to rewrite the array in the memory the way I want? I precise that contrary to my simple example, I am manipulating 3D or 4D data, so I cannot simply change the ordering from row to col major when I create it.
Thanks!
The order parameter of the numpy.reshape(...,order='F') function does exactly what you want
a=np.arange(12).reshape((4,3),order='F')
a.flatten()
array([ 0, 4, 8, 1, 5, 9, 2, 6, 10, 3, 7, 11])
This is what my dataframe looks like. The first column is a single int. The second column is a single list of 512 ints.
IndexID Ids
1899317 [0, 47715, 1757, 9, 38994, 230, 12, 241, 12228...
22861131 [0, 48156, 154, 6304, 43611, 11, 9496, 8982, 1...
2163410 [0, 26039, 41156, 227, 860, 3320, 6673, 260, 1...
15760716 [0, 40883, 4086, 11, 5, 18559, 1923, 1494, 4, ...
12244098 [0, 45651, 4128, 227, 5, 10397, 995, 731, 9, 3...
I saved it to hdf and tried opening it using
df.to_hdf('test.h5', key='df', data_columns=True)
h3 = h5py.File('test.h5')
I see 4 keys when I list the keys
h3['df'].keys()
KeysViewHDF5 ['axis0', 'axis1', 'block0_items', 'block0_values']
Axis1 sees to contain the values for the first column
h3['df']['axis1'][0:5]
array([ 1899317, 22861131, 2163410, 15760716, 12244098,
However, there doesn't seem to be data from the second column. There does is another column with other data
h3['df']['block0_values'][0][0:5]
But that doesn't seem to correspond to any of the data in the second column
array([128, 4, 149, 1, 0], dtype=uint8)
Purpose
I am eventually trying to create a datastore that's memory mapped, that retrieves data using particular indices.
So something like
h3['df']['workingIndex'][22861131, 15760716]
would retrieve
[0, 48156, 154, 6304, 43611, 11, 9496, 8982, 1...],
[0, 40883, 4086, 11, 5, 18559, 1923, 1494, 4, ...
The problem is you're trying to serialize a Pandas Series of Python lists and it is not rectangular (it is jagged).
Pandas and HDF5 are largely used for rectangular (cube, hypercube, etc) data, not for jagged lists-of-lists.
Did you see this warning when you call to_hdf()?
PerformanceWarning:
your performance may suffer as PyTables will pickle object types that it cannot
map directly to c-types [inferred_type->mixed,key->block0_values] [items->['Ids']]
What it's trying to tell you is that lists-of-lists are not supported in an intuitive, high-performance way. And if you run an HDF5 visualization tool like h5dump on your output file, you'll see what's wrong. The index (which is well-behaved) looks like this:
DATASET "axis1" {
DATATYPE H5T_STD_I64LE
DATASPACE SIMPLE { ( 5 ) / ( 5 ) }
DATA {
(0): 1899317, 22861131, 2163410, 15760716, 12244098
}
ATTRIBUTE "CLASS" {
DATA {
(0): "ARRAY"
}
}
But the values (lists of lists) look like this:
DATASET "block0_values" {
DATATYPE H5T_VLEN { H5T_STD_U8LE}
DATASPACE SIMPLE { ( 1 ) / ( H5S_UNLIMITED ) }
DATA {
(0): (128, 5, 149, 164, ...)
}
ATTRIBUTE "CLASS" {
DATA {
(0): "VLARRAY"
}
}
ATTRIBUTE "PSEUDOATOM" {
DATA {
(0): "object"
}
}
What's happening is exactly what the PerformanceWarning warned you about:
> PyTables will pickle object types that it cannot map directly to c-types
Your list-of-lists is being pickled and stored as H5T_VLEN which is just a blob of bytes.
Here are some ways you could fix this:
Store each row under a separate key in HDF5. That is, each list will be stored as an array, and they can all have different lengths. This is no problem with HDF5, because it supports any number of keys in one file.
Change your data to be rectangular, e.g. by padding the shorter lists with zeros. See: Pandas split column of lists into multiple columns
Use h5py to write the data in whatever format you like. It's much more flexible and creates simpler (and yet more powerful) HDF5 files than Pandas/PyTables. Here's one example (which shows h5py can actually store jagged arrays, though it's not pretty): Storing multidimensional variable length array with h5py
Why the line3 raise valueError‘ matrix must be 2-dimensional’
import numpy as np
np.mat([[[1],[2]],[[10],[1,3]]])
np.mat([[[1],[2]],[[10],[1]]])
The reason why this code raises an error is because NumPy tries to determine the dimensionality of your input using nesting levels (nesting levels -> dimensions).
If, at some level, some elements do not have the same length (i.e. they are incompatible), it will create the array using the deepest nesting it can, using the objects as the elements of the array.
For this reason:
np.mat([[[1],[2]],[[10],[1,3]]])
Will give you a matrix of objects (lists), while:
np.mat([[[1],[2]],[[10],[1]]])
would result in a 3D array of numbers which np.mat() does not want to squeeze into a matrix.
Also, please avoid using np.mat() in your code as it is deprecated.
Use np.array() instead.
Incidentally, np.array() would work in both cases and it would give you a (2, 2, 1)-shaped array of int, which you could np.squeeze() into a matrix if you like.
However, it would be better to start from nesting level of 2 if all you want is a matrix:
np.array([[1, 2], [10, 1]])
so I tried to create a 3d array using numpy via this line:
self.dark_median_roi=np.median(self.dark_roi, axis=3)
where self.dark_roi is a multidimensional array and I got this error:
IndexError: axis 3 out of bounds (2)
I'm guessing I went about creating a 3d array the wrong way. What is the correct way to create a median numpy array? This will be running/is trying to run on a Raspberry pi, so I would rather avoid using loops, especially with arrays.
Edit:
so I corrected some mistakes from earlier in the code that weren't noticeable at first until I started adding print statements so this is the error I'm getting now:
IndexError: axis 3 out of bounds (3)
and I tried changing the the axis flag to 2 and it created a 2d array
You have 3 axes: 0, 1 and 2.
If you mean the last one - enter axis=2.
This question already has answers here:
NumPy array initialization (fill with identical values)
(9 answers)
Closed 9 years ago.
I know how to fill with zero in a 100-elements array:
np.zeros(100)
But what if I want to fill it with 9?
You can use:
a = np.empty(100)
a.fill(9)
or also if you prefer the slicing syntax:
a[...] = 9
np.empty is the same as np.ones, etc. but can be a bit faster since it doesn't initialize the data.
In newer numpy versions (1.8 or later), you also have:
np.full(100, 9)
If you just want same value throughout the array and you never want to change it, you could trick the stride by making it zero. By this way, you would just take memory for a single value. But you would get a virtual numpy array of any size and shape.
>>> import numpy as np
>>> from numpy.lib import stride_tricks
>>> arr = np.array([10])
>>> stride_tricks.as_strided(arr, (10, ), (0, ))
array([10, 10, 10, 10, 10, 10, 10, 10, 10, 10])
But, note that if you modify any one of the elements, all the values in the array would get modified.
This question has been discussed in some length earlier, see NumPy array initialization (fill with identical values) , also for which method is fastest.
As far as I can see there is no dedicated function to fill an array with 9s. So you should create an empty (ie uninitialized) array using np.empty(100) and fill it with 9s or whatever in a loop.