Multiple 3D arrays to one 2D array - numpy

I have many 3D arrays in different files. I want to turn them into 2D arrays and then join them in 1 array.
I managed to get the 2D array, but not the format.
Ex:
Original 3D array of (4x2x2):
[[[ 0 1]
[ 2 3]]
[[ 4 5]
[ 6 7]]
[[ 8 9]
[10 11]]
[[12 13]
[14 15]]]
I want it to become 2D (2x8):
[[0 1 4 5 8 9 12 13]
[2 3 6 7 10 11 14 15]]
This is my code:
import numpy as np
x=np.arange(16).reshape((4,2,2)) #Depth, Row, Column
y=x.reshape((x.shape[1], -1), order='F')
If there is a better way to do this, please feel free to improve my code.

You can use np.swapaxes to swap the first two axes and then reshape, like so -
y = x.swapaxes(0,1).reshape(x.shape[1],-1)

Related

Numpy vs Pandas axis

Why axis differs in Numpy vs Pandas?
Example:
If I want to get rid of column in Pandas I could do this:
df.drop("column", axis = 1, inplace = True)
Here, we are using axis = 1 to drop a column (vertically in a DF).
In Numpy, if I want to sum a matrix A vertically I would use:
A.sum(axis = 0)
Here I use axis = 0.
axis isn't used that often in pandas. A dataframe has 2 dimensions, which are often treated quite differently. In drop the axis definition is well documented, and actually corresponds to the numpy usage.
Make a simple array and data frame:
In [180]: x = np.arange(9).reshape(3,3)
In [181]: df = pd.DataFrame(x)
In [182]: df
Out[182]:
0 1 2
0 0 1 2
1 3 4 5
2 6 7 8
Delete a row from the array, or a column:
In [183]: np.delete(x, 1, 0)
Out[183]:
array([[0, 1, 2],
[6, 7, 8]])
In [184]: np.delete(x, 1, 1)
Out[184]:
array([[0, 2],
[3, 5],
[6, 8]])
Drop does the same thing for the same axis:
In [185]: df.drop(1, axis=0)
Out[185]:
0 1 2
0 0 1 2
2 6 7 8
In [186]: df.drop(1, axis=1)
Out[186]:
0 2
0 0 2
1 3 5
2 6 8
In sum, the definitions are the same as well:
In [188]: x.sum(axis=0)
Out[188]: array([ 9, 12, 15])
In [189]: df.sum(axis=0)
Out[189]:
0 9
1 12
2 15
dtype: int64
In [190]: x.sum(axis=1)
Out[190]: array([ 3, 12, 21])
In [191]: df.sum(axis=1)
Out[191]:
0 3
1 12
2 21
dtype: int64
The pandas sums are Series, which are the pandas equivalent of a 1d array.
Visualizing what axis does with reduction operations like sum is a bit tricky - especially with 2d arrays. Is the axis kept or removed? It can help to think about axis for 1d arrays (the only axis is removed), or 3d arrays, where one axis is removed leaving two.
When you get rid of a column, the name is picked from the axis 1, which is the horizontal axis. When you sum along the axis 0, you sum vertically.

Trying to understand shuffle within mini-batch in tensorflow Dataset

From here I understand what shuffle, batch and repeat do. I'm working on Medical image data where each mini-batch has slices from one patient record. I'm looking for a way to shuffle within the minibatch while training. I cannot increase the buffer size because I don't want slices from different records to get mixed up. Could someone please explain how this can be done?
dataset = tf.data.Dataset.from_tensor_slices(tf.range(1, 20))
data = dataset.batch(5).shuffle(5).repeat(1)
for element in data.as_numpy_iterator():
print(element)
Current Output :
[ 6 7 8 9 10]
[1 2 3 4 5]
[11 12 13 14 15]
[16 17 18 19]
Expected Output :
[ 6 8 9 7 10]
[3 4 1 5 2]
[15 12 11 14 13]
[16 17 19 20 17]
I just realized, there is no need to shuffle within the mini-batch as shuffling within the minibatch doesn't contribute to improving training in any way. Appretiate if anyone has other views on this.

How to reorganize tensor

Without loss of generality, I have the following 3d tensor
shape (2, 3, 3)
[[[1 2 3]
[4 5 6]
[7 8 9]]
[[-1 -2 -3]
[-4 -5 -6]
[-7 -8 -9]]
I need to reorganize the above tensor as follow
[[[1 2 3]
[-1 -2 -3]]
[[4 5 6]
[-4 -5 -6]]
[[7 8 9]
[-7 -8 -9]]]
That is, a tensor of shape (3 x 2 x 3). How should I go about doing this in tensorflow?
tf.transpose is exactly what you want.
If you want to transpose a with shape [3,4,5] to [4,5,3], you can use tf.transpose(a, [1, 2, 0])

Saving with numpy savetxt. Array elements as columns

I am pretty new to Python and trying to kick my Matlab addiction. I am converting a lot of my lab's machine vision code over to Python but I am just stuck on one aspect of the saving. At each line of the code we save 6 variables in an array. I'd like these to be entered in as one of 6 columns in a txt file with bumpy.savetxt. Each iteration of the tracking loop would then add similar variables for that given frame as the next row in the txt file.
But I keep getting either a single column that just grows with every loop. I've attached a simple code to show my problem. As it loops through, there will be a variable generated that is called output. I would like this to be the three columns of the txt file and each iteration of the loop to be a new row. Is there any easy way to do this?
import numpy as np
dataFile_Path = "dataFile.txt"
dataFile_id = open(dataFile_Path, 'w+')
for x in range(0, 9):
variable = np.array([2,3,4])
output = x*variable+1
output.astype(float)
print(output)
np.savetxt(dataFile_id, output, fmt="%d")
dataFile_id.close()
In [160]: for x in range(0, 9):
...: variable = np.array([2,3,4])
...: output = x*variable+1
...: output.astype(float)
...: print(output)
...:
[1 1 1]
[3 4 5]
[5 7 9]
[ 7 10 13]
[ 9 13 17]
[11 16 21]
[13 19 25]
[15 22 29]
[17 25 33]
So you are writing one row at a time. savetxt normally is used to write a 2d array.
Notice that the print is still integers - astype returns a new array, it does not change things inplace.
But because you are giving it 1d arrays it writes those as columns:
In [177]: f = open('txt','bw+')
In [178]: for x in range(0, 9):
...: variable = np.array([2,3,4])
...: output = x*variable+1
...: np.savetxt(f, output, fmt='%d')
...:
In [179]: f.close()
In [180]: cat txt
1
1
1
3
4
5
5
7
9
if instead I give savetxt a 2d array ((1,3) shape), it writes
In [181]: f = open('txt','bw+')
In [182]: for x in range(0, 9):
...: variable = np.array([2,3,4])
...: output = x*variable+1
...: np.savetxt(f, [output], fmt='%d')
...:
...:
In [183]: f.close()
In [184]: cat txt
1 1 1
3 4 5
5 7 9
7 10 13
9 13 17
11 16 21
13 19 25
15 22 29
17 25 33
But a better approach is to construct the 2d array, and write that with one savetxt call:
In [185]: output = np.array([2,3,4])*np.arange(9)[:,None]+1
In [186]: output
Out[186]:
array([[ 1, 1, 1],
[ 3, 4, 5],
[ 5, 7, 9],
[ 7, 10, 13],
[ 9, 13, 17],
[11, 16, 21],
[13, 19, 25],
[15, 22, 29],
[17, 25, 33]])
In [187]: np.savetxt('txt', output, fmt='%10d')
In [188]: cat txt
1 1 1
3 4 5
5 7 9
7 10 13
9 13 17
11 16 21
13 19 25
15 22 29
17 25 33

Pandas dataframe without copy

How can I avoid taking a copy of the dictionary supplied when creating a Pandas DataFrame?
>>> a = np.arange(10)
>>> b = np.arange(10.0)
>>> df1 = pd.DataFrame(a)
>>> a[0] = 100
>>> df1
0
0 100
1 1
2 2
3 3
4 4
5 5
6 6
7 7
8 8
9 9
>>> d = {'a':a, 'b':b}
>>> df2 = pd.DataFrame(d)
>>> a[1] = 200
>>> d
{'a': array([100, 200, 2, 3, 4, 5, 6, 7, 8, 9]), 'b': array([ 0., 1., 2., 3., 4., 5., 6., 7., 8., 9.])}
>>> df2
a b
0 100 0
1 1 1
2 2 2
3 3 3
4 4 4
5 5 5
6 6 6
7 7 7
8 8 8
9 9 9
If I create the dataframe from just a then changes to a are reflected in df (and vice versa).
Is there any way of making this work when supplying a dictionary?
It is possible to initialize a dataframe without copying the data. To understand how, you need to understand the BlockManager, which is the underlying datastructure used by DataFrame. It tries to group data of the same dtype together and hold their memory in a single block -- it does not function as as a columns of columns, as the documentation says. If the data is already provided as a single block, for example you initialize from a matrix:
a = np.zeros((100,20))
a.flags['WRITEABLE'] = False
df = pd.DataFrame(a, copy=False)
assert_read_only(df[df.columns[0]].iloc)
... then the DataFrame will usually just reference the ndarray.
However, this ain't gonna work if you're starting with multiple arrays or have heterogeneous types.
In which case, you can monkey patch the BlockManager to force it not to consolidate same-typed data columns.
However, if you initialize your dataframe with non-numpy arrays, then pandas will immediately copy it.
There is no way to 'share' a dict and have the frame update based on the dict changes. The copy argument is not relevant for a dict, data is always copied, because it is transformed to an ndarray.
However, there is a way to get this type of dynamic behavior in a limited way.
In [9]: arr = np.array(np.random.rand(5,2))
In [10]: df = DataFrame(arr)
In [11]: arr[0,0] = 0
In [12]: df
Out[12]:
0 1
0 0.000000 0.192056
1 0.847185 0.609028
2 0.833997 0.422521
3 0.937638 0.711856
4 0.047569 0.033282
Thus a passed ndarray will at construction time be a view onto the underlying numpy array. Depending on how you operate on the DataFrame you could trigger a copy (e.g. if you assign say a new column, or change a columns dtype). This will also only work for a single dtyped frame.