How to transpose each element of a Numpy Matrix [duplicate] - numpy

I'm starting off with a numpy array of an image.
In[1]:img = cv2.imread('test.jpg')
The shape is what you might expect for a 640x480 RGB image.
In[2]:img.shape
Out[2]: (480, 640, 3)
However, this image that I have is a frame of a video, which is 100 frames long. Ideally, I would like to have a single array that contains all the data from this video such that img.shape returns (480, 640, 3, 100).
What is the best way to add the next frame -- that is, the next set of image data, another 480 x 640 x 3 array -- to my initial array?

A dimension can be added to a numpy array as follows:
image = image[..., np.newaxis]

Alternatively to
image = image[..., np.newaxis]
in #dbliss' answer, you can also use numpy.expand_dims like
image = np.expand_dims(image, <your desired dimension>)
For example (taken from the link above):
x = np.array([1, 2])
print(x.shape) # prints (2,)
Then
y = np.expand_dims(x, axis=0)
yields
array([[1, 2]])
and
y.shape
gives
(1, 2)

You could just create an array of the correct size up-front and fill it:
frames = np.empty((480, 640, 3, 100))
for k in xrange(nframes):
frames[:,:,:,k] = cv2.imread('frame_{}.jpg'.format(k))
if the frames were individual jpg file that were named in some particular way (in the example, frame_0.jpg, frame_1.jpg, etc).
Just a note, you might consider using a (nframes, 480,640,3) shaped array, instead.

Pythonic
X = X[:, :, None]
which is equivalent to
X = X[:, :, numpy.newaxis] and
X = numpy.expand_dims(X, axis=-1)
But as you are explicitly asking about stacking images,
I would recommend going for stacking the list of images np.stack([X1, X2, X3]) that you may have collected in a loop.
If you do not like the order of the dimensions you can rearrange with np.transpose()

You can use np.concatenate() use the axis parameter to specify the dimension that should be concatenated. If the arrays being concatenated do not have this dimension, you can use np.newaxis to indicate where the new dimension should be added:
import numpy as np
movie = np.concatenate((img1[:,np.newaxis], img2[:,np.newaxis]), axis=3)
If you are reading from many files:
import glob
movie = np.concatenate([cv2.imread(p)[:,np.newaxis] for p in glob.glob('*.jpg')], axis=3)

Consider Approach 1 with reshape method and Approach 2 with np.newaxis method that produce the same outcome:
#Lets suppose, we have:
x = [1,2,3,4,5,6,7,8,9]
print('I. x',x)
xNpArr = np.array(x)
print('II. xNpArr',xNpArr)
print('III. xNpArr', xNpArr.shape)
xNpArr_3x3 = xNpArr.reshape((3,3))
print('IV. xNpArr_3x3.shape', xNpArr_3x3.shape)
print('V. xNpArr_3x3', xNpArr_3x3)
#Approach 1 with reshape method
xNpArrRs_1x3x3x1 = xNpArr_3x3.reshape((1,3,3,1))
print('VI. xNpArrRs_1x3x3x1.shape', xNpArrRs_1x3x3x1.shape)
print('VII. xNpArrRs_1x3x3x1', xNpArrRs_1x3x3x1)
#Approach 2 with np.newaxis method
xNpArrNa_1x3x3x1 = xNpArr_3x3[np.newaxis, ..., np.newaxis]
print('VIII. xNpArrNa_1x3x3x1.shape', xNpArrNa_1x3x3x1.shape)
print('IX. xNpArrNa_1x3x3x1', xNpArrNa_1x3x3x1)
We have as outcome:
I. x [1, 2, 3, 4, 5, 6, 7, 8, 9]
II. xNpArr [1 2 3 4 5 6 7 8 9]
III. xNpArr (9,)
IV. xNpArr_3x3.shape (3, 3)
V. xNpArr_3x3 [[1 2 3]
[4 5 6]
[7 8 9]]
VI. xNpArrRs_1x3x3x1.shape (1, 3, 3, 1)
VII. xNpArrRs_1x3x3x1 [[[[1]
[2]
[3]]
[[4]
[5]
[6]]
[[7]
[8]
[9]]]]
VIII. xNpArrNa_1x3x3x1.shape (1, 3, 3, 1)
IX. xNpArrNa_1x3x3x1 [[[[1]
[2]
[3]]
[[4]
[5]
[6]]
[[7]
[8]
[9]]]]

a = np.expand_dims(a, axis=-1)
or
a = a[:, np.newaxis]
or
a = a.reshape(a.shape + (1,))

There is no structure in numpy that allows you to append more data later.
Instead, numpy puts all of your data into a contiguous chunk of numbers (basically; a C array), and any resize requires allocating a new chunk of memory to hold it. Numpy's speed comes from being able to keep all the data in a numpy array in the same chunk of memory; e.g. mathematical operations can be parallelized for speed and you get less cache misses.
So you will have two kinds of solutions:
Pre-allocate the memory for the numpy array and fill in the values, like in JoshAdel's answer, or
Keep your data in a normal python list until it's actually needed to put them all together (see below)
images = []
for i in range(100):
new_image = # pull image from somewhere
images.append(new_image)
images = np.stack(images, axis=3)
Note that there is no need to expand the dimensions of the individual image arrays first, nor do you need to know how many images you expect ahead of time.

You can use stack with the axis parameter:
img.shape # h,w,3
imgs = np.stack([img1,img2,img3,img4], axis=-1) # -1 = new axis is last
imgs.shape # h,w,3,nimages
For example: to convert grayscale to color:
>>> d = np.zeros((5,4), dtype=int) # 5x4
>>> d[2,3] = 1
>>> d3.shape
Out[30]: (5, 4, 3)
>>> d3 = np.stack([d,d,d], axis=-2) # 5x4x3 -1=as last axis
>>> d3[2,3]
Out[32]: array([1, 1, 1])

I followed this approach:
import numpy as np
import cv2
ls = []
for image in image_paths:
ls.append(cv2.imread('test.jpg'))
img_np = np.array(ls) # shape (100, 480, 640, 3)
img_np = np.rollaxis(img_np, 0, 4) # shape (480, 640, 3, 100).

This worked for me:
image = image[..., None]

This will help you add axis anywhere you want
import numpy as np
signal = np.array([[0.3394572666491664, 0.3089068053925853, 0.3516359279582483], [0.33932706934615525, 0.3094755563319447, 0.3511973743219001], [0.3394407172182317, 0.30889042266755573, 0.35166886011421256], [0.3394407172182317, 0.30889042266755573, 0.35166886011421256]])
print(signal.shape)
#(4,3)
print(signal[...,np.newaxis].shape) or signal[...:none]
#(4, 3, 1)
print(signal[:, np.newaxis, :].shape) or signal[:,none, :]
#(4, 1, 3)

there is three-way for adding new dimensions to ndarray .
first: using "np.newaxis" (something like #dbliss answer)
np.newaxis is just given an alias to None for making it easier to
understand. If you replace np.newaxis with None, it works the same
way. but it's better to use np.newaxis for being more explicit.
import numpy as np
my_arr = np.array([2, 3])
new_arr = my_arr[..., np.newaxis]
print("old shape", my_arr.shape)
print("new shape", new_arr.shape)
>>> old shape (2,)
>>> new shape (2, 1)
second: using "np.expand_dims()"
Specify the original ndarray in the first argument and the position
to add the dimension in the second argument axis.
my_arr = np.array([2, 3])
new_arr = np.expand_dims(my_arr, -1)
print("old shape", my_arr.shape)
print("new shape", new_arr.shape)
>>> old shape (2,)
>>> new shape (2, 1)
third: using "reshape()"
my_arr = np.array([2, 3])
new_arr = my_arr.reshape(*my_arr.shape, 1)
print("old shape", my_arr.shape)
print("new shape", new_arr.shape)
>>> old shape (2,)
>>> new shape (2, 1)

Related

numpy.atleast(), RamdonForestClassifier and numpy.hstack functions

I have this doubt about what is the exactly way of working of numpy.atleast(), RamdonForestClassifier and numpy.hstack functions.
I've read documentation about the purpose of all of those functions I mention above, but it is still not clear.
Can someone please help me?!
The method I am dealing with is the one below:
def fit(self, X, Y):
X, Y = map(np.atleast_2d, (X, Y))
assert X.shape[0] == Y.shape[0]
Ny = Y.shape[1]
self.clfs = []
for i in range(Ny):
clf = RandomForestClassifier(*self.args, **self.kwargs,n_jobs=-1)
Xi = np.hstack([X, Y[:, :i]])
yi = Y[:, i]
self.clfs.append(clf.fit(Xi, yi))
so let me explain step by step.
So, np.atleast_2d() converts any input to a 2D array.
np.atleast_2d(1)
Out[5]: array([[1]])
np.atleast_2d([1,2])
Out[6]: array([[1, 2]])
So, as you can see it is converting them to a 2D array, similarly in your code,
X, Y = map(np.atleast_2d, (X, Y)), this maps the function np.atleast_2d to these inputs X and Y such that given inputs X, Y, it will convert them to to 2-D arrays.
Next, regarding RandomForestClassifier
clf = RandomForestClassifier(*self.args, **self.kwargs,n_jobs=-1) line initializes the model or the classifier for you.
a = np.arange(0,4,1).reshape(2,2)
Y = np.arange(5,9,1).reshape(2,2)
res = np.hstack([a,Y])
res
Out[10]:
array([[0, 1, 5, 6],
[2, 3, 7, 8]])
res.shape
Out[11]: (2, 4)
See 2 rows and 4 columns
np.hstack just horizontally stacks the input arrays.
Xi = np.hstack([X, Y[:, :i]]), this line basically stacks the inputs X with the probable labels.
clf.fit(Xi, yi), this function fits the data to your model. Its like I initialized a black box or a system and now I am passing the data into that system to train the system to adapt to that data. Feel free to ask if you have more questions.

how to use tf.scatter_nd "without" accumulation?

Let's say I have a (None, 2)-shape tensor indices, and (None,)-shape tensor values. These actual row # and values will be determined at runtime.
I would like to set a 4x5 tensor t that each element of indices has values of values. I found that I can use tf.scatter_nd like this:
t = tf.scatter_np(indices, values, [4, 5])
# E.g., indices = [[1,2],[2,3]], values = [100, 200]
# t[1,2] <-- 100; t[2,3] <-- 200
My problem is that: when indices has duplicates, the values will be accumulated.
# E.g., indices = [[1,2],[1,2]], values = [100, 200]
# t[1,2] <-- 300
I would like to assign only one, i,e, either ignorance (so, the first value) or overwriting (so, the last value).
I feel like I need to check duplicates in indices, or I need to use tensorflow loop. Could anyone please advise? (hopefully a minimal example code?)
You can use tf.unique: the only issue is that this op requires a 1D tensor.
Thus, to overcome this I decided to use the Cantor pairing function.
In short, it exists a bijective function that maps a tuple (in this case a pair of values, but it works for any N-dimensional tuple) to a single value.
Once the coordinates have been reduced to a 1-D tensor of scalar, then tf.unique can be used to find the indices of the unique numbers.
The Cantor pairing function is invertible, thus now we know not only the indices of the non-repeated values within the 1-D tensor, but we can also go back to the 2-D space of the coordinates and use scatter_nd to perform the update without the problem of the accumulator.
TL;DR:
import tensorflow as tf
import numpy as np
# Dummy values
indices = np.array([[1, 2], [2, 3]])
values = np.array([100, 200])
# Placeholders
indices_ = tf.placeholder(tf.int32, shape=(2, 2))
values_ = tf.placeholder(tf.float32, shape=(2))
# Use the Cantor tuple to create a one-to-one correspondence between the coordinates
# and a single value
x = tf.cast(indices_[:, 0], tf.float32)
y = tf.cast(indices_[:, 1], tf.float32)
z = (x + y) * (x + y + 1) / 2 + y # shape = (2)
# Collect unique indices, treated as single values
# Drop the indices position into z because are useless
unique_cantor, _ = tf.unique(z)
# Go back from cantor numbers to pairs of values
w = tf.floor((tf.sqrt(8 * unique_cantor + 1) - 1) / 2)
t = (tf.pow(w, 2) + w) / 2
y = z - t
x = w - y
# Recreate a batch of coordinates that are uniques
unique_indices = tf.cast(tf.stack([x, y], axis=1), tf.int32)
# Update without accumulator
go = tf.scatter_nd(unique_indices, values_, [4, 5])
with tf.Session() as sess:
print(sess.run(go, feed_dict={indices_: indices, values_: values}))
Apply tf.scatter_nd also to a matrix of ones. That will give you the number of elements that are accumulated and you can just divide your result by that to get an average. (But watch out for zeros, for those you should divide by one).
counter = tf.ones(tf.shape(values))
t = tf.scatter_nd(indices,values,shape)
t_counter = tf.scatter_nd(indices,counter,shape)
Then divide t by t_counter (but only where t_counter is not zero).
This may not be the best solution, I used tf.unsorted_segment_max to avoid accumulation
with tf.Session() as sess:
# #########
# Examples:
# ##########
width, height, depth = [3, 3, 2]
indices = tf.cast([[0, 1, 0], [0, 1, 0], [1, 1, 1]], tf.int32)
values = tf.cast([1, 2, 3], tf.int32)
# ########################
# Filter duplicated indices
# #########################
flatten = tf.matmul(indices, [[height * depth], [depth], [1]])
filtered, idx = tf.unique(tf.squeeze(flatten))
# #####################
# Obtain updated result
# #####################
def reverse(index):
"""Map from 1-D to 3-D """
x = index / (height * depth)
y = (index - x * height * depth) / depth
z = index - x * height * depth - y * depth
return tf.stack([x, y, z], -1)
# This will pick the maximum value instead of accumulating the result
updated_values = tf.unsorted_segment_max(values, idx, tf.shape(filtered_idx)[0])
updated_indices = tf.map_fn(fn=lambda i: reverse(i), elems=filtered)
# Now you can scatter_nd without accumulation
result = tf.scatter_nd(updated_indices,
updated_values,
tf.TensorShape([3, 3, 2]))

Digitize a colormap

Consider the following image:
I'd like to print it as a grayscale image. I can do the conversion with scikit-image:
from skimage.io import imread
from matplotlib import pyplot as plt
from skimage.color import rgb2gray
img = imread('image.jpg')
plt.grid(which = 'both')
plt.imshow(rgb2gray(img), cmap=plt.cm.gray)
I get:
which is obviously not what I want.
My question is: Is there a way with scikit-image or with raw numpy and/or mathplotlib to digitize the image so that I get a 3D array (first dimension: X index, second dimension: Y index, third dimension: value according to the colormap). Then I can easily change to colormap to something that turns out to have better results when printing in grayscale?
The example below demonstrates a simple way to undo a colormap's value -> RGB mapping.
def unmap_nearest(img, rgb):
""" img is an image of shape [n, m, 3], and rgb is a colormap of shape [k, 3]. """
d = np.sum(np.abs(img[np.newaxis, ...] - rgb[:, np.newaxis, np.newaxis, :]), axis=-1)
i = np.argmin(d, axis=0)
return i / (rgb.shape[0] - 1)
This function works by taking the RGB value of each pixel and looking up the index of the best matching color in the colormap. Some trickery with indexing and broadcasting allows for efficient vectorization (at the cost of memory spent on temporary arrays):
img[np.newaxis, ...] converts the image from shape [n, m, 3] to [1, n, m, 3]
rgb[:, np.newaxis, np.newaxis, :] converts the colormap from shape [k, 3] to [k, 1, 1, 3].
subtracting the resulting arrays leads to an array of shape [k, n, m, 3] that contians the difference between each colormap index k and pixel n, m for each color component.
sum(abs(..), axis=-1) takes the absolute value of the differences and sums over all color components (the last dimension) to get the total difference between all pixels and color map entries (array of shape [k, n, m]).
i = np.argmin(d, axis=0) finds the index of the minimum element along the first dimension. The result is the index of the best matching color map entry of each pixel [n, m].
return i / (rgb.shape[0] - 1) finally returns the indices normalized by the color map size so that the result is in range 0-1.
There are a faw caveats with this approach:
It cannot reconstruct the original value range.
It will treat all pixels as part of the color map (i.e. continent contuors will also be mapped).
If you use the wrong color map it will fail hilariously.
.
import numpy as np
import matplotlib.pyplot as plt
from skimage.color import rgb2gray
def unmap_nearest(img, rgb):
""" img is an image of shape [n, m, 3], and rgb is a colormap of shape [k, 3]. """
d = np.sum(np.abs(img[np.newaxis, ...] - rgb[:, np.newaxis, np.newaxis, :]), axis=-1)
i = np.argmin(d, axis=0)
return i / (rgb.shape[0] - 1)
cmap = plt.cm.jet
rgb = cmap(np.linspace(0, 1, cmap.N))[:, :3]
original = (np.arange(10)[:, None] + np.arange(10)[None, :])
plt.subplot(2, 2, 1)
plt.imshow(original, cmap='gray')
plt.colorbar()
plt.title('original')
plt.subplot(2, 2, 2)
rgb_img = cmap(original / 18)[..., :-1]
plt.imshow(rgb_img)
plt.title('color-mapped')
plt.subplot(2, 2, 3)
wrong = rgb2gray(rgb_img)
plt.imshow(wrong, cmap='gray')
plt.title('rgb2gray')
plt.subplot(2, 2, 4)
reconstructed = unmap_nearest(rgb_img, rgb)
plt.imshow(reconstructed, cmap='gray')
plt.colorbar()
plt.title('reconstructed')
plt.show()
Building on #kazemakmakase's answer, if you're digitizing a figure, you probably are dealing with a copy of the original that's been converted, or maybe even printed and scanned at some point. Those things can distort colors from the "true" colormap that was originally used.
You can deal with this by using a slice through the figure's colorbar as the 'pattern' (rgb) to match against. Specifically, crop the figure down to just the color ramp (in landscape orientation in this example), then replace the rgb variable in #kazemakmakase's example with:
cmapimg = plt.imread('cropped_colorbar.png')
rgb = cmapimg[cmapimg.shape[0]/2,:,:3]

TensorFlow: numpy.repeat() alternative

I want to compare the predicted values yp from my neural network in a pairwise fashion, and so I was using (back in my old numpy implementation):
idx = np.repeat(np.arange(len(yp)), len(yp))
jdx = np.tile(np.arange(len(yp)), len(yp))
s = yp[[idx]] - yp[[jdx]]
This basically create a indexing mesh which I then use. idx=[0,0,0,1,1,1,...] while jdx=[0,1,2,0,1,2...]. I do not know if there is a simpler manner of doing it...
Anyhow, TensorFlow has a tf.tile(), but it seems to be lacking a tf.repeat().
idx = np.repeat(np.arange(n), n)
v2 = v[idx]
And I get the error:
TypeError: Bad slice index [ 0 0 0 ..., 215 215 215] of type <type 'numpy.ndarray'>
It also does not work to use a TensorFlow constant for the indexing:
idx = tf.constant(np.repeat(np.arange(n), n))
v2 = v[idx]
-
TypeError: Bad slice index Tensor("Const:0", shape=TensorShape([Dimension(46656)]), dtype=int64) of type <class 'tensorflow.python.framework.ops.Tensor'>
The idea is to convert my RankNet implementation to TensorFlow.
You can achieve the effect of np.repeat() using a combination of tf.tile() and tf.reshape():
idx = tf.range(len(yp))
idx = tf.reshape(idx, [-1, 1]) # Convert to a len(yp) x 1 matrix.
idx = tf.tile(idx, [1, len(yp)]) # Create multiple columns.
idx = tf.reshape(idx, [-1]) # Convert back to a vector.
You can simply compute jdx using tf.tile():
jdx = tf.range(len(yp))
jdx = tf.tile(jdx, [len(yp)])
For the indexing, you could try using tf.gather() to extract non-contiguous slices from the yp tensor:
s = tf.gather(yp, idx) - tf.gather(yp, jdx)
According to tf api document, tf.keras.backend.repeat_elements() does the same work with np.repeat() . For example,
x = tf.constant([1, 3, 3, 1], dtype=tf.float32)
rep_x = tf.keras.backend.repeat_elements(x, 5, axis=0)
# result: [1. 1. 1. 1. 1. 3. 3. 3. 3. 3. 3. 3. 3. 3. 3. 1. 1. 1. 1. 1.]
Just for 1-d tensors, I've made this function
def tf_repeat(y,repeat_num):
return tf.reshape(tf.tile(tf.expand_dims(y,axis=-1),[1,repeat_num]),[-1])
It looks like your question is so popular that people refer it on TF tracker. Sadly the same function is not still implemented in TF.
You can implement it by combining tf.tile, tf.reshape, tf.squeeze. Here is a way to convert examples from np.repeat:
import numpy as np
import tensorflow as tf
x = [[1,2],[3,4]]
print np.repeat(3, 4)
print np.repeat(x, 2)
print np.repeat(x, 3, axis=1)
x = tf.constant([[1,2],[3,4]])
with tf.Session() as sess:
print sess.run(tf.tile([3], [4]))
print sess.run(tf.squeeze(tf.reshape(tf.tile(tf.reshape(x, (-1, 1)), (1, 2)), (1, -1))))
print sess.run(tf.reshape(tf.tile(tf.reshape(x, (-1, 1)), (1, 3)), (2, -1)))
In the last case where repeats are different for each element you most probably will need loops.
Just in case anybody is interested for a 2D method to copy the matrices. I think this could work:
TF_obj = tf.zeros([128, 128])
tf.tile(tf.expand_dims(TF_obj, 2), [1, 1, 2])
import numpy as np
import tensorflow as tf
import itertools
x = np.arange(6).reshape(3,2)
x = tf.convert_to_tensor(x)
N = 3 # number of repetition
K = x.shape[0] # for here 3
order = list(range(0, N*K, K))
order = [[x+i for x in order] for i in range(K)]
order = list(itertools.chain.from_iterable(order))
x_rep = tf.gather(tf.tile(x, [N, 1]), order)
Results from:
[0, 1],
[2, 3],
[4, 5]]
To:
[[0, 1],
[0, 1],
[0, 1],
[2, 3],
[2, 3],
[2, 3],
[4, 5],
[4, 5],
[4, 5]]
If you want:
[[0, 1],
[2, 3],
[4, 5],
[0, 1],
[2, 3],
[4, 5],
[0, 1],
[2, 3],
[4, 5]]
Simply use tf.tile(x, [N, 1])
So I have found that tensorflow has one such method to repeat the elements of an array. The method tf.keras.backend.repeat_elements is what you are looking for. Anyone who comes at a later point of time can save lot of their efforts. This link offers an explanation to the method and specifically says
Repeats the elements of a tensor along an axis, like np.repeat
I have included a very short example which proves that the elements are copied in the exact way as np.repeat would do.
import numpy as np
import tensorflow as tf
x = np.random.rand(2,2)
# print(x) # uncomment this line to see the array's elements
y = tf.convert_to_tensor(x)
y = tf.keras.backend.repeat_elements(x, rep=3, axis=0)
# print(y) # uncomment this line to see the results
You can simulate missing tf.repeat by tf.stacking the value with itself:
value = np.arange(len(yp)) # what to repeat
repeat_count = len(yp) # how many times
repeated = tf.stack ([value for i in range(repeat_count)], axis=1)
I advice using this only on small repeat counts.
Though many clean and working solutions have been given, they seem to all be based on producing the set of indices from scratch each iteration.
While the cost to produce these node's isn't typically significant during training, it may be significant if using your model for inference.
Repeating tf.range (like your example) has come up a few times so I built the following function creator. Given the maximum number of times something will be repeated and the maximum number of things that will need repeating, it returns a function which produces the same values as np.repeat(np.arange(len(multiples)), multiples).
import tensorflow as tf
import numpy as np
def numpy_style_repeat_1d_creator(max_multiple=100, max_to_repeat=10000):
board_num_lookup_ary = np.repeat(
np.arange(max_to_repeat),
np.full([max_to_repeat], max_multiple))
board_num_lookup_ary = board_num_lookup_ary.reshape(max_to_repeat, max_multiple)
def fn_to_return(multiples):
board_num_lookup_tensor = tf.constant(board_num_lookup_ary, dtype=tf.int32)
casted_multiples = tf.cast(multiples, dtype=tf.int32)
padded_multiples = tf.pad(
casted_multiples,
[[0, max_to_repeat - tf.shape(multiples)[0]]])
return tf.boolean_mask(
board_num_lookup_tensor,
tf.sequence_mask(padded_multiples, maxlen=max_multiple))
return fn_to_return
#Here's an example of how it can be used
with tf.Session() as sess:
repeater = numpy_style_repeat_1d_creator(5,4)
multiples = tf.constant([4,1,3])
repeated_values = repeater(multiples)
print(sess.run(repeated_values))
The general idea is to store a repeated tensor and then mask it, but it may help to see it visually (this is for the example given above):
In the example above the following Tensor is produced:
[[0,0,0,0,0],
[1,1,1,1,1],
[2,2,2,2,2],
[3,3,3,3,3]]
For multiples [4,1,3] it will collect the non-X values:
[[0,0,0,0,X],
[1,X,X,X,X],
[2,2,2,X,X],
[X,X,X,X,X]]
resulting in:
[0,0,0,0,1,2,2,2]
tl;dr: To avoid producing the indices each time (can be costly), pre-repeat everything and then mask that tensor each time
A relatively fast implementation was recently added with RaggedTensor utilities from 1.13, but it's not a part of the officially exported API. You can still use it, but there's a chance it might disappear.
from tensorflow.python.ops.ragged.ragged_util import repeat
From the source code:
# This op is intended to exactly match the semantics of numpy.repeat, with
# one exception: numpy.repeat has special (and somewhat non-intuitive) behavior
# when axis is not specified. Rather than implement that special behavior, we
# simply make `axis` be a required argument.
Tensorflow 2.10 has implemented np.repeat feature.
tf.repeat([1, 2, 3], repeats=[3, 1, 2], axis=0)
<tf.Tensor: shape=(6,), dtype=int32, numpy=array([1, 1, 1, 2, 3, 3], dtype=int32)>

Turn 2D NumPy array into 1D array for plotting a histogram

I'm trying to plot a histogram with matplotlib.
I need to convert my one-line 2D Array
[[1,2,3,4]] # shape is (1,4)
into a 1D Array
[1,2,3,4] # shape is (4,)
How can I do this?
Adding ravel as another alternative for future searchers. From the docs,
It is equivalent to reshape(-1, order=order).
Since the array is 1xN, all of the following are equivalent:
arr1d = np.ravel(arr2d)
arr1d = arr2d.ravel()
arr1d = arr2d.flatten()
arr1d = np.reshape(arr2d, -1)
arr1d = arr2d.reshape(-1)
arr1d = arr2d[0, :]
You can directly index the column:
>>> import numpy as np
>>> x2 = np.array([[1,2,3,4]])
>>> x2.shape
(1, 4)
>>> x1 = x2[0,:]
>>> x1
array([1, 2, 3, 4])
>>> x1.shape
(4,)
Or you can use squeeze:
>>> xs = np.squeeze(x2)
>>> xs
array([1, 2, 3, 4])
>>> xs.shape
(4,)
reshape will do the trick.
There's also a more specific function, flatten, that appears to do exactly what you want.
the answer provided by mtrw does the trick for an array that actually only has one line like this one, however if you have a 2d array, with values in two dimension you can convert it as follows
a = np.array([[1,2,3],[4,5,6]])
From here you can find the shape of the array with np.shape and find the product of that with np.product this now results in the number of elements. If you now use np.reshape() to reshape the array to one length of the total number of element you will have a solution that always works.
np.reshape(a, np.product(a.shape))
>>> array([1, 2, 3, 4, 5, 6])
Use numpy.flat
import numpy as np
import matplotlib.pyplot as plt
a = np.array([[1,0,0,1],
[2,0,1,0]])
plt.hist(a.flat, [0,1,2,3])
The flat property returns a 1D iterator over your 2D array. This method generalizes to any number of rows (or dimensions). For large arrays it can be much more efficient than making a flattened copy.