Normalize numpy ndarray data - numpy

My data is numpy ndarray with shape(2,3,4) following this:
I've try to normalize 0-1 scale for each column through sklearn normalization.
from sklearn.preprocessing import normalize
x = np.array([[[1, 2, 3, 4],
[2, 2, 3, 4],
[3, 2, 3, 4]],
[[4, 2, 3, 4],
[5, 2, 3, 4],
[6, 2, 3, 4]]])
x.shape ==> ( 2,3,4)
x = normalize(x, norm='max', axis=0, )
However, I catch the error :
ValueError: Found array with dim 3. the normalize function expected <= 2.
How do I solve this problem?
Thank you.

It seems scikit-learn expects ndarrays with at most two dims. So, to solve it would be to reshape to 2D, feed it to normalize that gives us a 2D array, which could be reshaped back to original shape -
from sklearn.preprocessing import normalize
normalize(x.reshape(x.shape[0],-1), norm='max', axis=0).reshape(x.shape)
Alternatively, it's much simpler with NumPy that works fine with generic ndarrays -
x/np.linalg.norm(x, ord=np.inf, axis=0, keepdims=True)

Related

Pandas dataframe to 2D numpy array

I have the following dataframe:
d = {'histogram' : [[1,2],[3,4],[5,6]]}
df = pd.DataFrame(d)
The length of the histograms are always the same (2 in this example case).
and I would like to convert the 'histogram' column into a 2D numpy array to feed into a neural net. The preferred output is:
output_array = np.array(d["histogram"])
i.e.:
array([[1, 2],
[3, 4],
[5, 6]])
however when I try:
df["histogram"].to_numpy()
the results is an array of lists instead of numpy array of arrays:
array([list([1, 2]), list([3, 4]), list([5, 6])], dtype=object)
this is problematic for neural nets as I have to specify the dimensions/shape.
I try to solve the issue by casting as numpy array:
df["histogram_arrays"] = df["histogram"].apply(lambda x: np.array(x))
df["histogram_arrays"].to_numpy()
which returns a 1D array of arrays and not the 2D array.
array([array([1, 2]), array([3, 4]), array([5, 6])], dtype=object)
How can I get the histograms into a 2D array?
Try this:
np.vstack(df['histogram'])
Your question is essentially: how do I convert a NumPy array of (identically-sized) lists to a two-dimensional NumPy array.
That makes it a (near) duplicate of this SO question, but since your actual question is somewhat hidden, I'll put an answer here anyway.
Use numpy.vstack:
>>> data = df['histogram'].to_numpy()
>>> data
array([list([1, 2]), list([3, 4]), list([5, 6])], dtype=object)
>>> data = np.vstack(data)
>>> data.dtype, data.shape
(dtype('int64'), (3, 2))
>>> data
array([[1, 2],
[3, 4],
[5, 6]])

Element-wise assignment in tensorflow

In numpy, it could be easily done as
>>> img
array([[1, 2, 3],
[4, 5, 6],
[7, 8, 9]], dtype=int32)
>>> img[img>5] = [1,2,3,4]
>>> img
array([[1, 2, 3],
[4, 5, 1],
[2, 3, 4]], dtype=int32)
However, there seems not exist similar operation in tensorflow.
You can never assign a value to a tensor in tensorflow as the change in tensor value is not traceable by backpropagation, but you can still get another tensor from origin tensor, here is a solution
import tensorflow as tf
tf.enable_eager_execution()
img = tf.constant(list(range(1, 10)), shape=[3, 3])
replace_mask = img > 5
keep_mask = tf.logical_not(replace_mask)
keep = tf.boolean_mask(img, keep_mask)
keep_index = tf.where(keep_mask)
replace_index = tf.where(replace_mask)
replace = tf.random_uniform((tf.shape(replace_index)[0],), 0, 10, tf.int32)
updates = tf.concat([keep, replace], axis=0)
indices = tf.concat([keep_index, replace_index], axis=0)
result = tf.scatter_nd(tf.cast(indices, tf.int32), updates, shape=tf.shape(img))
Actually there is a way to achieve this. Very similar to #Jie.Zhou's answer, you can replace tf.constant with tf.Variable, then replace tf.scatter_nd with tf.scatter_nd_update

what TensorFlow hash_bucket_size matters

I am creating a DNNclassifier with sparse columns. The training data looks like this,
samples col1 col2 price label
eg1 [[0,1,0,0,0,2,0,1,0,3,...] [[0,0,4,5,0,...] 5.2 0
eg2 [0,0,...] [0,0,...] 0 1
eg3 [0,0,...]] [0,0,...] 0 1
The following snippet can run successfully,
import tensorflow as tf
sparse_feature_a = tf.contrib.layers.sparse_column_with_hash_bucket('col1', 3, dtype=tf.int32)
sparse_feature_b = tf.contrib.layers.sparse_column_with_hash_bucket('col2', 1000, dtype=tf.int32)
sparse_feature_a_emb = tf.contrib.layers.embedding_column(sparse_id_column=sparse_feature_a, dimension=2)
sparse_feature_b_emb = tf.contrib.layers.embedding_column(sparse_id_column=sparse_feature_b, dimension=2)
feature_c = tf.contrib.layers.real_valued_column('price')
estimator = tf.contrib.learn.DNNClassifier(
feature_columns=[sparse_feature_a_emb, sparse_feature_b_emb, feature_c],
hidden_units=[5, 3],
n_classes=2,
model_dir='./tfTmp/tfTmp0')
# Input builders
def input_fn_train(): # returns x, y (where y represents label's class index).
features = {'col1': tf.SparseTensor(indices=[[0, 1], [0, 5], [0, 7], [0, 9]],
values=[1, 2, 1, 3],
dense_shape=[3, int(250e6)]),
'col2': tf.SparseTensor(indices=[[0, 2], [0, 3]],
values=[4, 5],
dense_shape=[3, int(100e6)]),
'price': tf.constant([5.2, 0, 0])}
labels = tf.constant([0, 1, 1])
return features, labels
estimator.fit(input_fn=input_fn_train, steps=100)
However, I have a question from this sentence,
sparse_feature_a = tf.contrib.layers.sparse_column_with_hash_bucket('col1', 3, dtype=tf.int32)
where 3 means hash_bucket_size=3, but this sparse tensor includes 4 non-zero values,
'col1': tf.SparseTensor(indices=[[0, 1], [0, 5], [0, 7], [0, 9]],
values=[1, 2, 1, 3],
dense_shape=[3, int(250e6)])
It seems has_bucket_size does nothing here. No matter how many non-zero values you have in your sparse tensor, you just need to set it with an integer > 1 and it works correctly.
I know my understanding may not be right. Could anyone explain how has_bucket_size works? Thanks a lot!
hash_bucket_size works by taking the original indices, hashing them into a space of the specified size, and using the hashed indices as features.
This means you can specify your model before knowing the full range of possible indices, at the cost of some indices maybe colliding.

Repeat element from a 2D matrix to a 3D matrix with numpy

I have a 2-D numpy matrix, an example
M = np.matrix([[1,2],[3,4],[5,6]])
I would like, starting from M, to have a matrix like:
M = np.matrix([[[1,2],[1,2],[1,2]],[[3,4],[3,4],[3,4]],[[5,6],[5,6],[5,6]]])
thus, the new matrix has 3 dimensions. How can I do?
NumPy matrix class can't hold 3D data. So, assuming you are okay with NumPy array as output, we can extend the array version of it to 3D with None/np.newaxis and then use np.repeat -
np.repeat(np.asarray(M)[:,None],3,axis=1)
Sample run -
In [233]: M = np.matrix([[1,2],[3,4],[5,6]])
In [234]: np.repeat(np.asarray(M)[:,None],3,axis=1)
Out[234]:
array([[[1, 2],
[1, 2],
[1, 2]],
[[3, 4],
[3, 4],
[3, 4]],
[[5, 6],
[5, 6],
[5, 6]]])
Alternatively, with np.tile -
np.tile(np.asarray(M),3).reshape(-1,3,M.shape[-1])
This should work for you:
np.array([list(np.array(i)) * 3 for i in M])
as another answerer already said, the matrix can't be three-dimensional.
instead of it, you can make 3-dimensional np.array like below.
import numpy as np
M = np.matrix([[1,2],[3,4],[5,6]])
M = np.array(M)
M = np.array([ [x, x, x] for x in M])
M

How to flatten along 3rd dimension in numpy?

I have a 3d array in numpy that I want to flatten into a 1d array. I want to flatten each 2d "layer" of the array, copying each successive layer into the 1d array.
e.g., for an array with arr[:, :, 0] = [[1, 2], [3, 4]] and arr[:, :, 1] = [[5, 6], [7, 8]], I want the output to be [1, 2, 3, 4, 5, 6, 7, 8].
Currently I have the following code:
out = np.empty(arr.size)
for c in xrange(arr.shape[2]):
layer = arr[:, :, c]
out[c * layer.size:(c + 1) * layer.size] = layer.ravel()
Is there a way to accomplish this efficiently in numpy (without using a for loop)? I have tried messing around with reshape, transpose, and flatten to no avail.
I figured it out:
out = arr.transpose((2, 0, 1)).flatten()
Or (the last axe will be first) : np.rollaxis(a,-1).ravel()