numpy concatenate function not works when handling with large matrix - numpy

I was trying to concatenate a 3-by-n 3d coordinate matrix called VTrans with a 1-by-n all one value vector called lr to augment the coordinate matrix to the 4-by-n homogeneous matrix. n in my case is the vertex Number 141669, which is pretty big.
The code below is not working while it does work in a very small dataset.
lr = np.ones(vertexNum).reshape((1, vertexNum))
VtransAppend = np.concatenate((VTrans, lr), axis=0)
update2:
Just found the problem, my vertexNum is wrong! IT is actually 47223 instead of 141669. 141669 is its size! All solution work and I will accept the first one. Thank you all!
The error says "all the input array dimensions except for the concatenation axis must match exactly"
I further verify lr and VtransAppend has the same length by printing the size out.
print lr.size
print VTrans.size
Anyone once has the same weird problem before and know how to solve it?
Here is the update:
My VTrans matrix is attached, where vertextNum is 141669
This is the code followed by YXD's suggestion, but the issue still exits...
vertexNum = VTrans.size # Total vertex in current model
lr = np.ones(vertexNum)
VtransAppend = np.concatenate((VTrans, lr.reshape(1, -1)), axis=0)

You have to fiddle lr to have the same number of dimensions as vTrans
>>> n = 4
>>> vTrans = np.random.random_sample((3, n))
>>> lr = np.ones(n)
>>> np.concatenate((vTrans, lr.reshape(1, -1)), axis=0)
array([[ 0.65769116, 0.41008341, 0.66046706, 0.86501781],
[ 0.51584699, 0.60601466, 0.93800371, 0.25077702],
[ 0.16696658, 0.41839794, 0.0938594 , 0.48484606],
[ 1. , 1. , 1. , 1. ]])
>>>
i.e. after the reshape, the non-concatenation dimension matches vTrans
>>> lr.shape
(4,)
>>> lr.reshape(1, -1).shape
(1, 4)
>>>

Try vstack instead of concatenate:
a = np.random.random((3,5))
b = np.random.random(5)
np.vstack((a, b))
Alternatively:
np.concatenate((a, b[None,:]))
The None adds an axis to the 1D array b.

Related

making numpy binary file data to two decimal [duplicate]

I have a numpy array, something like below:
data = np.array([ 1.60130719e-01, 9.93827160e-01, 3.63108206e-04])
and I want to round each element to two decimal places.
How can I do so?
Numpy provides two identical methods to do this. Either use
np.round(data, 2)
or
np.around(data, 2)
as they are equivalent.
See the documentation for more information.
Examples:
>>> import numpy as np
>>> a = np.array([0.015, 0.235, 0.112])
>>> np.round(a, 2)
array([0.02, 0.24, 0.11])
>>> np.around(a, 2)
array([0.02, 0.24, 0.11])
>>> np.round(a, 1)
array([0. , 0.2, 0.1])
If you want the output to be
array([1.6e-01, 9.9e-01, 3.6e-04])
the problem is not really a missing feature of NumPy, but rather that this sort of rounding is not a standard thing to do. You can make your own rounding function which achieves this like so:
def my_round(value, N):
exponent = np.ceil(np.log10(value))
return 10**exponent*np.round(value*10**(-exponent), N)
For a general solution handling 0 and negative values as well, you can do something like this:
def my_round(value, N):
value = np.asarray(value).copy()
zero_mask = (value == 0)
value[zero_mask] = 1.0
sign_mask = (value < 0)
value[sign_mask] *= -1
exponent = np.ceil(np.log10(value))
result = 10**exponent*np.round(value*10**(-exponent), N)
result[sign_mask] *= -1
result[zero_mask] = 0.0
return result
It is worth noting that the accepted answer will round small floats down to zero as demonstrated below:
>>> import numpy as np
>>> arr = np.asarray([2.92290007e+00, -1.57376965e-03, 4.82011728e-08, 1.92896977e-12])
>>> print(arr)
[ 2.92290007e+00 -1.57376965e-03 4.82011728e-08 1.92896977e-12]
>>> np.round(arr, 2)
array([ 2.92, -0. , 0. , 0. ])
You can use set_printoptions and a custom formatter to fix this and get a more numpy-esque printout with fewer decimal places:
>>> np.set_printoptions(formatter={'float': "{0:0.2e}".format})
>>> print(arr)
[2.92e+00 -1.57e-03 4.82e-08 1.93e-12]
This way, you get the full versatility of format and maintain the precision of numpy's datatypes.
Also note that this only affects printing, not the actual precision of the stored values used for computation.

Facing an IndexError: only integers, slices (`:`), ellipsis (`...`), numpy.newaxis (`None`) and integer or boolean arrays are valid indices

I have been working on link prediction problem in which the data set, which is a numpy array, has to be parsed and stored into another numpy array. I am trying to do the same but at 9th line it is throwing an IndexError: only integers, slices (:), ellipsis (...), numpy.newaxis (None) and integer or boolean arrays are valid indices. I even tried typecasting the indices with int but it seems to not work. What am I missing here ?
1. train_edges, test_edges, = train_test_split(edgeL,test_size=0.3,random_state=16)
2. out_dim = int(W_out.shape[1])
3. in_dim = int(W_in.shape[1])
4. train_x = np.zeros((len(train_edges), (out_dim + in_dim) * 2))
5. train_y = np.zeros((len(train_edges), 1))
6. for i, edge in enumerate(train_edges):
7. u = edge[0]
8. v = edge[1]
9. train_x[int(i), : int(out_dim)] = W_out[u]
10. train_x[int(i), int(out_dim): int(out_dim + in_dim)] = W_in[u]
11. train_x[i, out_dim + in_dim: out_dim * 2 + in_dim] = W_out[v]
12. train_x[i, out_dim * 2 + in_dim:] = W_in[v]
13. if edge[2] > 0:
14. train_y[i] = 1
15. else:
16. train_y[i] = -1
EDIT:
For reference, The W_out is a 64-dimensional tuple which looks like this
print(W_out[0])
type(W_out.shape[1])
Output:
[[0.10160154 0. 0.70414263 0.6772633 0.07685234 0.75205046
0.421092 0.1776721 0.8622188 0.15669271 0. 0.40653425
0.5768579 0.75861764 0.6745151 0.37883565 0.18074909 0.73928916
0.6289512 0. 0.33160248 0.7441727 0. 0.8810399
0.1110919 0.53732747 0. 0.33330196 0.36220717 0.298112
0.10643011 0.8997948 0.53510064 0.6845873 0.03440218 0.23005858
0.8097505 0.7108275 0.38826624 0.28532124 0.37821335 0.3566149
0.42527163 0.71940386 0.8075657 0.5775364 0.01444144 0.21734199
0.47439903 0.21176265 0.32279345 0.00187511 0.43511534 0.4302601
0.39407462 0.20941389 0.199842 0.8710182 0.2160332 0.30246672
0.27159846 0.19009161 0.32349357 0.08938174]]
int
And edge is a tuple which is from training data set which has source, destination, sign. It looks like this...
train_edges, test_edges, = train_test_split(edgeL,test_size=0.3,random_state=16)
for i, edge in enumerate(train_edges):
print(edge)
print(i)
type(i)
type(edge)
Output:
Streaming output truncated to the last 5000 lines.
2936
['16936', '17031', '1']
2937
['15307', '14904', '1']
2938
['22852', '13045', '1']
2939
['14291', '96703', '1']
2940
Any help/suggestion is highly appreciated.
Your syntax is causing the error.
Looks like accessing the edge object may be the issue. Debug using type() and len() of edge and see what the index error is.
implicitly specifying int(i) is not needed, so the issue will be in the assignment of train_index[x] or your enumeration logic is not right.
As mentioned by #indigo_4_alpha, The error is caused by the 'edge[0]` element which is a string.
Code for checking the train_edges
train_edges, test_edges, = train_test_split(edgeL,test_size=0.3,random_state=16)
for i, edge in enumerate(train_edges):
print(edge)
print(i)
print(edge[0], edge[1],edge[2])
print(type(edge[0]))
Output
['11635' '22046' '1']
2608
11635 22046 1
<class 'str'>
After observing the output, I noticed that individually edge[0] is a string. Then I realized that int(W_out[u] is of no-effect when u itself is a string.
So, I type-casted u=edge[0] to u=int(edge[0]) in the lines 7 and 8 of the code, as shown below.
Master code for Train and test data split
1. train_edges, test_edges, = train_test_split(edgeL,test_size=0.3,random_state=16)
2. out_dim = int(W_out.shape[1])
3. in_dim = int(W_in.shape[1])
4. train_x = np.zeros((len(train_edges), (out_dim + in_dim) * 2))
5. train_y = np.zeros((len(train_edges), 1))
6. for i, edge in enumerate(train_edges):
7. u = int(edge[0])
8. v = int(edge[1])
Thank you one and all for sparing your time and giving me your valuable suggestions.

How to concatenate two tensors with intervals in tensorflow?

I want to concatenate two tensors checkerboard-ly in tensorflow2, like examples showed below:
example 1:
a = [[1,1],[1,1]]
b = [[0,0],[0,0]]
concated_a_and_b = [[1,0,1,0],[0,1,0,1]]
example 2:
a = [[1,1,1],[1,1,1],[1,1,1]]
b = [[0,0,0],[0,0,0],[0,0,0]]
concated_a_and_b = [[1,0,1,0,1,0],[0,1,0,1,0,1],[1,0,1,0,1,0]]
Is there a decent way in tensorflow2 to concatenate them like this?
A bit of background for this:
I first split a tensor c with a checkerboard mask into two halves a and b. A after some transformation I have to concat them back into oringnal shape and order.
What I mean by checkerboard-ly:
Step 1: Generate a matrix with alternated values
You can do this by first concatenating into [1, 0] pairs, and then by applying a final reshape.
Step 2: Reverse some rows
I split the matrix into two parts, reverse the second part and then rebuild the full matrix by picking alternatively from the first and second part
Code sample:
import math
import numpy as np
import tensorflow as tf
a = tf.ones(shape=(3, 4))
b = tf.zeros(shape=(3, 4))
x = tf.expand_dims(a, axis=-1)
y = tf.expand_dims(b, axis=-1)
paired_ones_zeros = tf.concat([x, y], axis=-1)
alternated_values = tf.reshape(paired_ones_zeros, [-1, a.shape[1] + b.shape[1]])
num_samples = alternated_values.shape[0]
middle = math.ceil(num_samples / 2)
is_num_samples_odd = middle * 2 != num_samples
# Gather first part of the matrix, don't do anything to it
first_elements = tf.gather_nd(alternated_values, [[index] for index in range(middle)])
# Gather second part of the matrix and reverse its elements
second_elements = tf.reverse(tf.gather_nd(alternated_values, [[index] for index in range(middle, num_samples)]), axis=[1])
# Pick alternatively between first and second part of the matrix
indices = np.concatenate([[[index], [index + middle]] for index in range(middle)], axis=0)
if is_num_samples_odd:
indices = indices[:-1]
output = tf.gather_nd(
tf.concat([first_elements, second_elements], axis=0),
indices
)
print(output)
I know this is not a decent way as it will affect time and space complexity. But it solves the above problem
def concat(tf1, tf2):
result = []
for (index, (tf_item1, tf_item2)) in enumerate(zip(tf1, tf2)):
item = []
for (subitem1, subitem2) in zip(tf_item1, tf_item2):
if index % 2 == 0:
item.append(subitem1)
item.append(subitem2)
else:
item.append(subitem2)
item.append(subitem1)
concated_a_and_b.append(item)
return concated_a_and_b

Tensorflow stack vectors out of loop to create a matrix

I have a for loop that creates vectors (tf tensors) of equal length, say
a1 = [0, 2, 4 ... ]
a2 = [1, 4, 6 ... ]
...
and I want to concatenate these vectors into a matrix, along the 0th axis
matrix = [[0,2,4...] , [1,4,6...] ... ]
I can do a
matrix = tf.concat(0, [matrix, a])
inside the for loop. However the first iteration does not work, since matrix does not exist and if I initialize it to a vector, I'm stuck with that vector at the top of the end matrix. Is there a quick way of doing this?
You can use tf.stack:
matrix = tf.stack([a1, a2, ...])

Row-wise Histogram

Given a 2-dimensional tensor t, what's the fastest way to compute a tensor h where
h[i, :] = tf.histogram_fixed_width(t[i, :], vals, nbins)
I.e. where tf.histogram_fixed_width is called per row of the input tensor t?
It seems that tf.histogram_fixed_width is missing an axis parameter that works like, e.g., tf.reduce_sum's axis parameter.
tf.histogram_fixed_width works on the entire tensor indeed. You have to loop through the rows explicitly to compute the per-row histograms. Here is a complete working example using TensorFlow's tf.while_loop construct :
import tensorflow as tf
t = tf.random_uniform([2, 2])
i = 0
hist = tf.constant(0, shape=[0, 5], dtype=tf.int32)
def loop_body(i, hist):
h = tf.histogram_fixed_width(t[i, :], [0.0, 1.0], nbins=5)
return i+1, tf.concat_v2([hist, tf.expand_dims(h, 0)], axis=0)
i, hist = tf.while_loop(
lambda i, _: i < 2, loop_body, [i, hist],
shape_invariants=[tf.TensorShape([]), tf.TensorShape([None, 5])])
sess = tf.InteractiveSession()
print(hist.eval())
Inspired by keveman's answer and because the number of rows of t is fixed and rather small, I chose to use a combination of tf.gather to split rows and tf.pack to join rows. It looks simple and works, will see if it is efficient...
t_histo_rows = [
tf.histogram_fixed_width(
tf.gather(t, [row]),
vals, nbins)
for row in range(t_num_rows)]
t_histo = tf.pack(t_histo_rows, axis=0)
I would like to propose another implementation.
This implementation can also handle multi axes and unknown dimensions (batching).
def histogram(tensor, nbins=10, axis=None):
value_range = [tf.reduce_min(tensor), tf.reduce_max(tensor)]
if axis is None:
return tf.histogram_fixed_width(tensor, value_range, nbins=nbins)
else:
if not hasattr(axis, "__len__"):
axis = [axis]
other_axis = [x for x in range(0, len(tensor.shape)) if x not in axis]
swap = tf.transpose(tensor, [*other_axis, *axis])
flat = tf.reshape(swap, [-1, *np.take(tensor.shape.as_list(), axis)])
count = tf.map_fn(lambda x: tf.histogram_fixed_width(x, value_range, nbins=nbins), flat, dtype=(tf.int32))
return tf.reshape(count, [*np.take([-1 if a is None else a for a in tensor.shape.as_list()], other_axis), nbins])
The only slow part here is tf.map_fn but it is still faster than the other solutions mentioned.
If someone knows a even faster implementation please comment since this operation is still very expensive.
answers above is still slow running in GPU. Here i give an another option, which is faster(at least in my running envirment), but it is limited to 0~1 (you can normalize the value first). the train_equal_mask_nbin can be defined once in advance
def histogram_v3_nomask(tensor, nbins, row_num, col_num):
#init mask
equal_mask_list = []
for i in range(nbins):
equal_mask_list.append(tf.ones([row_num, col_num], dtype=tf.int32) * i)
#[nbins, row, col]
#[0, row, col] is tensor of shape [row, col] with all value 0
#[1, row, col] is tensor of shape [row, col] with all value 1
#....
train_equal_mask_nbin = tf.stack(equal_mask_list, axis=0)
#[inst, doc_len] float to int(equaly seg float in bins)
int_input = tf.cast(tensor * (nbins), dtype=tf.int32)
#input [row,col] -> copy N times, [nbins, row_num, col_num]
int_input_nbin_copy = tf.reshape(tf.tile(int_input, [nbins, 1]), [nbins, row_num, col_num])
#calculate histogram
histogram = tf.transpose(tf.count_nonzero(tf.equal(train_equal_mask_nbin, int_input_nbin_copy), axis=2))
return histogram
With the advent of tf.math.bincount, I believe the problem has become much simpler.
Something like this should work:
def hist_fixed_width(x,st,en,nbins):
x=(x-st)/(en-st)
x=tf.cast(x*nbins,dtype=tf.int32)
x=tf.clip_by_value(x,0,nbins-1)
return tf.math.bincount(x,minlength=nbins,axis=-1)