How to populate a queue from preloaded data with variable length sequences? - tensorflow

Say I have input data with variable length sequences loaded into memory:
sentences = [
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5, 6, 7],
[0, 1, 2, 3, 4, 5 ],
[0, 1, 2, 3, 4, 5 ]
]
How can I use this to fill a queue? E.g. something like:
padding_q = tf.PaddingFIFOQueue(
capacity=len(sentences),
dtypes=[tf.int32], shapes=[[None]])
qr = tf.train.QueueRunner(padding_q, [the_wanted_op])
How does the_wanted_op look like? It should enqueue one sentence yet four enqueues must have enqueued each sentence once.

Related

Numpy array value change via two index sets

I am trying to achieve the following:
# Before
raw = np.array([0, 1, 2, 3, 4, 5, 6, 7, 8, 9])
# Set values to 10
indice_set1 = np.array([0, 2, 4])
indice_set2 = np.array([0, 1])
raw[indice_set1][indice_set2] = 10
# Result
print(raw)
[0, 1, 2, 3, 4, 5, 6, 7, 8, 9]
But the raw values remain exactly the same.
Expecting this:
# After
raw = np.array([10, 1, 10, 3, 4, 5, 6, 7, 8, 9])
After doing raw[indice_set1] you get a new array, which is the one you modify with the second slicing, not raw.
Instead, slice the slicer:
raw[indice_set1[indice_set2]] = 10
Modified raw:
array([10, 1, 10, 3, 4, 5, 6, 7, 8, 9])

Numpy: Fast approach to extract rows from an array using an 2D index array

I have 2 arrays a and b:
N,D,V,W = 2,3,4,5
a = np.random.randint(0,V,N*D).reshape(N,D)
a
array([[2, 3, 3],
[2, 0, 3]])
b = np.random.randint(0,10,V*W).reshape(V,W)
b
array([[0, 1, 0, 5, 5],
[0, 3, 6, 8, 7],
[8, 8, 9, 0, 9],
[4, 6, 3, 3, 1]])
What I need to do is to replace every element of array a with a row from array b using the array a element value as the row index of array b.
At the moment I'm doing it this way which works fine:
b[a.ravel(),:].reshape(*a.shape,-1)
array([[[8, 8, 9, 0, 9],
[4, 6, 3, 3, 1],
[4, 6, 3, 3, 1]],
[[8, 8, 9, 0, 9],
[0, 1, 0, 5, 5],
[4, 6, 3, 3, 1]]])
However it seems this approach is a bit slow.
I tested it with:
N,D,V,W = 20000,64,100,256
and it took an average of 674ms on my laptop(8 core, 16 ram)
Can someone please recommend an faster yet still simple approach?

tensorflow expand counts into ranges

We have a Tensor of unknown length N, containing some int32 values.
How can we generate another Tensor that will contain N ranges concatenated together, each one between 0 and the int32 value from the original tensor ?
For example, if we have [4, 4, 5, 3, 1], the output Tensor should look like [0 1 2 3 0 1 2 3 0 1 2 3 4 0 1 2 0].
Thank you for any advice.
You can make this work with a tensor as input by using a tf.RaggedTensor which can contain dimensions of non-uniform length.
# Or any other N length tensor
tf_counts = tf.convert_to_tensor([4, 4, 5, 3, 1])
tf.print(tf_counts)
# [4 4 5 3 1]
# Create a ragged tensor, each row is a sequence of length tf_counts[i]
tf_ragged = tf.ragged.range(tf_counts)
tf.print(tf_ragged)
# <tf.RaggedTensor [[0, 1, 2, 3], [0, 1, 2, 3], [0, 1, 2, 3, 4], [0, 1, 2], [0]]>
# Read values
tf.print(tf_ragged.flat_values, summarize=-1)
# [0 1 2 3 0 1 2 3 0 1 2 3 4 0 1 2 0]
For this 2-dimensional case the ragged tensor tf_ragged is a “matrix“ of rows with varying length:
[[0, 1, 2, 3],
[0, 1, 2, 3],
[0, 1, 2, 3, 4],
[0, 1, 2],
[0]]
Check tf.ragged.range for more options on how to create the sequences on each row: starts for inclusive lower limits, limits for exclusive upper limit, deltas for increment. Each may vary for each sequence.
Also mind that the dtype of the tf_counts tensor will propagate to the final values.
If you want to have everything as a tensorflow object, then use tf.range() along with tf.concat().
In [88]: vals = [4, 4, 5, 3, 1]
In [89]: tf_range = [tf.range(0, limit=item, dtype=tf.int32) for item in vals]
# concat all `tf_range` objects into a single tensor
In [90]: concatenated_tensor = tf.concat(tf_range, 0)
In [91]: concatenated_tensor.eval()
Out[91]: array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0], dtype=int32)
There're other approaches to do this as well. Here, I assume that you want a constant tensor but you can construct any tensor once you have the full range list.
First, we construct the full range list using a list comprehension, make a flat list out of it, and then construct a tensor.
In [78]: from itertools import chain
In [79]: vals = [4, 4, 5, 3, 1]
In [80]: range_list = list(chain(*[range(item) for item in vals]))
In [81]: range_list
Out[81]: [0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0]
In [82]: const_tensor = tf.constant(range_list, dtype=tf.int32)
In [83]: const_tensor.eval()
Out[83]: array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0], dtype=int32)
On the other hand, we can also use tf.range() but then it returns an array when you evaluate it. So, you'd have to construct the list from the arrays and then make a flat list out of it and finally construct the tensor as in the following example.
list_of_arr = [tf.range(0, limit=item, dtype=tf.int32).eval() for item in vals]
range_list = list(chain(*[arr.tolist() for arr in list_of_arr]))
# output
[0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0]
const_tensor = tf.constant(range_list, dtype=tf.int32)
const_tensor.eval()
#output tensor as numpy array
array([0, 1, 2, 3, 0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 0], dtype=int32)

Pandas - Indexing by not in index

Googled around a bit and couldn't seem to find anything on this.
Is there an option to access data in a pandas data frame using "not index"?
So something like
df_index = asdf = pandas.MultiIndex(levels=[
['2014-10-19', '2014-10-20', '2014-10-21', '2014-10-22', '2014-10-30'],
[u'after_work', u'all_day', u'breakfast', u'lunch', u'mid_evening']],
labels=[[0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 4, 4, 4, 4],
[4, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 2, 0, 1, 3, 4]],
names=[u'start_date', u'time_group'])
And then I would like to be able to call the following to get everything not in df_index
df.ix[~df_index]
I know you can do it for logical indexing within pandas. Just curious if I could do it using an Index Object
you can use df.drop(df_index, errors="ignore").

Select elements of a numpy array based on the elements of a second array

Consider a numpy array A of shape (7,6)
A = array([[0, 1, 2, 3, 5, 8],
[4, 100, 6, 7, 8, 7],
[8, 9, 10, 11, 5, 4],
[12, 13, 14, 15, 1, 2],
[1, 3, 5, 6, 4, 8],
[12, 23, 12, 24, 4, 3],
[1, 3, 5, 7, 89, 0]])
together with a second numpy array r of the same shape which contains the radius of A starting from a central point A(3,2)=0:
r = array([[3, 3, 3, 3, 3, 4],
[2, 2, 2, 2, 2, 3],
[2, 1, 1, 1, 2, 3],
[2, 1, 0, 1, 2, 3],
[2, 1, 1, 1, 2, 3],
[2, 2, 2, 2, 2, 3],
[3, 3, 3, 3, 3, 4]])
I would like to pick up all the elements of A which are located at the position 1 of r, i.e. [9,10,11,15,4,6,5,13], all the elements of A located at position 2 of r and so on. I there some numpy function to do that?
Thank you
You can select a section of A by doing something like A[r == 1], to get all the sections as a list you could do [A[r == i] for i in range(r.max() + 1)]. This will work, but may be inefficient depending on how big the values in r go because you need to compute r == i for every i.
You could also use this trick, first sort A based on r, then simply split the sorted A array at the right places. That looks something like this:
r_flat = r.ravel()
order = r_flat.argsort()
A_sorted = A.ravel()[order]
r_sorted = r_flat[order]
edges = r_sorted.searchsorted(np.arange(r_sorted[-1] + 1), 'right')
sections = []
start = 0
for end in edges:
sections.append(A_sorted[start:end])
start = end
I get a different answer to the one you were expecting (3 not 4 from the 4th row) and the order is slightly different (strictly row then column), but:
>>> A
array([[ 0, 1, 2, 3, 5, 8],
[ 4, 100, 6, 7, 8, 7],
[ 8, 9, 10, 11, 5, 4],
[ 12, 13, 14, 15, 1, 2],
[ 1, 3, 5, 6, 4, 8],
[ 12, 23, 12, 24, 4, 3],
[ 1, 3, 5, 7, 89, 0]])
>>> r
array([[3, 3, 3, 3, 3, 4],
[2, 2, 2, 2, 2, 3],
[2, 1, 1, 1, 2, 3],
[2, 1, 0, 1, 2, 3],
[2, 1, 1, 1, 2, 3],
[2, 2, 2, 2, 2, 3],
[3, 3, 3, 3, 3, 4]])
>>> A[r==1]
array([ 9, 10, 11, 13, 15, 3, 5, 6])
Alternatively, you can get column then row ordering by transposing both arrays:
>>> A.T[r.T==1]
array([ 9, 13, 3, 10, 5, 11, 15, 6])