Lua: Dimensions of a table - numpy

This seems like a really easy, "google it for me" kind of question but I can't seem to get an answer to it. How do I find the dimensions of a table in Lua using a command similar to Numpy's .shape method?
E.g. blah = '2 x 3 table'; blah.lua_equivalent_of_shape = {2,3}

Tables in Lua are sets of key-value pairs and do not have dimensions.
You can implement 2d-arrays with Lua tables. In this case, the dimension is given by #t x #t[1], as in the example below:
t={
{11,12,13},
{21,22,23},
}
print(#t,#t[1])

Numpy's arrays are contiguous in memory and Lua's tables are Hashes so they don't always have the notion of a shape. Tables can be used to implement ragged arrays, sets, objects, etc.
That being said, to find the length of a table, t, using indices 1..n use #t
t = {1, 2, 3}
print(#t) -- prints 3
You could implement an object to behave more like a numpy array and add a shape attribute, or implement it in C and make bindings for Lua.
t = {{1, 0}, {2, 3}, {3, 1}, shape={2, 2}}
print(t.shape[1], t.shape[2])
print("dims", #t.shape)
If you really miss Numpy's functionality you can use use torch.tensor for efficient numpy like functionality in Lua.

Related

Dask scatter broadcast a list

what is the appropriate way to scatter broadcast a list using Dask disitributed?
case 1 - wrapping the list:
[future_list] = client.scatter([my_list], broadcast=True)
case 2 - not wrapping the list:
future_list = client.scatter(my_list, broadcast=True)
In the Dask documentation I have seen both examples: 1. wrapping (see bottom example) and 2. not wrapping. In my experience case 1 is the best approach, in case 2 constructing the Dask graph (large in my use case) takes a lot longer.
What could explain the difference in graph construction time? Is this expected behaviour?
Thanks in advance.
Thomas
If you call scatter with a list then Dask will assume that each element of that list should be scattered independently.
a, b, c = client.scatter([1, 2, 3], ...)
If you don't want this, if you actually just want your list to be moved around as a single piece of data, then you should wrap it in another list
[future] = client.scatter([[1, 2, 3]], ...)

TensorFlow: Can data sets contain string category values?

With TensorFlow, it is easy to determine from examples that data contains numeric values. For example:
x_train = [1, 2, 3, 4]
y_train = [0, -1, -2, -3]
However, does it also work with string category values? For example:
x_train = ["sunny", "rainy", "sunny", "cloudy"]
y_train = ["go outside", "stay inside", "go outside", "go outside"]
If it does not, I must assume that TensorFlow has a methodology for working with categorical values. Perhaps by some clever trick such as converting them to numeric values in some systematic way.
Yes, TensorFlow does support datasets with categorical features. Perhaps the easiest way to work with them is to use the Feature Column API, which provides methods such as tf.feature_column.categorical_column_with_vocabulary_list() (for dealing with small, known sets of categories) and tf.feature_column.categorical_column_with_hash_bucket() (for dealing with large and potentially unbounded sets of categories).

Fastest way to apply arithmetic operations to System.Array in IronPython

I would like to add (arithmetics) two large System.Arrays element-wise in IronPython and store the result in the first array like this:
for i in range(0:ArrA.Count) :
arrA.SetValue(i, arrA.GetValue(i) + arrB.GetValue(i));
However, this seems very slow. Having a C background I would like to use pointers or iterators. However, I do not know how I should apply the IronPython idiom in a fast way. I cannot use Python lists, as my objects are strictly from type System.Array. The type is 3d float.
What is the fastests / a fast way to perform to compute this computation?
Edit:
The number of elements is appr. 256^3.
3d float means that the array can be accessed like this: array.GetValue(indexX, indexY, indexZ). I am not sure how the respective memory is organized in IronPython's System.Array.
Background: I wrote an interface to an IronPython API, which gives access to data in a simulation software tool. I retrieve 3d scalar data and accumulate it to a temporal array in my IronPython script. The accumulation is performed 10,000 times and should be fast, so that the simulation does not take ages.
Is it possible to use the numpy library developed for IronPython?
https://pytools.codeplex.com/wikipage?title=NumPy%20and%20SciPy%20for%20.Net
It appears to be supported, and as far as I know is as close you can get in python to C style pointer functionality with arrays and such.
Create an array:
x = np.array([[1, 2, 3], [4, 5, 6]], np.int32)
Multiply all elements by 3.0:
x *= 3.0

Matrices with different row lengths in numpy

Is there a way of defining a matrix (say m) in numpy with rows of different lengths, but such that m stays 2-dimensional (i.e. m.ndim = 2)?
For example, if you define m = numpy.array([[1,2,3], [4,5]]), then m.ndim = 1. I understand why this happens, but I'm interested if there is any way to trick numpy into viewing m as 2D. One idea would be padding with a dummy value so that rows become equally sized, but I have lots of such matrices and it would take up too much space. The reason why I really need m to be 2D is that I am working with Theano, and the tensor which will be given the value of m expects a 2D value.
I'll give here very new information about Theano. We have a new TypedList() type, that allow to have python list with all elements with the same type: like 1d ndarray. All is done, except the documentation.
There is limited functionality you can do with them. But we did it to allow looping over the typed list with scan. It is not yet integrated with scan, but you can use it now like this:
import theano
import theano.typed_list
a = theano.typed_list.TypedListType(theano.tensor.fvector)()
s, _ = theano.scan(fn=lambda i, tl: tl[i].sum(),
non_sequences=[a],
sequences=[theano.tensor.arange(2, dtype='int64')])
f = theano.function([a], s)
f([[1, 2, 3], [4, 5]])
One limitation is that the output of scan must be an ndarray, not a typed list.
No, this is not possible. NumPy arrays need to be rectangular in every pair of dimensions. This is due to the way they map onto memory buffers, as a pointer, itemsize, stride triple.
As for this taking up space: np.array([[1,2,3], [4,5]]) actually takes up more space than a 2×3 array, because it's an array of two pointers to Python lists (and even if the elements were converted to arrays, the memory layout would still be inefficient).

What does it mean to flatten an iterator?

I would like to know what it means to flatten e.g. flatten an iterator of iterators. Can you tell me? Are there any C/Java/Python idioms for it?
In this context, to flatten means to remove nesting. For instance, an array of arrays (an array where each element is an array) of integers is nested; if we flatten it we get an array of integers which contains the same values in the same order, but next to each other in a single array, rather than split into several arrays: [[1 2] [3 4]] -> [1 2 3 4]. Same difference with iterators, other collections, and deeper nesting (array of array of sets of iterators of strings).
As for idioms, there aren't really many -- it's not a common task, and often simple. Note that in the case of regular arrays (all nested arrays are of the same size), nested[i][j] is equivalent to nested[i * INNER_ARRAY_SIZE + j]. This is sometimes used to avoid nesting, especially in languages which treat arrays as reference types and thus require many separately-allocated arrays if you nest them. In Python, you can flatten iterables with itertools.chain(*iterable_of_iterables).
Flattening means to remove nesting of sequence types. Python provides itertools.chain(*iterables) for this purpose (http://docs.python.org/library/itertools.html#itertools.chain).