I have been looking for ways to transfer matrix data from one block to another. I was wondering if it's possible to do the same. What I've thought of till now is converting the numpy matrix to a list, and sending the list through after padding it with the number of rows and columns in the end. After receiving, just reshape the list to a numpy matrix and process as required. But from what I understand, the length of a list must be known while making the blocks.
I'd like to know if it's possible to implement this, or if I'll have to look at it in some other way.
GNU Radio doesn't care what your items actually represent, only their size in bytes.
Therefore, you can define arbitrary item sizes, and put multiple numbers in one item. In fact, what the stream_to_vector and vector_to_stream do is exactly that.
You'd use a output_signature = gr.io_signature(1,1, [gr.sizeofgr_complex] * N_elements) with N_elements being your number of matrix entries.
As a side note: exchanging matrices does reek of things of channel estimates or equalization; these are often more elegantly handled by asynchronous message passing than item streams.
Related
I have a large quantity of missing values that appear at random in my data. Unfortunately, I cannot simply drop observations with missing data as I am grouping observations by a feature and cannot drop NaNs without affecting the entire group.
I was hoping to simply mask features that were missing. So a single group might have 8 items in it, and each item may have 0 to N features, depending on how many got masked due to being missing.
I have been experimenting a lot with RaggedTensors, but have encountered a lot of issues ranging from not being able to flatten the RaggedTensor, not being able to concatenate it with regular tensors of uniform shape, and Dense layers requiring the last dimension of their input to be known, aka the number of features.
Does anybody know if there is a way to do this?
Could someone, please clarify that what is the benefit of using a Numba typed list over an ND array? Also, how do the two compares in terms of speed, and in what context would it be recommended to use the typed list?
Typed lists are useful when your need to append a sequence of elements but you do not know the total number of elements and you could not even find a reasonable bound. Such a data structure is significantly more expensive than a 1D array (both in memory space and computation time).
1D arrays cannot be resized efficiently: a new array needs to be created and a copy must be performed. However, the indexing of 1D arrays is very cheap. Numpy also provide many functions that can natively operate on them (lists are implicitly converted to arrays when passed to a Numpy function and this process is expensive). Note that is the number of items can be bounded to a reasonably size (ie. not much higher than the number of actual element), you can create a big array, then add the elements and finally work on a sub-view of the array.
ND arrays cannot be directly compared with lists. Note that lists of lists are similar to jagged array (they can contains lists of different sizes) while ND array are likes a (fixed-size) N x ... x M table. Lists of lists are very inefficient and often not needed.
As a result, use ND arrays when you can and you do not need to often resize them (or append/remove elements). Otherwise, use typed lists.
I've got the following task: there are two outputs from DAQ, namely speed and the raw data acquired along with this speed. I'd like to use speed as a parameter to define certain number of bins, and fit the raw data which corresponds to the speed into the specific bin. I am not sure how to do this in LabVIEW - because when I check the histogram function, it seems that it only requires one input (1D array of values).
Many thanks, any help is much appreciated. Aileen
The Histogram VI takes an array of data and the number of bins you want, and determines the boundaries of the bins automatically. It sounds like that's the one you're looking at.
The General Histogram VI allows you to specify the bins yourself. If you can't find it, perhaps you only have the LabVIEW Base Package development system, as it's only present in the Full Development System and above.
If you don't have General Histogram and you need to create a histogram using your own bin boundaries, it wouldn't be too hard to create. Without writing the code for you, you could do something like:
Create a 1D array containing your bin boundaries in ascending order.
Use a For loop to index through the array of bin boundaries
In the loop, use (e.g.) >, <=, and And functions to get a Boolean array which contains True for each value in the data array that should be in the current bin
Use Boolean to (0,1) and Add Array Elements to count the number of True values.
If any of that's unclear, please edit your question with more details and perhaps an example of some input data and what you want the output to be.
This is an implementation of nekomatic's description.
The first SubVi just creates the 1D array containing your bin boundaries.
X_in and Y_in are the independent and dependent input datasets. Both have to be of equal length but must not be sorted. In the inner For loop it will be checked if X_in fits into the current bin. If so, X_in and the corresponding Y_in value are stored in a temporary arrays which are averaged afterwards.
Maybe it is not the most efficient code but at least it seems to be not slower than the General Histogram VI
I am trying to get deterministic behaviour from tf.train.shuffle_batch(). I could, instead, use tf.train.batch() which works fine (always the same order of elements), but I need to get examples from multiple tf-records and so I am stuck with shuffle_batch().
I am using:
random.seed(0)
np.random.seed(0)
tf.set_random_seed(0)
data_entries = tf.train.shuffle_batch(
[data], batch_size=batch_size, num_threads=1, capacity=512,
seed=57, min_after_dequeue=32)
But every time I restart my script I get slightly different results (not completely different, but about 20% of the elements are in the wrong order).
Is there anything I am missing?
Edit: Solved it! See my answer below!
Maybe I misunderstood something, but you can collect multiple tf-records in a queue with tf.train.string_input_producer(), then read the examples into tensors and finally use tf.train.batch().
Take a look at CIFAR-10 input.
Answering my own question:
First the reason shuffle_batch is non deterministic:
The time until I request a batch is inherently random.
In that time, a random number of tensors are available.
Tensorflow calls a shuffle operation that is seeded but depending on the number of items, it will return a different order.
So no matter the seeding, the order is always different unless the number of elements is constant. So the solution is to keep the number of elements constant, but how we do it?
By setting capacity=min_after_dequeue+batch_size. This will force Tensorflow to fill up the queue until it reaches full capacity before dequeuing an item. Therefore, at the time of the shuffle operation, we have capacity many items which is a constant number.
So why are we doing this? Because one tf.record contains many examples but we want examples from multiple tf.records. With a normal batch we would first get all the examples of one record and then of the next one. This also means we should set min_after_dequeue to something larger than the number of items in one tf.record. In my example, I have 50 examples in one file so I set min_after_dequeue=2048.
Alternatively, we can also shuffle the examples before creating the tf.records, but this was not possible for me because I read tf.records from multiple directories (each with their own dataset).
Last Note: You should also use a batch size of 1 to be super save.
Seeing this answer I am wondering if the creation of a flattened view of X are essentially the same, as long as I know that the number of axes in X is 3:
A = X.ravel()
s0, s1, s2 = X.shape
B = X.reshape(s0*s1*s2)
C = X.reshape(-1) # thanks to #hpaulj below
I'm not asking if A and B and C are the same.
I'm wondering if the particular use of ravel and reshape in this situation are essentially the same, or if there are significant differences, advantages, or disadvantages to one or the other, provided that you know the number of axes of X ahead of time.
The second method takes a few microseconds, but that does not seem to be size dependent.
Look at their __array_interface__ and do some timings. The only difference that I can see is that ravel is faster.
.flatten() has a more significant difference - it returns a copy.
A.reshape(-1)
is a simpler way to use reshape.
You could study the respective docs, and see if there is something else. I haven't explored what happens when you specify order.
I would use ravel if I just want it to be 1d. I use .reshape most often to change a 1d (e.g. arange()) to nd.
e.g.
np.arange(10).reshape(2,5).ravel()
Or choose the one that makes your code most readable.
reshape and ravel are defined in numpy C code:
In https://github.com/numpy/numpy/blob/0703f55f4db7a87c5a9e02d5165309994b9b13fd/numpy/core/src/multiarray/shape.c
PyArray_Ravel(PyArrayObject *arr, NPY_ORDER order) requires nearly 100 lines of C code. And it punts to PyArray_Flatten if the order changes.
In the same file, reshape punts to newshape. That in turn returns a view is the shape doesn't actually change, tries _attempt_nocopy_reshape, and as last resort returns a PyArray_NewCopy.
Both make use of PyArray_Newshape and PyArray_NewFromDescr - depending on how shapes and order mix and match.
So identifying where reshape (to 1d) and ravel are different would require careful study.
Another way to do this ravel is to make a new array, with a new shape, but the same data buffer:
np.ndarray((24,),buffer=A.data)
It times the same as reshape. Its __array_interface__ is the same. I don't recommend using this method, but it may clarify what is going on with these reshape/ravel functions. They all make a new array, with new shape, but with share data (if possible). Timing differences are the result of different sequences of function calls - in Python and C - not in different handling of the data.