How do you iterate backward over circular buffer without a conditional? - iteration

Iterating forward through a circular buffer without using a conditional is easy with the remainder operator...
iterator = (iterator + 1) % buffer_size;
I can't for the life of me figure out the reverse operation, iterating backward.

Does iterator = (iterator + buffer_size - 1) % buffer_size work for you? Go one less than all the way around.

Borealid's answer works. (note: iterator is set to 0 initially).
Another solution is
iterator = buffer_size - 1 - (buffer_size - iterator) % buffer_size
with iterator set to buffer_size initially.

Related

Is it possible to compute the sign of a permutation in linear time?

I was just wondering if there's a way to compute the sign of a permutation within linear (or at least better than n^2?) time
For example, let's say I have an array of n numbers and I permute two elements within this array which would flip the sign of the permutation. I have a function that can compute this in n^2 time, however, it seems there might be a more efficient algorithm.
I've attached a minimal reproducible example of computing in quadratic time,
import numpy as np
vals = np.arange(1,6,1)
pvals = np.arange(1,6,1)
pvals[0], pvals[1] = pvals[1], pvals[0] #swap
def quadratic(vals):
sgn_matrix = np.sign(np.expand_dims(vals, -1) - np.expand_dims(vals, -2))
return np.prod(np.tril(np.ones_like(sgn_matrix)) + np.triu(sgn_matrix, 1))
def sub_quadratic(vals):
#algorithm quicker than quadratic time?
sgn = quadratic(vals)
print(sgn) #prints +1
psgn = quadratic(pvals)
print(psgn) #prints -1 (because one permutation)
I have had a look around SO (here for example) and people keep talking about cyclic permutations which apparently can compute in linear time but it's something I'm unaware of completely and can't find much of myself.
TL;DR Does anyone know of a method for computing the sign of a permutation in sub-quadratic time ?
Just decompose it into transpositions and check whether you needed an even or odd number of transpositions:
def permutation_sign(perm):
parity = 1
perm = perm.copy()
for i in range(len(perm)):
while perm[i] != i+1:
parity *= -1
j = perm[i] - 1
# Note: if you try to inline the j computation into the next line,
# you'll get evaluation order bugs.
perm[i], perm[j] = perm[j], perm[i]
return parity

How to find the size of tensorflow dataset object?

I have created tensorflow dataset object and I would like to know the size of this dataset.
Sadly tf.data.Dataset doesn't have a fully defined length.
One workaround approach would be to iterate over it once to get the number of elements
def get_ds_length(dataset):
len = 0
for _ in dataset:
len += 1
return len
Obviously this would be slow for large datasets and ones that use heavy preprocessing.

Time complexity and integer inputs

I came across a question asking to describe the computational complexity in Big O of the following code:
i = 1;
while(i < N) {
i = i * 2;
}
I found this Stack Overflow question asking for the answer, with the most voted answer saying it is Log2(N).
On first thought that answer looks correct, however I remember learning about psuedo polynomial runtimes, and how computational complexity measures difficulty with respect to the length of the input, rather than the value.
So for integer inputs, the complexity should be in terms of the number of bits in the input.
Therefore, shouldn't this function be O(N)? Because every iteration of the loop increases the number of bits in i by 1, until it reaches around the same bits as N.
This code might be found in a function like the one below:
function FindNextPowerOfTwo(N) {
i = 1;
while(i < N) {
i = i * 2;
}
return i;
}
Here, the input can be thought of as a k-bit unsigned integer which we might as well imagine as having as a string of k bits. The input size is therefore k = floor(log(N)) + 1 bits of input.
The assignment i = 1 should be interpreted as creating a new bit string and assigning it the length-one bit string 1. This is a constant time operation.
The loop condition i < N compares the two bit strings to see which represents the larger number. If implemented intelligently, this will take time proportional to the length of the shorter of the two bit strings which will always be i. As we will see, the length of i's bit string begins at 1 and increases by 1 until it is greater than or equal to the length of N's bit string, k. When N is not a power of two, the length of i's bit string will reach k + 1. Thus, the time taken by evaluating the condition is proportional to 1 + 2 + ... + (k + 1) = (k + 1)(k + 2)/2 = O(k^2) in the worst case.
Inside the loop, we multiply i by two over and over. The complexity of this operation depends on how multiplication is to be interpreted. Certainly, it is possible to represent our bit strings in such a way that we could intelligently multiply by two by performing a bit shift and inserting a zero on the end. This could be made to be a constant-time operation. If we are oblivious to this optimization and perform standard long multiplication, we scan i's bit string once to write out a row of 0s and again to write out i with an extra 0, and then we perform regular addition with carry by scanning both of these strings. The time taken by each step here is proportional to the length of i's bit string (really, say that plus one) so the whole thing is proportional to i's bit-string length. Since the bit-string length of i assumes values 1, 2, ..., (k + 1), the total time is 2 + 3 + ... + (k + 2) = (k + 2)(k + 3)/2 = O(k^2).
Returning i is a constant time operation.
Taking everything together, the runtime is bounded from above and from below by functions of the form c * k^2, and so a bound on the worst-case complexity is Theta(k^2) = Theta(log(n)^2).
In the given example, you are not increasing the value of i by 1, but doubling it at every time, thus it is moving 2 times faster towards N. By multiplying it by two you are reducing the size of search space (between i to N) by half; i.e, reducing the input space by the factor of 2. Thus the complexity of your program is - log_2 (N).
If by chance you'd be doing -
i = i * 3;
The complexity of your program would be log_3 (N).
It depends on important question: "Is multiplication constant operation"?
In real world it is usually considered as constant, because you have fixed 32 or 64 bit numbers and multiplicating them takes always same (=constant) time.
On the other hand - you have limitation that N < 32/64 bit (or any other if you use it).
In theory where you do not consider multiplying as constant operation or for some special algorithms where N can grow too much to ignore the multiplying complexity, you are right, you have to start thinking about complexity of multiplying.
The complexity of multiplying by constant number (in this case 2) - you have to go through each bit each time and you have log_2(N) bits.
And you have to do hits log_2(N) times before you reach N
Which ends with complexity of log_2(N) * log_2(N) = O(log_2^2(N))
PS: Akash has good point that multiply by 2 can be written as constant operation, because the only thing you need in binary is to "add zero" (similar to multiply by 10 in "human readable" format, you just add zero 4333*10 = 43330)
However if multiplying is not that simple (you have to go through all bits), the previous answer is correct

Elegant Way to Select one Element per Row in Tensorflow

Given...
a Matrix A of shape [m, n]
a tensor I of shape [m]
I want to get a list J of elements from A where
J[i] = A[i, I[i]].
That is, I holds the index of the element to select from each row in A.
Context: I already have the argmax(A, 1) and now I also want the max.
I know that I can just use reduce_max.
And after trying around for a bit I also came up with this:
J = tf.gather_nd(A,
tf.transpose(tf.pack([tf.to_int64(tf.range(A.get_shape()[0])), I])))
Where the to_int64 is needed because range only produces int32 and argmax only produces int64.
None of the two strike me as particularly elegant.
One has runtime overhead (probably about factor n) and the other has an unknown factor cognitive overhead. Am I missing something here?
The gather() function provides a way to do it:
r = tf.random.uniform([4,5],0, 9, dtype=tf.int32)
i = tf.random.uniform([4], 0, 4, dtype=tf.int32)
tf.gather(r, i, axis=1, batch_dims=1)
This is a rather late answer, but could doing
mask = tf.one_hot(I, depth=n, dtype=tf.bool, on_value=True, off_value=False)
elements = tf.boolean_mask(A, mask)
Accomplish what you're looking for?
edit: I should point out that this is NOT a good idea if A is already a very large tensor, as this ends up making a dense matrix.
Link provided by #yaroslav-bulatov mentiones this solution:
def get_elements(data, indices):
indeces = tf.range(0, tf.shape(indices)[0])*data.shape[1] + indices
return tf.gather(tf.reshape(data, [-1]), indeces)
Your solution is not currently differentiable (because gradients for tf.gather_nd are not currently supported).
Hopefully, data[:, indices] will be introduced soon.

What is reduction variable? Could anyone give me some examples?

What is reduction variable?
Could anyone give me some examples?
Here's a simple example in a C-like language of computing the sum of an array:
int x = 0;
for (int i = 0; i < n; i++) {
x += a[i];
}
In this example,
i is an induction variable - in each iteration it changes by some constant. It can be +1 (as in the above example) or *2 or /3 etc., but the key is that in all the iterations the number is the same.
In other words, in each iteration i_new = i_old op constant, where op is +, *, etc., and neither op nor constant change between iterations.
x is a reduction variable - it accumulates data from one iteration to the next. It always has some initialization (x = 0 in this case), and while the data accumulated can be different in each iteration, the operator remains the same.
In other words, in each iteration x_new = x_old op data, and op remains the same in all iterations (though data may change).
In many languages there's a special syntax for performing something like this - often called a "fold" or "reduce" or "accumulate" (and it has other names) - but in the context of LLVM IR, an induction variable will be represented by a phi node in a loop between a binary operation inside the loop and the initialization value before it.
Commutative* operations in reduction variables (such as addition) are particularly interesting for an optimizing compiler because they appear to show a stronger dependency between iterations than there really is; for instance the above example could be rewritten into a vectorized form - adding, say, 4 numbers at a time, and followed by a small loop to sum the final vector into a single value.
* there are actually more conditions that the reduction variable has to fulfill before a vectorization like this can be applied, but that's really outside the scope here