Computer all possible pairs oft two lists efficiency - time-complexity

Let's say I have two sets of numbers and I want to build all pairs oft the values inside oft it. For example:
A = {1, 2}
B = {3, 4}
Out = {(1,3), (1,4), (2,3), (2,4)}
My sets have a lookup time in O(1). It should be possible to compute my output in O(|A| + |B|) also if the sets do not have the same size but I don't find any solution for this [easy solution would be two foor loops but this in O(n^2)]. Can you please give me a hint how I can compute this in the given complexity?

No you cannot do it better than two for loops. Think about this way. For every element in A you have to output B elements. So you running time will always be some multiple of A*B.
Let's modify your example above for A to have 3 elements
A = {1, 2, 3}
B = {3, 4}
Out = {(1,3), (1,4), (2,3), (2,4), {3,3}, {3,4}}
So you have 6 elements of output with initial sets |A| = 3 and |B| = 2. You claim your output should be |A| + |B| which is 5. Therefore you initial assumption is not true.
Your best optimization would be to make sure you can enumerate element of the sets in O(1) time per element.

Related

Optimizing specific numbers to reach value

I'm trying to make a program, that when given specific values (let's say 1, 4 and 10), will try to get how much of each value is needed to reach a certain amount, say 19.
It will always try to use as many high values as possible, so in this case, the result should be 10*1, 4*2, 1*1.
I tried thinking about it, but couldn't end up with an algorithm that could work...
Any help or hints would be welcome!
Here is a python solution that tries all the choices until one is found. If you pass the values it can use in descending order, the first found will be the one that uses the most high values as possible:
def solve(left, idx, nums, used):
if (left == 0):
return True
for i in range(idx, len(nums)):
j = int(left / nums[idx])
while (j > 0):
used.append((nums[idx], j))
if solve(left - j * nums[idx], idx + 1, nums, used):
return True
used.pop()
j -= 1
return False
solution = []
solve(19, 0, [10, 4, 1], solution)
print(solution) # will print [(10, 1), (4, 2), (1, 1)]
If anyone needs a simple algorithm, one way I found was:
sort the values, in descending order
keep track on how many values are kept
for each value, do:
if the sum is equal to the target, stop
if it isn't the first value, remove one of the previous values
while the total sum of values is smaller than the objective:
add the current value once
Have a nice day!
(As juviant mentionned, this won't work if the skips larger numbers, and only uses smaller ones! I'll try to improve it and post a new version when I get it to work)

Octave: summing indexed elements

The easiest way to describe this is via example:
data = [1, 5, 3, 6, 10];
indices = [1, 2, 2, 2, 4];
result = zeroes(1, 5);
I want result(1) to be the sum of all the elements in data whose index is 1, result(2) to be the sum of all the elements in data whose index is 2, etc.
This works but is really slow when applied (changing 5 to 65535) to 64K element vectors:
result = result + arrayfun(#(x) sum(data(index==x)), 1:5);
I think it's creating 64K vectors with 64K elements that's taking up the time. Is there a faster way to do this? Or do I need to figure out a completely different approach?
for i = [1:5]
idx = indices(i);
result(idx) = result(idx) + data(i);
endfor
But that's a very non-octave-y way to do it.
Seeing how MATLAB is very similar to Octave, I will provide an answer that was tested on MATLAB R2016b. Looking at the documentation of Octave 4.2.1 the syntax should be the same.
All you need to do is this:
result = accumarray(indices(:), data(:), [5 1]).'
Which gives:
result =
1 14 0 10 0
Reshaping to a column vector (arrayName(:) ) is necessary because of the expected inputs to accumarray. Specifying the size as [5 1] and then transposing the result was done to avoid some MATLAB error.
accumarray is also described in depth in the MATLAB documentation

torch logical indexing of tensor

I looking for an elegant way to select a subset of a torch tensor which satisfies some constrains.
For example, say I have:
A = torch.rand(10,2)-1
and S is a 10x1 tensor,
sel = torch.ge(S,5) -- this is a ByteTensor
I would like to be able to do logical indexing, as follows:
A1 = A[sel]
But that doesn't work.
So there's the index function which accepts a LongTensor but I could not find a simple way to convert S to a LongTensor, except the following:
sel = torch.nonzero(sel)
which returns a K x 2 tensor (K being the number of values of S >= 5). So then I have to convert it to a 1 dimensional array, which finally allows me to index A:
A:index(1,torch.squeeze(sel:select(2,1)))
This is very cumbersome; in e.g. Matlab all I'd have to do is
A(S>=5,:)
Can anyone suggest a better way?
One possible alternative is:
sel = S:ge(5):expandAs(A) -- now you can use this mask with the [] operator
A1 = A[sel]:unfold(1, 2, 2) -- unfold to get back a 2D tensor
Example:
> A = torch.rand(3,2)-1
-0.0047 -0.7976
-0.2653 -0.4582
-0.9713 -0.9660
[torch.DoubleTensor of size 3x2]
> S = torch.Tensor{{6}, {1}, {5}}
6
1
5
[torch.DoubleTensor of size 3x1]
> sel = S:ge(5):expandAs(A)
1 1
0 0
1 1
[torch.ByteTensor of size 3x2]
> A[sel]
-0.0047
-0.7976
-0.9713
-0.9660
[torch.DoubleTensor of size 4]
> A[sel]:unfold(1, 2, 2)
-0.0047 -0.7976
-0.9713 -0.9660
[torch.DoubleTensor of size 2x2]
There are two simpler alternatives:
Use maskedSelect:
result=A:maskedSelect(your_byte_tensor)
Use a simple element-wise multiplication, for example
result=torch.cmul(A,S:gt(0))
The second one is very useful if you need to keep the shape of the original matrix (i.e A), for example to select neurons in a layer at backprop. However, since it puts zeros in the resulting matrix whenever the condition dictated by the ByteTensor doesn't apply, you can't use it to compute the product (or median, etc.). The first one only returns the elements that satisfy the condittion, so this is what I'd use to compute products or medians or any other thing where I don't want zeros.

Picking random binary flag

I have defined the following:
typdef enum {
none = 0,
alpha = 1,
beta = 2,
delta = 4
gamma = 8
omega = 16,
} Greek;
Greek t = beta | delta | gammax
I would like to be able to pick one of the flags set in t randomly. The value of t can vary (it could be, anything from the enum).
One thought I had was something like this:
r = 0;
while ( !t & ( 2 << r ) { r = rand(0,4); }
Anyone got any more elegant ideas?
If it helps, I want to do this in ObjC...
Assuming I've correctly understood your intent, if your definition of "elegant" includes table lookups the following should do the trick pretty efficiently. I've written enough to show how it works, but didn't fill out the entire table. Also, for Objective-C I recommend arc4random over using rand.
First, construct an array whose indices are the possible t values and whose elements are arrays of t's underlying Greek values. I ignored none, but that's a trivial addition to make if you want it. I also found it easiest to specify the lengths of the subarrays. Alternatively, you could do this with NSArrays and have them self-report their lengths:
int myArray[8][4] = {
{0},
{1},
{2},
{1,2},
{4},
{4,1},
{4,2},
{4,2,1}
};
int length[] = {1,1,1,2,1,2,2,3};
Then, for any given t you can randomly select one of its elements using:
int r = myArray[t][arc4random_uniform(length[t])];
Once you get past the setup, the actual random selection is efficient, with no acceptance/rejection looping involved.

What is a data structure for quickly finding non-empty intersections of a list of sets?

I have a set of N items, which are sets of integers, let's assume it's ordered and call it I[1..N]. Given a candidate set, I need to find the subset of I which have non-empty intersections with the candidate.
So, for example, if:
I = [{1,2}, {2,3}, {4,5}]
I'm looking to define valid_items(items, candidate), such that:
valid_items(I, {1}) == {1}
valid_items(I, {2}) == {1, 2}
valid_items(I, {3,4}) == {2, 3}
I'm trying to optimize for one given set I and a variable candidate sets. Currently I am doing this by caching items_containing[n] = {the sets which contain n}. In the above example, that would be:
items_containing = [{}, {1}, {1,2}, {2}, {3}, {3}]
That is, 0 is contained in no items, 1 is contained in item 1, 2 is contained in itmes 1 and 2, 2 is contained in item 2, 3 is contained in item 2, and 4 and 5 are contained in item 3.
That way, I can define valid_items(I, candidate) = union(items_containing[n] for n in candidate).
Is there any more efficient data structure (of a reasonable size) for caching the result of this union? The obvious example of space 2^N is not acceptable, but N or N*log(N) would be.
I think your current solution is optimal big-O wise, though there are micro-optimization techniques that could improve its actual performance. Such as using bitwise operations when merging the chosen set in item_containing set with the valid items set.
i.e. you store items_containing as this:
items_containing = [0x0000, 0x0001, 0x0011, 0x0010, 0x0100, 0x0100]
and your valid_items can use bit-wise OR to merge like this:
int valid_items(Set I, Set candidate) {
// if you need more than 32-items, use int[] for valid
// and int[][] for items_containing
int valid = 0x0000;
for (int item : candidate) {
// bit-wise OR
valid |= items_containing[item];
}
return valid;
}
but they don't really change the Big-O performance.
One representation that might help is storing the sets I as vectors V of size n whose entries V(i) are 0 when i is not in V and positive otherwise. Then to take the intersection of two vectors you multiply the terms, and to take the union you add the terms.