Hash function to iterate through a matrix - objective-c

Given a NxN matrix and a (row,column) position, what is a method to select a different position in a random (or pseudo-random) order, trying to avoid collisions as much as possible?
For example: consider a 5x5 matrix and start from (1,2)
0 0 0 0 0
0 0 X 0 0
0 0 0 0 0
0 0 0 0 0
0 0 0 0 0
I'm looking for a method like
(x,y) hash (x,y);
to jump to a different position in the matrix, avoiding collisions as much as possible
(do not care how to return two different values, it doesn't matter, just think of an array).
Of course, I can simply use
row = rand()%N;
column = rand()%N;
but it's not that good to avoid collisions.
I thought I could apply twice a simple hash method for both row and column and use the results as new coordinates, but I'm not sure this is a good solution.
Any ideas?

Can you determine the order of the walk before you start iterating? If your matrices are large, this approach isn't space-efficient, but it is straightforward and collision-free. I would do something like:
Generate an array of all of the coordinates. Remove the starting position from the list.
Shuffle the list (there's sample code for a Fisher-Yates shuffle here)
Use the shuffled list for your walk order.

Edit 2 & 3: A modular approach: Given s array elements, choose a prime p of form 2+3*n, p>s. For i=1 to p, use cells (iii)%p when that value is in range 1...s-1. (For row-length r, cell #c subscripts are c%r, c/r.)
Effectively, this method uses H(i) = (iii) mod p as a hash function. The reference shows that as i ranges from 1 to p, H(i) takes on each of the values from 0 to p-1, exactly one time each.
For example, with s=25 and p=29 or 47, this uses cells in following order:
p=29: 1 8 6 9 13 24 19 4 14 17 22 18 11 7 12 3 15 10 5 16 20 23 2 21 0
p=47: 1 8 17 14 24 13 15 18 7 4 10 2 6 21 3 22 9 12 11 23 5 19 16 20 0
according to bc code like
s=25;p=29;for(i=1;i<=p;++i){t=(i^3)%p; if(t<s){print " ",t}}
The text above shows the suggestion I made in Edit 2 of my answer. The text below shows my first answer.
Edit 0: (This is the suggestion to which Seamus's comment applied): A simple method to go through a vector in a "random appearing" way is to repeatedly add d (d>1) to an index. This will access all elements if d and s are coprime (where s=vector length). Note, my example below is in terms of a vector; you could do the same thing independently on the other axis of your matrix, with a different delta for it, except a problem mentioned below would occur. Note, "coprime" means that gcd(d,s)=1. If s is variable, you'd need gcd() code.
Example: Say s is 10. gcd(s,x) is 1 for x in {1,3,7,9} and is not 1 for x in {2,4,5,6,8,10}. Suppose we choose d=7, and start with i=0. i will take on values 0, 7, 14, 21, 28, 35, 42, 49, 56, 63, 70, which modulo 10 is 0, 7, 4, 1, 8, 5, 2, 9, 6, 3, 0.
Edit 1 & 3: Unfortunately this will have a problem in the two-axis case; for example, if you use d=7 for x axis, and e=3 for y-axis, while the first 21 hits will be distinct, it will then continue repeating the same 21 hits. To address this, treat the whole matrix as a vector, use d with gcd(d,s)=1, and convert cell numbers to subscripts as above.

If you just want to iterate through the matrix, what is wrong with row++; if (row == N) {row = 0; column++}?
If you iterate through the row and the column independently, and each cycles back to the beginning after N steps, then the (row, column) pair will interate through only N of the N^2 cells of the matrix.
If you want to iterate through all of the cells of the matrix in pseudo-random order, you could look at questions here on random permutations.

This is a companion answer to address a question about my previous answer: How to find an appropriate prime p >= s (where s = the number of matrix elements) to use in the hash function H(i) = (i*i*i) mod p.
We need to find a prime of form 3n+2, where n is any odd integer such that 3*n+2 >= s. Note that n odd gives 3n+2 = 3(2k+1)+2 = 6k+5 where k need not be odd. In the example code below, p = 5+6*(s/6); initializes p to be a number of form 6k+5, and p += 6; maintains p in this form.
The code below shows that half-a-dozen lines of code are enough for the calculation. Timings are shown after the code, which is reasonably fast: 12 us at s=half a million, 200 us at s=half a billion, where us denotes microseconds.
// timing how long to find primes of form 2+3*n by division
// jiw 20 Sep 2011
#include <stdlib.h>
#include <stdio.h>
#include <sys/time.h>
double ttime(double base) {
struct timeval tod;
gettimeofday(&tod, NULL);
return tod.tv_sec + tod.tv_usec/1e6 - base;
}
int main(int argc, char *argv[]) {
int d, s, p, par=0;
double t0=ttime(0);
++par; s=5000; if (argc > par) s = atoi(argv[par]);
p = 5+6*(s/6);
while (1) {
for (d=3; d*d<p; d+=2)
if (p%d==0) break;
if (d*d >= p) break;
p += 6;
}
printf ("p = %d after %.6f seconds\n", p, ttime(t0));
return 0;
}
Timing results on 2.5GHz Athlon 5200+:
qili ~/px > for i in 0 00 000 0000 00000 000000; do ./divide-timing 500$i; done
p = 5003 after 0.000008 seconds
p = 50021 after 0.000010 seconds
p = 500009 after 0.000012 seconds
p = 5000081 after 0.000031 seconds
p = 50000021 after 0.000072 seconds
p = 500000003 after 0.000200 seconds
qili ~/px > factor 5003 50021 500009 5000081 50000021 500000003
5003: 5003
50021: 50021
500009: 500009
5000081: 5000081
50000021: 50000021
500000003: 500000003
Update 1 Of course, timing is not determinate (ie, can vary substantially depending on the value of s, other processes on machine, etc); for example:
qili ~/px > time for i in 000 004 010 058 070 094 100 118 184; do ./divide-timing 500000$i; done
p = 500000003 after 0.000201 seconds
p = 500000009 after 0.000201 seconds
p = 500000057 after 0.000235 seconds
p = 500000069 after 0.000394 seconds
p = 500000093 after 0.000200 seconds
p = 500000099 after 0.000201 seconds
p = 500000117 after 0.000201 seconds
p = 500000183 after 0.000211 seconds
p = 500000201 after 0.000223 seconds
real 0m0.011s
user 0m0.002s
sys 0m0.004s

Consider using a double hash function to get a better distribution inside the matrix,
but given that you cannot avoid colisions, what I suggest is to use an array of sentinels
and mark the positions you visit, this way you are sure you get to visit a cell once.

Related

Sorting/Optimization problem with rows in a pandas dataframe [duplicate]

So if I was given a sorted list/array i.e. [1,6,8,15,40], the size of the array, and the requested number..
How would you find the minimum number of values required from that list to sum to the requested number?
For example given the array [1,6,8,15,40], I requested the number 23, it would take 2 values from the list (8 and 15) to equal 23. The function would then return 2 (# of values). Furthermore, there are an unlimited number of 1s in the array (so you the function will always return a value)
Any help is appreciated
The NP-complete subset-sum problem trivially reduces to your problem: given a set S of integers and a target value s, we construct set S' having values (n+1) xk for each xk in S and set the target equal to (n+1) s. If there's a subset of the original set S summing to s, then there will be a subset of size at most n in the new set summing to (n+1) s, and such a set cannot involve extra 1s. If there is no such subset, then the subset produced as an answer must contain at least n+1 elements since it needs enough 1s to get to a multiple of n+1.
So, the problem will not admit any polynomial-time solution without a revolution in computing. With that disclaimer out of the way, you can consider some pseudopolynomial-time solutions to the problem which work well in practice if the maximum size of the set is small.
Here's a Python algorithm that will do this:
import functools
S = [1, 6, 8, 15, 40] # must contain only positive integers
#functools.lru_cache(maxsize=None) # memoizing decorator
def min_subset(k, s):
# returns the minimum size of a subset of S[:k] summing to s, including any extra 1s needed to get there
best = s # use all ones
for i, j in enumerate(S[:k]):
if j <= s:
sz = min_subset(i, s-j)+1
if sz < best: best = sz
return best
print min_subset(len(S), 23) # prints 2
This is tractable even for fairly large lists (I tested a random list of n=50 elements), provided their values are bounded. With S = [random.randint(1, 500) for _ in xrange(50)], min_subset(len(S), 8489) takes less than 10 seconds to run.
There may be a simpler solution, but if your lists are sufficiently short, you can just try every set of values, i.e.:
1 --> Not 23
6 --> Not 23
...
1 + 6 = 7 --> Not 23
1 + 8 = 9 --> Not 23
...
1 + 40 = 41 --> Not 23
6 + 8 = 14 --> Not 23
...
8 + 15 = 23 --> Oh look, it's 23, and we added 2 values
If you know your list is sorted, you can skip some tests, since if 6 + 20 > 23, then there's no need to test 6 + 40.

SQL Max Consecutive Values in a number set using recursion

The following SQL query is supposed to return the max consecutive numbers in a set.
WITH RECURSIVE Mystery(X,Y) AS (SELECT A AS X, A AS Y FROM R)
UNION (SELECT m1.X, m2.Y
FROM Mystery m1, Mystery m2
WHERE m2.X = m1.Y + 1)
SELECT MAX(Y-X) + 1 FROM Mystery;
This query on the set {7, 9, 10, 14, 15, 16, 18} returns 3, because {14 15 16} is the longest chain of consecutive numbers and there are three numbers in that chain. But when I try to work through this manually I don't see how it arrives at that result.
For example, given the number set above I could create two columns:
m1.x
m2.y
7
7
9
9
10
10
14
14
15
15
16
16
18
18
If we are working on rows and columns, not the actual data, as I understand it WHERE m2.X = m1.Y + 1 takes the value from the next row in Y and puts it in the current row of X, like so
m1.X
m2.Y
9
7
10
9
14
10
15
14
16
15
18
16
18
Null?
The main part on which I am uncertain is where in the SQL recursion actually happens. According to Denis Lukichev recursion is the R part - or in this case the RECURSIVE Mystery(X,Y) - and stops when the table is empty. But if the above is true, how would the table ever empty?
Since I don't know how to proceed with the above, let me try a different direction. If WHERE m2.X = m1.Y + 1 is actually a comparison, the result should be:
m1.X
m2.Y
14
14
15
15
16
16
But at this point, it seems that it should continue recursively on this until only two rows are left (nothing else to compare). If it stops here to get the correct count of 3 rows (2 + 1), what is actually stopping the recursion?
I understand that for the above example the MAX(Y-X) + 1 effectively returns the actual number of recursion steps and adds 1.
But if I have 7 consecutive numbers and the recursion flows down to 2 rows, should this not end up with an incorrect 3 as the result? I understand recursion in C++ and other languages, but this is confusing to me.
Full disclosure, yes it appears this is a common university question, but I am retired, discovered this while researching recursion for my use, and need to understand how it works to use similar recursion in my projects.
Based on this db<>fiddle shared previously, you may find it instructive to alter the CTE to include an iteration number as follows, and then to show the content of the CTE rather than the output of final SELECT. Here's an amended CTE and its content after the recursion is complete:
Amended CTE
WITH RECURSIVE Mystery(X,Y) AS ((SELECT A AS X, A AS Y, 1 as Z FROM R)
UNION (SELECT m1.X, m2.A, Z+1
FROM Mystery m1
JOIN R m2 ON m2.A = m1.Y + 1))
CTE Content
x
y
z
7
7
1
9
9
1
10
10
1
14
14
1
15
15
1
16
16
1
18
18
1
9
10
2
14
15
2
15
16
2
14
16
3
The Z field holds the iteration count. Where Z = 1 we've simply got the rows from the table R. The, values X and Y are both from the field A. In terms of what we are attempting to achieve these represent sequences consecutive numbers, which start at X and continue to (at least) Y.
Where Z = 2, the second iteration, we find all the rows first iteration where there is a value in R which is one higher than our Y value, or one higher than the last member of our sequence of consecutive numbers. That becomes the new highest number, and we add one to the number of iterations. As only three numbers in our original data set have successors within the set, there are only three rows output in the second iteration.
Where Z = 3, the third iteration, we find all the rows of the second iteration (note we are not considering all the rows of the first iteration again), where there is, again, a value in R which is one higher than our Y value, or one higher than the last member of our sequence of consecutive numbers. That, again, becomes the new highest number, and we add one to the number of iterations.
The process will attempt a fourth iteration, but as there are no rows in R where the value is one more than the Y values from our third iteration, no extra data gets added to the CTE and recursion ends.
Going back to the original db<>fiddle, the process then searches our CTE content to output MAX(Y-X) + 1, which is the maximum difference between the first and last values in any consecutive sequence, plus one. This finds it's value from the record produced in the third iteration, using ((16-14) + 1) which has a value of 3.
For this specific piece of code, the output is always equivalent to the value in the Z field as every addition of a row through the recursion adds one to Z and adds one to Y.

The King's March

You’re given a chess board with dimension n x n. There’s a king at the bottom right square of the board marked with s. The king needs to reach the top left square marked with e. The rest of the squares are labeled either with an integer p (marking a point) or with x marking an obstacle. Note that the king can move up, left and up-left (diagonal) only. Find the maximum points the king can collect and the number of such paths the king can take in order to do so.
Input Format
The first line of input consists of an integer t. This is the number of test cases. Each test case contains a number n which denotes the size of board. This is followed by n lines each containing n space separated tokens.
Output Format
For each case, print in a separate line the maximum points that can be collected and the number of paths available in order to ensure maximum, both values separated by a space. If e is unreachable from s, print 0 0.
Sample Input
3
3
e 2 3
2 x 2
1 2 s
3
e 1 2
1 x 1
2 1 s
3
e 1 1
x x x
1 1 s
Sample Output
7 1
4 2
0 0
Constraints
1 <= t <= 100
2 <= n <= 200
1 <= p <= 9
I think this problem could be solved using dynamic-programing. We could use dp[i,j] to calculate the best number of points you can obtain by going from the right bottom corner to the i,j position. We can calculate dp[i,j], for a valid i,j, based on dp[i+1,j], dp[i,j+1] and dp[i+1,j+1] if this are valid positions(not out of the matrix or marked as x) and adding them the points obtained in the i,j cell. You should start computing from the bottom right corner to the left top, row by row and beginning from the last column.
For the number of ways you can add a new matrix ways and use it to store the number of ways.
This is an example code to show the idea:
dp[i,j] = dp[i+1,j+1] + board[i,j]
ways[i,j] = ways[i+1,j+1]
if dp[i,j] < dp[i+1,j] + board[i,j]:
dp[i,j] = dp[i+1,j] + board[i,j]
ways[i,j] = ways[i+1,j]
elif dp[i,j] == dp[i+1,j] + board[i,j]:
ways[i,j] += ways[i+1,j]
# check for i,j+1
This assuming all positions are valid.
The final result is stored in dp[0,0] and ways[0,0].
Brief Overview:
This problem can be solved through recursive method call, starting from nn till it reaches 00 which is the king's destination.
For the detailed explanation and the solution for this problem,check it out here -> https://www.callstacker.com/detail/algorithm-1

Why does Perl 6 try to evaluate an infinite list only in one of two similar situations?

Suppose I define a lazy, infinite array using a triangular reduction at the REPL, with a single element pasted onto the front:
> my #s = 0, |[\+] (1, 2 ... *)
[...]
I can print out the first few elements:
> #s[^10]
(0 1 3 6 10 15 21 28 36 45)
I'd like to move the zero element inside the reduction like so:
> my #s = [\+] (0, |(1, 2 ... *))
However, in response to this, the REPL hangs, presumably by trying to evaluate the infinite list.
If I do it in separate steps, it works:
> my #s = 0, |(1, 2 ... *)
[...]
> ([\+] #s)[^10]
(0 1 3 6 10 15 21 28 36 45)
Why doesn't the way that doesn't work...work?
Short answer:
It is probably a bug.
Long answer:
(1, 2 ... *) produces a lazy sequence because it is obviously infinite, but somehow that is not making the resulting sequence from being marked as lazy.
Putting a sequence into an array #s causes it to be eagerly evaluated unless it is marked as being lazy.
Quick fix:
Append lazy to the front.
> my #s = [\+] lazy 0, |(1, 2 ... *)
[...]
> #s[^10]
(0 1 3 6 10 15 21 28 36 45)

Calculate amount of combinations with conditions

I'd like to calculate how many different variations of a certain amount of numbers are possible. The number of elements is variable.
Example:
I have 5 elements and each element can vary between 0 and 8. Only the first element is a bit more defined and can only vary between 1 and 8. So far I'd say I have 8*9^4 possibilities. But I have some more conditions. As soon as one of the elements gets zero the next elements should be automatically zero as well.
E.G:
6 5 4 7 8 is ok
6 3 6 8 0 is ok
3 6 7 0 5 is not possible and would turn to 3 6 7 0 0
Would somebody show me how to calculate the amount of combinations for this case and also in general, because I'd like to be able to calculate it also for 4 or 8 or 9 etc. elements. Later on I'd like to calculate this number in VBA to be able give the user a forecast how long my calculations will take.
Since once a 0 is present in the sequence, all remaining numbers in the sequence will also be 0, these are all of the possibilities: (where # below represents any digit from 1 to 8):
##### (accounts for 8^5 combinations)
####0 (accounts for 8^4 combinations)
...
#0000 (accounts for 8^1 combinations)
Therefore, the answer is (in pseudocode):
int sum = 0;
for (int x = 1; x <= 5; x++)
{
sum = sum + 8^x;
}
Or equivalently,
int prod = 0;
for (int x = 1; x <= 5; x++)
{
prod = 8*(prod+1);
}
great thank you.
Sub test()
Dim sum As Single
Dim x As Integer
For x = 1 To 6
sum = sum + 8 ^ x
Next
Debug.Print sum
End Sub
With this code I get exactly 37488. I tried also with e.g. 6 elements and it worked as well. Now I can try to estimate the calculation time