Checkerboard indexing in CUDA - indexing

So, here's the question. I want to do a computation in CUDA where I have a large 1D array (which represents a lattice), I partition it into subarrays of length #part, and I want each thread to do a couple of computations on each subarray.
More specifically, let's say that we have a number of threads, #threads, and a number of blocks, #blocks. The array is of size N = 2 * #part * #threads * #blocks. If we number the subarrays from 1 to 2*#blocks*#threads, we want to first use the #threads*#blocks threads to do computation on the subarrays with an even number and then the same number of threads to do computation on the subarrays with an odd number.
I thought that I could have a local index in each thread which would denote from where it's subarray would start.
So, I used the following index :
localIndex = #part * (2 * threadIdx.x + var) + 2 * #part * #Nthreads * blockIdx.x;
var is either 1 or 0, depending on if we want to have the thread do computation on an subarray with an even or an odd number.
I've tried to run it and it seems that something goes wrong when I use more than one blocks. Have I done something wrong with the indexing?
Thanks.

Why is it important that the threads collectively do first even, then the odd subarrays, since block and thread execution is not guaranteed to be in order there is no benefit?
Assuming you index only using x-dimension for your kernel dimension setup:
subArrayIndexEven = 2 * (blockIdx.x * blockDim.x + threadIdx.x) * part
subArrayIndexOdd = subArrayIndexEven + part
Prove:
BLOCK_SIZE = 3
NUM_OF_BLOCKS = 2
PART = 4
N = 2 * 3 * 2 * 4 = 48
T(threadIdx.x, blockIdx.x)
T(0, 1) -> even = 2 * (1 * 3 + 0) * 4 = 24, odd = 28
T(1, 1) -> even = 2 * (1 * 3 + 1) * 4 = 32, odd = 36
T(2, 1) -> even = 2 * (1 * 3 + 2) * 4 = 40, odd = 44

idx = threads_per_block*blockIdx.x + threadIdx.x;
int my_even_offset, my_odd_offset, my_even_idx, my_odd_idx;
int my_offset = floor(float(idx)/float(num_part));
my_even_offset = 2*my_offset*num_part;
my_odd_offset = (2*my_offset+1)*num_part;
my_even_idx = idx + my_even_offset;
my_odd_idx = idx + my_odd_offset;
//Do stuff with the indices.

Related

How do I properly calculate this time complexity

I'm examining this code as preparation for my test and I'm having some problems figuring out what is the correct time complexity:
a = 1;
while (a < n) {
b = 1;
while (b < a^2)
b++;
a = a*2;
}
The values for a are as follows : 1, 2, 4, 8, ... , 2^(logn) = n
Therefore we have logn iterations for the outer loop.
In every nested loop, there are a^2 iterations, so basically what I've come up with is:
T(n) = 1 + 4 + 16 + 64 + ... + (2^logn)^2
I'm having problems finding the general term of this series and therefore getting to a final result.
(maybe due to being completely off in my calculations though)
Would appreciate any help, thank you.
Your calculations are alright, you are correct with your analysis of the inner while-loop. We can demonstrate this with a small table:
This is basically the sum of a geometric progression with:
a1 = 1, q = 4, #n = lg n + 1
Where #n is the number of elements.
We have: Sn = 1 * (4^(lgn +1) - 1)/(4-3) = (4*n^2 - 1)/3
Thus we can say your code running complexity is Θ(n^2)
Mathematical explanation in LaTeX:

Maximise the array sum modulo M under the given condition

Given an array=[a, b, c, ...].
You have to find the maximum value of [a * k1 + b * k2 + c * k3 +
...]%M. Where k1,k2,k3.. are any desired non-negative integers you
can choose. M is known.
Language- C++
Example
Input
arr=[7, 3, 2].
M=10
Output
9
Explanation
You can choose to make the array [7 * 1 + 3 * 0 + 2 * 1 ] % 10.
Here k1 = 1, k2 = 0, k3 = 1 .
So you get 9 as the answer(max value).
Edit-
I am trying for a C++ Solution.
My attempt:
I know that the ans will range from 0 to M-1. But I'm not getting the idea how to proceed.
Edit
My attempt
int ans=arr[0];
for(int i=1; i<arr.length(); i++){
ans=max((ans+arr[i])%M,ans);
}
return ans;
Here I traverse the array from left to right going on updating the ans. Is this correct?

Understanding Remainder operator

Just doing some basic modulo operations and trying to wrap my head around the below operations with questions marks.
0%5 // 0 - Totally understand
1%5 // 1 ?
2%5 // 2 ?
3%5 // 3 ?
4%5 // 4 ?
5%5 // 0 - Totally understand
Perhaps I'm thinking in the wrong way. For example 1/5 would return a Double of 0.2 and not a single integer so how does it return a remainder of 1?
I understand these. It makes sense but the above I can't wrap my head around.
9%4 // 1
10%2 // 0
10%6 // 4
Be great if someone could explain this. Seems I'm having a brain fart. Source of learning.
From the same Basic Operators page that you link to:
The remainder operator (a % b) works out how many multiples of b will fit inside a and returns the value that is left over (known as the remainder).
Specifically for 1 % 5:
5 doesn't fit in 1, so it fits 0 times.
This means that 1 can be described as
1 = (5 * multiplier) + remainder
Since the multiplier is 0, the remainder is 1
1 = (5 * 0) + remainder
1 = remainder
If we instead look at 6 % 5 the remainder is also 1. This is because 5 fit in 6 one time:
6 = (5 * multiplier) + remainder
6 = (5 * 1) + remainder
6-5 = remainder
1 = remainder
This / the division operator when you say 1/5 if division is in integer it'll give 0 , but this 1.0/0.5 when you make it in Double , it'll give 0.2
but % the modulo operator when you say 1%5 = 1 because you have 1 = 0*5 + 1 which means that 1 has zero number of 5 and the reminder is 1

Running time of nested while loops

Function f(n)
s = 0
i = 1
while i < 7n^1/2 do
j = i
while j > 5 do
s = s + i -j
j = j -2
end
i = 5i
end
return s
end f
I am trying to solve the running time for big theta with the code above. I have been looking all over the place for something to help me with an example, but everything is for loops or only one while loop. How would you go about this problem with nested while loops?
Let's break this down into two key points:
i starts from 1, and is self-multiplied by 5, until it is greater than or equal to 7 sqrt(n). This is an exponential increase with logarithmic number of steps. Thus we can change the code to the following equivalent:
m = floor(log(5, 7n^(1/2)))
k = 0
while k < m do
j = 5^k
// ... inner loop ...
end
For each iteration of the outer loop, j starts from i, and decreases in steps of 2, until it is less than or equal to 5. Note that in the first execution of the outer loop i = 1, and in the second i = 5, so the inner loop is not executed until the third iteration. The loop limit means that the final value of j is 7 if k is odd, and 6 if even (you can check this with pen and paper).
Combining the above steps, we arrive at:
First loop will do 7 * sqrt(n) iterations. Exponent 1/2 is the same as sqrt() of a number.
Second loop will run m - 2 times since first two values of i are 1 and 5 respectively, not passing the comparison.
i is getting an increment of 5i.
Take an example where n = 16:
i = 1, n = 16;
while( i < 7 * 4; i *= 5 )
//Do something
First value of i = 1. It runs 1 time. Inside loop will run 0 times.
Second value of i = 5. It runs 2 times. Inside loop will run 0 times.
Third value of i = 25. It runs 3 times. Inside loop will run 10 times.
Fourth value of i = 125. It stops.
Outer iterations are n iterations while inner iterations are m iterations, which gives O( 7sqrt(n) * (m - 2) )
IMO, is complex.

Shuffle data in a repeatable way (ability to get the same "random" order again)

This is the opposite of what most "random order" questions are about.
I want to select data from a database in random order. But I want to be able to repeat certain selects, getting the same order again.
Current (random) select:
SELECT custId, rand() as random from
(
SELECT DISTINCT custId FROM dummy
)
Using this, every key/row gets a random number. Ordering those ascending results in a random order.
But I want to repeat this select, getting the very same order again. My idea is to calculate a random number (r) once per session (e.g. "4") and use this number to shuffle the data in some way.
My first idea:
SELECT custId, custId * 4 as random from
(
SELECT DISTINCT custId FROM dummy
)
(in real life "4" would be something like 4005226664240702)
This results in a different number for each line but the same ones every run. By changing "r" to 5 all numbers will change.
The problem is: multiplication is not sufficient here. It just increases the numbers but keeps the order the same. Therefore I need some other kind of arithmetic function.
More abstract
Starting with my data (A-D). k is the key and r is the random number currently used:
k r
A = 1 4
B = 2 4
C = 3 4
D = 4 4
Doing some calculation using k and r in every line I want to get something like:
k r
A = 1 4 --> 12
B = 2 4 --> 13
C = 3 4 --> 11
D = 4 4 --> 10
The numbers can be whatever they want, but when I order them ascending I want to get a different order than the initial one. In this case D, C, A, B, E.
Setting r to 7 should result in a different order (C, A, B, D):
k r
A = 1 7 --> 56
B = 2 7 --> 78
C = 3 7 --> 23
D = 4 7 --> 80
Every time I use r = 7 should result in the same numbers => same order.
I'm looking for a mathematical function to do the calculation with k and r. Seeding the RAND() function is not suitable because it's not supported by some databases we support
Please note that r is already a randomly generated number
Background
One Table - Two data consumers. One consumer will get random 5% of the table, the other one the other 95%. They don't just get the data but a generated SQL. So there are two SQL's which must not select the same data twice but still random.
You could try and implement the Multiply-With-Carry PseudoRandomNumberGenerator. The C version goes like this (source: Wikipedia):
m_w = <choose-initializer>; /* must not be zero, nor 0x464fffff */
m_z = <choose-initializer>; /* must not be zero, nor 0x9068ffff */
uint get_random()
{
m_z = 36969 * (m_z & 65535) + (m_z >> 16);
m_w = 18000 * (m_w & 65535) + (m_w >> 16);
return (m_z << 16) + m_w; /* 32-bit result */
}
In SQL, you could create a table Random, with two columns to contain w and z, and one ID column to identify each session. Perhaps your vendor supports variables and you need not bother with the table.
Nonetheless, even if we use a table, we immediately run into trouble cause ANSI SQL doesn't support unsigned INTs. In SQL Server I could switch to BIGINT, unsure if your vendor supports that.
CREATE TABLE Random (ID INT, [w] BIGINT, [z] BIGINT)
Initialize a new session, say number 3, by inserting 1 into z and the seed into w:
INSERT INTO Random (ID, w, z) VALUES (3, 8921, 1);
Then each time you wish to generate a new random number, do the computations:
UPDATE Random
SET
z = (36969 * (z % 65536) + z / 65536) % 4294967296,
w = (18000 * (w % 65536) + w / 65536) % 4294967296
WHERE ID = 3
(Note how I have replaced bitwise operands with div and mod operations and how, after computing, you need to mod 4294967296 to stay within the proper 32 bits unsigned int range.)
And select the new value:
SELECT(z * 65536 + w) % 4294967296
FROM Random
WHERE ID = 3
SQLFiddle demo
Not sure if this applies in non-SQL Server, but typically when you use a RAND() function, you can specify a seed. Everytime you specify the same seed, the randomization will be the same.
So, it sounds like you just need to store the seed number and use that each time to get the same set of random numbers.
MSDN Article on RAND
Each vendor has solved this in its own way. Creating your own implementation will be hard, since random number generation is difficult.
Oracle
dbms_random can be initialized with a seed: http://docs.oracle.com/cd/B19306_01/appdev.102/b14258/d_random.htm#i998255
SQL Server
First call to RAND() can provide a seed: http://technet.microsoft.com/en-us/library/ms177610.aspx
MySql
First call to RAND() can provide a seed: http://dev.mysql.com/doc/refman/4.1/en/mathematical-functions.html#function_rand
Postgresql
Use SET SEED or SELECT setseed() : http://www.postgresql.org/docs/8.3/static/sql-set.html