Understanding the DFS portion of Number of Islands in Kotlin - kotlin

Here's the question:
"Given an m x n 2D binary grid grid which represents a map of '1's (land) and '0's (water), return the number of islands.
An island is surrounded by water and is formed by connecting adjacent lands horizontally or vertically. You may assume all four edges of the grid are all surrounded by water.
Example 1:
Input: grid = [
["1","1","1","1","0"],
["1","1","0","1","0"],
["1","1","0","0","0"],
["0","0","0","0","0"]
]
Output: 1
Example 2:
Input: grid = [
["1","1","0","0","0"],
["1","1","0","0","0"],
["0","0","1","0","0"],
["0","0","0","1","1"]
]
Output: 3 "
So this was an answer I saw for the question "Number of Islands". I get most of the code except for the last part of it (the directional recursion part)
fun numIslands(grid: Array<CharArray>): Int {
var count = 0
for (i in grid.indices){
for (j in grid[0].indices){
if(grid[i][j] == '1'){
dfs(grid, i, j)
count++
}
}
}
return count
}
private fun dfs(grid: Array<CharArray>, i: Int, j: Int){
if(i < 0 || j < 0 || i >= grid.size|| j >= grid[0].size || grid[i][j] == '0'){
return
}
//directional recursion
grid[i][j] = '0'
dfs(grid, i + 1, j)
dfs(grid, i, j + 1)
dfs(grid, i - 1, j)
dfs(grid, i, j - 1)
}
My question is what is going on in that part? Is the recursive call accounting for all sides of the 2D array surrounding the given char? Or is it something else entirely? Any help is appreciated. Thank You.

The nested loops in numIslands() identify one cell from each island, and then call dfs() to remove that whole island from the grid. (So that the next land cell it finds will be a different island, and it's counting whole islands.)
dfs() works by setting the given cell to 0 (water), and then recursively looking at the four adjacent cells. (Which then go on to remove their adjacent cells, and so on, sweeping outward until it has removed the entire island.) The clever bit there is that the if stops it when it hits water (or the edge of the grid) — which means that although it will revisit cells it has already visited, by that point they'll be water, and so it'll ignore them and not keep going over the same cells endlessly.
(Whenever you write recursive code, you always need to be thinking about how it terminates — if not, there's a real risk that it won't! In this case, recursion only happens for cells that were land, and only after setting them to water. Since the number of land cells is always reducing, and since it can't go below zero, it's guaranteed to terminate after a finite number of steps.)
dfs() is not a very helpful name! If you were writing this code, I'd suggest renaming it to something more meaningful, such as removeIslandAt(). (I'd also suggest making it an extension function on Array<CharArray>.)
To understand code like this, I always find it useful to be able to see what's going on. You might write a little function to display the current state of the grid; then you could call that at the start of dfs() (along with displaying the i and j co-ordinates). That should make it much clearer how it works. (If you really wanted, you could make an animation out of it, which would be clearer still!)

Related

Ranking Big O Functions By Complexity

I am trying to rank these functions — 2n, n100, (n + 1)2, n·lg(n), 100n, n!, lg(n), and n99 + n98 — so that each function is the big-O of the next function, but I do not know a method of determining if one function is the big-O of another. I'd really appreciate if someone could explain how I would go about doing this.
Assuming you have some programming background. Say you have below code:
void SomeMethod(int x)
{
for(int i = 0; i< x; i++)
{
// Do Some Work
}
}
Notice that the loop runs for x iterations. Generalizing, we say that you will get the solution after N iterations (where N will be the value of x ex: number of items in array/input etc).
so This type of implementation/algorithm is said to have Time Complexity of Order of N written as O(n)
Similarly, a Nested For (2 Loops) is O(n-squared) => O(n^2)
If you have Binary decisions made and you reduce possibilities into halves and pick only one half for solution. Then complexity is O(log n)
Found this link to be interesting.
For: Himanshu
While the Link explains how log(base2)N complexity comes into picture very well, Lets me put the same in my words.
Suppose you have a Pre-Sorted List like:
1,2,3,4,5,6,7,8,9,10
Now, you have been asked to Find whether 10 exists in the list. The first solution that comes to mind is Loop through the list and Find it. Which means O(n). Can it be made better?
Approach 1:
As we know that List of already sorted in ascending order So:
Break list at center (say at 5).
Compare the value of Center (5) with the Search Value (10).
If Center Value == Search Value => Item Found
If Center < Search Value => Do above steps for Right Half of the List
If Center > Search Value => Do above steps for Left Half of the List
For this simple example we will find 10 after doing 3 or 4 breaks (at: 5 then 8 then 9) (depending on how you implement)
That means For N = 10 Items - Search time was 3 (or 4). Putting some mathematics over here;
2^3 + 2 = 10 for simplicity sake lets say
2^3 = 10 (nearly equals --- this is just to do simple Logarithms base 2)
This can be re-written as:
Log-Base-2 10 = 3 (again nearly)
We know 10 was number of items & 3 was the number of breaks/lookup we had to do to find item. It Becomes
log N = K
That is the Complexity of the alogorithm above. O(log N)
Generally when a loop is nested we multiply the values as O(outerloop max value * innerloop max value) n so on. egfor (i to n){ for(j to k){}} here meaning if youll say for i=1 j=1 to k i.e. 1 * k next i=2,j=1 to k so i.e. the O(max(i)*max(j)) implies O(n*k).. Further, if you want to find order you need to recall basic operations with logarithmic usage like O(n+n(addition)) <O(n*n(multiplication)) for log it minimizes the value in it saying O(log n) <O(n) <O(n+n(addition)) <O(n*n(multiplication)) and so on. By this way you can acheive with other functions as well.
Approach should be better first generalised the equation for calculating time complexity. liken! =n*(n-1)*(n-2)*..n-(n-1)so somewhere O(nk) would be generalised formated worst case complexity like this way you can compare if k=2 then O(nk) =O(n*n)

Trade off between Linear and Binary Search

I have a list of elements to be searched in a dataset of variable lengths. I have tried binary search and I found it is not always efficient when the objective is to search a list of elements.
I did the following study and conclude that if the number of elements to be searched is less than 5% of the data, binary search is efficient, other wise the Linear search is better.
Below are the details
Number of elements : 100000
Number of elements to be searched: 5000
Number of Iterations (Binary Search) =
log2 (N) x SearchCount=log2 (100000) x 5000=83048
Further increase in the number of search elements lead to more iterations than the linear search.
Any thoughts on this?
I am calling the below function only if the number elements to be searched is less than 5%.
private int SearchIndex(ref List<long> entitylist, ref long[] DataList, int i, int len, ref int listcount)
{
int Start = i;
int End = len-1;
int mid;
while (Start <= End)
{
mid = (Start + End) / 2;
long target = DataList[mid];
if (target == entitylist[listcount])
{
i = mid;
listcount++;
return i;
}
else
{
if (target < entitylist[listcount])
{
Start = mid + 1;
}
if (target > entitylist[listcount])
{
End = mid - 1;
}
}
}
listcount++;
return -1; //if the element in the list is not in the dataset
}
In the code I retun the index rather than the value because, I need to work with Index in the calling function. If i=-1, the calling function resets the value to the previous i and calls the function again with a new element to search.
In your problem you are looking for M values in an N long array, N > M, but M can be quite large.
Usually this can be approached as M independent binary searches (or even with the slight optimization of using the previous result as a starting point): you are going to O(M*log(N)).
However, using the fact that also the M values are sorted, you can find all of them in one pass, with linear search. In this case you are going to have your problem O(N). In fact this is better than O(M*log(N)) for M large.
But you have a third option: since M values are sorted, binary split M too, and every time you find it, you can limit the subsequent searches in the ranges on the left and on the right of the found index.
The first look-up is on all the N values, the second two on (average) N/2, than 4 on N/4 data,.... I think that this scale as O(log(M)*log(N)). Not sure of it, comments welcome!
However here is a test code - I have slightly modified your code, but without altering its functionality.
In case you have M=100000 and N=1000000, the "M binary search approach" takes about 1.8M iterations, that's more that the 1M needed to scan linearly the N values. But with what I suggest it takes just 272K iterations.
Even in case the M values are very "collapsed" (eg, they are consecutive), and the linear search is in the best condition (100K iterations would be enough to get all of them, see the comments in the code), the algorithm performs very well.

Find all pairs of consecutive numbers in BST

I need to write a code that will find all pairs of consecutive numbers in BST.
For example: let's take the BST T with key 9, T.left.key = 8, T.right.key = 19. There is only one pair - (8, 9).
The naive solution that I thought about is to do any traversal (pre, in, post) on the BST and for each node to find its successor and predecessor, and if one or two of them are consecutive to the node - we'll print them. But the problem is that it'll will the O(n^2), because we have n nodes and for each one of them we use function that takes O(h), that in the worst case h ~ n.
Second solution is to copy all the elements to an array, and to find the consecutive numbers in the array. Here we use O(n) additional space, but the runtime is better - O(n).
Can you help me to find an efficient algorithm to do it? I'm trying to think about algorithm that don't use additional space, and its runtime is better than O(n^2)
*The required output is the number of those pairs (No need to print the pairs).
*any 2 consecutive integers in the BST is a pair.
*The BST containts only integers.
Thank you!
Why don't you just do an inorder traversal and count pairs on the fly? You'll need a global variable to keep track of the last number, and you'll need to initialize it to something which is not one less than the first number (e.g. the root of the tree). I mean:
// Last item
int last;
// Recursive function for in-order traversal
int countPairs (whichever_type treeRoot)
{
int r = 0; // Return value
if (treeRoot.leftChild != null)
r = r + countPairs (treeRoot.leftChild);
if (treeRoot.value == last + 1)
r = r + 1;
last = treeRoot.value;
if (treeRoot.rightChild != null)
r = r + countPairs (treeRoot.rightChild);
return r; // Edit 2016-03-02: This line was missing
}
// Main function
main (whichever_type treeRoot)
{
int r;
if (treeRoot == null)
r = 0;
else
{
last = treeRoot.value; // to make sure this is not one less than the lowest element
r = countPairs (treeRoot);
}
// Done. Now the variable r contains the result
}

Please explain this code for Merkle–Hellman knapsack cryptosystem?

This is the code snippet from a program that implements Merkle–Hellman knapsack cryptosystem.
// Generates keys based on input data size
private void generateKeys(int inputSize)
{
// Generating values for w
// This first value of the private key (w) is set to 1
w.addNode(new BigInteger("1"));
for (int i = 1; i < inputSize; i++)
{
w.addNode(nextSuperIncreasingNumber(w));
}
// Generate value for q
q = nextSuperIncreasingNumber(w);
// Generate value for r
Random random = new Random();
// Generate a value of r such that r and q are coprime
do
{
r = q.subtract(new BigInteger(random.nextInt(1000) + ""));
}
while ((r.compareTo(new BigInteger("0")) > 0) && (q.gcd(r).intValue() != 1));
// Generate b such that b = w * r mod q
for (int i = 0; i < inputSize; i++)
{
b.addNode(w.get(i).getData().multiply(r).mod(q));
}
}
Just tell me what is going on in the following lines:
do
{
r = q.subtract(new BigInteger(random.nextInt(1000) + ""));
}
while ((r.compareTo(new BigInteger("0")) > 0) && (q.gcd(r).intValue() != 1));
(1) Why is random number generated with upper bound 1000?
(2) Why is it subtracted from q?
The code is searching for a value that is co-prime with the already selected value q. In my opinion, it's doing so rather poorly, but you mention it's a simulator? I'm not sure what that means, but maybe it just means the code is quick and dirty rather than slow and secure.
Answering your questions directly:
Why is random number generated with upper bound 1000?
The Merkle-Hellman algorithm does indicate that r should be 'random'. The implementation for doing so is pretty haphazard; that might be what's thrown you off. The code is not technically an algorithm because the loop is not guaranteed to terminate. In theory, the psuedo-random candidate selection of r could be an arbitrarily long sequence of numbers which aren't co-prime to q, resulting in an infinite loop.
The upper bound of 1000 could be to ensure that the chosen r is sufficiently large. In general, large keys are harder to break than small keys, so if q is large, then this code will only find large r.
A more deterministic way to get a random co-prime would be to test each number lower than q, generating a list of co-primes and select one at random. This would probably be more secure, as an attacker knowing that q and r are within 1000 of each other would have a significantly reduced search space.
Why is it subtracted from q?
The subtraction is important because r must be less than q. The Merkle-Hellmen algorithm specifies it that way. I'm not convince that it needs to be that way. The public key is generated by multiplying each element in w by r and taking the modulus q. If r were very large, larger than q, it seems like it would further obfuscate q and each element in w.
The decryption step of Merkle-Hellmen, on the other hand, depends on the modular inverse of each encrypted letter a x r−1 mod q. This operation might be hampered by having r > q; it seems like it could still work out.
However, if nextInt can return 0, that iteration of the loop is a waste as a q and r must be different (gcd(a,a) is just a).
Breaking down the code:
do
Try it at least once. r is probably null or undefined before the method is called.
r = q.subtract(new BigInteger(random.nextInt(1000) + ""));
Find a candidate value that's between q and q - 1000.
while ((r.compareTo(new BigInteger("0")) > 0) && (q.gcd(r).intValue() != 1));
Keep going until you've found an r that is:
Greater than 0 r.compareTo(new BigInteger("0")) > 0, and
Is co-prime with q, q.gcd(r).intValue() != 1. Obviously, a randomly selected number is not guaranteed to be co-prime with another other number, so the randomly generated candidate might not be work for this q.
Does that clear it up? I have to admit that I'm not an expert on Merkle-Hellman.

Algorithm - find the minimal subtraction between sum of two arrays

I am hunting job now and doing many algorithm exercises. Here is my problem:
Given two arrays: a and b with same length, the subject is to make |sum(a)-sum(b)| minimal, by swapping elements between a and b.
Here is my though:
assume we swap a[i] and b[j], set Delt = sum(a) - sum(b), x = a[i]-b[j]
then Delt2 = sum(a)-a[i]+b[j] - (sum(b)-b[j]+a[i]) = Delt - 2*x,
then the change = |Delt| - |Delt2|, which is proportional to |Delt|^2 - |Delt2|^2 = 4*x*(Delt-x),
Based on the thought above I got the following code:
Delt = sum(a) - sum(b);
done = false;
while(!done)
{
done = true;
for i = [0, n)
{
for j = [0,n)
{
x = a[i]-b[j];
change = x*(Delt-x);
if(change >0)
{
swap(a[i], b[j]);
Delt = Delt - 2*x;
done = false;
}
}
}
}
However, does anybody have a much better solution ? If you got, please tell me and I would be very grateful of you!
This problem is basically the optimization problem for Partition Problem with an extra constraint of equal parts. I'll prove that adding this constraint doesn't make the problem easier.
NP-Hardness proof:
Assume there was an algorithm A that solves this problem in polynomial time, we can solve the Partition-Problem in polynomial time.
Partition(S):
for i in range(|S|):
S += {0}
result <- A(S\2,S\2) //arbitrary split S into 2 parts
if result is a partition: //simple to check, since partition is NP.
return true.
return false //no partition
Correctness:
If there is a partition denote as (S1,S2) [assume S2 has more elements], on iteration |S2|-|S1| [i.e. when adding |S2|-|S1| zeros]. The input to A will contatin enough zeros so we can return two equal length arrays: S2,S1+{0,0,...,0}, which will be a partition to S, and the algorithm will yield true.
If the algorithm yields true, and iteration k, we had two arrays: S2,S1, with same number of elements, and equal values. by removing k zeros from the arrays, we get a partition to the original S, so S had a partition.
Polynomial:
assume A takes P(n) time, the algorithm we produced will take n*P(n) time, which is also polynomial.
Conclusion:
If this problem is solveable in polynomial time, so does the Partion-Problem, and thus P=NP. based on this: this problem is NP-Hard.
Because this problem is NP-Hard, for an exact solution you will probably need an exponential algorith. One of those is simple backtracking [I leave it as an exercise to the reader to implement a backtracking solution]
EDIT: as mentioned by #jpalecek: by simply creating a reduction: S->S+(0,0,...,0) [k times 0], one can directly prove NP-Hardness by reduction. polynomial is trivial and correctness is very similar to the above partion's correctness proof: [if there is a partition, adding 'balancing' zeros is possible; the other direction is simply trimming those zeros]
Just a comment. Through all this swapping you can basically arrange the contents of both arrays as you like. So it is unimportant in which array the values are at start.
Can't do it in my head but I'm pretty sure there is a constructive solution. I think if you sort them first and then deal them according to some rule. Something along the lines If value > 0 and if sum(a)>sum(b) then insert to a else into b