VB.NET Array Intersection - vb.net

This could be terribly trivial, but I'm having trouble finding an answer that executes in less than n^2 time. Let's say I have two string arrays and I want to know which strings exist in both arrays. How would I do that, efficiently, in VB.NET or is there a way to do this other than a double loop?

The simple way (assuming no .NET 3.5) is to dump the strings from one array in a hashtable, and then loop through the other array checking against the hashtable. That should be much faster than an n^2 search.

If you sort both arrays, you can then walk through them each once to find all the matching strings.
Pseudo-code:
while(index1 < list1.Length && index2 < list2.Length)
{
if(list1[index1] == list2[index2])
{
// You've found a match
index1++;
index2++;
} else if(list1[index1] < list2[index2]) {
index1++;
} else {
index2++;
}
}
Then you've reduced it to the time it takes to do the sorting.

If one of the arrays is sorted you can do a binary search on it in the inner loop, this will decrease the time to O(n log n)

Sort both lists. Then you can know with certainty that if the next entry in list A is 'cobble' and the next entry in list B is 'definite', then 'cobble' is not in list B. Simply advance the pointer/counter on whichever list has the lower ranked result and ascend the rankings.
For example:
List 1: D,B,M,A,I
List 2: I,A,P,N,D,G
sorted:
List 1: A,B,D,I,M
List 2: A,D,G,I,N,P
A vs A --> match, store A, advance both
B vs D --> B
D vs D --> match, store D, advance both
I vs G --> I>G, advance 2
I vs I --> match, store I, advance both
M vs N --> M
List 1 has no more items, quit.
List of matches is A,D,I
2 list sorts O(n log(n)), plus O(n) comparisons makes this O(n(log(n) + 1)).

Related

How can I improve Insertionsort by the following argument ? The correct answer is b. Can someone CLEARLY explain every answer?

A person claims that they can improve InsertionSort by the following argument. In the innermost loop of InsertionSort, instead of looping over all entries in the already sorted array in order to insert the j’th observed element, simply perform BinarySearch in order to sandwich the j’th element in its correct position in the list A[1, ... , j−1]. This person claims that their resulting insertion sort is asymptotically as good as mergesort in the worst case scenario. True or False and why? Circle the one correct answer from the below:
a. True: In this version, the while loop will iterate log(n), but in each such iteration elements in the left side of the list have to be shifted to allow room for the key to propagate downwards across the median elements and so this shift will still require log(n) in the worst case scenario. Adding up, Insertion Sort will significantly improve in this case to continue to require n log(n) in the worst case scenario like mergesort.
b. False: In this version, the while loop will iterate log(n), but in each such iteration elements in the left side of the list have to be shifted to allow room for the key to propagate downwards and so this shift will still require n in the worst case scenario. Adding up, Insertion Sort will continue to require n² in the worst case scenario which is orders of magnitude worse than mergesort.
c. False: In this version, the while loop will iterate n, but in each such iteration elements in the left side of the list have to be shifted to allow room for the key to propagate downwards and so this shift will still require log(n) in the worst case scenario. Adding up, Insertion Sort will continue to require n log(n) in the worst case scenario which is orders of magnitude worse than mergesort.
d. True: In this version, the while loop will iterate log(n), but in each such iteration elements in the left side of the list have to be shifted to allow room for the key to propagate downwards and so this shift will still require n in the worst case scenario. Adding up, Insertion Sort will continue to require n log(n) in the worst case scenario which is orders of magnitude worse than mergesort.
b is correct, with some assumptions about compiler optimizations.
Consider a reverse sorted array,
8 7 6 5 4 3 2 1
and that insertion sort is half done so it is
5 6 7 8 4 3 2 1
The next step:
normal insertion sort sequence assuming most recent value read kept in register:
t = a[4] = 4 1 read
compare t and a[3] 1 read
a[4] = a[3] = 8 1 write
compare t and a[2] 1 read
a[3] = a[2] = 7 1 write
compare t and a[1] 1 read
a[2] = a[1] = 6 1 write
compare t and a[0] 1 read
a[1] = a[0] = 5 1 write
a[0] = t = 4 1 write
---------------
5 read 5 write
binary search
t = a[4] 1 read
compare t and a[1] 1 read
compare t and a[0] 1 read
a[4] = a[3] 1 read 1 write
a[3] = a[2] 1 read 1 write
a[2] = a[1] 1 read 1 write
a[1] = a[0] 1 read 1 write
a[0] = t 1 write
----------------
7 read 5 write
If a compiler re-read data with normal insertion sort it would be
9 read 5 write
In which case the binary search would save some time.
The expected answer to this question is b), but the explanation is not precise enough:
locating the position where to insert the j-th element indeed requires log(j) comparisons instead of j comparisons for regular Insertion Sort.
inserting the elements requires j element moves in the worst case for both implementations (reverse sorted array).
Summing these over the whole array produces:
n log(n) comparisons for this modified Insertion Sort idea in all cases vs: n2 comparisons in the worst case (already sorted array) for the classic implementation.
n2 element moves in the worst case in both implementations (reverse sorted array).
note that in the classic implementation the sum of the number of comparisons and element moves is constant.
Merge Sort on the other hand uses approximately n log(n) comparisons and n log(n) element moves in all cases.
Therefore the claim the resulting insertion sort is asymptotically as good as mergesort in the worst case scenario is False, indeed because the modified Insertion Sort method still performs n2 element moves in the worst case, which is asymptotically much worse than n log(n) moves.
Note however that depending on the relative cost of comparisons and element moves, the performance of this modified Insertion Sort approach may be much better than the classic implementation, for example sorting an array of string pointers containing URLs to the same site, the cost of comparing strings with a long initial substring is much greater than moving a single pointer.

Ranking Big O Functions By Complexity

I am trying to rank these functions — 2n, n100, (n + 1)2, n·lg(n), 100n, n!, lg(n), and n99 + n98 — so that each function is the big-O of the next function, but I do not know a method of determining if one function is the big-O of another. I'd really appreciate if someone could explain how I would go about doing this.
Assuming you have some programming background. Say you have below code:
void SomeMethod(int x)
{
for(int i = 0; i< x; i++)
{
// Do Some Work
}
}
Notice that the loop runs for x iterations. Generalizing, we say that you will get the solution after N iterations (where N will be the value of x ex: number of items in array/input etc).
so This type of implementation/algorithm is said to have Time Complexity of Order of N written as O(n)
Similarly, a Nested For (2 Loops) is O(n-squared) => O(n^2)
If you have Binary decisions made and you reduce possibilities into halves and pick only one half for solution. Then complexity is O(log n)
Found this link to be interesting.
For: Himanshu
While the Link explains how log(base2)N complexity comes into picture very well, Lets me put the same in my words.
Suppose you have a Pre-Sorted List like:
1,2,3,4,5,6,7,8,9,10
Now, you have been asked to Find whether 10 exists in the list. The first solution that comes to mind is Loop through the list and Find it. Which means O(n). Can it be made better?
Approach 1:
As we know that List of already sorted in ascending order So:
Break list at center (say at 5).
Compare the value of Center (5) with the Search Value (10).
If Center Value == Search Value => Item Found
If Center < Search Value => Do above steps for Right Half of the List
If Center > Search Value => Do above steps for Left Half of the List
For this simple example we will find 10 after doing 3 or 4 breaks (at: 5 then 8 then 9) (depending on how you implement)
That means For N = 10 Items - Search time was 3 (or 4). Putting some mathematics over here;
2^3 + 2 = 10 for simplicity sake lets say
2^3 = 10 (nearly equals --- this is just to do simple Logarithms base 2)
This can be re-written as:
Log-Base-2 10 = 3 (again nearly)
We know 10 was number of items & 3 was the number of breaks/lookup we had to do to find item. It Becomes
log N = K
That is the Complexity of the alogorithm above. O(log N)
Generally when a loop is nested we multiply the values as O(outerloop max value * innerloop max value) n so on. egfor (i to n){ for(j to k){}} here meaning if youll say for i=1 j=1 to k i.e. 1 * k next i=2,j=1 to k so i.e. the O(max(i)*max(j)) implies O(n*k).. Further, if you want to find order you need to recall basic operations with logarithmic usage like O(n+n(addition)) <O(n*n(multiplication)) for log it minimizes the value in it saying O(log n) <O(n) <O(n+n(addition)) <O(n*n(multiplication)) and so on. By this way you can acheive with other functions as well.
Approach should be better first generalised the equation for calculating time complexity. liken! =n*(n-1)*(n-2)*..n-(n-1)so somewhere O(nk) would be generalised formated worst case complexity like this way you can compare if k=2 then O(nk) =O(n*n)

Trade off between Linear and Binary Search

I have a list of elements to be searched in a dataset of variable lengths. I have tried binary search and I found it is not always efficient when the objective is to search a list of elements.
I did the following study and conclude that if the number of elements to be searched is less than 5% of the data, binary search is efficient, other wise the Linear search is better.
Below are the details
Number of elements : 100000
Number of elements to be searched: 5000
Number of Iterations (Binary Search) =
log2 (N) x SearchCount=log2 (100000) x 5000=83048
Further increase in the number of search elements lead to more iterations than the linear search.
Any thoughts on this?
I am calling the below function only if the number elements to be searched is less than 5%.
private int SearchIndex(ref List<long> entitylist, ref long[] DataList, int i, int len, ref int listcount)
{
int Start = i;
int End = len-1;
int mid;
while (Start <= End)
{
mid = (Start + End) / 2;
long target = DataList[mid];
if (target == entitylist[listcount])
{
i = mid;
listcount++;
return i;
}
else
{
if (target < entitylist[listcount])
{
Start = mid + 1;
}
if (target > entitylist[listcount])
{
End = mid - 1;
}
}
}
listcount++;
return -1; //if the element in the list is not in the dataset
}
In the code I retun the index rather than the value because, I need to work with Index in the calling function. If i=-1, the calling function resets the value to the previous i and calls the function again with a new element to search.
In your problem you are looking for M values in an N long array, N > M, but M can be quite large.
Usually this can be approached as M independent binary searches (or even with the slight optimization of using the previous result as a starting point): you are going to O(M*log(N)).
However, using the fact that also the M values are sorted, you can find all of them in one pass, with linear search. In this case you are going to have your problem O(N). In fact this is better than O(M*log(N)) for M large.
But you have a third option: since M values are sorted, binary split M too, and every time you find it, you can limit the subsequent searches in the ranges on the left and on the right of the found index.
The first look-up is on all the N values, the second two on (average) N/2, than 4 on N/4 data,.... I think that this scale as O(log(M)*log(N)). Not sure of it, comments welcome!
However here is a test code - I have slightly modified your code, but without altering its functionality.
In case you have M=100000 and N=1000000, the "M binary search approach" takes about 1.8M iterations, that's more that the 1M needed to scan linearly the N values. But with what I suggest it takes just 272K iterations.
Even in case the M values are very "collapsed" (eg, they are consecutive), and the linear search is in the best condition (100K iterations would be enough to get all of them, see the comments in the code), the algorithm performs very well.

Find all pairs of consecutive numbers in BST

I need to write a code that will find all pairs of consecutive numbers in BST.
For example: let's take the BST T with key 9, T.left.key = 8, T.right.key = 19. There is only one pair - (8, 9).
The naive solution that I thought about is to do any traversal (pre, in, post) on the BST and for each node to find its successor and predecessor, and if one or two of them are consecutive to the node - we'll print them. But the problem is that it'll will the O(n^2), because we have n nodes and for each one of them we use function that takes O(h), that in the worst case h ~ n.
Second solution is to copy all the elements to an array, and to find the consecutive numbers in the array. Here we use O(n) additional space, but the runtime is better - O(n).
Can you help me to find an efficient algorithm to do it? I'm trying to think about algorithm that don't use additional space, and its runtime is better than O(n^2)
*The required output is the number of those pairs (No need to print the pairs).
*any 2 consecutive integers in the BST is a pair.
*The BST containts only integers.
Thank you!
Why don't you just do an inorder traversal and count pairs on the fly? You'll need a global variable to keep track of the last number, and you'll need to initialize it to something which is not one less than the first number (e.g. the root of the tree). I mean:
// Last item
int last;
// Recursive function for in-order traversal
int countPairs (whichever_type treeRoot)
{
int r = 0; // Return value
if (treeRoot.leftChild != null)
r = r + countPairs (treeRoot.leftChild);
if (treeRoot.value == last + 1)
r = r + 1;
last = treeRoot.value;
if (treeRoot.rightChild != null)
r = r + countPairs (treeRoot.rightChild);
return r; // Edit 2016-03-02: This line was missing
}
// Main function
main (whichever_type treeRoot)
{
int r;
if (treeRoot == null)
r = 0;
else
{
last = treeRoot.value; // to make sure this is not one less than the lowest element
r = countPairs (treeRoot);
}
// Done. Now the variable r contains the result
}

How to write an objective C code to check if the sum of any three numbers in an array/list matches a given number? [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Finding three elements in an array whose sum is closest to an given number
How can I write an Objective C code to check if the sum of any three numbers in an array/list matches a given number?
step 1: sort, O(nlgn)
step 2: iterate every number, say A,(this costs O(n)), then check whether the sum of any two numbers equals to the given number minus A(this is a classic problem which costs O(n))
total complexity: O(n^2)
Here is another way
X,Y,Z are indices of array and P is given Number .
If conditions is X+Y=P
then we sort the array
and then We pick each element and then search P-Y in remaining array .If searching is successful then fine are else return False .
So searching takes log(n) time(binary search) so for n elements it takes O(nlog(n)) time .
Now Our condition is X+Y+Z=P
We deduce it to X+Y=P-Z
Now Pick a number Z and calculate P-Z and let it be R .
Now the problem is deduce to X+Y=R .So time complexity is O(nlog(n))
Since R varies n times for n picks in array so complexity is O((N^2)log(n))) .
Here's a brute-force solution in Python, valuable only for its succinctness, not at all for its efficiency:
import itertools
def anyThreeEqualTo(list, value):
return any([sum(c) == value for c in itertools.combinations(list, 3)])
Another idea:
import itertools
def anyThreeEqualTo(list, value):
for c in itertools.combinations(list, 3)])
if sum(c) == value:
return True
return False
These solutions try each of the triplets in turn until one is found with the desired sum.