Array search NP complete [closed] - time-complexity

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about programming within the scope defined in the help center.
Closed 7 years ago.
Improve this question
Given an unsorted array of size n, it's obvious that finding whether an element exists in the array takes O(n) time.
If we let m = log n then it takes O(2^m) time.
Notice that if the array is sorted, a binary search actually takes O(m) time (which is polynomial) but the binary search cannot apply to an unsorted array.
Is it possible to prove that the problem to find an element in an array (yes or no) is NP complete in terms of m. What problem should I reduce from and how to reduce?
Any idea would be appreciated.
EDIT:
My description above probably did not express clearly what I was trying to say.
Let's reword the problem in the following way.
We have an oracle, which is a binary tree of height h with each node having random values. I.E. a tree that DOES NOT have the property that all values in the left subtree of a node must be smaller than the value in the node or all values in the right subtree of a node must be greater than the value in the node. However all nodes in the oracle tree are guaranteed to have value between 0 and 2^h-1.
The input is a number to be searched. The input is guaranteed to have value between 0 and 2^h-1. (The input has h bits)
(Let's say we are searching through the same array every time and hence we have the same oracle every time so the tree is not a part of input.)
The output is YES or NO, indicating whether the input is in a node of the tree or not.
Question: whether this problem is NP complete or not in terms of h.
This problem is NP because if a path to the YES node in the tree is given it can be verified in O(h) time.
(Note that if the oracle tree has the property that left subtree of a node is less than the node and right subtree of a node is greater than the node then the problem is NOT NP complete because binary search can be applied.)

Finding an element in an array is NOT NP-complete as it can be done in linear time. (Assuming P ≠ NP)
In fact, the naive brute-force search algorithm you mentioned in your question is a linear time algorithm!
When we are talking about the complexity of a computational problem, we always measure the time with respect to the size of the input. You claimed the input size of our algorithm is m = log(n), but in our case, the size of our input is determined by the number of elements in the array, which is n.
For your reference, testing whether a given number n is a prime number is an example computational problem that takes input of size log(n). The input of the problem is n, and it is of size log(n) because we need to use log(n) bits to represent n in binary form.
Update
Deterministic search algorithm requires Ω(n) time for unsorted array.
Any search algorithm must read through the entire input (i.e. the n entries of the array). We are going to prove this by contradiction.
Suppose there is a search algorithm that does not read all n input entries, then there is an entry that is not read by this algorithm. you can then construct a case that the search item is at the entry that is not read by this hypothetical algorithm, this violates the correctness of the algorithm. Hence such algorithm does not exist.

Related

Time Complexity Analysis of Recursive Tree Treversal Algorithm

I am supposed to design a recursive algorithm that traverses a tree and sets the cardinality property for every node. The cardinality property is the number of nodes that are in the subtree where the currently traversed node is the root.
Here's my algorithm in pseudo/ python code:
SetCardinality(node)
if node exists:
node.card = 1 + SetCardinality(x.left_child) + SetCardinality(x.right_child)
return node.card
else:
return 0
I'm having a hard time coming up with the recurrence relation that describes this function. I figured out that the worst-case input would be a tree of height = n. I saw on the internet that a recurrent relation for such a tree in this algorithm might be:
T(n) = T(n-1) + n but I don't know how the n in the relation corresponds to the algorithm.
You have to ask yourself: How many nodes does the algorithm visit? You will notice that if you run your algorithm on the root node, it will visit each node exactly once, which is expected as it is essentially a depth-first search.
Therefore, if the rest of your algorithm is constant-time operations, we have a time complexity of O(n) for the total number of nodes n.
Now, if you want to express it in terms of the height of the tree, you need to know more about the given tree. If it's a complete binary tree then the height is O(logn) and therefore the time complexity would be O(2h). But expressing it in terms of total nodes is simpler. Notice also that the shape of the tree does not really matter for your time complexity, as you will be visiting each node exactly once regardless.

Can I represent time-complexity as a summation (complexity of elements of different length)

Let's say I have to iterate over every character in an array of strings, in which every string has a different length, so arr[0].length != arr[1].length and so on, as this for example:
#prints every char in all the array
for str in arr:
for c in str:
print(c)
How should the time complexity of an algorithm of this nature be represented? A summation of every length of the element in the array? or just like O(N*M), taking N as number of elements and M as max length of array, which it overbounds accordingly?
There is a precise mathematical theory called complexity theory which answers your question and many more. In complexity theory, we have what is called a Turing machine which is a type of computer. The time complexity of a Turing machine doing a computation is then defined as the function f defined on natural numbers such that f(n) is the worst case running time of the machine on inputs of length n. In your case it just needs to copy its input into somewhere else, which is clearly has O(n) time complexity (n here is the combined length of your array). Since NM is greater than n, it means that your Turing machine doing the algorithm you described will not run longer than some constant times NM but it may halt sooner due to irregularities of the lengths of elements of the array.
If you are interested in learning about complexity theory, I recommend the book Introduction to the Theory of Computation by Michael Sipser, which explains these concepts from scratch.
There are many ways you could do this. Your bound of O(NM) is a conservative upper bound. You could also define a parameter L indicating the total length of all the strings and say that the runtime is Θ(N + L), which is essentially your sum idea made a bit cleaner by assigning a name to the summation. That’s a more precise bound that more clearly indicates where the work is being done.

When analyzing the worst case time complexity of search algorithms using Big O notation, why is the variable representing the input nonexistent?

Thanks for your willingness to help.
Straight to the point, I'm confused with the use of Big O notation when analyzing the worst case time complexity of search algorithms.
For example, the worst case time complexity of Alpha-Beta Pruning is O(b^d) where ^ means ~ to the power of ~, b representing the average branching factor and d representing the depth of the search tree.
I do get that the worst case time complexity would be less or equal to a positive constant multiplied by b^d, but why is the use of big O notation permitted here? Where did the variable n, the input size, go? I do know that the input of same size might cause significant difference in time complexity of an algorithm.
All of the research I've done only explains "the use of big o notation in the analysis of worst case time complexity" in terms of the growth function, a function that has variable y as time complexity and variable x as input size. There are also formal definitions of big o notation, which make me even more confused with the question above. definition 1definition 2
Any attempts to answer my question would be greatly appreciated.
The input size you refer here to n is in this case d. If n is the amount of entries in your tree, d can be calculated by ln_2(n), assuming your tree is a balanced binary tree.
Big O notation implies that you are discussing what the runtime would be for a very large n. In the case you noted, O(b^d), the n is the variable that changes with input size. In this case, d would be your n. As you've found, some notations make use of many variables.
n is just a general term for the number of elements, but runtime could vary on many factors- depth of a tree, or a different list entirely. For example, to traverse lists like this:
for n in firstList:
for k in secondList:
do stuff
the cost would be O(n*k).

Finding Shortest Path using BFS search on a Undirected Graph, knowing the length of the SP

I was asked an interview question today and I was not able to solve at that time.
The question is to get the minimum time complexity of finding the shortest path from node S to node T in a graph G where:
G is undirected and unweighted
The connection factor of G is given as B
The length of shortest path from S to T is given as K
The first thing I thought was that in general case, the BFS is fastest way to get the SP from S to T, in O(V+E) time. Then how can we use the B and K to reduce the time. I'm not sure what a connection factor is, so I asked the interviewer, then he told me that it is on average a node has B edges with other nodes. So I was thinking that if K = 1, then the time complexity should be O(B). But wait, it is "on average", which means it could still be O(E+V), where the graph is a like a star and all other nodes are connected to S.
If we assume that the B is a strict up limit. Then the first round of BFS is O(B), and the second is O(B*B), and so on, like a tree. Some of the nodes in the lower layer may be already visited in the previous round therefore should not be added. Still, the worst scenario is that the graph is huge and none of the node has been visited. And the time complexity is
O(B) + O(B^2) + O(B^3) ... O(B^K)
Using the sum of Geometric Series, the sum is O(B(1-B^K)/(1-B)). But this SUM should not exceed V+E.
So, is the time complexity is O(Min(SUM, V+E))?
I have no idea how to correctly solve this problem. Any help is appreciated.
Your analysis seems correct. Please refer to the following references.
http://axon.cs.byu.edu/~martinez/classes/312/Slides/Paths.pdf
https://courses.engr.illinois.edu/cs473/sp2011/lectures/03_class.pdf

How to compute kolmogorov complexity of an algorithm?

Suppose for various input strings an algorithm generates binary string with same number of 0's and 1's. The output for two different input strings may or may not be the same. Can we say anything about the space complexity of the algorithm?
The question isn't quite right.
Kolmogorov complexity K(x) doesn't apply to programs, it applies to a string x.
More specifically, the Kolmogorov complexity of a string x is the minimum program length needed to compute a particular string x.
It has been formally proven that one can't compute the Kolmogorov complexity of a string. In practice, you can approximate via an upper bound.
The following paper by Ferbus-Zanda and Griorieff gives you the theory http://arxiv.org/abs/1010.3201
An intuitive way of thinking about such an approximate upper bound is to consider the length of a compression program that can decompress to a particular string.
Applying this to your problem, the string you describe is a random binary one, doubled. The input string acts a seed for the random number generator.
Ignoring the kolmogorov complexity part of your question, and just looking at space complexity (ie. memory footprint) aspect as #templatetypedef did, the criteria you mention are so loose that all you can say is that the lower space bound for the algorithm is O(1) and the upper bound O(n), where n is the output.
No, I don't believe so. Consider the algorithm "print 01," which requires space Θ(1), and the algorithm "double the length of the input string, then print 01," which requires space Θ(n). Both algorithms meet the criteria you've provided, so just given those criteria you can't say anything about the space complexity of the algorithm.
Hope this helps!