BST search problen - oop

I've implemented a BST that contains in it's n nodes an object of type Point (x,y), the order in the tree is according to the X's values.
I need to implement a function that getting as input the range of X's (x left, x right)
and the output is
(EDIT):the sum of Y's coordinates in range(included).
it's not that hard doing it by "walking" throught all the nodes, the problem is the I asKed to do it in O(logn) comlexity.
I thought about intializing fields of ranges and sum of Y's, but somehow it doesn't work out with the insertion and deletion functions.
any ideas?

Finding each end of the range is O(log n) complexity. There is no need to look at all nodes in the tree.
If you are required to sum the numbers of a range, then taking the high boundary minus the low boundary (assuming a consecutive, linear range), is a constant time operation.

Related

How do I determine "Big-oh" time complexity given an excel sheet showing various input size values VS their corresponding run time values

I have been given a list of input sizes and their corresponding runtime values for a given algorithm A. How should I go about computing the "Big-oh" time complexity of algorithm A given these values?
Try playing around with the numbers and see if they approximately fit one of the "standard" complexity functions, e.g. n, n^2, n^3, 2^n, log(n).
For example, if the ratio between value and input is nearly constant, it's likely O(n). If the ratio between value and input grows linearly (or doubling the input quadruples the value etc.), it is O(n^2). If it grows quadratically, it's O(n^3). If adding a constant to the input results in multiplicative change in its value, it's exponential. And if it's the reverse relationship, it's log(n).
If it's just slightly but consistently growing more quickly than a line, it's probably O(n log(n)).
You can also plot the graph of your values (input numbers vs runtime values) in Excel and overlay it with the graph of the function you guessed may fit, and then try to tweak the parameters (e.g. for O(n^2), plot a graph of a*x^2 + b, and tweak a and b).
To make it more precise (e.g. to calculate the uncertainty), you could apply regression analysis (search for non-linear regression analysis in Excel).

Generating Matched Pairs for Statistical Analysis

In my study, a person is represented as a pair of real numbers (x, y). x is on [30, 80] and y is [60, 120]. There are two types of people, A and B. I have ~300 of each type. How can I generate the largest (or even a large) set of pairs of one person from A with one from B: ((xA, yA), (xB, yB)) such that each pair of points is close? Two points are close if abs(x1-x2) < dX and abs(y1 - y2) < dY. Similar constraints are acceptable. (That is, this constraint is roughly a Manhattan metric, but euclidean/etc is ok too.) Not all points need be used, but no point can be reused.
You're looking for the Hungarian Algorithm.
Suggested formulation: A are rows, B are columns, each cell contains a distance metric between Ai and Bi, e.g. abs(X(Ai)-X(Bi)) + abs(Y(Ai)-Y(Bi)). (You can normalize the X and Y values to [0,1] if you want distances to be proportional to the range of each variable)
Then use the Hungarian Algorithm to minimize matching weight.
You can filter out matches with distances over your threshold. If you're worried that this filtering might cause the approach to be sub-optimal, you could set distances over your threshold to a very high number.
There are many implementations of this algorithm. A short search found one in any conceivable language, including VBA for Excel and some online solvers (not sure about matching 300x300 matrix with them, though)
Hungarian algorithm did it, thanks Etov.
Source code available here: http://www.filedropper.com/stackoverflow1

do numbers in an array contain sides of a valid triange

Check if an array of n integers contains 3 numbers which can form a triangle (i.e. the sum of any of the two numbers is bigger than the third).
Apparently, this can be done in O(n) time.
(the obvious O(n log n) solution is to sort the array so please don't)
It's difficult to imagine N numbers (where N is moderately large) so that there is no triangle triplet. But we'll try:
Consider a growing sequence, where each next value is at the limit N[i] = N[i-1] + N[i-2]. It's nothing else than Fibonacci sequence. Approximately, it can be seen as a geometric progression with the factor of golden ratio (GRf ~= 1.618).
It can be seen that if the N_largest < N_smallest * (GRf**(N-1)) then there sure will be a triangle triplet. This definition is quite fuzzy because of floating point versus integer and because of GRf, that is a limit and not an actual geometric factor. Anyway, carefully implemented it will give an O(n) test that can check if the there is sure a triplet. If not, then we have to perform some other tests (still thinking).
EDIT: A direct conclusion from fibonacci idea is that for integer input (as specified in Q) there will exist a garanteed solution for any possible input if the size of array will be larger than log_GRf(MAX_INT), and this is 47 for 32 bits or 93 for 64 bits. Actually, we can use the largest value from the input array to define it better.
This gives us a following algorithm:
Step 1) Find MAX_VAL from input data :O(n)
Step 2) Compute the minimum array size that would guarantee the existence of the solution:
N_LIMIT = log_base_GRf(MAX_VAL) : O(1)
Step 3.1) if N > N_LIMIT : return true : O(1)
Step 3.2) else sort and use direct method O(n*log(n))
Because for large values of N (and it's the only case when the complexity matters) it is O(n) (or even O(1) in cases when N > log_base_GRf(MAX_INT)), we can say it's O(n).

How to calculate continuous effect of gravitational pull between simulated planets

so I am making a simple simulation of different planets with individual velocity flying around space and orbiting each other.
I plan to simulate their pull on each other by considering each planet as projecting their own "gravity vector field." Each time step I'm going to add the vectors outputted from each planets individual vector field equation (V = -xj + (-yj) or some notation like it) except the one being effected in the calculation, and use the effected planets position as input to the equations.
However this would inaccurate, and does not consider the gravitational pull as continuous and constant. Bow do I calculate the movement of my planets if each is continuously effecting the others?
Thanks!
In addition to what Blender writes about using Newton's equations, you need to consider how you will be integrating over your "acceleration field" (as you call it in the comment to his answer).
The easiest way is to use Euler's Method. The problem with that is it rapidly diverges, but it has the advantage of being easy to code and to be reasonably fast.
If you are looking for better accuracy, and are willing to sacrifice some performance, one of the Runge-Kutta methods (probably RK4) would ordinarily be a good choice. I'll caution you that if your "acceleration field" is dynamic (i.e. it changes over time ... perhaps as a result of planets moving in their orbits) RK4 will be a challenge.
Update (Based on Comment / Question Below):
If you want to calculate the force vector Fi(tn) at some time step tn applied to a specific object i, then you need to compute the force contributed by all of the other objects within your simulation using the equation Blender references. That is for each object, i, you figure out how all of the other objects pull (apply force) and those vectors when summed will be the aggregate force vector applied to i. Algorithmically this looks something like:
for each object i
Fi(tn) = 0
for each object j ≠ i
Fi(tn) = Fi(tn) + G * mi * mj / |pi(tn)-pj(tn)|2
Where pi(tn) and pj(tn) are the positions of objects i and j at time tn respectively and the | | is the standard Euclidean (l2) normal ... i.e. the Euclidean distance between the two objects. Also, G is the gravitational constant.
Euler's Method breaks the simulation into discrete time slices. It looks at the current state and in the case of your example, considers all of the forces applied in aggregate to all of the objects within your simulation and then applies those forces as a constant over the period of the time slice. When using
ai(tn) = Fi(tn)/mi
(ai(tn) = acceleration vector at time tn applied to object i, Fi(tn) is the force vector applied to object i at time tn, and mi is the mass of object i), the force vector (and therefore the acceleration vector) is held constant for the duration of the time slice. In your case, if you really have another method of computing the acceleration, you won't need to compute the force, and can instead directly compute the acceleration. In either event, with the acceleration being held as constant, the position at time tn+1, p(tn+1) and velocity at time tn+1, v(tn+1), of the object will be given by:
pi(tn+1) = 0.5*ai(tn)*(tn+1-tn)2 + vi(tn)*(tn+1-tn)+pi(tn)
vi(tn+1) = ai(tn+1)*(tn+1-tn) + vi(tn)
The RK4 method fits the driver of your system to a 2nd degree polynomial which better approximates its behavior. The details are at the wikipedia site I referenced above, and there are a number of other resources you should be able to locate on the web. The basic idea is that instead of picking a single force value for a particular timeslice, you compute four force vectors at specific times and then fit the force vector to the 2nd degree polynomial. That's fine if your field of force vectors doesn't change between time slices. If you're using gravity to derive the vector field, and the objects which are the gravitational sources move, then you need to compute their positions at each of the four sub-intervals in order compute the force vectors. It can be done, but your performance is going to be quite a bit poorer than using Euler's method. On the plus side, you get more accurate motion of the objects relative to each other. So, it's a challenge in the sense that it's computationally expensive, and it's a bit of a pain to figure out where all the objects are supposed to be for your four samples during the time slice of your iteration.
There is no such thing as "continuous" when dealing with computers, so you'll have to approximate continuity with very small intervals of time.
That being said, why are you using a vector field? What's wrong with Newton?
And the sum of the forces on an object is that above equation. Equate the two and solve for a
So you'll just have to loop over all the objects one by one and find the acceleration on it.

Search optimization problem

Suppose you have a list of 2D points with an orientation assigned to them. Let the set S be defined as:
S={ (x,y,a) | (x,y) is a 2D point, a is an orientation (an angle) }.
Given an element s of S, we will indicate with s_p the point part and with s_a the angle part. I would like to know if there exist an efficient data structure such that, given a query point q, is able to return all the elements s in S such that
(dist(q_p, s_p) < threshold_1) AND (angle_diff(q_a, s_a) < threshold_2) (1)
where dist(p1,p2), with p1,p2 2D points, is the euclidean distance, and angle_diff(a1,a2), with a1,a2 angles, is the difference between angles (taken to be the smallest one). The data structure should be efficient w.r.t. insertion/deletion of elements and the search as defined above. The number of vectors can grow up to 10.000 and more, but take this with a grain of salt.
Now suppose to change the above requirement: instead of using the condition (1), let's request all the elements of S such that, given a distance function d, we want all elements of S such that d(q,s) < threshold. If i remember well, this last setup is called range-search. I don't know if the first case can be transformed in the second.
For the distance search I believe the accepted best method is a Binary Space Partition tree. This can be stored as a series of bits. Each two bits (for a 2D tree) or three bits (for a 3D tree) subdivides the space one more level, increasing resolution.
Using a BSP, locating a set of objects to compare distances with is pretty easy. Just find the smallest set of squares or cubes which contain the edges of your distance box.
For the angle, I don't know of anything. I suppose that you could store each object in a second list or tree sorted by its angle. Then you would find every object at the proper distance using the BSP, every object at the proper angles using the angle tree, then do a set intersection.
You have effectively described a "three dimensional cyclindrical space", ie. a space that is locally three dimensional but where one dimension is topologically cyclic. In other words, it is locally flat and may be modeled as the boundary of a four-dimensional object C4 in (x, y, z, w) defined by
z^2 + w^2 = 1
where
a = arctan(w/z)
With this model, the space defined by your constraints is a 2-dimensional cylinder wrapped "lengthwise" around a cross section wedge, where the wedge wraps around the 4-d cylindrical space with an angle of 2 * threshold_2. This can be modeled using a "modified k-d tree" approach (modified 3-d tree), where the data structure is not a tree but actually a graph (it has cycles). You can still partition this space into cells with hyperplane separation, but traveling along the curve defined by (z, w) in the positive direction may encounter a point encountered in the negative direction. The tree should be modified to actually lead to these nodes from both directions, so that the edges are bidirectional (in the z-w curve direction - the others are obviously still unidirectional).
These cycles do not change the effectiveness of the data structure in locating nearby points or allowing your constraint search. In fact, for the most part, those algorithms are only slightly modified (the simplest approach being to hold a visited node data structure to prevent cycles in the search - you test the next neighbors about to be searched).
This will work especially well for your criteria, since the region you define is effectively bounded by these axis-defined hyperplane-bounded cells of a k-d tree, and so the search termination will leave a region on average populated around pi / 4 percent of the area.