Optimization in Typed Racket... Is this going too far? - optimization

A question about typed/racket. I'm currently working my way through the Euler Project problems to better learn racket. Some of my solutions are really slow, especially when dealing with primes and factors. So for some problems, I've tried to make a typed/racket version and I find no improvement in speed, quite the opposite. (I try to minimize the impact of overhead by using really big numbers, calculations are around 10 seconds.)
I know from the Racket docs that the best optimizations happen when using Floats/Flonums. So... yeah, I've tried to make float versions of problems dealing with integers. As in this problem with a racket version using integers, and a typed/racket one artificially turning integers to floats. I have to use tricks: checking equality between two numbers actually means checking that they are "close enough", like in this function which checks if x can be divided by y :
(: divide? (-> Flonum Flonum Boolean))
(define (divide? x y)
(let ([r (/ x y)])
(< (- r (floor r)) 1e-6)))
It works (well... the solution is correct) and I have a 30%-40% speed improvement.
How acceptable is this? Do people actually do that in real life? If not, what is the best way to optimize typed/racket solutions when using integers? Or should typed/racket be abandoned altogether when dealing with integers and reserved for problems with float calculations?

In most cases the solution is to use better algorithms rather than converting to Typed Racket.
Since most problems at Project Euler concern integers, here is a few tips and tricks:
The division operator / needs to compute the greatest common division between the denominator and the numerator in order to cancel out common factors. This makes / a bad choice if you only want to know whether one number divides another. Use (= (remainder n m) 0) to check whether m divides n. Also: use quotient rander than / when you know the division has a zero remainder.
Use memoization to avoid recomputation. I.e. use a vector to store already computed results. Example: https://github.com/racket/math/blob/master/math-lib/math/private/number-theory/eulerian-number.rkt
First implement a naive algorithm. Then consider how to reduce the number of cases. A rule of thumb: brute force works best if you can reduce the number of cases to 1-10 million.
To reduce the number of cases look for parametrizations of the search space. Example: If you need to find a Pythagorean triple: loop over numbers m and n and then compute a = m^2 - n^2, b = 2mn, and, c = m^2 + n^2. This will be faster than looping over a, b, and, c skipping those triples where a^2 + b^2 = c^2 is not true.
Look for tips and tricks in the source of math/number-theory.

Not aspiring to be an real answer since I can't provide any general tips soegaard hasn't posted already, but since I recently have done "Amicable numbers
Problem 21", I thought may as well leave you my solution here (Sadly not many Lisp solutions get posted on Euler...).
(define (divSum n)
(define (aux i sum)
(if (> (sqr i) n)
(if (= (sqr (sub1 i)) n) ; final check if n is a perfect square
(- sum (sub1 i))
sum)
(aux (add1 i) (if (= (modulo n i) 0)
(+ sum i (/ n i))
sum))))
(aux 2 1))
(define (amicableSum n)
(define (aux a sum)
(if (>= a n)
sum
(let ([b (divSum a)])
(aux (add1 a)
(if (and (> b a) (= (divSum b) a))
(+ sum a b)
sum)))))
(aux 2 0))
> (time (amicableSum 10000))
cpu time: 47 real time: 46 gc time: 0
When dealing with divisors one can often stop at the square-root of n like here with divSum. And when you find an amicable pair you may as well add both to the sum at once, what saves you an unnecessary computation of (divSum b) in my code.

Related

What exactly does small-oh notation mean? [duplicate]

What is the difference between Big-O notation O(n) and Little-O notation o(n)?
f ∈ O(g) says, essentially
For at least one choice of a constant k > 0, you can find a constant a such that the inequality 0 <= f(x) <= k g(x) holds for all x > a.
Note that O(g) is the set of all functions for which this condition holds.
f ∈ o(g) says, essentially
For every choice of a constant k > 0, you can find a constant a such that the inequality 0 <= f(x) < k g(x) holds for all x > a.
Once again, note that o(g) is a set.
In Big-O, it is only necessary that you find a particular multiplier k for which the inequality holds beyond some minimum x.
In Little-o, it must be that there is a minimum x after which the inequality holds no matter how small you make k, as long as it is not negative or zero.
These both describe upper bounds, although somewhat counter-intuitively, Little-o is the stronger statement. There is a much larger gap between the growth rates of f and g if f ∈ o(g) than if f ∈ O(g).
One illustration of the disparity is this: f ∈ O(f) is true, but f ∈ o(f) is false. Therefore, Big-O can be read as "f ∈ O(g) means that f's asymptotic growth is no faster than g's", whereas "f ∈ o(g) means that f's asymptotic growth is strictly slower than g's". It's like <= versus <.
More specifically, if the value of g(x) is a constant multiple of the value of f(x), then f ∈ O(g) is true. This is why you can drop constants when working with big-O notation.
However, for f ∈ o(g) to be true, then g must include a higher power of x in its formula, and so the relative separation between f(x) and g(x) must actually get larger as x gets larger.
To use purely math examples (rather than referring to algorithms):
The following are true for Big-O, but would not be true if you used little-o:
x² ∈ O(x²)
x² ∈ O(x² + x)
x² ∈ O(200 * x²)
The following are true for little-o:
x² ∈ o(x³)
x² ∈ o(x!)
ln(x) ∈ o(x)
Note that if f ∈ o(g), this implies f ∈ O(g). e.g. x² ∈ o(x³) so it is also true that x² ∈ O(x³), (again, think of O as <= and o as <)
Big-O is to little-o as ≤ is to <. Big-O is an inclusive upper bound, while little-o is a strict upper bound.
For example, the function f(n) = 3n is:
in O(n²), o(n²), and O(n)
not in O(lg n), o(lg n), or o(n)
Analogously, the number 1 is:
≤ 2, < 2, and ≤ 1
not ≤ 0, < 0, or < 1
Here's a table, showing the general idea:
(Note: the table is a good guide but its limit definition should be in terms of the superior limit instead of the normal limit. For example, 3 + (n mod 2) oscillates between 3 and 4 forever. It's in O(1) despite not having a normal limit, because it still has a lim sup: 4.)
I recommend memorizing how the Big-O notation converts to asymptotic comparisons. The comparisons are easier to remember, but less flexible because you can't say things like nO(1) = P.
I find that when I can't conceptually grasp something, thinking about why one would use X is helpful to understand X. (Not to say you haven't tried that, I'm just setting the stage.)
Stuff you know: A common way to classify algorithms is by runtime, and by citing the big-Oh complexity of an algorithm, you can get a pretty good estimation of which one is "better" -- whichever has the "smallest" function in the O! Even in the real world, O(N) is "better" than O(N²), barring silly things like super-massive constants and the like.
Let's say there's some algorithm that runs in O(N). Pretty good, huh? But let's say you (you brilliant person, you) come up with an algorithm that runs in O(N⁄loglogloglogN). YAY! Its faster! But you'd feel silly writing that over and over again when you're writing your thesis. So you write it once, and you can say "In this paper, I have proven that algorithm X, previously computable in time O(N), is in fact computable in o(n)."
Thus, everyone knows that your algorithm is faster --- by how much is unclear, but they know its faster. Theoretically. :)
In general
Asymptotic notation is something you can understand as: how do functions compare when zooming out? (A good way to test this is simply to use a tool like Desmos and play with your mouse wheel). In particular:
f(n) ∈ o(n) means: at some point, the more you zoom out, the more f(n) will be dominated by n (it will progressively diverge from it).
g(n) ∈ Θ(n) means: at some point, zooming out will not change how g(n) compare to n (if we remove ticks from the axis you couldn't tell the zoom level).
Finally h(n) ∈ O(n) means that function h can be in either of these two categories. It can either look a lot like n or it could be smaller and smaller than n when n increases. Basically, both f(n) and g(n) are also in O(n).
I think this Venn diagram (adapted from this course) could help:
It's the exact same has what we use for comparing numbers:
In computer science
In computer science, people will usually prove that a given algorithm admits both an upper O and a lower bound 𝛺. When both bounds meet that means that we found an asymptotically optimal algorithm to solve that particular problem Θ.
For example, if we prove that the complexity of an algorithm is both in O(n) and 𝛺(n) it implies that its complexity is in Θ(n). (That's the definition of Θ and it more or less translates to "asymptotically equal".) Which also means that no algorithm can solve the given problem in o(n). Again, roughly saying "this problem can't be solved in (strictly) less than n steps".
Usually the o is used within lower bound proof to show a contradiction. For example:
Suppose algorithm A can find the min value in an array of size n in o(n) steps. Since A ∈ o(n) it can't see all items from the input. In other words, there is at least one item x which A never saw. Algorithm A can't tell the difference between two similar inputs instances where only x's value changes. If x is the minimum in one of these instances and not in the other, then A will fail to find the minimum on (at least) one of these instances. In other words, finding the minimum in an array is in 𝛺(n) (no algorithm in o(n) can solve the problem).
Details about lower/upper bound meanings
An upper bound of O(n) simply means that even in the worse case, the algorithm will terminate in at most n steps (ignoring all constant factors, both multiplicative and additive). A lower bound of 𝛺(n) is a statement about the problem itself, it says that we built some example(s) where the given problem couldn't be solved by any algorithm in less than n steps (ignoring multiplicative and additive constants). The number of steps is at most n and at least n so this problem complexity is "exactly n". Instead of saying "ignoring constant multiplicative/additive factor" every time we just write Θ(n) for short.
The big-O notation has a companion called small-o notation. The big-O notation says the one function is asymptotical no more than another. To say that one function is asymptotically less than another, we use small-o notation. The difference between the big-O and small-o notations is analogous to the difference between <= (less than equal) and < (less than).

How can I compare the time-complexity O(n^2) with O(N+log(M))?

My Lua function:
for y=userPosY+radius,userPosY-radius,-1 do
for x=userPosX-radius,userPosX+radius,1 do
local oneNeighborFound = redis.call('lrange', userPosZone .. x .. y, '0', '0')
if next(oneNeighborFound) ~= nil then
table.insert(neighborsFoundInPosition, userPosZone .. x .. y)
neighborsFoundInPositionCount = neighborsFoundInPositionCount + 1
end
end
end
Which leads to this formula: (2n+1)^2
As I understand it correctly, that would be a time complexity of O(n^2).
How can I compare this to the time complexity of the GEORADIUS (Redis) with O(N+log(M))? https://redis.io/commands/GEORADIUS
Time complexity: O(N+log(M)) where N is the number of elements inside the bounding box of the circular area delimited by center and radius and M is the number of items inside the index.
My time complexity does not have a M. I do not know how many items are in the index (M) because I do not need to know that. My index changes often, almost with every request and can be large.
Which time complexity is when better?
Assuming N and M were independent variables, I would treat O(N + log M) the same way you treat O(N3 - 7N2 - 12N + 42): the latter becomes O(N3) simply because that's the term that has most effect on the outcome.
This is especially true as time complexity analysis is not really a case of considering runtime. Runtime has to take into account the lesser terms for specific limitations of N. For example, if your algorithm runtime can be expressed as runtime = N2 + 9999999999N, and N is always in the range [1, 4], it's the second term that's more important, not the first.
It's better to think of complexity analysis as what happens as N approaches infinity. With the O(N + log M) one, think about what happens when you:
double N?
double M?
The first has a much greater impact so I would simply convert the complexity to O(N).
However, you'll hopefully have noticed the use of the word "independent" in my first paragraph. The only sticking point to my suggestion would be if M was actually some function of N, in which case it may become the more important term.
Any function that reversed the impact of the log M would do this, such as the equality M = 101010N.

Comparing complexity of O(n+m) and O(max(n,m))

I had a job interview today. And was asked about complexity of std:set_intersection. When I was answering I mentioned that
O(n+m)
is equal to:
O(max(n,m))
I was told that this is incorrect. I was unsuccessfully trying to show equivalence with:
O(0.5*(n+m)) ≤ O(max(n,m)) ≤ O(n+m)
My question is: am I really incorrect?
For all m, n ≥ 0 it is valid that max(m, n) ≤ m + n → max(m, n) in O(m + n), and m + n ≤ 2max(m, n) → m + n in O(max(m, n)).
Thus O(max(m, n)) = O(m + n).
ADDENDUM: If f belongs O(m + n) then a constant D > 0 exists, that f(n, m) < D * (m + n) for m and n large enough. Thus f(n, m) < 2 D * max(m, n), and O(m + n) must be a subset of O(max(m, n)). The proof of O(max(m, n)) is a subset of O(m + n) is made analogously.
Well you have totally right about O(n+m) is equal to O(max(n,m)),even more precise we can prove Θ(n+m)=Θ(max(n,m) which is more tight and proves your sentence. The mathematical proof is (for both big-O and Θ) very simple and easy to understand with common sense. So since we have a mathematical proof which is a way to say something but in a more well defined and strict way which doesn't leaves any ambiguity.
Though you was (wrongly) told that this is incorrect because if we want to be very precise this is not the appropriate - mathematical way to express that order of max(m,n) is same as m+n. You used the words "is equal" referring to big-O notation but what is the definition of big-O notation?
It is referred to Sets. Saying max(n+m) belongs to O(m+n) is the
most correct way and vice versa m+n belongs to O(max(m,n)). In big O
notation is commonly used and accepted to say m+n = O(max(n,m)).
The problem caused is that you didn't try to refer to the order of a function like f is O(g) but you tried to compare Sets O(f) and O(g).But proving two infinite sets are equal is not easy (and that may confused the interviewer).
We can say Sets A and B are identical(or equal) when contain same elements (we do not try to compare but instead refer to elements they contain so they must be finite). And even identification can't be easily applied when talking about Big O Sets.
Big O of F is used to notate that we are talking about the Set that
contains all functions with order greater or equal than F. How many
functions are there??
Infinite since F+c is contained and c can take infinite values.
How could you say two different Sets are identical(or equal) when they are
infinite ,well it is not that simple.
So I understand what you are thinking that n+m and max(n,m) have same
order but **the right way to express that** is by saying n+m is
O(max(n,m)) and max(n,m)is O(m+n) ( O(m+n) is equal to O(max(m,n))
may better requires a proof).
One more thing, we said that these functions have same order and this is absolutely mathematically correct but when trying to do optimization of an algorithm and you may need to take into account some lower order factors then maybe they give you slightly different results but the asymptotic behavior is proved to be the same.
CONCLUSION
As you can read in wikipedia (and in all cs courses in every university or in every algorithm book) Big O/θ/Ω/ω/ο notations helps us compare functions and find their order of growth and not for Sets of Functions and this is why you were told you were wrong. Though is easy to say O(n^2) is subset of O(n) it is very difficult to compare infinite to say if two sets are identical. Cantor have worked on categorizing infinite sets, for example we know that natural numbers are countable infinite and real numbers are uncountable infinite so real numbers are more than natural numbers even though both are infinite. It is getting very complicating when trying t order and categorize infinite sets and this would be more of a research in maths than a way of comparing functions.
UPDATE
It turns out you could simply prove O(n+m) equals to O(max(n,m)):
for every function F which belongs to O(n+m) this means that there are constant c and k such:
F <= c(n+m) for every n>=k and m>=k
then also stands:
F <= c(n+m)<= 2c*max(n,m)
so F belongs to O(max(n,m)) and as a result O(m+n) is subset of O(max(n,m)).
Now consider F belongs to O(max(n,m)) then there are constants c and k such:
F <= c*max(n+m) for every n>=k and m>=k
and we also have:
F <= c*max(n+m)<=2c(m+n) for every n>=k and m>=k
so there is c'=2c and with same k by definition: F is O(m+n) and as a result O(max(n,m)) is subset of O(n+m). Because we proved O(m+n) is subset of O(max(n,m)) we proved that O(max(m,n)) and O(m+n) are equal and this mathematical proof proves you had totally right without any doubt.
Finally note that proving that m+n is O(max(n,m)) and max(n,m) is O(m+n) doesn't proves immediately that sets are equal (we need a proof for that) as your saying it just proves that functions have same order but we didn't examine the sets. Though it is easy to see (in general case) that if f is O(g) and g is O(F) then you can easily prove in that case the big O sets equality like we did in the previous paragraph.
We'll show by rigorous Big-O analysis that you are indeed correct, given one possible choice of parameter of growth in your analysis. However, this does not necessarily mean that the viewpoint of the interviewer is incorrect, rather that his/her choice of parameter of growth differs. His/her prompt that your answer was outright incorrect, however, is questionable: you've possibly simply used two slightly different approaches to analyzing the asymptotic complexity of std::set_intersection, both leading to the general consensus that the algorithm runs in linear time.
Preparations
Lets start by looking at the reference of std::set_intersection at cppreference (emphasis mine)
http://en.cppreference.com/w/cpp/algorithm/set_intersection
Parameters
first1, last1 - the first range of elements to examine
first2, last2 - the second range of elements to examine
Complexity
At most 2·(N1+N2-1) comparisons, where
N1 = std::distance(first1, last1)
N2 = std::distance(first2, last2)
std::distance itself is naturally linear (worst case: no random access)
std::distance
...
Returns the number of elements between first and last.
We'll proceed to briefly recall the basic of Big-O notation.
Big-O notation
We loosely state the definition of a function or algorithm f being in O(g(n)) (to be picky, O(g(n)) being a set of functions, hence f ∈ O(...), rather than the commonly misused f(n) ∈ O(...)).
If a function f is in O(g(n)), then c · g(n) is an upper
bound on f(n), for some non-negative constant c such that f(n) ≤ c · g(n)
holds, for sufficiently large n (i.e. , n ≥ n0 for some constant
n0).
Hence, to show that f ∈ O(g(n)), we need to find a set of (non-negative) constants (c, n0) that fulfils
f(n) ≤ c · g(n), for all n ≥ n0, (+)
We note, however, that this set is not unique; the problem of finding the constants (c, n0) such that (+) holds is degenerate. In fact, if any such pair of constants exists, there will exist an infinite amount of different such pairs.
We proceed with the Big-O analysis of std::set_intersection, based on the already known worst case number of comparisons of the algorithm (we'll consider one such comparison a basic operation).
Applying Big-O asymptotic analysis to the set_intersection example
Now consider two ranges of elements, say range1 and range2, and assume that the number of elements contained in these two ranges are m and n, respectively.
Note! Already at this initial stage of the analys do we make a choice: we choose to study the problem in terms of two different parameters of growth (or rather, focusing on the largest one of these two). As we shall see ahead, this will lead to the same asymptotic complexity as the one stated by the OP. However, we could just as well choose to let k = m+n be the parameter of choice: we would still conclude that std::set_intersection is of linear-time complexity, but rather in terms of k (which is m+n which is not max(m, n)) than the largest of m and n. These are simply the preconditions we freely choose to set prior to proceeding with our Big-O notation/asymptotic analysis, and it's quite possibly that the interviewer had a preference of choosing to analyze the complexity using k as parameter of growth rather than the largest of its two components.
Now, from above we know that as worst case, std::set_intersection will run 2 · (m + n - 1) comparisons/basic operations. Let
h(n, m) = 2 · (m + n - 1)
Since the goal is to find an expression of the asymptotic complexity in terms of Big-O (upper bound), we may, without loss of generality, assume that n > m, and define
f(n) = 2 · (n + n - 1) = 4n - 2 > h(n,m) (*)
We proceed to analyze the asymptotic complexity of f(n), in terms of Big-O notation. Let
g(n) = n
and note that
f(n) = 4n - 2 < 4n = 4 · g(n)
Now (choose to) let c = 4 and n0 = 1, and we can state the fact that:
f(n) < 4 · g(n) = c · g(n), for all n ≥ n0, (**)
Given (**), we know from (+) that we've now shown that
f ∈ O(g(n)) = O(n)
Furthermore, since `(*) holds, naturally
h ∈ O(g(n)) = O(n), assuming n > m (i)
holds.
If we switch our initial assumption and assume that m > n, re-tracing the analysis above will, conversely, yield the similar result
h ∈ O(g(m)) = O(m), assuming m > n (ii)
Conclusion
Hence, given two ranges range1 and range2 holding m and n elements, respectively, we've shown that the asymptotic complexity of std::set_intersection applied two these two ranges is indeed
O(max(m, n))
where we're chosen the largest of m and n as the parameter of growth of our analysis.
This is, however, not really valid annotation (at least not common) when speaking about Big-O notation. When we use Big-O notation to describe the asymptotic complexity of some algorithm or function, we do so with regard to some single parameter of growth (not two of them).
Rather than answering that the complexity is O(max(m, n)) we may, without loss of generality, assume that n describes the number of elements in the range with the most elements, and given that assumption, simply state than an upper bound for the asymptotic complexity of std::set_intersection is O(n) (linear time).
A speculation as to the interview feedback: as mentioned above, it's possible that the interviewer simply had a firm view that the Big-O notation/asymptotic analysis should've been based on k = m+n as parameter of growth rather than the largest of its two components. Another possibility could, naturally, be that the interviewer simply confusingly queried about the worst case of actual number of comparisons of std::set_intersection, while mixing this with the separate matter of Big-O notation and asymptotic complexity.
Final remarks
Finally note that the analysis of worst case complexity of std::set_intersect is not at all representative for the commonly studied non-ordered set intersection problem: the former is applied to ranges that are already sorted (see quote from Boost's set_intersection below: the origin of std::set_intersection), whereas in the latter, we study the computation of the intersection of non-ordered collections.
Boost: set_intersection
Description
set_intersection constructs a sorted range that is the intersection
of the sorted ranges rng1 and rng2. The return value is the
end of the output range.
As an example of the latter, the Intersection set method of Python applies to non-ordered collections, and is applied to say sets s and t, it has an average case and a worst-case complexity of O(min(len(s), len(t)) and O(len(s) * len(t)), respectively. The huge difference between average and worst case in this implementation stems from the fact that hash based solutions generally works very well in practice, but can, for some applications, theoretically have a very poor worst-case performance.
For additional details of the latter problem, see e.g.
Intersection of two unsorted sets or lists # SE-CSTheory

Practical difference between O(n) and O(1 + n)?

Isn't O(n) an improvement over O(1 + n)?
This is my conception of the difference:
O(n):
for i=0 to n do ; print i ;
O(1 + n):
a = 1;
for i=0 to n do ; print i+a ;
... which would just reduce to O(n) right?
If the target time complexity is O(1 + n), but I have a solution in O(n),
does this mean I'm doing something wrong?
Thanks.
O(1+n) and O(n) are mathematically identical, as you can straightforwardly prove from the formal definition or using the standard rule that O( a(n) + b(n) ) is equal to the bigger of O(a(n)) and O(b(n)).
In practice, of course, if you do n+1 things it'll (usually, dependent on compiler optimizations/etc) take longer than if you only do n things. But big-O notation is the wrong tool to talk about those differences, because it explicitly throws away differences like that.
It's not an improvement because BigO doesn't describe the exact running time of your algorithm but rather its growth rate. BigO therefore describes a class of functions, not a single function. O(n^2) doesn't mean that your algorithms for input of size 2 will run in 4 operations, it means that if you were to plot the running time of your application as a function of n it would be asymptotically upper bound by c*n^2 starting at some n0. This is nice because we know how much slower our algorithm will be for each input size, but we don't really know exactly how fast it will be. Why use the c? Because as I said we don't care about exact numbers but more about the shape of the function - when we multiply by a constant factor the shape stays the same.
Isn't O(n) an improvement over O(1 + n)?
No, it is not. Asymptotically these two are identical. In fact, O(n) is identical to O(n+k) where k is any constant value.

Little-oh notation in detail, CS homework, excluding actual assignment

I'm sitting here with this assignment in a course on algorithms with massive data sets and the use of Little-Oh notation has got me confused, although I'm perfectly confident with Big-Oh.
I do not want a solution to the assignment, and as such I will not present it. Instead my question is how I interpret the time complexity o(log n)?
I know from the definition, that an algorithm A must grow asymptotically slower than o(log n), but I'm uncertain as to whether this means that the algorithm must be running in constant time or if it is still allowed to be log n under certain conditions, such that c = 1 or if it is really log (n-1).
Say an algorithm has a running time of O(log n) but in fact does two iterations and as such c = 2, but 2*log n is still O(log n), am I right when I say that this does not hold for Little-Oh?
Any help is greatly appreciated and if strictly needed for clarification, I will provide the assignment
To say the f is 'little-oh-of g' f = o(g), means that the quotient
|f(x)/g(x)|
approaches 0 as x approaches infinity. Referring to your example of o(log n), that class contains functions like log x / log (log x), sqrt(log x) and many more, so o(log x) definitely doesn't imply O(1). On the other hand, log (x/2) = log x - log 2, so
log (x/2) / log x = 1 - log 2 / log x -> 1
and log (x/2) is not in the class o(log x).
For Little-Oh, f(x) does not have to be smaller than g(x) for all x. It has to be smaller only after a certain value of x. (For your question, it is still allowed to be log n under certain conditions.)
For example:
let f(x) = 5000x and g(x) = x^2
f(x) / g(x) as x approaches infinity is 0, so f(x) is litte-o of g(x). However, at x = 1, f(x) is greater than g(x). Only after x becomes greater than 5000 will g(x) be bigger than f(x).
What little-o really tells us is that g(x) always grows at a faster rate than f(x). For example, look how much f(x) grew between x = 1 and x = 2:
f(1) = 5000
f(2) = 10000 - f(x) became twice as big
Now look at g(x) on the same interval:
g(1) = 1
g(2) = 4 - g(x) became four times bigger
This rate will increase even more at bigger values of x. Now, since g(x) increases at a faster rate and because we take x to the infinity, at some point it will become larger than f(x). But this is not what little-o is concerned with, it's all about the rate of change.