Finding longest sequence according to requirements (Dynamic Programming) - sequence

''How to determine the longest increasing subsequence using dynamic programming?'' didn't help me enough that I could do it on my own so I am asking for your help.
I have a sequence of integers: (-2, 4, 1, 1, 5, -2, 3, 3, -1, 1). I want to find longest sequence of them according to these requirements using dynamic programming (X here is a number, i its index):
The numbers have to go in order, their indexes would keep increasing
If the index is an odd number, this requirement has to be met: Xi <= Xi+1, if the index is an even number, this requirement has to be met: Xi >= Xi+1.
For example longest sequence would be: (-2, 4, 1, 5, -2, 3, -1, 1). Any help is greatly appreciated, I have been on this for the whole day..!

Related

Maximum contigous subsequence sum of x elements

So I came up with a question that I've looked and searched but with no answer found... What's the best (and by saying the best, I mean the fastest) way to get the maximum contigous subsequence sum of x elements?
Imagine that I've: A[] = {2, 4, 1, 10, 40, 50, 22, 1, 24, 12, 40, 11, ...}.
And then I ask:
"What is the maximum contigous subsequence on array A with 3 elements?"
Please imagine this in a array with more than 100000 elements... Can someone help me?
Thank you for your time and you help!
I Googled it and found this:
Using Divide and Conquer approach, we can find the maximum subarray sum in O(nLogn) time. Following is the Divide and Conquer algorithm.
The Kadane’s Algorithm for this problem takes O(n) time. Therefore the Kadane’s algorithm is better than the Divide and Conquer approach
See the code:
Initialize:
max_so_far = 0
max_ending_here = 0
Loop for each element of the array
(a) max_ending_here = max_ending_here + a[i]
(b) if(max_ending_here < 0)
max_ending_here = 0
(c) if(max_so_far < max_ending_here)
max_so_far = max_ending_here
return max_so_far

Create a new generation using replication and crossover in genetic algorthm

Hi all i am studying genetic algorithm to create a new generation. I got a problem for the following one:
This question refers to Genetic Algorithms. Assume you have a population made of 10 individuals. Each individual is made of 5 bits. Here is the initial population.
x1 = (1, 0, 0, 1, 1)
x2 = (1, 1, 0, 0, 1)
x3 = (1, 1, 0, 1, 1)
x4 = (1, 1, 1, 1, 1)
x5 = (0, 0, 0, 1, 1)
x6 = (0, 0, 1, 1, 1)
x7 = (0, 0, 0, 0, 1)
x8 = (0, 0, 0, 0, 0)
x9 = (1, 0, 1, 1, 1)
x10 = (1, 0, 0, 1, 0)
Individuals are ranked according to fitness value (x1 has the greatest fitness value, x2 the second best, etc.). Assume that when sampling, you get individuals in the same order as they are ranked. Create a new generation of solutions assuming the following:
Replication is 20%. Cross over is 80% (assume a crossover mask as follows: 11100; pair examples in the same order as ranked). No mutation is done.
My solution: replication is 20% that means first two population is unchanged.Next the given the crossover mask given 11100 I choose randomly 3 words from crossover(11100) mask so start from x3 and x4 and here i keep first 3 words same both x3 and x4 and finally swap last two remaining words for x3 and x4 and generate new population. I follow same rule for x5 and x6, x7 and x8 and x9 and x10.I am not sure this answer is correct or wrong. Any body can help me please?
I don't know the background of the implementation you are using so I may not be correct, but from a genetic algorithm point of view most of your answer seems correct.
As far as I can see, the only issue in your reasoning is with the crossover. After replication has taken place you use the remaining chromosomes for crossover. This seems inherently flawed from a genetic algorithms point of view. Genetic algorithms generally use the best chromosome in crossover. You've already saved the best and seem to then exclude them from any recombination. This idea goes against the idea of genetic algorithms, which is to evolve the population by means of recombination of the fittest individuals. At the very least, the fittest chromosomes should be included.
Generally, most implementations involve an element of randomness in selection with the fittest chromosomes given more weighting. Since your question explicitly states that pairs are selected in order of ranking, and therefore no randomness, I'd assume crossover is to be performed on chromosomes 1 to 8.
Your understanding of the crossover mask seems correct from the question.
Again, I know nothing of the implementation in question so I'm not sure how good my understanding is. I'd be interested to know the source since the genetic algorithm seems highly unusual.

How to solve Euler Project Prblem 303 faster?

The problem is:
For a positive integer n, define f(n) as the least positive multiple of n that, written in base 10, uses only digits ≤ 2.
Thus f(2)=2, f(3)=12, f(7)=21, f(42)=210, f(89)=1121222.
To solve it in Mathematica, I wrote a function f which calculates f(n)/n :
f[n_] := Module[{i}, i = 1;
While[Mod[FromDigits[IntegerDigits[i, 3]], n] != 0, i = i + 1];
Return[FromDigits[IntegerDigits[i, 3]]/n]]
The principle is simple: enumerate all number with 0, 1, 2 using ternary numeral system until one of those number is divided by n.
It correctly gives 11363107 for 1~100, and I tested for 1~1000 (calculation took roughly a minute, and gives 111427232491), so I started to calculate the answer of the problem.
However, this method is too slow. The computer has been calculating the answer for two hours and hasn't finished computing.
How can I improve my code to calculate faster?
hammar's comment makes it clear that the calculation time is disproportionately spent on values of n that are a multiple of 99. I would suggest finding an algorithm that targets those cases (I have left this as an exercise for the reader) and use Mathematica's pattern matching to direct the calculation to the appropriate one.
f[n_Integer?Positive]/; Mod[n,99]==0 := (* magic here *)
f[n_] := (* case for all other numbers *) Module[{i}, i = 1;
While[Mod[FromDigits[IntegerDigits[i, 3]], n] != 0, i = i + 1];
Return[FromDigits[IntegerDigits[i, 3]]/n]]
Incidentally, you can speed up the fast easy ones by doing it a slightly different way, but that is of course a second-order improvement. You could perhaps set the code up to use ff initially, breaking the While loop if i reaches a certain point, and then switching to the f function you have already provided. (Notice I'm returning n i not i here - that was just for illustrative purposes.)
ff[n_] :=
Module[{i}, i = 1; While[Max[IntegerDigits[n i]] > 2, i++];
Return[n i]]
Table[Timing[ff[n]], {n, 80, 90}]
{{0.000125, 1120}, {0.001151, 21222}, {0.001172, 22222}, {0.00059,
11122}, {0.000124, 2100}, {0.00007, 1020}, {0.000655,
12212}, {0.000125, 2001}, {0.000119, 2112}, {0.04202,
1121222}, {0.004291, 122220}}
This is at least a little faster than your version (reproduced below) for the short cases, but it's much slower for the long cases.
Table[Timing[f[n]], {n, 80, 90}]
{{0.000318, 14}, {0.001225, 262}, {0.001363, 271}, {0.000706,
134}, {0.000358, 25}, {0.000185, 12}, {0.000934, 142}, {0.000316,
23}, {0.000447, 24}, {0.006628, 12598}, {0.002633, 1358}}
A simple thing that you can do to is compile your function to C and make it parallelizable.
Clear[f, fCC]
f[n_Integer] := f[n] = fCC[n]
fCC = Compile[{{n, _Integer}}, Module[{i = 1},
While[Mod[FromDigits[IntegerDigits[i, 3]], n] != 0, i++];
Return[FromDigits[IntegerDigits[i, 3]]]],
Parallelization -> True, CompilationTarget -> "C"];
Total[ParallelTable[f[i]/i, {i, 1, 100}]]
(* Returns 11363107 *)
The problem is that eventually your integers will be larger than a long integer and Mathematica will revert to the non-compiled arbitrary precision arithmetic. (I don't know why the Mathematica compiler does not include a arbitrary precision C library...)
As ShreevatsaR commented, the project Euler problems are often designed to run quickly if you write smart code (and think about the math), but take forever if you want to brute force it. See the about page. Also, spoilers posted on their message boards are removed and it's considered bad form to post spoilers on other sites.
Aside:
You can test that the compiled code is using 32bit longs by running
In[1]:= test = Compile[{{n, _Integer}}, {n + 1, n - 1}];
In[2]:= test[2147483646]
Out[2]= {2147483647, 2147483645}
In[3]:= test[2147483647]
During evaluation of In[53]:= CompiledFunction::cfn: Numerical error encountered at instruction 1; proceeding with uncompiled evaluation. >>
Out[3]= {2147483648, 2147483646}
In[4]:= test[2147483648]
During evaluation of In[52]:= CompiledFunction::cfsa: Argument 2147483648 at position 1 should be a machine-size integer. >>
Out[4]= {2147483649, 2147483647}
and similar for the negative numbers.
I am sure there must be better ways to do this, but this is as far as my inspiration got me.
The following code finds all values of f[n] for n 1-10,000 except the most difficult one, which happens to be n = 9999. I stop the loop when we get there.
ClearAll[f];
i3 = 1;
divNotFound = Range[10000];
While[Length[divNotFound] > 1,
i10 = FromDigits[IntegerDigits[i3++, 3]];
divFound = Pick[divNotFound, Divisible[i10, divNotFound]];
divNotFound = Complement[divNotFound, divFound];
Scan[(f[#] = i10) &, divFound]
] // Timing
Divisible may work on lists for both arguments, and we make good use of that here. The whole routine takes about 8 min.
For 9999 a bit of thinking is necessary. It is not brute-forceable in a reasonable time.
Let P be the factor we are looking for and T (consisting only of 0's, 1's and 2's) the result of multiplication P with 9999, that is,
9999 P = T
then
P(10,000 - 1) = 10,000 P - P = T
==> 10,000 P = P + T
Let P1, ...PL be the digits of P, and Ti the digits of T then we have
The last four zeros in the sum originate of course from the multiplication by 10,000. Hence TL+1,...,TL+4 and PL-3,...,PL are each others complement. Where the former only consists of 0,1,2 the latter allows:
last4 = IntegerDigits[#][[-4 ;; -1]] & /# (10000 - FromDigits /# Tuples[{0, 1, 2}, 4])
==> {{0, 0, 0, 0}, {9, 9, 9, 9}, {9, 9, 9, 8}, {9, 9, 9, 0}, {9, 9, 8, 9},
{9, 9, 8, 8}, {9, 9, 8, 0}, {9, 9, 7, 9}, ..., {7, 7, 7, 9}, {7, 7, 7, 8}}
There are only 81 allowable sets, with 7's, 8's, 9's and 0's (not all possible combinations of them) instead of 10,000 numbers, a speed gain of a factor of 120.
One can see that P1-P4 can only have ternary digits, being the sum of ternary digit and naught. You can see there can be no carry over from the addition of T5 and P1. A further reduction can be gained by realizing that P1 cannot be 0 (the first digit must be something), and if it were a 2 multiplication with 9999 would cause a 8 or 9 (if a carry occurs) in the result for T which is not allowed either. It must be a 1 then. Two's may also be excluded for P2-P4.
Since P5 = P1 + T5 it follows that P5 < 4 as T5 < 3, same for P6-P8.
Since P9 = P5 + T9 it follows that P9 < 6, same for P10-P11
In all these cases the additions don't need to include a carry over as they can't occur (Pi+Ti always < 8). This may not be true for P12 if L = 16. In that case we can have a carry over from the addition of the last 4 digits . So P12 <7. This also excludes P12 from being in the last block of 4 digits. The solution must therefore be at least 16 digits long.
Combining all this we are going to try to find a solution for L=16:
Do[
If[Max[IntegerDigits[
9999 FromDigits[{1, 1, 1, 1, i5, i6, i7, i8, i9, i10, i11, i12}~
Join~l4]]
] < 3,
Return[FromDigits[{1, 1, 1, 1, i5, i6, i7, i8, i9, i10, i11, i12}~Join~l4]]
],
{i5, 0, 3}, {i6, 0, 3}, {i7, 0, 3}, {i8, 0, 3}, {i9, 0, 5},
{i10, 0, 5}, {i11, 0, 5}, {i12, 0, 6}, {l4,last4}
] // Timing
==> {295.372, 1111333355557778}
and indeed 1,111,333,355,557,778 x 9,999 = 11,112,222,222,222,222,222
We could have guessed this as
f[9] = 12,222
f[99] = 1,122,222,222
f[999] = 111,222,222,222,222
The pattern apparently being the number of 1's increasing with 1 each step and the number of consecutive 2's with 4.
With 13 min, this is over the 1 min limit for project Euler. Perhaps I'll look into it some time soon.
Try something smarter.
Build a function F(N) which finds out the smallest number with {0, 1, 2} digits which is divisible by N.
So for a given N the number which we are looking for can be written as SUM = 10^n * dn + 10^(n-1) * dn-1 .... 10^1 * d1 + 1*d0 (where di are the digits of the number).
so you have to find out the digits such that SUM % N == 0
basically each digits contributes to the SUM % N with (10^i * di) % N
I am not giving any more hints, but the next hint would be to use DP. Try to figure out how to use DP to find out the digits.
for all numbers between 1 and 10000 it took under 1sec in C++. (in total)
Good luck.

Modulo operator in Objective-C returns the wrong result

I'm a little freaked out by the results I'm getting when I do modulo arithmetic in Objective-C. -1 % 3 is coming out to be -1, which isn't the right answer: according to my understanding, it should be 2. -2 % 3 is coming out to -2, which also isn't right: it should be 1.
Is there another method I should be using besides the % operator to get the correct result?
Objective-C is a superset of C99 and C99 defines a % b to be negative when a is negative. See also the Wikipedia entry on the Modulo operation and this StackOverflow question.
Something like (a >= 0) ? (a % b) : ((a % b) + b) (which hasn't been tested and probably has unnecessary parentheses) should give you the result you want.
Spencer, there is a simple way to think about mods (the way it's defined in mathematics, not programming). It's actually rather straightforward:
Take all the integers:
...-9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ...
Now let's think about multiples of 3 (if you are considering mod 3). Let's start with 0 and the positive multiples of 3:
...-9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ...
These are all the numbers that have a remainder of zero when divided by 3, i.e. these are all the ones that mod to zero.
Now let's shift this whole group up by one.
...-9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ...
These are all the numbers that have a remainder of 1 when divided by 3, i.e. these are all the ones that mod to 1.
Now let's shift this whole group up again by one.
...-9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ...
These are all the numbers that have a remainder of 2 when divided by 3, i.e. these are all the ones that mod to 2.
You'll notice that in each of these cases, the selected numbers are spaced out by 3. We always take every third number because we're considering modulo 3. (If we were doing mod 5, we'd take every fifth number).
So, you can carry this pattern backwards into the negative numbers. Just keep the spacing of 3. You'll get these three congruence classes (a special type of equivalence classes, as they're called in mathematics):
...-9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ...
...-9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ...
...-9, -8, -7, -6, -5, -4, -3, -2, -1, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9 ...
The standard mathematical representation of all of these equivalent numbers is to use the residue of the class, which just means take the smallest non-negative number.
So usually, when I'm thinking about mods and I'm dealing with a negative number, I just think of successively adding the modulo number again and again until I get the first 0 or positive number:
If we're doing mod 3, then with -1, just add 3 once: -1 + 3 = 2.
With -4, add 3 twice because once isn't enough. If we add +3 once, we get -4 + 3 = -1, which is still negative. So We'll add +3 again: -1 + 3 = 2.
Let's try a larger negative number, like -23. If you keep adding +3, you'll get:
-23, -20, -17, -14, -11, -8, -5, -2, 1. We got a positive number, so we stop. The residue is 1, and this is the form that mathematicians typically use.
ANSI C99 6.5.5 Multiplicative operators-
6.5.5.5: The result of the / operator is the quotient from the division of the first operand by the second; the result of the % operator is the remainder. In both operations, if the value of the second operand is zero, the behavior is undefined.
6.5.5.6: When integers are divided, the result of the / operator is the algebraic quotient with any fractional part discarded (*90). If the quotient a/b is representable, the expression (a/b)*b + a%b shall equal a.
*90: This is often called "truncation toward zero".
The type of modulo behavior you're thinking of is called "modular arithmetic" or "number theory" style modulo / remainder. Using the modular arithmetic / number theory definition of the modulo operator, it is non-sensical to have a negative result. This is (obviously) not the style of modulo behavior defined and used by C99. There's nothing "wrong" with the C99 way, it's just not what you were expecting. :)
I had the same problem, but I worked it out! All you need to do is check if the number is positive or negative and if it's negative, you need to add one more number:
//neg
// -6 % 7 = 1
int testCount = (4 - 10);
if (testCount < 0) {
int moduloInt = (testCount % 7) + 7; // add 7
NSLog(#"\ntest modulo: %d",moduloInt);
}
else{
int moduloInt = testCount % 7;
NSLog(#"\ntest modulo: %d",moduloInt);
}
// pos
// 1 % 7 = 1
int testCount = (6 - 5);
if (testCount < 0) {
int moduloInt = (testCount % 7) + 7; // add 7
NSLog(#"\ntest modulo: %d",moduloInt);
}
else{
int moduloInt = testCount % 7;
NSLog(#"\ntest modulo: %d",moduloInt);
}
Hope that helps! A.
An explicit function that will give you the correct answer is at the end, but first, here is an explanation of some of the other ideas that were discussed:
Actually, (a >= 0) ? (a % b) : ((a % b) + b) will only result in the correct answer if the negative number, a, is within one multiple of b.
In other words: If you want to find: -1 % 3, then sure, (a >= 0) ? (a % b) : ((a % b)+ b) will work because you added back at the end in ((a % b) + b).
-1 % 3 = -1 and
-1 + 3 = 2, which is the correct answer.
However, if you try it with a = -4 and b = 3, then it won't work:
-4 % 3 = -4 but
-4 + 3 = -1.
While this is technically, also equivalent to 2 (modulo 3), I don't think this is the answer you are looking for. You're probably expecting the canonical form: which is that the answer should always be a non-negative number between 0 and n-1.
You'd have to add +3 twice to get the answer:
-4 + 3 = -1
-1 + 3 = 2
Here is an explicit way to do it:
a - floor((float) a/b)*b
** Be careful! Make sure you keep the (float) cast in there. Otherwise, it will divide a/b as integers and you'll get an unexpected answer for negatives. Of course this means that your result will be a float, too. It will be an integer written as a float, like 2.000000, so you might want to convert the whole answer back to an integer.
(int) (a - floor((float) a/b)*b)

SQL efficient nearest neighbour query

I'm having trouble coming up with an efficient SQL query to handle the following situation:
Assume we have a table with two columns
groupId : int
value : float
The table is huge (several million rows). There are a varying amount of "values" per "groupId" - say something between 100 and 50.000. All float values are greater or equal to zero but are otherwise unbounded.
For a given groupId the query should return all other groups sorted by decreasing similarity where "similar" is defined as minimum euclidian distance between all possible pairs of 30 values in two groups.
That definition of similarity is what kills me. I think for calculating similarity as defined above the naiive algorithm is O(n^2). Now I'm looking for ideas to either redefine "similarity" or an efficient implementation of the above. I could imagine a solution involving a k-nearest neighbour, something like PostGis geometrical nearest neighbours or maybe a largest common subsequence algorithm (although I'd need a "fuzzy" implementation of the latter because "values" will hardly ever compare exactly equal).
We are currently on mySQL in case it matters.
cheers,
Sören
Could you verify that I got the question right?
Your table represents vectors identified by the groupId. Every vector has a dimension of something between 100 and 50,000, but there is no order defined on the dimension. That is a vector from the table is actually a representative of equivalence class.
Now you define the similarity of two equivalence classes as the minimum Euclidian distance of the projections of any two representative of the equivalence classes to the subspace of the first 30 dimensions.
Examples for projection to two dimensions:
A = <1, 2, 3, 4>
B = <5, 6, 7, 8, 9, 10>
A represents the following equivalence class of vectors.
<1, 2, 3, 4> <2, 1, 2, 3> <3, 1, 2, 4> <4, 1, 2, 3>
<1, 2, 4, 4> <2, 1, 3, 2> <3, 1, 4, 2> <4, 1, 3, 2>
<1, 3, 2, 4> <2, 3, 1, 4> <3, 2, 1, 4> <4, 2, 1, 3>
<1, 3, 4, 2> <2, 3, 4, 1> <3, 2, 4, 1> <4, 2, 3, 1>
<1, 4, 2, 2> <2, 4, 1, 3> <3, 4, 1, 2> <4, 3, 1, 2>
<1, 4, 3, 2> <2, 4, 3, 1> <3, 4, 2, 1> <4, 3, 2, 1>
The projection of all representative of this equivalence class to the first two dimensions yields.
<1, 2> <1, 3> <1, 4>
<2, 1> <2, 3> <2, 4>
<3, 1> <3, 2> <3, 4>
<4, 1> <4, 2> <4, 3>
B represents a equivalence class with 720 elements. The projection to the first two dimensions yields 30 elements.
< 5, 6> < 5, 7> < 5, 8> < 5, 9> < 5, 10>
< 6, 5> < 6, 7> < 6, 8> < 6, 9> < 6, 10>
< 7, 5> < 7, 6> < 7, 8> < 7, 9> < 7, 10>
< 8, 5> < 8, 6> < 8, 7> < 8, 9> < 8, 10>
< 9, 5> < 9, 6> < 9, 7> < 9, 8> < 9, 10>
<10, 5> <10, 6> <10, 7> <10, 8> <10, 9>
So the distance of A and B is the square root of 8, because this is the minimum distance of two vectors from the projections. For example <3, 4> and <5, 6> yield this distance.
So, am I right with my understanding of the problem?
A really naive algorithm for n vectors with m components each would have to calculate (n - 1) distances. For each distance the algorithm would calculate the distances of m! / (m - 30)! projection for each vector. So for 100 dimensions (your lower bound) there are 2.65*10^32 possible projection for a vector. This requires to calculate about 7*10^64 distances between projections and finding the minimum to find the distance of two vectors. And then repeat this n times.
I hope that I misunderstood you or made a mistake. Else this sounds something between really challenging and not feasible.
Something I thought about is ordering the vector components and trying to match them. Using Manhattan distance - if possible - may help to simplify the solution.
Here are some nice approximations:
You could calculate the center of mass of each group and then compare based on the distance of each groups center of mass.
Another way you could do it is by hash the coordinates of each row and rows that hash to the same location are considered similar and thus the two groups similarity are updated.
Some more information would be helpful such as:
Is the information constantly being updated and if so at what interval.
How up to date and how accurate does it need to be?
The naive version would be something like this: (not run through query analyser)
select groupid, min(distance) as mindist
from
(select other.groupid as groupid,
min(abs(other.value - us.value)) as distance
from g us
join g other on other.groupid != us.groupid
where us.groupid = ?)
order by mindist
group by groupid
Then, to take advantage of indicies:
select groupid, min(abs(value - usvalue)) as mindist
from
(select other.groupid as groupid,
max(other.value) as value,
us.value as usvalue
from g us
join g other on other.groupid != us.groupid and other.value <= us.value
where us.groupid = ?
union
select other.groupid as groupid,
min(other.value) as value,
us.value as usvalue
from g us
join g other on other.groupid != us.groupid and other.value >= us.value
where us.groupid = ?)
order by mindist
group by groupid
This should hopefully allow mysql to use an index to quickly find the nearest neighbors on the join.
There might be errors in this, but hopefully this line of thought will help.
All float values are greater or equal to zero but are otherwise unbounded.
If you want to do KNN on floats, use the btree_gist module for PostgreSQL and create a GIST index.
Also, for data types for which there is a natural distance metric, btree_gist defines a distance operator <->, and provides GiST index support for nearest-neighbor searches using this operator. Distance operators are provided for int2, int4, int8, float4, float8, timestamp with time zone, timestamp without time zone, time without time zone, date, interval, oid, and money.
float8 is double precision.