CodeChef C_HOLIC2 Solution Find the smallest N whose factorial produces P Trailing Zeroes - series

For CodeChef problem C_HOLIC2, I tried iterating over elements: 5, 10, 15, 20, 25,... and for each number checking the number of trailing zeros using the efficient technique as specified over here, but got TLE.
What is the fastest way to solve this using formula method?
Here is the Problem Link

As we know for counting the number of trailing zeros in factorial of a number, the trick used is:
The number of multiples of 5 that are less than or equal to 500 is 500÷5=100
Then, the number of multiples of 25 is 500÷25=20
Then, the number of multiples of 125 is 500÷125=4
The next power of 5 is 625, which is > than 500.
Therefore, the number of trailing zeros of is 100+20+4=124
For detailed explanation check this page
Thus, this count can be represented as:
Using this trick, given a number N you can determine the no. of trailing zeros count in its factorial. Codechef Problem Link
Now, suppose we are given the no. of trailing zeros, count and we are asked the smallest no. N whose factorial has count trailing zeros Codechef Problem Link
Here the question is how can we split count into this representation?
This is a problem because in the following examples, as we can see it becomes difficult.
The count jumps even though the no is increasing by the same amount.
As you can see from the following table, count jumps at values whose factorials have integral powers of 5 as factors e.g. 25, 50, ..., 125, ...
+-------+-----+
| count | N |
+-------+-----+
| 1 | 5 |
+-------+-----+
| 2 | 10 |
+-------+-----+
| 3 | 15 |
+-------+-----+
| 4 | 20 |
+-------+-----+
| 6 | 25 |
+-------+-----+
| 7 | 30 |
+-------+-----+
| 8 | 35 |
+-------+-----+
| 9 | 40 |
+-------+-----+
| 10 | 45 |
+-------+-----+
| 12 | 50 |
+-------+-----+
| ... | ... |
+-------+-----+
| 28 | 120 |
+-------+-----+
| 31 | 125 |
+-------+-----+
| 32 | 130 |
+-------+-----+
| ... | ... |
+-------+-----+
You can see this from any brute force program for this task, that these jumps occur frequently i.e. at 6, 12, 18, 24 in case of numbers whose factorials have 25.(Interval = 6=1×5+1)
After N=31 factorials will also have a factor of 125. Thus, these jumps corresponding to 25 will still occur with the same frequency i.e. at 31, 37, 43, ...
Now the next jump corresponding to 125 will be at 31+31 which is at 62. Thus jumps corresponding to 125 will occur at 31, 62, 93, 124.(Interval =31=6×5+1)
Now the jump corresponding to 625 will occur at 31×5+1=155+1=156
Thus you can see there exists a pattern. We need to find the formula for this pattern to proceed.
The series formed is 1, 6, 31, 156, ...
which is 1 , 1+5 , 1+5+52 , 1+5+52+53 , ...
Thus, nth term is sum of n terms of G.P. with a = 1, r = 5
Thus, the count can be something like 31+31+6+1+1, etc.
We need to find this tn which is less than count but closest to it. i.e.
Say the number is count=35, then using this we identify that tn=31 is closest. For count=63 we again see that using this formula, we get tn=31 to be the closest but note that here, 31 can be subtracted twice from count=63. Now we go on finding this n and keep on subtracting tn from count till count becomes 0.
The algorithm used is:
count=read_no()
N=0
while count!=0:
n=floor(log(4*count+1,5))
baseSum=((5**n)-1)/4
baseOffset=(5**n)*(count/baseSum) // This is integer division
count=count%baseSum
N+=baseOffset
print(N)
Here, 5**n is 5n
Let's try working this out for an example:
Say count = 70,
Iteration 1:
Iteration 2:
Iteration 3:
Take another example. Say count=124 which is the one discussed at the beginning of this page:
Iteration 1:
PS: All the images are completely owned by me. I had to use images because StackOverflow doesn't allow MathJax to be embedded. #StackOverflowShouldAllowMathJax

Related

FormulaArray not averaging out all the specified entries

Table 1:
G H I J K
| Lane | Bowler | Score | Score | Score | 1
|:-----------|------------:|:------------:|:------------:|:------------:|
| Lane 1 | Thomas| 100 | 100 | 100 | 2
| Lane 2 | column | 200 | 200 | 100 | 3
| Lane 3 | Mary | 300 | 300 | 100 | 4
| Lane 1 | Cool | 150 | 400 | 100 | 5
| Lane 2 | right | 160 | 500 | 100 | 6
| Lane 9 | Susan | 170 | 600 | 100 | 7
say I want to find the average for each Lane that appeared in table 2 and put them in column O:
Table 2:
N O
| Lane | Average | 1
|:-----------|------------:|
| Lane 1 | | 2
| Lane 2 | | 3
| Lane 3 | | 4
I would put
=AVERAGE(IF(N2=$G$2:$G$7, $I$2:$K$7 )) for lane 1 (put this formula on cell "O2")
=AVERAGE(IF(N3=$G$2:$G$7, $I$2:$K$7 )) for Lane 2 ("O3")
=AVERAGE(IF(N4=$G$2:$G$7, $I$2:$K$7 )) for Lane 2 ("O4")
My first question is
What if I want to find the Average of ALL the lane together that appear in table 2. So average of Lane 1, Lane 2 and Lane 3 together (but not other lane, such as lane 9).
My attempt:
= Average(IF(G2:G7 = N2:N4, I2:K:7)) why doesn't this work?
My second question is
I have done the "average of each individual Lane" using vba:
.
Dim i As Integer
For i = 2 To 4
Cells(i, 15).FormulaArray = "=AVERAGE(IF(RC[-1]=R2C7:R7C7,R2C9:R7C12))"
Next i
.
What if I have done it using vba without the .formula method
For Lane 1 only:
pseudo code:
Loop from G2 to G7
If cell (N1) = Gx then //x: 2 to 7
Sum = Sum + Ix + Jx + Kx
}
Average = Sum/totalEntries
Would this be slower than if I were to use the build in .formula? is there a advanage to doing it this way instead?
The answer to the first question about why this FormulaArray
= Average(IF(G2:G7 = N2:N4, I2:K7)) doesn't work?
Is implicit on how this other FormulaArray works:
= AVERAGE( IF( $G$7:$G$12 = $N7, $I$7:$K$12 ) )
Let’s see how each part of this “single-cell formula array” works:
1st part: $G$7:$G$12 = $N7
The first part of the formula generates an array with the records from range $G$7:$G$12 complying with the condition = $N7. Fig. 1 shows the first part of the FormulaArray in as a “multi-cell formula array”.
2nd Part: $I$7:$K$12
The result of the first part is applied to the second part to obtain the range of scores complying with the condition = $N7 (see Fig. 2)
3rd part: AVERAGE
Finally the last part of the formula calculates the average of the scores complying with the condition = $N7
Now let’s try to apply the same analysis to the formula:
= AVERAGE( IF( G2:G7 = N2:N4, I2:K7 ) )
Unfortunately, we cannot go beyond the first part G2:G7 = N2:N4 as it fails trying to compare two arrays of different dimensions thus resulting in #N/A (see Fig. 3)
However, even if the arrays have same dimension the result would not have shown the duplicated values, as the members are compared one to one (see Fig. 4)
To obtain the average for Lanes 1 to 3 use this FormulaArray
=AVERAGE( IF(
( $G$7:$G$12 = $N7 ) + ( $G$7:$G$12 = $N8 ) + ( $G$7:$G$12 = $N9 ),
$I$7:$K$12 ) )
It generates an array with the records complying with the conditions = $N7 + = $N8 + = $N9 (+ equivalent to operator OR)
As regards the second question:
Performance is intrinsically associated to maintenance and efficiency.
The sample procedure just enters a formula which is hard coded and only works for this particular case, for example:
If needed to change the formulas to expand the ranges, the macro has to be updated, it may still have to change the formula but no need to open the VBA editor.
If any of the columns before column G get deleted as it becomes obsolete, the macro needs to be updated, while the formulas will not require any maintenance as they are automatically updated.
In reference to the macro without the .Formula method
I found this redundant, as it’s like writing an algorithm to do something that can be done efficiently and accurately with an existing function, as such a macro will not bring anything that's it's not there actually.
I'll consider the advantage of writing such a procedure in a situation in which the workbook is very large and it heavily uses resource significantly slowing down the performance of the workbook, however the advantages to be delivered by the procedure will not reside and just writing the formulas but it must calculate the results and enter the values resulting from the formulas instead of the formulas thus making the workbook light, fast and smooth to the end user.
To get the average of them all, just use
=AVERAGE(I2:K7)
As to the VBA, as it is all done on the same lines, could you just use
For i = 2 To 7
Cells(i,"O").Value = Application.Sum(Range(Cells(i,"I"),Cells(i,"K")))
Next i

Ransack search- select rows whose sum adds up to a given value

Im using ransack search with ruby on rails and trying to output random rows between 1-6, whose time adds up to a given value specified by the search.
For example search for rows whose time value adds up to 40. In this case id 12 and 14 will be returned. Any combination between 1-6 can be randomly outputted.
If a combination of 3 rows meet the criteria then 3 rows should be outputted. likewise 1,2,3,4,5,6. If no single row or combination can be found then the output should return nil
id | title | time
----+-------------------------+-----------
26 | example | 10
27 | example | 26
14 | example | 20
28 | example | 50
12 | example | 20
20 | example | 6
Note - Not sure if ransack search is the best to perform this type of query
Thanks in advance

Tabulate Command Stata

I don't know if Stata can do this but I use the tabulate command a lot in order to find frequencies. For instance, I have a success variable which takes on values 0 to 1 and I would like to know the success rate for a certain group of observations ie tab success if group==1. I was wondering if I can do sort of the inverse of this operation. That is, I would like to know if I can find a value of "group" for which the frequency is greater than or equal to 15% for example.
Is there a command that does this?
Thanks
As an example
sysuse auto
gen success=mpg<29
Now I want to find the value of price such that the frequency of the success variable is greater than 75% for example.
According to #Nick:
ssc install groups
sysuse auto
count
74
#return list optional
local nobs=r(N) # r(N) gives total observation
groups rep78, sel(f >(0.15*`r(N)')) #gives the group for which freq >15 %
+---------------------------------+
| rep78 Freq. Percent % <= |
|---------------------------------|
| 3 30 43.48 57.97 |
| 4 18 26.09 84.06 |
+---------------------------------+
groups rep78, sel(f >(0.10*`nobs'))# more than 10 %
+----------------------------------+
| rep78 Freq. Percent % <= |
|----------------------------------|
| 2 8 11.59 14.49 |
| 3 30 43.48 57.97 |
| 4 18 26.09 84.06 |
| 5 11 15.94 100.00 |
+----------------------------------+
I'm not sure if I fully understand your question/situation, but I believe this might be useful. You can egen a variable that is equal to the mean of success, by group, and then see which observations have the value for mean(success) that you're looking for.
egen avgsuccess = mean(success), by(group)
tab group if avgsuccess >= 0.15
list group if avgsuccess >= 0.15
Does that accomplish what you want?

Luke reveals unknown term values for numeric fields in index

We use Lucene.net for indexing. One of the fields that we index, is a numeric field with the values 1 to 6 and 9999 for not set.
When using Luke to explore the index, we see terms that we do not recognize. The index contains a total of 38673 documents, and Luke shows the following top ranked terms for this field:
Term | Rank | Field | Text | Text (decoded as numeric-int)
1 | 38673 | Axis | x | 0
2 | 38673 | Axis | p | 0
3 | 38673 | Axis | t | 0
4 | 38673 | Axis | | | 0
5 | 19421 | Axis | l | 0
6 | 19421 | Axis | h | 0
7 | 19421 | Axis | d# | 0
8 | 19252 | Axis | ` N | 9999
9 | 19252 | Axis | l | 8192
10 | 19252 | Axis | h ' | 9984
11 | 19252 | Axis | d# p | 9984
12 | 18209 | Axis | ` | 4
13 | 950 | Axis | ` | 1
14 | 116 | Axis | ` | 5
15 | 102 | Axis | ` | 6
16 | 26 | Axis | ` | 3
17 | 18 | Axis | ` | 2
We find the same pattern for other numeric fields.
Where does the unknown values come from?
NumericFields are indexed using a trie structure. The terms you see are part of it, but will not return results if you query for them.
Try indexing your NumericField with a precision step of Int32.MaxValue and the values will go away.
NumericField documentation
... Within Lucene, each numeric value is indexed as a trie structure, where each term is logically assigned to larger and larger pre-defined brackets (which are simply lower-precision representations of the value). The step size between each successive bracket is called the precisionStep, measured in bits. Smaller precisionStep values result in larger number of brackets, which consumes more disk space in the index but may result in faster range search performance. The default value, 4, was selected for a reasonable tradeoff of disk space consumption versus performance. You can use the expert constructor NumericField(String,int,Field.Store,boolean) if you'd like to change the value. Note that you must also specify a congruent value when creating NumericRangeQuery or NumericRangeFilter. For low cardinality fields larger precision steps are good. If the cardinality is < 100, it is fair to use Integer.MAX_VALUE, which produces one term per value. ...
More details on the precision step available in the NumericRangeQuery documentation:
Good values for precisionStep are depending on usage and data type:
• The default for all data types is 4, which is used, when no
precisionStep is given.
• Ideal value in most cases for 64 bit data
types (long, double) is 6 or 8.
• Ideal value in most cases for 32 bit
data types (int, float) is 4.
• For low cardinality fields larger
precision steps are good. If the cardinality is < 100, it is fair to use •Integer.MAX_VALUE (see below).
• Steps ≥64 for long/double and
≥32 for int/float produces one token per value in the index and
querying is as slow as a conventional TermRangeQuery. But it can be
used to produce fields, that are solely used for sorting (in this case
simply use Integer.MAX_VALUE as precisionStep). Using NumericFields
for sorting is ideal, because building the field cache is much faster
than with text-only numbers. These fields have one term per value and
therefore also work with term enumeration for building distinct lists
(e.g. facets / preselected values to search for). Sorting is also
possible with range query optimized fields using one of the above
precisionSteps.
EDIT
little sample, the index produced by this will show terms with value 8192, 9984, 1792, etc in luke, but using a range that would include them in the query doesnt produce results:
NumericField number = new NumericField("number", Field.Store.YES, true);
Field regular = new Field("normal", "", Field.Store.YES, Field.Index.ANALYZED);
IndexWriter iw = new IndexWriter(FSDirectory.GetDirectory("C:\\temp\\testnum"), new StandardAnalyzer(), true);
Document doc = new Document();
doc.Add(number);
doc.Add(regular);
number.SetIntValue(1);
regular.SetValue("one");
iw.AddDocument(doc);
number.SetIntValue(2);
regular.SetValue("one");
iw.AddDocument(doc);
number.SetIntValue(13);
regular.SetValue("one");
iw.AddDocument(doc);
number.SetIntValue(2000);
regular.SetValue("one");
iw.AddDocument(doc);
number.SetIntValue(9999);
regular.SetValue("one");
iw.AddDocument(doc);
iw.Commit();
IndexSearcher searcher = new IndexSearcher(iw.GetReader());
NumericRangeQuery rangeQ = NumericRangeQuery.NewIntRange("number", 1, 2, true, true);
var docs = searcher.Search(rangeQ);
Console.WriteLine(docs.Length().ToString()); // prints 2
rangeQ = NumericRangeQuery.NewIntRange("number", 13, 13, true, true);
docs = searcher.Search(rangeQ);
Console.WriteLine(docs.Length().ToString()); // prints 1
rangeQ = NumericRangeQuery.NewIntRange("number", 9000, 9998, true, true);
docs = searcher.Search(rangeQ);
Console.WriteLine(docs.Length().ToString()); // prints 0
Console.ReadLine();

How to represent and insert into an ordered list in SQL?

I want to represent the list "hi", "hello", "goodbye", "good day", "howdy" (with that order), in a SQL table:
pk | i | val
------------
1 | 0 | hi
0 | 2 | hello
2 | 3 | goodbye
3 | 4 | good day
5 | 6 | howdy
'pk' is the primary key column. Disregard its values.
'i' is the "index" that defines that order of the values in the 'val' column. It is only used to establish the order and the values are otherwise unimportant.
The problem I'm having is with inserting values into the list while maintaining the order. For example, if I want to insert "hey" and I want it to appear between "hello" and "goodbye", then I have to shift the 'i' values of "goodbye" and "good day" (but preferably not "howdy") to make room for the new entry.
So, is there a standard SQL pattern to do the shift operation, but only shift the elements that are necessary? (Note that a simple "UPDATE table SET i=i+1 WHERE i>=3" doesn't work, because it violates the uniqueness constraint on 'i', and also it updates the "howdy" row unnecessarily.)
Or, is there a better way to represent the ordered list? I suppose you could make 'i' a floating point value and choose values between, but then you have to have a separate rebalancing operation when no such value exists.
Or, is there some standard algorithm for generating string values between arbitrary other strings, if I were to make 'i' a varchar?
Or should I just represent it as a linked list? I was avoiding that because I'd like to also be able to do a SELECT .. ORDER BY to get all the elements in order.
As i read your post, I kept thinking 'linked list'
and at the end, I still think that's the way to go.
If you are using Oracle, and the linked list is a separate table (or even the same table with a self referencing id - which i would avoid) then you can use a CONNECT BY query and the pseudo-column LEVEL to determine sort order.
You can easily achieve this by using a cascading trigger that updates any 'index' entry equal to the new one on the insert/update operation to the index value +1. This will cascade through all rows until the first gap stops the cascade - see the second example in this blog entry for a PostgreSQL implementation.
This approach should work independent of the RDBMS used, provided it offers support for triggers to fire before an update/insert. It basically does what you'd do if you implemented your desired behavior in code (increase all following index values until you encounter a gap), but in a simpler and more effective way.
Alternatively, if you can live with a restriction to SQL Server, check the hierarchyid type. While mainly geared at defining nested hierarchies, you can use it for flat ordering as well. It somewhat resembles your approach using floats, as it allows insertion between two positions by assigning fractional values, thus avoiding the need to update other entries.
If you don't use numbers, but Strings, you may have a table:
pk | i | val
------------
1 | a0 | hi
0 | a2 | hello
2 | a3 | goodbye
3 | b | good day
5 | b1 | howdy
You may insert a4 between a3 and b, a21 between a2 and a3, a1 between a0 and a2 and so on. You would need a clever function, to generate an i for new value v between p and n, and the index can become longer and longer, or you need a big rebalancing from time to time.
Another approach could be, to implement a (double-)linked-list in the table, where you don't save indexes, but links to previous and next, which would mean, that you normally have to update 1-2 elements:
pk | prev | val
------------
1 | 0 | hi
0 | 1 | hello
2 | 0 | goodbye
3 | 2 | good day
5 | 3 | howdy
hey between hello & goodbye:
hey get's pk 6,
pk | prev | val
------------
1 | 0 | hi
0 | 1 | hello
6 | 0 | hi <- ins
2 | 6 | goodbye <- upd
3 | 2 | good day
5 | 3 | howdy
the previous element would be hello with pk=0, and goodbye, which linked to hello by now has to link to hey in future.
But I don't know, if it is possible to find a 'order by' mechanism for many db-implementations.
Since I had a similar problem, here is a very simple solution:
Make your i column floats, but insert integer values for the initial data:
pk | i | val
------------
1 | 0.0 | hi
0 | 2.0 | hello
2 | 3.0 | goodbye
3 | 4.0 | good day
5 | 6.0 | howdy
Then, if you want to insert something in between, just compute a float value in the middle between the two surrounding values:
pk | i | val
------------
1 | 0.0 | hi
0 | 2.0 | hello
2 | 3.0 | goodbye
3 | 4.0 | good day
5 | 6.0 | howdy
6 | 2.5 | hey
This way the number of inserts between the same two values is limited to the resolution of float values but for almost all cases that should be more than sufficient.