Dependent types over datatypes - dependent-type

I struggled for a bit with writing a function that could only be passed certain days of the week. I expected that this would work:
datatype days = sunday | monday | tuesday | wednesday | thursday | friday | saturday
fn only_sunday{d:days | d == sunday}(d: days(d)): days(d) = d
but of course, days(d) was never even defined. That only seemed like it would work because ATS has so many builtin types - int and also int(int), etc.
This also doesn't work, but perhaps it's just the syntax that's wrong?
typedef only_sunday = { d:days | d == sunday }
fn only_sunday(d: only_sunday): only_sunday = d
After revisiting the Dependent Types chapter of Introduction to Programming in ATS, I realized that this works:
datatype days(int) =
| sunday(0)
| monday(1)
| tuesday(2)
| wednesday(3)
| thursday(4)
| friday(5)
| saturday(6)
fn print_days{n:int}(d: days(n)): void =
print_string(case+ d of
| sunday() => "sunday"
| monday() => "monday"
| tuesday() => "tuesday"
| wednesday() => "wednesday"
| thursday() => "thursday"
| friday() => "friday"
| saturday() => "saturday")
overload print with print_days
fn sunday_only{n:int | n == 0}(d: days(n)): days(n) = d
implement main0() =
println!("this typechecks: ", sunday_only(sunday))
but of course, it's a little bit less clear that n == 0 means "the day must be sunday" than it would with some code like d == sunday. And although it doesn't seem that unusual to map days to numbers, consider:
datatype toppings(int) = lettuce(0) | swiss_cheese(1) | mushrooms(2) | ...
In this case the numbers are completely senseless, such that you can only understand any {n:int | n != 1} ... toppings(n) code as anti-swiss-cheese if you have the datatype definition at hand. And if you were to edit in a new topping
datatype toppings(int) = lettuce(0) | tomatoes(1) | swiss_cheese(2) | ...
then it would be quite a chore update 1 to 2 in only any Swiss cheese code.
Is there a more symbolic way to use dependent types?

You could try something like this:
stadef lettuce = 0
stadef swiss_cheese = 1
stadef mushrooms = 2
datatype
toppings(int) =
| lettuce(lettuce)
| swiss_cheese(swiss_cheese)
| mushrooms(mushrooms)

Related

Confusing matching behaviour of pandas extract(all)

I have a strange problem. But first, I want to match a hierarchy-based string onto the value of a column in a pandas data frame and count the occurrence of the current node and all of its children.
| index | hierarchystr |
| ----- | --------------------- |
| 0 | level0level00level000|
| 1 | level0level01 |
| 2 | level0level02level021|
| 3 | level0level02level021|
| 4 | level0level02level020|
| 5 | level0level02level021|
| 6 | level1level02level021|
| 7 | level1level02level021|
| 8 | level1level02level021|
| 9 | level2level02level021|
Assume that there are 300k lines. Each node can have multiple children with again multiple children so on and so forth (here represented by level0-2 strings). Now I have a separate hierarchy where I extract the hierarchy strings from. Now to the problem:
#hstrs = ["level0", "level1", "level0level01", "level0level02", "level0level02level021"]
pat = "|".join(hstrs)
s = df.hierarchystr.str.extract('(' + pat + ')', expand=True)[0]
df1 = df.groupby(s).size().reset_index(name='Count')
df1 = df1[df1 > 200]
size = len(df1)
The size of the found matched substrings with occurrence greater than 200 differ every RUN! "level0" should match every row where the hierarchy str level0 is included and should build a group with all its subchildren and that size needs to be greater than 200.
Edit:// levelX is just an example, i have thousands of nodes, with different names and again thousands of different subchilds. The hstrs strings do not include each other, besides the parent nodes. (E.g. "parent1" is included in "parent1subchild1" and "parent1subchild2")
I traced it back to a different order of the hierarchy strings in the array hstrs. So I changed the code and compare each substring individually:
for hstr in hstrs:
s = df.hierarchystr.str.extract('(' + hstr + ')', expand=True)
s2 = s.count()
s3 = s2.values[0]
if s3 > 200:
list.append(hstr)
This is slow as hell, but the result sticks the same, no matter which order hstrs has. But for efficiency is it possible to do the same with only one regex matching group, all at once for all hstrs?
Edit://
expected output would be:
|index| 0 | Count |
|-----|---------------------|-------|
|0 |level0 | 5 |
|1 |level1 | 3 |
|2 |level0level01 | 1 |
|3 |level0level02 | 4 |
|4 |level0level02level021| 3 |
Edit2://
it has something to do with the ordering of hstrs. I think with the match and stop after the first match the behavior of the extract method. If the ordering is different the hierarchy strings in the pat will be matched differently which results in different sizes of each group. A high hierarchy (short str) will be matched first, the lower hierarchy levels in the same pat won't be matched again. But IDK what to do against this behavior.
Edit3://
an alternative would be, but is also slow as hell:
for hstr in hstrs:
s = df[df.hierarchystr.str.contains(fqn)]
s2 = s.count()
s3 = s2.values[0]
if s3 > 200:
beforeset.append(fqn)
Edit4://
I think what I am searching for is the opportunity to do a "group_by" with "contains" or "is in" for the hstrs. I am glad for every Idea. :)
Edit5://
Found a simple, but not satisfying alternative (but faster than the previous tries):
containing =[item for hierarchystr in df.hierarchystr for item in hstrs if item in hierarchystr]
containing = Counter(containing)
df1 = pd.DataFrame([containing]).T
nodeNamesWithOver200 = df1[df1 > 200].dropna().index.values

Do While statement not progressing to second result set in laravel using PDO

I have a stored procedure that returns multiple result sets. The results contain stats for a player, each result represents a year in which we have stats for him.
Stored procedure return from mysql CLI:
+----------+-----------+---------+------+
| compperc | passyards | passtds | ints |
+----------+-----------+---------+------+
| 61.4 | 319 | 2 | 1 |
| 85.7 | 76 | 0 | 0 |
| 20.0 | 9 | 0 | 1 |
| 57.1 | 30 | 0 | 0 |
| 100.0 | 59 | 1 | 0 |
| 66.7 | 21 | 0 | 0 |
| 50.0 | 86 | 1 | 0 |
| 60.0 | 38 | 0 | 0 |
+----------+-----------+---------+------+
8 rows in set (0.00 sec)
+----------+-----------+---------+------+
| compperc | passyards | passtds | ints |
+----------+-----------+---------+------+
| 80.0 | 40 | 0 | 0 |
| 0.0 | 0 | 0 | 0 |
| 100.0 | 40 | 0 | 0 |
+----------+-----------+---------+------+
3 rows in set (0.00 sec)
I'm using Laravel 4.1.2 and call the procedure in my Player Model with a raw PDO prepared statement:
$statDB = DB::connection('mysql')->getPdo();
$statDB->setAttribute(PDO::ATTR_EMULATE_PREPARES, true);
$results = $statDB->prepare("CALL fullstats(:id);");
$results->execute(array(':id'=> $id));
The previous block of code pulls in the proper result sets (manually iterating using
if($statDB->nextRowset()) { $statArr[] = $results->fetchAll(PDO::FETCH_ASSOC); } works) but when I try to iterate through it using a do-while statement it never gets to the second result set.
$statArr = array();
do
{
$statArr[] = $results->fetchAll(PDO::FETCH_ASSOC);
} while ($results->nextRowset());
I can add dd($statArr); immediately after I set the initial $statArr[] and it will return the set of stats for the first year. I can also add dd($results->nextRowset()); after I set the $statArr[] and it returns true so it theoretically should move through the additional result sets. If I let the statement execute I get a generic error from Laravel: PDOException SQLSTATE[HY000]: General error. It provides no additional details as to what's going wrong. I've tried the same do-while statement in a raw php file (on a different domain but the same server) using PDO and it works without a problem.
Is there some configuration option that I need to set to get this to work? I've been beating my head against the problem for an entire day and can't figure out why this isn't working. Any help is greatly appreciated.
Update:
The PDOException SQLSTATE[HY000]: General error is coming from the second $statArr[] = $results->fetchAll(); so it's entering the second result set, it just won't fetch the data. I also removed the PDO::FETCH_ASSOC from fetchAll() as I've read it doesn't work properly but the issue persists.
It turns out this is a bug in the PDO/mysql driver. The nextRowset() method doesn't return false when it's out of rowsets so the fetch statement tries to access data that isn't there. This causes the nondescript general error on the fetchall() method and in turn breaks the script.
The PHP bug report:
https://bugs.php.net/bug.php?id=62820
Since the nextRowset() method wasn't working properly I added a build query that gives me the count of the results I should expect and then have a counter in the loop with an if statement that breaks the loop if the loop count is equal to the count returned by the query.
Here's the working function:
public function allstats($id) {
$statDB = DB::connection('mysql')->getPdo();
$statDB->setAttribute(PDO::ATTR_EMULATE_PREPARES, true);
$years = $statDB->prepare('select count(distinct year) from pass_stats where player_id = :id');
$years->execute(array(':id'=> $id));
$years = (int) $years->fetchColumn();
$results = $statDB->prepare("CALL fullstats(:id);");
$results->execute(array(':id'=> $id));
$statArr = array();
$i = 1; // Set to one to mimic sql count
do
{
if($i === $years) { break; }
foreach($results->fetchall(PDO::FETCH_ASSOC) as $g) {
$statArr[$g['year']][] = $g;
}
$i++;
} while ($results->nextRowset()); // Check for next rowset
return $statArr;
}

Optimizing working scheduling MiniZinc code - constraint programming

Please can you help optimize this working MiniZinc code:
Task: There is a conference which has 6x time slots. There are 3 speakers attending the conference who are each available at certain slots. Each speaker will present for a predetermined number of slots.
Objective: Produce the schedule that has the earliest finish of speakers.
Example: Speakers A, B & C. Talk durations = [1, 2, 1]
Speaker availability:
+---+------+------+------+
| | Sp.A | Sp.B | Sp.C |
+---+------+------+------+
| 1 | | Busy | |
| 2 | Busy | Busy | Busy |
| 3 | Busy | Busy | |
| 4 | | | |
| 5 | | | Busy |
| 6 | Busy | Busy | |
+---+------+------+------+
Link to working MiniZinc code: http://pastebin.com/raw.php?i=jUTaEDv0
What I'm hoping to optimize:
% ensure allocated slots don't overlap and the allocated slot is free for the speaker
constraint
forall(i in 1..num_speakers) (
ending_slot[i] = starting_slot[i] + app_durations[i] - 1
) /\
forall(i,j in 1..num_speakers where i < j) (
no_overlap(starting_slot[i], app_durations[i], starting_slot[j], app_durations[j])
) /\
forall(i in 1..num_speakers) (
forall(j in 1..app_durations[i]) (
starting_slot[i]+j-1 in speaker_availability[i]
)
)
;
Expected solution:
+---+----------+----------+----------+
| | Sp.A | Sp.B | Sp.C |
+---+----------+----------+----------+
| 1 | SELECTED | Busy | |
| 2 | Busy | Busy | Busy |
| 3 | Busy | Busy | SELECTED |
| 4 | | SELECTED | |
| 5 | | SELECTED | Busy |
| 6 | Busy | Busy | |
+---+----------+----------+----------+
I'm hakank (author of the original model). If I understand it correctly, your question now is how to present the table for the optimal solution, not really about finding the solution itself (all FlatZinc solvers I tested solved it in no time).
One way of creating the table is to have a help matrix ("m") which contain information if a speaker is selected (1), busy (-1) or not available (0):
array[1..num_slots, 1..num_speakers] of var -1..1: m;
Then one must connect info in this the matrix and the other decision variables ("starting_slot" and "ending_slot"):
% connect to matrix m
constraint
forall(t in 1..num_slots) (
forall(s in 1..num_speakers) (
(not(t in speaker_availability[s]) <-> m[t,s] = -1)
/\
((t >= starting_slot[s] /\ t <= ending_slot[s]) <-> m[t,s] = 1)
)
)
;
Then the matrix "m" can be printed like this:
% ...
++
[
if s = 1 then "\n" else " " endif ++
if fix(m[t,s]) = -1 then
"Busy "
elseif fix(m[t,s]) = 1 then
"SELECTED"
else
" "
endif
| t in 1..num_slots, s in 1..num_speakers
]
;
As always, there are more than one way of doing this, but I settled with this since it's quite direct.
Here's the complete model:
http://www.hakank.org/minizinc/scheduling_speakers_optimize.mzn
Update: Adding the output of the model:
Starting: [1, 4, 3]
Durations: [1, 2, 1]
Ends: [1, 5, 3]
z: 5
SELECTED Busy
Busy Busy Busy
Busy Busy SELECTED
SELECTED
SELECTED Busy
Busy Busy
----------
==========
Update 2:
Another way is to use cumulative/4 instead of no_overlap/4 which should be more effective, i.e.
constraint
forall(i in 1..num_speakers) (
ending_slot[i] = starting_slot[i] + app_durations[i] - 1
)
% /\ % use cumulative instead (see below)
% forall(i,j in 1..num_speakers where i < j) (
% no_overlap(starting_slot[i], app_durations[i], starting_slot[j], app_durations[j])
% )
/\
forall(i in 1..num_speakers) (
forall(j in 1..app_durations[i]) (
starting_slot[i]+j-1 in speaker_availability[i]
)
)
/\ cumulative(starting_slot, app_durations, [1 | i in 1..num_speakers], 1)
;
Here's the altered version (which give the same result)
http://www.hakank.org/minizinc/scheduling_speakers_optimize2.mzn
(I've also skipped the presentation matrix "m" and do all presentation in the output section.)
For this simple problem instance, there is no discernible difference, but for larger instances this should be faster. (And for larger instances, one might want to test different search heuristics instead of "solve minimize z".)
As I commented on your previous question Constraint Programming: Scheduling speakers in shortest time, the cumulative constraint is appropriate for this. I don't have Minizinc code handy, but there is the model in ECLiPSe (http://eclipseclp.org):
:- lib(ic).
:- lib(ic_edge_finder).
:- lib(branch_and_bound).
solve(JobStarts, Cost) :-
AllUnavStarts = [[2,6],[1,6],[2,5]],
AllUnavDurs = [[2,1],[3,1],[1,1]],
AllUnavRess = [[1,1],[1,1],[1,1]],
JobDurs = [1,2,1],
Ressources = [1,1,1],
length(JobStarts, 3),
JobStarts :: 1..9,
% the jobs must not overlap with each other
cumulative(JobStarts, JobDurs, Ressources, 1),
% for each speaker, no overlap of job and unavailable periods
(
foreach(JobStart,JobStarts),
foreach(JobDur,JobDurs),
foreach(UnavStarts,AllUnavStarts),
foreach(UnavDurs,AllUnavDurs),
foreach(UnavRess,AllUnavRess)
do
cumulative([JobStart|UnavStarts], [JobDur|UnavDurs], [1|UnavRess], 1)
),
% Cost is the maximum end date
( foreach(S,JobStarts), foreach(D,JobDurs), foreach(S+D,JobEnds) do true ),
Cost #= max(JobEnds),
minimize(search(JobStarts,0,smallest,indomain,complete,[]), Cost).

Dynamically select a column from a generic list

I have a table that is 200 columns wide and need to return the data of a specific row and column but I won't know the column until runtime. I can easily get the row I want into either a list, an individual strongly typed object, or an Array through LINQ but I can't for the life of me figure out how to find the column I need.
So For instance (on a smaller scale) my table looks like this
GrowerKey | day1 | day2 | day3 | day4 |
-----------------------------------------
3 | 1 | 3 | 2 | 2 |
4 | 6 | 1 | 9 | 1 |
5 | 8 | 8 | 2 | 4 |
and I can get the row I want with something simple like this
Dim CleanRecord As List(Of Grower_Clean_Schedule) = (From key In eng.Grower_Clean_Schedules
Where key.Grower_Key = Grower_Key).ToList
how do I then return only the value of a specific column of that row (like say the value stored in "day2") When I won't know which column until runtime?
Something like this (starting with CleanRecord which you defined in your question):
dim matchingRow = CleanRecord.First()
dim props = matchingRow.GetType().GetProperties( _
BindingFlags.Instance or BindingFlags.Public))
dim myReturnVal = (from prop in props _
where prop.Name = "day2" _
select prop.GetValue(matchingRow, Nothing).FirstOrDefault()
return myReturnVal

Luke reveals unknown term values for numeric fields in index

We use Lucene.net for indexing. One of the fields that we index, is a numeric field with the values 1 to 6 and 9999 for not set.
When using Luke to explore the index, we see terms that we do not recognize. The index contains a total of 38673 documents, and Luke shows the following top ranked terms for this field:
Term | Rank | Field | Text | Text (decoded as numeric-int)
1 | 38673 | Axis | x | 0
2 | 38673 | Axis | p | 0
3 | 38673 | Axis | t | 0
4 | 38673 | Axis | | | 0
5 | 19421 | Axis | l | 0
6 | 19421 | Axis | h | 0
7 | 19421 | Axis | d# | 0
8 | 19252 | Axis | ` N | 9999
9 | 19252 | Axis | l | 8192
10 | 19252 | Axis | h ' | 9984
11 | 19252 | Axis | d# p | 9984
12 | 18209 | Axis | ` | 4
13 | 950 | Axis | ` | 1
14 | 116 | Axis | ` | 5
15 | 102 | Axis | ` | 6
16 | 26 | Axis | ` | 3
17 | 18 | Axis | ` | 2
We find the same pattern for other numeric fields.
Where does the unknown values come from?
NumericFields are indexed using a trie structure. The terms you see are part of it, but will not return results if you query for them.
Try indexing your NumericField with a precision step of Int32.MaxValue and the values will go away.
NumericField documentation
... Within Lucene, each numeric value is indexed as a trie structure, where each term is logically assigned to larger and larger pre-defined brackets (which are simply lower-precision representations of the value). The step size between each successive bracket is called the precisionStep, measured in bits. Smaller precisionStep values result in larger number of brackets, which consumes more disk space in the index but may result in faster range search performance. The default value, 4, was selected for a reasonable tradeoff of disk space consumption versus performance. You can use the expert constructor NumericField(String,int,Field.Store,boolean) if you'd like to change the value. Note that you must also specify a congruent value when creating NumericRangeQuery or NumericRangeFilter. For low cardinality fields larger precision steps are good. If the cardinality is < 100, it is fair to use Integer.MAX_VALUE, which produces one term per value. ...
More details on the precision step available in the NumericRangeQuery documentation:
Good values for precisionStep are depending on usage and data type:
• The default for all data types is 4, which is used, when no
precisionStep is given.
• Ideal value in most cases for 64 bit data
types (long, double) is 6 or 8.
• Ideal value in most cases for 32 bit
data types (int, float) is 4.
• For low cardinality fields larger
precision steps are good. If the cardinality is < 100, it is fair to use •Integer.MAX_VALUE (see below).
• Steps ≥64 for long/double and
≥32 for int/float produces one token per value in the index and
querying is as slow as a conventional TermRangeQuery. But it can be
used to produce fields, that are solely used for sorting (in this case
simply use Integer.MAX_VALUE as precisionStep). Using NumericFields
for sorting is ideal, because building the field cache is much faster
than with text-only numbers. These fields have one term per value and
therefore also work with term enumeration for building distinct lists
(e.g. facets / preselected values to search for). Sorting is also
possible with range query optimized fields using one of the above
precisionSteps.
EDIT
little sample, the index produced by this will show terms with value 8192, 9984, 1792, etc in luke, but using a range that would include them in the query doesnt produce results:
NumericField number = new NumericField("number", Field.Store.YES, true);
Field regular = new Field("normal", "", Field.Store.YES, Field.Index.ANALYZED);
IndexWriter iw = new IndexWriter(FSDirectory.GetDirectory("C:\\temp\\testnum"), new StandardAnalyzer(), true);
Document doc = new Document();
doc.Add(number);
doc.Add(regular);
number.SetIntValue(1);
regular.SetValue("one");
iw.AddDocument(doc);
number.SetIntValue(2);
regular.SetValue("one");
iw.AddDocument(doc);
number.SetIntValue(13);
regular.SetValue("one");
iw.AddDocument(doc);
number.SetIntValue(2000);
regular.SetValue("one");
iw.AddDocument(doc);
number.SetIntValue(9999);
regular.SetValue("one");
iw.AddDocument(doc);
iw.Commit();
IndexSearcher searcher = new IndexSearcher(iw.GetReader());
NumericRangeQuery rangeQ = NumericRangeQuery.NewIntRange("number", 1, 2, true, true);
var docs = searcher.Search(rangeQ);
Console.WriteLine(docs.Length().ToString()); // prints 2
rangeQ = NumericRangeQuery.NewIntRange("number", 13, 13, true, true);
docs = searcher.Search(rangeQ);
Console.WriteLine(docs.Length().ToString()); // prints 1
rangeQ = NumericRangeQuery.NewIntRange("number", 9000, 9998, true, true);
docs = searcher.Search(rangeQ);
Console.WriteLine(docs.Length().ToString()); // prints 0
Console.ReadLine();