translation from Datalog to SQL - sql

I am still thinking on how to translate the recursivity of a Datalog program into SQL, such as
P(x,y) <- Q(x,y).
Q(x,y) <- P(x,z), A(y).
where A/1 is an EDB predicate. This, there is a co-dependency between P and Q. For longer queries, how to solve this problem?
Moreover, is there any system completely implement the translation? If there is, may I know what system or which paper may I refer?

If you adopt an approach of "tabling" previous conclusions and forward-chain reasoning on these to infer new conclusions, no recursive "depth" is required.
Bear in mind that Datalog requires some restrictions on rules and variable that assure finite termination and hence finitely many conclusions. Variables must have a finite range of possible values, for example.
Let's assume your example refers to constants rather than to variables:
P(x,y) <- Q(x,y).
Q(x,y) <- P(x,z), A(y).
One wrinkle is that you want A/1 to be implemented as an extended stored procedure or external code. For that I would propose tabling all the results of calling A on all possible arguments (finitely many). These are after all among the conclusions (provable statements) of your system.
Once that is done the forward-chaining inference proceeds iteratively rather than recursively. At each step consider each rule, applying it with premises (right-hand sides) that are previously obtained (tabled) conclusions if it produces a new conclusion. If no rule produces a new conclusion in the current step, halt. The proof procedure is complete.
In your example the proofs stop after all the A facts are adduced, because there are no conclusions sufficient to apply either rule to get new conclusions.

A possible approach is to use recursive CTEs in SQL, which provide the power of transitive closure. Relational algebra + transitive closure = Datalog.

Logica does something like this. It translates a datalog-like language into SQL for Google BigQuery, PostgreSQL and SQLite.

Related

Genetic algorithms for guillotine cut optimization

Ive been revisiting genetic algorithms with encoding, optimizing and decoding. My first attempt was the travelling salesman with ordered cross over which worked great. I found an article that tried to optimize a more complex genome while optimizing a 2d packing problem.
The author encodes the problem using reverse polish notation that made sense. It uses a combination of parts and either V Or H as opertors.
Ie 34H5V
With decoding the stack having to be resolved to one stack element that is my final layout. That being said, the number of operater up until a certain point must be 1 less than the number of parts up until the same point. The author then states that he used a mixed cross over by using an ordered cross over on the parts and binary crossover for the operators.
I mulled this over but i cannot understand how he seperates the parts and operators before crossing over and then recombines them before evaluating performance and they offer little details. If a binary cross over occured replacing parts with an "X" to keep the relative positions so they can be recombined after crossover but the relationship between operator and parts doesnt hold true.
Does anyone perhaps have a resource that has dealt with a similar scenario or perhaps has used this successfully.
This looked way more difficult than it actually was. When the original population is generated, you need to adhere to the limitations set out by postfix notation. When a crossover occurs you simply build a mask of the parent
Ie xxxxooxoxx
Where x is an object and o is an operaror. Once you have the mask holding the positions you can create a sting only of operators and one only of objects. The operators can be done with a binary cross over and the objects as partial map crossover. Once done you fill the mask with the value in the order they appear in each group. Since the mask was valid, the progeny is valid too.
The only issue ia getting all the possible arrangements because without it, it will all be limited to the masks. He solves this by doing a swap mutation dictated by the mutation rates.
Select an item at random.
If the item is an operator then
A. Swithc the operator to another kind
B. Select another. If its an object then make sure the requirementa are met and if so then switch.

Why is the existential necessary in strongest postconditions?

Every formulation of the strongest postcondition predicate transformer I have seen presents the assignment rule as follows:
sp(X:=E, P) = ∃v. (X=E[v/X] ∧ P[v/X])
I am wondering, why is the existential (and thus the existentially quantified variable "v") necessary in the above rule? It seems to me the strongest postconditions predicate transformer is almost identical to symbolic evaluation, in that you maintain a state (a mapping from variables to values) and a path condition (a predicate that must be true at a particular point in the program). Yet, symbolic evaluation does not rely on an existential quantifier.
So, I think I must be missing something here. Any help is appreciated!
I will give some intuitive description, since you have some knowledge in symbolic evaluation
If you have an arbitrary map to variables, you can not say anything about future state changes in the program before looking at them during the analysis.
Symbolical evaluation remembers each chosen path[as state space seperation], so it does not need to be contained in the evaluation formula to solve.
Here however you argue about every possible path and thus need an arbitrary formula to describe the behavior.
Assuming you would keep the variable in the formula, then you would argue about only 1 path of the possible runs. If you know that your variable does not induce other paths, then you can simplify this behavior.
Having however weakest liberal precondition, you know from which possible path you start and wrap all paths together to proof properties about your system.

Optimization of Lisp function calls

If my code is calling a function, and one of the function's arguments will vary based on a certain condition, is it more efficient to have the conditional statement as an argument of the function, or to call the function multiple times in the conditional statement.
Example:
(if condition (+ 4 3) (+ 5 3))
(+ (if condition 4 5) 3)
Obiously this is just an example: in the real scenario the numbers would be replaced by long, complex expressions, full of variables. The if might instead be a long cond statement.
Which would be more efficient in terms of speed, space etc?
Don't
What you care about is not performance (in this case the difference will be trivial) but code readability.
Remember,
"... a computer language is not just a way of getting a computer to
perform operations, but rather ... it is a novel formal medium for
expressing ideas about methodology"
Abelson/Sussman "Structure and
Interpretation of Computer Programs".
You are writing code primarily for others (and you yourself next year!) to read. The fact that the computer can execute it is a welcome fringe benefit.
(I am exaggerating, of course, but much less than you think).
Okay...
Now that you skipped the harangue (if you claim you did not, close your eyes and tell me which specific language I mention above), let me try to answer your question.
If you profiled your program and found that this place is the bottleneck, you should first make sure that you are using the right algorithm.
E.g., using a linearithmic sort (merge/heap) instead of quadratic (bubble/insertion) sort will make much bigger difference than micro-optimizations like you are contemplating.
Then you should disassemble both versions of your code; the shorter version is (ceteris paribus) likely to be marginally faster.
Finally, you can waste a couple of hours of machine time repeatedly running both versions on the same output on an otherwise idle box to discover that there is no statistically significant difference between the two approaches.
I agree with everything in sds's answer (except using a trick question -_-), but I think it might be nice to add an example. The code you've given doesn't have enough context to be transparent. Why 5? Why 4? Why 3? When should each be used? Should there always be only two options? The code you've got now is sort of like:
(defun compute-cost (fixed-cost transaction-type)
(+ fixed-cost
(if (eq transaction-type 'discount) ; hardcoded magic numbers
3 ; and conditions are brittle
4)))
Remember, if you need these magic numbers (3 and 4) here, you might need them elsewhere. If you ever have to change them, you'll have to hope you don't miss any cases. It's not fun. Instead, you might do something like this:
(defun compute-cost (fixed-cost transaction-type)
(+ fixed-cost
(variable-cost transaction-type)))
(defun variable-cost (transaction-type)
(case transaction-type
((employee) 2) ; oh, an extra case we'd forgotten about!
((discount) 3)
(t 4)))
Now there's an extra function call, it's true, but computation of the magic addend is pulled out into its own component, and can be reused by anything that needs it, and can be updated without changing any other code.

Can antlr do type-dependent parsing?

Let me ask whether antlr3 accepts the following example grammar.
for an input , x + y * z ,
it is parsed as x+(y*z) if each in {x,y,z} is a number;
it is parsed as (x+y)*z if each in {x,y,z} is an object of a particular type T;
And let me ask whether such grammars are used sometimes or very rarely for computer languages.
Thank you very much.
In general, parsers (produced by parser generators) only check syntax.
A parser (produced by any means) that can explore multiple parses (I believe ANTLR does this by backtracking; other parsing engines [GLR, Earley] do it by parallel exploration of possible parses), if augmented with semantic checking information, could reject parses that didn't meet semantic constraints.
People tend not to build such parsers in my experience, partly because it is hard to explain to users. If they don't get it, your parser isn't successful; your example is especially bad IMHO in terms of explainability. They also tend not to do this because they need that type information, and that's not always convenient to collect as you parse. The GCC parsers famously do just this this to parse statements such as
X*T;
and the parser is a bit of a mess because of the need to parse and collect this type information as it goes.
I suspect ANTLR can check semantic predicates. How easy it is to get type information of the kind you discuss to those semantic checks is another question; I have no experience here.
The GLR parsing engine used by our DMS Software Reengineering Toolkit does have "semantic" predicates. It isn't particularly easy to get real semantic type information to those predicates by architectural design; we wanted such predicates to be driven off of "syntax". But then, everything (including type inference) is driven off syntax. So we stick information purely local to the reduction being proposed. This is particulary handy in (not) recognizing as separate types of parses, the following peculiar FORTRAN construct for nested-do-termination vs. shared-do-termination:
DO 10 I=1,10,1
DO 10 J=1,10,1
A(I,J)=0
10 CONTINUE
20 CONTINUE
vs.
DO 20 I=1,10,1
DO 10 J=1,10,1
A(I,J)=0
10 CONTINUE
20 CONTINUE
To the parser, at the pure syntax level, both of these look like:
DO <INT> <VAR>=...
DO <INT> <VAR>=...
<STMTS>
<INT> CONTINUE
<INT> CONTINUE
How can one determine which CONTINUE statement belongs to which DO consrtuct with only this information? You can't.
The DMS FORTRAN parser does exactly this by having two sets of rules for DO loops, one for unshared continues, an one for shared continues. They differentiate using semantic predicates that check that the CONTINUE statement label matches the DO loop designated label. And thus the DMS FORTRAN parser gets the loop nesting right as it parses. AFAIK, all the other FORTRAN compilers parse the statements individually, and then stitch the DO loop nests together in a post pass.
And yes, while FORTRAN has this (confusing) construct, no other modern language that I know copied it.

Optimization of Function Calls in Haskell

Not sure what exactly to google for this question, so I'll post it directly to SO:
Variables in Haskell are immutable
Pure functions should result in same values for same arguments
From these two points it's possible to deduce that if you call somePureFunc somevar1 somevar2 in your code twice, it only makes sense to compute the value during the first call. The resulting value can be stored in some sort of a giant hash table (or something like that) and looked up during subsequent calls to the function. I have two questions:
Does GHC actually do this kind of optimization?
If it does, what is the behaviour in the case when it's actually cheaper to repeat the computation than to look up the results?
Thanks.
GHC doesn't do automatic memoization. See the GHC FAQ on Common Subexpression Elimination (not exactly the same thing, but my guess is that the reasoning is the same) and the answer to this question.
If you want to do memoization yourself, then have a look at Data.MemoCombinators.
Another way of looking at memoization is to use laziness to take advantage of memoization. For example, you can define a list in terms of itself. The definition below is an infinite list of all the Fibonacci numbers (taken from the Haskell Wiki)
fibs = 0 : 1 : zipWith (+) fibs (tail fibs)
Because the list is realized lazily it's similar to having precomputed (memoized) previous values. e.g. fibs !! 10 will create the first ten elements such that fibs 11 is much faster.
Saving every function call result (cf. hash consing) is valid but can be a giant space leak and in general also slows your program down a lot. It often costs more to check if you have something in the table than to actually compute it.