What is DLR three-level caching strategy - .net-4.0

I just heard that DLR has a three level caching strategy.. But what it is .. A simple explanation with simple example will be very helpful.
Thanks

This is how I understand it, the idea of the caching is to reuse expressions wherever possible to reduce the dynamic vs static overhead of dynamic expression evaluation.
imagine a dynamic expression
>> a + b
Then working this out the first time an expression/syntax tree will need to be created (if one doesn't exist). This is of the type
if a is an int and not null and b is an int and not null then result = a + b
This is essentially a rule that can evaulated and if true the expression can be used. Hence we have a level 1 cache.
Level 2 is simillar but a more complex rule, probably along the lines of:
if a is an int and not null and b is an int and not null then result = a + b
if a is string and b is an int then do Int.Parse(a) + b
etc...
Level 3 is more complex still.
if no expression can be found then a new expression is created and added to one of the caches (though I don't know anything about that).
As I understand it l1 is 1 rule, l2 is about 10 rules and l3 is about 100 rules.
I got all this from reading around the subject on google.
- http://dotnetslackers.com/articles/csharp/Dissecting-C-Sharp-4-0-Dynamic-Programming.aspx
- http://msdn.microsoft.com/en-us/magazine/cc163344.aspx
and some others I cannot recall now.

Related

What will have a better performance on large databases - IN or OR? [duplicate]

Which operator in oracle gives better performance IN or OR
ex:
select * from table where y in (1,2,3)
or
select * from table where y = 1 or y = 2 or y = 3
You'd want to do an explain plan to be sure, but I would expect the performance to be identical.
The two statements are equivalent, the optimizer will generate the same access path. Therefore you should choose the one that is the most readable (I'd say the first one).
I would hesitate to use OR like that. You need to be careful if you add additional criteria. For instance adding an AND will require you remember to add parenthesis.
eg:
select * from table where y = 1 or y = 2 or y = 3
gets changed to:
select * from table where ( y = 1 or y = 2 or y = 3 ) AND x = 'value'
It is quite easy to forget to include the parenthesis and inject a difficult to daignose bug. For maintainability alone I would strongly suggest using IN instead of OR.
In a simple query like yours, the optimizer is smart enough to treat them both 100% the same so they are identical.
HOWEVER, that is potentially not 100% the case.
E.g. when optimizing large complex joints, it is plausible that the optimizer will not equate the two approaches as intelligently, thus choosing the wrong plan. I have observed somewhat similar problem on Sybase, although I don't know if it exists in Oracle (thus my "potentially" qualifyer).

What is the use case that makes EAVT index preferable to EATV?

From what I understand, EATV (which Datomic does not have) would be great fit for as-of queries. On the other hand, I see no use-case for EAVT.
This is analogous to row/primary key access. From the docs: "The EAVT index provides efficient access to everything about a given entity. Conceptually this is very similar to row access style in a SQL database, except that entities can possess arbitrary attributes rather then being limited to a predefined set of columns."
The immutable time/history side of Datomic is a motivating use case for it, but in general, it's still optimized around typical database operations, e.g. looking up an entity's attributes and their values.
Update:
Datomic stores datoms (in segments) in the index tree. So you navigate to a particular E's segment using the tree and then retrieve the datoms about that E in the segment, which are EAVT datoms. From your comment, I believe you're thinking of this as the navigation of more b-tree like structures at each step, which is incorrect. Once you've navigated to the E, you are accessing a leaf segment of (sorted) datoms.
You are not looking for a single value at a specific point in time. You are looking for a set of values up to a specific point in time T. History is on a per value basis (not attribute basis).
For example, assert X, retract X then assert X again. These are 3 distinct facts over 3 distinct transactions. You need to compute that X was added, then removed and then possibly added again at some point.
You can do this with SQL:
create table Datoms (
E bigint not null,
A bigint not null,
V varbinary(1536) not null,
T bigint not null,
Op bit not null --assert/retract
)
select E, A, V
from Datoms
where E = 1 and T <= 42
group by E, A, V
having 0 < sum(case Op when 1 then +1 else -1 end)
The fifth component Op of the datom tells you whether the value is asserted (1) or retracted (0). By summing over this value (as +1/-1) we arrive at either 1 or 0.
Asserting the same value twice does nothing, and you always retract the old value before you assert a new value. The last part is a prerequisite for the algorithm to work out this nicely.
With an EAVT index, this is a very efficient query and it's quite elegant. You can build a basic Datomic-like system in just 150 lines of SQL like this. It is the same pattern repeated for any permutation of EAVT index that you want.

Using Real numbers for explicit sorting in sql database

i'm facing a recurring problem. I've to let a user reorder some list that is stored in a database.
The fist straightforward approach i can think is to have a "position" column with the ordering saved as a integer. p.e.
Data, Order
A 1
B 2
C 3
D 4
Problem here is that if i have to insert FOO in position 2, now my table become
Data, Order
A 1
FOO 2
B 3
C 4
D 5
So to insert a new line, i have to do one CREATE and three UPDATE on a table of five elements.
So my new idea is using Real numbers instead of integers, my new table become
Data, Order
A 1.0
B 2.0
C 3.0
D 4.0
If i want to insert a element FOO after A, this become
Data, Order
A 1.0
FOO 1.5
B 2.0
C 3.0
D 4.0
With only one SQL query executed.
This would work fine with theoretical Real Numbers, but floating point numbers have a limited precision and i wondering how feasible this is and whether and how can i optimize it to avoid exceeding double precision with a reasonable number of modifications
edit:
this is how i implemented it now in python
#classmethod
def get_middle_priority(cls, p, n):
p = Decimal(str(p))
n = Decimal(str(n))
m = p + ((n - p)/2)
i = 0
while True:
m1 = round(m, i)
if m1 > p and m1 < n:
return m1
else:
i += 1
#classmethod
def create(cls, data, user):
prev = data.get('prev')
if prev is None or len(prev)<1:
first = cls.list().first()
if first is None:
priority = 1.0
else:
priority = first.priority - 1.0
else:
prev = cls.list().filter(Rotator.codice==prev).first()
next = cls.list().filter(Rotator.priority>prev.priority).first()
if next is None:
priority = prev.priority + 1.0
else:
priority = cls.get_middle_priority(prev.priority, next.priority)
r = cls(data.get('codice'),
priority)
DBSession.add(r)
return r
If you want to control the position and there is no ORDER BY solution then a rather simple and robust approach is to point to the next or to the previous. Updates/inserts/deletes (other than the first and last) will require 3 operations.
Insert the new Item
Update the Item Prior the New Item
Update the Item After the New Item
After you have that established you can use a CTE (with a UNION ALL) to create a sorted list that will never have a limit.
I have seen rather large implementations of this that were done via Triggers to keep the list in perfect form. I however am not a fan of triggers and would just put the logic for the entire operation in a stored procedure.
You may use a string rather then numbers:
item order
A ffga
B ffgaa
C ffgb
Here, the problem of finite precision is handled by the possibility of growing the string. String storage is theoretically unlimited in the database, only by the size of the storage device. But there is no better solution for absolute-ordering items. Relative-ordering, like linked-lists, might work better (but you can't do order by query then).
The linked list idea is neat but it's expensive to pull out data in order. If you have a database which supports it, you can use something like connect by to pull it out. linked list in sql is a question dedicated to that problem.
Now if you don't, I was thinking of how one can achieve an infinitely divisable range, and thought of sections in a book. What about storing the list initially as
1
2
3
and then to insert between 1 and two you insert a "subsection under 1" so that your list becomes
1
1.1
2
3
If you want to insert another one between 1.1 and 2 you place a second subsection under 1 and get
1
1.1
1.2
2
3
and lastly if you want to add something between 1.1 and 1.2 you need to introduce a subsubsection and get
1
1.1
1.1.1
1.2
2
3
Maybe using letters instead of numbers would be less confusing.
I'm not sure if there is any standard lexicographic ordering in sql databases which could sort this type of list correctly. But I think you could roll your own with some "order by case" and substringing. Edit: I found a question pertaining to this: linky
Another downside is that the worst case field size of this solution would grow exponentially with the number of input items (You could get long rows like 1.1.1.1.1.1 etc). But in the best case it would be linear or almost constant (Rows like 1.934856.1).
This solution is also quite close to what you already had in mind, and I'm not sure that it's an improvement. A decimal number using the binary partitioning strategy that you mentioned will probably increase the number of decimal points between each insert by one, right? So you would get
1,2 -> 1,1.5,2 -> 1,1.25,1.5,2 -> 1,1.125,1.25,1.5,2
So the best case of the subsectioning-strategy seems better, but the worst case a lot worse.
I'm also not aware of any infinite precision decimal types for sql databases. But you could of course save your number as a string, in which case this solution becomes even more similar to your original one.
Set all rows to a unique number starting at 1 and incrementing by 1 at the start. When you insert a new row, set it to count(*) of the table + 1 (there are a variety of ways of doing this).
When the user updates the Order of a row, always update it by calling a stored procedure with this Id (PK) of the row to update and the new order. In the stored procedure,
update tableName set Order = Order + 1 where Order >= #updatedRowOrder;
update tablename set Order = #updatedRowOrder where Id = #pk;
That guarantees that there will always be space and a continuous sequence with no duplicates. I haven't worked you what would happen if you put silly new Order numbers of a row (e.g. <= 0) but probably bad things; that's for the Front End app to prevent.
Cheers -

SQL and logical operators and null checks

I've got a vague, possibly cargo-cult memory from years of working with SQL Server that when you've got a possibly-null column, it's not safe to write "WHERE" clause predicates like:
... WHERE the_column IS NULL OR the_column < 10 ...
It had something to do with the fact that SQL rules don't stipulate short-circuiting (and in fact that's kind-of a bad idea possibly for query optimization reasons), and thus the "<" comparison (or whatever) could be evaluated even if the column value is null. Now, exactly why that'd be a terrible thing, I don't know, but I recall being sternly warned by some documentation to always code that as a "CASE" clause:
... WHERE 1 = CASE WHEN the_column IS NULL THEN 1 WHEN the_column < 10 THEN 1 ELSE 0 END ...
(the goofy "1 = " part is because SQL Server doesn't/didn't have first-class booleans, or at least I thought it didn't.)
So my questions here are:
Is that really true for SQL Server (or perhaps back-rev SQL Server 2000 or 2005) or am I just nuts?
If so, does the same caveat apply to PostgreSQL? (8.4 if it matters)
What exactly is the issue? Does it have to do with how indexes work or something?
My grounding in SQL is pretty weak.
I don't know SQL Server so I can't speak to that.
Given an expression a L b for some logical operator L, there is no guarantee that a will be evaluated before or after b or even that both a and b will be evaluated:
Expression Evaluation Rules
The order of evaluation of subexpressions is not defined. In particular, the inputs of an operator or function are not necessarily evaluated left-to-right or in any other fixed order.
Furthermore, if the result of an expression can be determined by evaluating only some parts of it, then other subexpressions might not be evaluated at all.
[...]
Note that this is not the same as the left-to-right "short-circuiting" of Boolean operators that is found in some programming languages.
As a consequence, it is unwise to use functions with side effects as part of complex expressions. It is particularly dangerous to rely on side effects or evaluation order in WHERE and HAVING clauses, since those clauses are extensively reprocessed as part of developing an execution plan.
As far as an expression of the form:
the_column IS NULL OR the_column < 10
is concerned, there's nothing to worry about since NULL < n is NULL for all n, even NULL < NULL evaluates to NULL; furthermore, NULL isn't true so
null is null or null < 10
is just a complicated way of saying true or null and that's true regardless of which sub-expression is evaluated first.
The whole "use a CASE" sounds mostly like cargo-cult SQL to me. However, like most cargo-cultism, there is a kernel a truth buried under the cargo; just below my first excerpt from the PostgreSQL manual, you will find this:
When it is essential to force evaluation order, a CASE construct (see Section 9.16) can be used. For example, this is an untrustworthy way of trying to avoid division by zero in a WHERE clause:
SELECT ... WHERE x > 0 AND y/x > 1.5;
But this is safe:
SELECT ... WHERE CASE WHEN x > 0 THEN y/x > 1.5 ELSE false END;
So, if you need to guard against a condition that will raise an exception or have other side effects, then you should use a CASE to control the order of evaluation as a CASE is evaluated in order:
Each condition is an expression that returns a boolean result. If the condition's result is true, the value of the CASE expression is the result that follows the condition, and the remainder of the CASE expression is not processed. If the condition's result is not true, any subsequent WHEN clauses are examined in the same manner.
So given this:
case when A then Ra
when B then Rb
when C then Rc
...
A is guaranteed to be evaluated before B, B before C, etc. and evaluation stops as soon as one of the conditions evaluates to a true value.
In summary, a CASE short-circuits buts neither AND nor OR short-circuit so you only need to use a CASE when you need to protect against side effects.
Instead of
the_column IS NULL OR the_column < 10
I'd do
isnull(the_column,0) < 10
or for the first example
WHERE 1 = CASE WHEN isnull(the_column,0) < 10 THEN 1 ELSE 0 END ...
I've never heard of such a problem, and this bit of SQL Server 2000 documentation uses WHERE advance < $5000 OR advance IS NULL in an example, so it must not have been a very stern rule. My only concern with OR is that it has lower precedence than AND, so you might accidentally write something like WHERE the_column IS NULL OR the_column < 10 AND the_other_column > 20 when that's not what you mean; but the usual solution is parentheses rather than a big CASE expression.
I think that in most RDBMSes, indices don't include null values, so an index on the_column wouldn't be terribly useful for this query; but even if that weren't the case, I don't see why a big CASE expression would be any more index-friendly.
(Of course, it's hard to prove a negative, and maybe someone else will know what you're referring to?)
Well, I've repeatedly written queries like the first example since about forever (heck, I've written query generators that generate queries like that), and I've never had a problem.
I think you may be remembering some admonishment somebody gave you sometime against writing funky join conditions that use OR. In your first example, the conditions joined by the OR restrict the same one column of the same table, which is OK. If your second condition was a join condition (i.e., it restricted columns from two different tables), then you could get into bad situations where the query planner just has no choice but to use a Cartesian join (bad, bad, bad!!!).
I don't think your CASE function is really doing anything there, except perhaps hamper your query planner's attempts at finding a good execution plan for the query.
But more generally, just write the straightforward query first and see how it performs for realistic data. No need to worry about a problem that might not even exist!
Nulls can be confusing. The " ... WHERE 1 = CASE ... " is useful if you are trying to pass a Null OR a Value as a parameter ex. "WHERE the_column = #parameter. This post may be helpful Passing Null using OLEDB .
Another example where CASE is useful is when using date functions on the varchar columns. adding ISDATE before using say convert(colA,datetime) might not work, and when colA has non-date data the query can error out.

Fox-Goat-Cabbage Transportation

My question is about an old transportation problem -- carrying three items across a river with a boat only capable of tranferring one item at a time. A constraint is certain items cannot be left together, such as the cabbage with the goat, wolf with the goat etc. This problem should be solveable using Integer programming, or another optimization approach. The cost function is all items being on the other side of the river, and the trips required to get there could be the output from Simplex (?) that tries out different feasible solutions. I was wondering if anyone has the Integer Programming (or Linear Programming) formulation of this problem, and / or Matlab, Octave, Python based code that can offer the solution programmatically, including a trace of Simplex trying out all paths -- our boat rides.
There was some interesting stuff here
http://www.zib.de/Publications/Reports/SC-95-27.pdf
Thanks,
I recommend using binary variables x_i,t to model the positions of your items, i.e. they are zero if the item is located on the left shore after trip t and one otherwise. At most one of these variables can change during a trip. This can be modeled by
x_wolf,1 + x_cabbage,1 + x_goat,1 <= 1 + x_wolf,0 + x_cabbage,0 + x_goat,0 and
x_wolf,1 >= x_wolf,0
x_cabbage,1 >= x_cabbage,0
x_goat,1 >= x_goat,0
Similar constraints are required for trips in the other direction.
Furthermore, after an odd number of trips you nedd constraints to check the items on the left shore, and similarily you have to check the right shore. For instance:
x_wolf,1 + x_goat,1 >= 0 and
x_wolf,2 + x_goat,2 <= 1 ...
Use an upper bound for t, such that a solution is surely possible.
Finally, introduce the binary variable z_t and let
z_t <= 1/3 (x_wolf,t + x_cabbage,t + x_goat,t)
and maximize sum_t (z_t).
(Most probably sum_t (x_wolf,t + x_cabbage,t + x_goat,t) shold work too.)
You are right that this formulation will require integer variables. The traditional way of solving a problem like this would be to formulate a binary variable model and pass the formulation onto a solver. MATLAB in this case would not work unless you have access to the Optimization Toolbox.
http://www.mathworks.com/products/optimization/index.html
In your formulation you would need to address the following:
Decision Variables
In your case this would look something like:
x_it (choose [yes=1 no=0] to transport item i during boat trip number t)
Objective Function
I'm not quite sure what this is from your description but there should be a cost, c_t, associated with each boat trip. If you want to minimize total time, each trip would have a constant cost of 1. So your objective should look something like:
minimize SUM((i,t),c_t*x_it) (so you are minimizing the total cost over all trips)
Constraints
This is the tricky part for your problem. The complicating constraint is the exclusivity that you identified. Remember, x_it is binary.
For each pair of items (i1,i2) that conflict with each other you have a constraint that looks like this
x_(i1 t) + x_(i2 t) <= 1
For example:
x_("cabbage" "1") + x_("goat" "1") <= 1
x_("wolf" "1") + x_("goat" "1") <= 1
x_("cabbage" "2") + x_("goat" "2") <= 1
x_("wolf" "2") + x_("goat" "2") <= 1
ect.
You see how this prevents conflict. A boat schedule that assigns "cabbage" and "goat" to the same trip will violate this binary exclusivity constraint since "1+1 > 1"
Tools like GAMS,AMPL and GLPK will allow you to express this group of constraints very concisely.
Hope that helps.