Model clause in Oracle - sql

I am recently inclined towards in Oracle jargon and the more I am looking into the more is attracting me.
I have recently come across the MODEL clause but to be honest I am not understanding the behaviour of this. Can any one with some examples please let me know about the same.
Thanks in advance

Some examples of MODEL are given here.
Personally I've looked at MODEL several times and not yet succeeded in finding a use case for it. While it first appears to be useful, there's a lot of places where only literals work (rather than binds or variables) which restrict its flexibility. For example, on inter-row calculations, you can't readily refer to the 'previous' or 'next' row, but have to be able to absolutely identify it by its attributes. So you can't say 'take the value of the row with the same date in the previous month' but can only code a specific date.
It might be used (internally) by some analytical tools. But as an end-user tool, I never 'got' it. I have previously recommended that, if you ever find a problem you think can be solved by the application of the MODEL clause, go and have a lie down until the feeling wears off.

I think the MODEL clause is quite simple to understand, when you slowly read the official whitepaper. In my opinion, the whitepaper nicely explains the MODEL clause step by step, adding one feature at a time to the examples, leaving out the most advanced features to the official documentation.
From the whitepaper, I also find it easy to understand when to actually use the MODEL clause. In some examples, it is a lot simpler to do "Excel-spreadsheet-like" operations using MODEL rather than, for instance, using window functions, CONNECT BY, or subquery factoring. Think about Excel. Whenever you want to define a complex rule set for Excel columns, use the MODEL clause. Example Excel spreadsheet rules:
A10 = A9 + A8
B10 = A10 * 5
C10 = MAX(A1:A9)
D10 = C10 / A10
In other words, MODEL is a very powerful SQL spreadsheet!

The best explanation is in the official white paper. It uses the SH demo schema and you really need it installed.
http://www.oracle.com/technetwork/middleware/bi-foundation/10gr1-twp-bi-dw-sqlmodel-131067.pdf
I don't think they do a very good job explaining this. It basically lets you load up data into an array and and then loop through array using straight SQL, instead of having to write procedural logic. Alot of the terms are based on spreadsheet terms (they are used in the Excel Help). So if you have them in excel, this would be confusing.
They should have drawn a picture for each of the queries and shown the array created than shown how you look through the array. The syntax looks to be based on Excel syntax. I'm not sure if this is common to all spreadsheet tools or not.
It has uses. Bin fitting is the most common. See the 2nd example. This is basically a complex group by where you are grouping by a range, but that range can change. It requires procedural logic. The example gives 3 ways to do it one of which is the model clause.
http://www.oracle.com/technetwork/issue-archive/2012/12-mar/o22asktom-1518271.html
I think people (often managers) who do complex spreadsheet calculations may have an easier time seeing uses for this and getting the lingo.

Related

How to constrain dtw from dtw-python library?

Here is what I want to do:
keep a reference curve unchanged (only shift and stretch a query curve)
constrain how many elements are duplicated
keep both start and end open
I tried:
dtw(ref_curve,query_curve,step_pattern=asymmetric,open_end=True,open_begin=True)
but I cannot constrain how the query curve is stretched
dtw(ref_curve,query_curve,step_pattern=mvmStepPattern(10))
it didn’t do anything to the curves!
dtw(ref_curve,query_curve,step_pattern=rabinerJuangStepPattern(4, "c"),open_end=True, open_begin=True)
I liked this one the most but in some cases it shifts the query curve more than needed...
I read the paper (https://www.jstatsoft.org/article/view/v031i07) and the API but still don't quite understand how to achieve what I want. Any other options to constrain number of elements that are duplicated? I would appreciate your help!
to clarify: we are talking about functions provided by the DTW suite packages at dynamictimewarping.github.io. The question is in fact language-independent (and may be more suited to the Cross-validated Stack Exchange).
The pattern rabinerJuangStepPattern(4, "c") you have found does in fact satisfy your requirements:
it's asymmetric, and each step advances the reference by exactly one step
it's slope-limited between 1/2 and 2
it's type "c", so can be normalized in a way that allows open-begin and open-end
If you haven't already, check out dtw.rabinerJuangStepPattern(4, "c").plot().
It goes without saying that in all cases you are getting is the optimal alignment, i.e. the one with the least accumulated distance among all allowed paths.
As an alternative, you may consider the simpler asymmetric recursion -- as your first attempt above -- constrained with a global warping window: see dtw.window and the window_type argument. This provides constraints of a different shape (and flexible size), which might suit your specific case.
PS: edited to add that the asymmetricP2 recursion is also similar to RJ-4c, but with a more constrained slope.

UniData - record count of all files / tables

Looking for a shortcut here. I am pretty adept with SQL database engines and ERPs. I should clarify... I mean databases like MS SQL, MySQL, postresql, etc.
One of the things that I like to do when I am working on a new project is to get a feel for what is being utilized and what isn't. In T-SQL this is pretty easy. I just query the information schema and get a row count of all the tables and filter out the ones having rowcount = 0. I know this isn't truly a precise row count, but it does give me an idea of what is in use.
So I recently started at a new company and one of their systems is running on UniData. This is a pretty radical shift from mainstream databases and there isn't a lot of help out there. I was wondering if anybody knew of a command to do the same thing listed above in UniBasic/UniQuery/whatever else.
Which tables, files, are heavily populated and which ones are not?
You can start with a special "table" (or file in Unidata terminology) named VOC - it will have a list of all the other files that are in your current "database" (aka account), as well as a bunch of other things.
To get a list of files in (or pointed to) the current account:
:SORT VOC WITH F1 = "F]" "L]" "DIR" F1 F2
Try HELP CREATE.FILE if you're curious about the difference between F and LF and DIR.
Once you have a list of files, weed out the ones named *TEMP* or *WORK* and start digging into the ones that seem important. There are other ways to get at what's important (e.g using triggers or timestamps), but browsing isn't a bad idea to see what conventions are used.
Once you have a file that looks interesting (let's say CUSTOMERS), you can look at the dictionary of that file to see
:SORT DICT CUSTOMERS F1 F2 BY F1 BY F2 USING DICT VOC
It can help to create something like F2.LONG in DICT VOC to increase the display size up from 15 characters.
Now you have a list of "columns" (aka fields or attributes), you're looking for D-type attributes that will tell you what columns are in the file. V or I-type's are calculations
https://github.com/ianmcgowan/SCI.BP/blob/master/PIVOT is helpful with profiling when you see an attribute that looks interesting and you want to see what the data looks like.
http://docs.rocketsoftware.com/nxt/gateway.dll/RKBnew20/unidata/previous%20versions/v8.1.0/unidata_userguide_v810.pdf has some generally good information on the concepts and there are many other online manuals available there. It can take a lot of reading to get to the right thing if you don't know the terminology.

VBA: Efficient Vlookup from another Workbook

I need to do a Vlookup from another workbook on about 400000 cells with Vba. These cells are all in one Column.And shall be written into one Column. I know already , how the Vlookup Works, but my runtime is much to high by using autofill. Do you have an Suggestion how i can approve it?
Dont use VLookup use Index Match: http://www.randomwok.com/excel/how-to-use-index-match/
If you are able to adjust what the data looks like a slight amount, you may be interested in using a binary search. Its been a while since I last used one (writing a code for group exercise check-in program). https://www.khanacademy.org/computing/computer-science/algorithms/binary-search/a/implementing-binary-search-of-an-array , was helpful in setting up the idea behind it.
If you are able to sort them in an order, say by last name (im not sure of what data you are working with) then add an order of numbers to use for the binary search.
Edit:
The reasoning for a binary search would be that with a binary search is that the computational time it takes. The amount of iterations it would take is log2(400000) vs 400000. So instead of 400000 possible iterations, it would take at most 19 times with a binary search, as you can see with the more data you use the binary search would yield much quicker times.
This would only be a beneficial way if you are able to manipulate the data in such a way that would allow you to use a binary search.
So, if you can give us a bit more background on what data you are using and any restrictions you have with that data we would be able to give more constructive feedback.

Cplex/OPL local search

I have a model implemented in OPL. I want to use this model to implement a local search in java. I want to initialize solutions with some heuristics and give these initial solutions to cplex find a better solution based on the model, but also I want to limit the search to a specific neighborhood. Any idea about how to do it?
Also, how can I limit the range of all variables? And what's the best: implement these heuristics and local search in own opl or in java or even C++?
Thanks in advance!
Just to add some related observations:
Re Ram's point 3: We have had a lot of success with approach b. In particular it is simple to add constraints to fix the some of the variables to values from a known solution, and then re-solve for the rest of the variables in the problem. More generally, you can add constraints to limit the values to be similar to a previous solution, like:
var >= previousValue - 1
var <= previousValue + 2
This is no use for binary variables of course, but for general integer or continuous variables can work well. This approach can be generalised for collections of variables:
sum(i in indexSet) var[i] >= (sum(i in indexSet) value[i])) - 2
sum(i in indexSet) var[i] <= (sum(i in indexSet) value[i])) + 2
This can work well for sets of binary variables. For an array of 100 binary variables of which maybe 10 had the value 1, we would be looking for a solution where at least 8 have the value 1, but not more than 12. Another variant is to limit something like the Hamming distance (assume that the vars are all binary here):
dvar int changed[indexSet] in 0..1;
forall(i in indexSet)
if (previousValue[i] <= 0.5)
changed[i] == (var[i] >= 0.5) // was zero before
else
changed[i] == (var[i] <= 0.5) // was one before
sum(i in indexSet) changed[i] <= 2;
Here we would be saying that out of an array of e.g. 100 binary variables, only a maximum of two would be allowed to have a different value from the previous solution.
Of course you can combine these ideas. For example, add simple constraints to fix a large part of the problem to previous values, while leaving some other variables to be re-solved, and then add constraints on some of the remaining free variables to limit the new solution to be near to the previous one. You will notice of course that these schemes get more complex to implement and maintain as we try to be more clever.
To make the local search work well you will need to think carefully about how you construct your local neighbourhoods - too small and there will be too little opportunity to make the improvements you seek, while if they are too large they take too long to solve, so you don't get to make so many improvement steps.
A related point is that each neighbourhood needs to be reasonably internally connected. We have done some experiments where we fixed the values of maybe 99% of the variables in a model and solved for the remaining 1%. When the 1% was clustered together in the model (e.g. all the allocation variables for a subset of resources) we got good results, while in comparison we got nowhere by just choosing 1% of the variables at random from anywhere in the model.
An often overlooked idea is to invert these same limits on the model, as a way of forcing some changes into the solution to achieve a degree of diversification. So you could add a constraint to force a specific value to be different from a previous solution, or ensure that at least two out of an array of 100 binary variables have a different value from the previous solution. We have used this approach to get a sort-of tabu search with a hybrid matheuristic model.
Finally, we have mainly done this in C++ and C#, but it would work perfectly well from Java. Not tried it much from OPL, but it should be fine too. The key for us was being able to traverse the problem structure and use problem knowledge to choose the sets of variables we freeze or relax - we just found that easier and faster to code in a language like C#, but then the modelling stuff is more difficult to write and maintain. We are maybe a bit "old-school" and like to have detailed fine-grained control of what we are doing, and find we need to create many more arrays and index sets in OPL to achieve what we want, while we can achieve the same effect with more intelligent loops etc without creating so many data structures in a language like C#.
Those are several questions. So here are some pointers and suggestions:
In Cplex, you give your model an initial solution with the use of IloOplCplexVectors()
Here's a good example in IBM's documentation of how to alter CPLEX's solution.
Within OPL, you can do the same. You basically set a series of values for your variables, and hand those over to CPLEX. (See this example.)
Limiting the search to a specific neighborhood: There is no easy way to respond without knowing the details. But there are two ways that people do this:
a. change the objective to favor that 'neighborhood' and make other areas unattractive.
b. Add constraints that weed out other neighborhoods from the search space.
Regarding limiting the range of variables in OPL, you can do it directly:
dvar int supply in minQty..maxQty;
Or for a whole array of decision variables, you can do something along the lines of:
range CreditsAllowed = 3..12;
dvar int credits[student] in CreditsAllowed;
Hope this helps you move forward.

Complex derived attributes in Django models

What I want to do is implement submission scoring for a site with users voting on the content, much like in e.g. reddit (see the 'hot' function in http://code.reddit.com/browser/sql/functions.sql). Edit: Ultimately I want to be able to retrieve an arbitrarily filtered list of arbitrary length of submissions ranked according to their score.
My submission model currently keeps track of up and down vote totals. Currently, when a user votes I create and save a related Vote object and then use F() expressions to update the Submission object's voting totals. The problem is that I want to update the score for the submission at the same time, but F() expressions are limited to only simple operations (it's missing support for log(), date_part(), sign() etc.)
From my limited experience with Django I can see 5 options here:
extend F() somehow (haven't looked at the code yet) to support the missing SQL functions; this is my preferred option and seems to fit within the Django framework the best
define a scoring function (much like reddit's 'hot' function) in my database, and have Django use the value of that function for the value of the score field; as far as I can tell, #2 is not possible
wrap my two step voting process in a suitably isolated transaction so that I can calculate the voting totals in Python and then update the Submission's voting totals without fear that another vote against the submission could be added/changed in the meantime; I'm hesitant to take this route because it seems overly complex - what is a "suitably isolated transaction" in this case anyway?
use raw SQL; I would prefer to avoid this entirely -- what's the point of an ORM if I have to revert to SQL for such a common use case as this! (Note that this coming from somebody who loves sprocs, but is using Django for ease of development.)
(edit: added this after further discussion) compute the score using an extra select parameter containing a call to my function; this would work but impose unnecessary load on the DB (would be forced to calculate the score for every submission ever made every time the query ran; caching could help here, but it still seems like a bit of lame workaround)
Before I embark on this mission to extend F() (which I'm not sure is even possible), am I about to reinvent the wheel? Is there a more standard way to do this? It seems like such a common use case and yet in an hour of searching I have yet to find a common solution...
EDIT: There is another option: set the default value of the field in the database script to be an expression containing my function. This is not as flexible as #1, but probably the quickest and cleanest approach to solving the problem (although my initial investigation into extending F() looks promising).
Why can't you just denormalize the score and reconstruct it with the Vote objects every once and a while?
If you can't do that, it is very easy to make a 'property' function that acts as an object attribute for scoring.
#property
def score(self):
... calculate score from Vote objects ...
return score
I've never used F() on a property like this, but it's Python, so I bet it works.
If you are using django-voting (which I recommend), you can put #3 in the manager's record_vote function since that's how all vote transactions take place.