Multipart identifiers in Microsoft.SqlServer.TransactSql.ScriptDom that are not tables - scriptdom

I'm trying to use Microsoft.SqlServer.TransactSql.ScriptDom to check that an expression is a scalar constant.
Here is such an expression:
DATEADD(YEAR, -21, CURRENT_TIMESTAMP)
Here is not such an expression:
DATEADD(YEAR, -21, DateOfBirth)
It is not a constant because it references the column DateOfBirth.
How can I determine this?
What I didn't expect -- and why I've run into trouble -- is that Microsoft.SqlServer.TransactSql.ScriptDom thinks that YEAR is a ColumnReferenceExpression.

(too long for comment)
ScriptDom does not compile, just parses and treats all "strange names" as possible column names, e.g. in IF (MAGICNAME = 0) will be detected a "column" named MAGICNAME. If you want more, you have to add more intelligence to this process by yourself.
This can be done by making additional visitor classes to be used as nested parsers. And by storing lists of "known magic words relevant to specific cases". Which in given case may lead to code which:
catches udf
checks if it is a one of well known functions
invokes nested visitor class which understands more about this specific function
In this approach a specific visitor for DATEADD (or all the date handling functions) might have the list of words YEAR, MONTH and so on to change the understanding of first argument from "possible column" to "known static magic word".
Given task can hardly be accomplished in general, for any possible case, however it looks like many cases can be handled correctly. An idea is to implement "duck typing" approach:
detect expressions which can possibly be scalar and "constant" and take a deeper look on them only
in deeper look recursively apply this approach to all expression arguments
if none of them violates your understanding of "scalar constant expression" - then it is one

Related

Arity of BETWEEN expression

What is the arity of the sql BETWEEN expression? I thought it was three (ternary) since the expression usually looks like:
WHERE...
1 BETWEEN 2 AND 3
But it's listed as binary on BigQuery's documentation, and I assume other places as well.
Source: Operators.
What is the arity of the BETWEEN expression and why? I think the answer is 3 from the following example:
select
~ (SELECT -1 AS expr_1) AS 'bitwise_arity_1',
(SELECT 1 AS expr_1) * (SELECT 2 AS expr_2) AS 'times_arity_2',
(SELECT 1 AS expr_1) BETWEEN
(SELECT 2 AS expr_2) AND (SELECT 3 AS expr_3) AS 'bitwise_arity_3?'
I suppose one way to interpret it might just be that the grammar is:
expr 'BETWEEN' logicalAndExpr
And so the two expressions in the logicalAnd are just grouped into one. Is that a correct understanding?
SQLFiddle: http://www.sqlfiddle.com/#!9/b28da2/2156
It's binary, in syntactic terms. See below for a discussion of syntax vs. semantics, where I note that a better syntactic term is "infix".
Similarly, function calls and array subscripting are postfix unary operators and the C family's conditional operator (often misnamed "the ternary operator" as though it were the only such thing) is also infix. The reason is that the interior operands (the operands between BETWEEN...AND, (...), [...], and ?...:, respectively) are fenced off from the rest of the syntax by the pair of surrounding terminal tokens which function as a syntactic barrier, like parentheses. Precedence does not penetrate to the enclosed operands; only the outer operand(s) remain floating in the syntax.
The semantic view is quite different, of course. BETWEEN...AND and ?...: are certainly three-argument functions, although since the latter is short-circuiting, only two of the three arguments are ever evaluated, which makes it hard to discuss in strict mathematical terms [Note 1]. Moreover, the semantic view is complicated by the fact that there is not just a single way to look at what an argument is. As noted in a comment, you can always curry functions into a series of unary applications of higher-order functions. Although you might be tempted to try to redefine "arity" as the length of that sequence, you will soon find higher-order functions which have different sequence lengths depending on the values of their arguments. Also, in most programming languages (unlike SQL) the function being called is a full expression which does not need to be evaluated at compile-time, and since different functions have different argument counts, there is no good way to describe the arity of a function call unless you respecify the call to be the application of a list-of-arguments object to a callable object. That's often done, but it's a bit unsatisfying because (in most languages), the list object does not really exist and cannot be observed as an object.
I'd suggest taking the Wikipedia article on arity with a good-sized saline dosage, because it completely misses the distinction between semantics and syntactic structure, giving rise to the confusing ambiguity between the semantic and syntactic view of SQL's range operator or C's conditional operator. Personally, I prefer to reserve "arity" for the semantic meaning, using "fixity" or "valence" for the syntactic feature. (The advantage of "fixity" is that it encourages the distinction between prefix and postfix, which is a real distinction hidden by calling both cases "unary operators".)
Notes
BETWEEN...AND could short-circuit, too, but standard SQL doesn't guarantee short-circuiting, as far as I know (although some SQL implementations do.)

Report Earliest Item in List

I am using Snap! to try to find the earliest item in a list. For instance, in list [3,1,2], I would like to report "1." I would like the solution to work for words as well (for instance, given list [Bob, George, Ari] report "Ari").
I tried to use recursion to solve the problem
and the solution works. However, I cannot find a way to do so recursively without the second "if else" statement. Is there a way to use recursion to solve this problem without the "if 0= length of..." statement?
Play with it here.
I don't see a way to do this without two if...else statements. You need two checks:
Is the list exhausted?
Is the first element less than all the following elements?
In some languages, you can use the conditional ternary operator ?:, but I don't think Snap! supports that. It's really just syntactic sugar for an if...else anyway.
You can do some clean-up on this function, though.
I recommend explicitly handling the case of a zero-length list.
"Earliest" is confusing. I recommend the term "least", since you're checking with the "less than" operator.
Don't call keep items such that [] from [] multiple times. This is inefficient and potentially a bug if someone modifies one line but forgets to modify the other. Instead, save the result in a script variable.
Don't compare the current first element to every element in the list. This gives the function an O(n^2) run time. Instead, compare it only to the least element so far. This reduces the run time to O(n).
Some of these changes are implemented here:

SQL DIFFERENCE function with names bringing too many results

I have a function that uses the SQL DIFFERENCE function to see if the name of a client is similar to a client already in the database
SELECT ID FROM People p
WHERE DIFFERENCE(p.FullName, #fullName) = 4
Being #fullname a variable passed to the function. The issue I'm having is that if I pass "pedro sanchez" as a parameter, the query will bring me all the Peter's in the database, or if I enter "pablo sanchez", it'll bring record "PEOPLE'S CREDIT UNION".
As I understand the DIFFERENCE function should returns 4 when the two strings are almost identical, but the results I'm having say otherwise.
Is there a way to further specify the resemblance to the DIFFERENCE function, or maybe another approach in finding similar names ?
Difference() is based on soundex(), which in turn -- to be frank -- is a lousy system for comparing strings. Let me add a caveat: it is pretty good for the purpose it was designed for, which is matching last names of people in English. You can read about the rules here and you can try it out here. Using the latter link, you can see that "Pedro" and "People" have the same code, P-140.
Soundex encodes the consonants and basically the first four matching consonants the list it cares about. (Some languages, such as Hawaiian and other Polynesian languages are rather light in consonants. One assumes the designers were not thinking about names in such languages.)
When you are looking for proximity among written strings, Levenshtein distance is a common metric. Unfortunately, SQL Server does not have this functionality built-in, but you can easily find implementations on the web. For most real applications, Levenshtein distance is too slow. Happily, the functionality of the full text search component is usually sufficient for most purposes.

How to check the type of an MDX expression?

I am creating some tool to dynamically generating MDX queries. In a part of the query I am generating I need to check whether an expression is a member expression or tuple expression and apply different logic on it. Does anyone have any clue about how I can check the MDX expression type at run time by just using MDX?
To know the exact type of an MDX expression, you would have to write an MDX parser (at least for the expressions that can appear).
There are some rules: something like (x, y) is probably a tuple; and the result of all methods that return a tuple (like StrToTuple, Root, or Item, the latter only if applied to a set) is a tuple, and the result of all methods that return a member (like Ancestor or DefaultMember, but also Item if applied to a tuple) is a member. See http://msdn.microsoft.com/en-us/library/ms145970.aspx for a list of functions classified by type. But you see already the difficulty of the Item method which can either deliver a tuple, or a member, depending on context.
And I would not think that you can easily write an MDX statement that tests the type, as Analysis Services uses automatic type casting which converts a member to a tuple whenever the context needs one.
The best approach from my point of view would be to use syntax that allows both a member and a tuple being used, and to avoid having to know the type.
One approach that could work as well without need to explicitly checking for data type, but just checking if some construct is valid or not would be using the VBA function IsError as follows:
IIf(IsError(x.Level, <something avoiding the Level function>, <use x.Level>)

Efficient method for storing simple regular expressions

I have a list of simple regular expressions:
ABC.+DE.+FHIJ.+
.+XY.+Z.+AB
.+KLM.+NO.+J.+
QRST.+UV
they all have alternating patterns of .+ and some text (I will call "words") repeated some number of times. A pattern may or may not begin or end in .+. These regular expression are all mutually exclusive. When another regex is added I want to remove any other matching regular expressions, and add one regular expression that combines the added one with all of its matches. For example, adding:
.+J.+
would match,
ABC.+DE.+FHIJ.+
.+KLM.+NO.+J.+
and thus, these would be remove and replaced with the added regular expression resulting in:
.+J.+
.+XY.+Z.+AB
QRST.+UV
I need to store these patterns either in some data structure or (preferably) in a database in an efficient manner. I first tried a tree of dictionaries, only to realize that in the case that a regex starts with a .* it has to search the entire tree for the next word, which is order O(2^n). Unfortunately, (unless I am mistaken) it appears that neither SQLite (which I am using) nor any other relational database that I have used, supports "regular expression" as a data type. My question is, is there an efficient method for storing and retrieving such simple regular expressions? If there is no canned method, is there some data structure that would be relatively efficient (say, at worst amortized polynomial time)?
Could you please explain what you are using these regular expressions for as that would make it easier to provide a better answer? In particular when I see the way you are splitting your regular expressions I'm wondering if a Trie or a Directed acyclic word graph would be a better fit.
From their you may find your answer is as simple as providing better normalization or finding an alternative no SQL db made specifically for your problem area.