When we have multiple terms to locate an element we can use a single predicate with logical and operator inside it or to use multiple predicates with single term inside each predicate.
For example on this page we can locate links to questions containing selenium in their links with this XPath:
"//a[#class='s-link'][contains(#href,'selenium')]"
and with this
"//a[#class='s-link' and contains(#href,'selenium')]"
I'm wondering if there are any differences between these 2 approaches?
The expressions are equivalent provided that neither of the predicates is positional. A predicate is positional if (a) its value is numeric, or (b) its value depends on the current position in the dynamic context (for practical purposes, that means if it's an expression that calls the position() function).
Assuming this condition is met, there is no good reason for preferring one expression over the other. It's possible of course that some XPath processor might evaluate one of them faster than the other, but it's very unlikely to be a signficant difference, and it could equally well favour either of the two.
There are minor differences in the degree of freedom offered to processors in how far they can go in optimising the constructs in ways that affect error handling, and these rules vary slightly between XPath versions, but again this is very unlikely to be significant in practice.
The equivalence doesn't apply to positional predicates because (a) when you write A[X][Y], the value of position() within Y is different from its value within X, and (b) if X or Y is numeric then it is interpreted as position()=X (or position()=Y), and this doesn't apply to the operands of and: you can't rewrite A[#code][1] as A[#code and 1].
There may be differences in the algorithm that evaluates these expressions, but the result is the same. Things would be different if the second condition contained position() or last():
//a[#class='s-link'][position() > 1] gives all s-link anchors except the first (because position() is the position in the nodeset //a[#class='s-link']) whereas
//a[#class='s-link' and position() > 1] gives all s-link anchors that come after the first anchor overall (because position() is the position in the nodeset //a).
Also, you can select the first s-link anchor with //a[#class='s-link'][1], but not with //a[#class='s-link' and 1].
Related
What is the arity of the sql BETWEEN expression? I thought it was three (ternary) since the expression usually looks like:
WHERE...
1 BETWEEN 2 AND 3
But it's listed as binary on BigQuery's documentation, and I assume other places as well.
Source: Operators.
What is the arity of the BETWEEN expression and why? I think the answer is 3 from the following example:
select
~ (SELECT -1 AS expr_1) AS 'bitwise_arity_1',
(SELECT 1 AS expr_1) * (SELECT 2 AS expr_2) AS 'times_arity_2',
(SELECT 1 AS expr_1) BETWEEN
(SELECT 2 AS expr_2) AND (SELECT 3 AS expr_3) AS 'bitwise_arity_3?'
I suppose one way to interpret it might just be that the grammar is:
expr 'BETWEEN' logicalAndExpr
And so the two expressions in the logicalAnd are just grouped into one. Is that a correct understanding?
SQLFiddle: http://www.sqlfiddle.com/#!9/b28da2/2156
It's binary, in syntactic terms. See below for a discussion of syntax vs. semantics, where I note that a better syntactic term is "infix".
Similarly, function calls and array subscripting are postfix unary operators and the C family's conditional operator (often misnamed "the ternary operator" as though it were the only such thing) is also infix. The reason is that the interior operands (the operands between BETWEEN...AND, (...), [...], and ?...:, respectively) are fenced off from the rest of the syntax by the pair of surrounding terminal tokens which function as a syntactic barrier, like parentheses. Precedence does not penetrate to the enclosed operands; only the outer operand(s) remain floating in the syntax.
The semantic view is quite different, of course. BETWEEN...AND and ?...: are certainly three-argument functions, although since the latter is short-circuiting, only two of the three arguments are ever evaluated, which makes it hard to discuss in strict mathematical terms [Note 1]. Moreover, the semantic view is complicated by the fact that there is not just a single way to look at what an argument is. As noted in a comment, you can always curry functions into a series of unary applications of higher-order functions. Although you might be tempted to try to redefine "arity" as the length of that sequence, you will soon find higher-order functions which have different sequence lengths depending on the values of their arguments. Also, in most programming languages (unlike SQL) the function being called is a full expression which does not need to be evaluated at compile-time, and since different functions have different argument counts, there is no good way to describe the arity of a function call unless you respecify the call to be the application of a list-of-arguments object to a callable object. That's often done, but it's a bit unsatisfying because (in most languages), the list object does not really exist and cannot be observed as an object.
I'd suggest taking the Wikipedia article on arity with a good-sized saline dosage, because it completely misses the distinction between semantics and syntactic structure, giving rise to the confusing ambiguity between the semantic and syntactic view of SQL's range operator or C's conditional operator. Personally, I prefer to reserve "arity" for the semantic meaning, using "fixity" or "valence" for the syntactic feature. (The advantage of "fixity" is that it encourages the distinction between prefix and postfix, which is a real distinction hidden by calling both cases "unary operators".)
Notes
BETWEEN...AND could short-circuit, too, but standard SQL doesn't guarantee short-circuiting, as far as I know (although some SQL implementations do.)
I'm trying to use Microsoft.SqlServer.TransactSql.ScriptDom to check that an expression is a scalar constant.
Here is such an expression:
DATEADD(YEAR, -21, CURRENT_TIMESTAMP)
Here is not such an expression:
DATEADD(YEAR, -21, DateOfBirth)
It is not a constant because it references the column DateOfBirth.
How can I determine this?
What I didn't expect -- and why I've run into trouble -- is that Microsoft.SqlServer.TransactSql.ScriptDom thinks that YEAR is a ColumnReferenceExpression.
(too long for comment)
ScriptDom does not compile, just parses and treats all "strange names" as possible column names, e.g. in IF (MAGICNAME = 0) will be detected a "column" named MAGICNAME. If you want more, you have to add more intelligence to this process by yourself.
This can be done by making additional visitor classes to be used as nested parsers. And by storing lists of "known magic words relevant to specific cases". Which in given case may lead to code which:
catches udf
checks if it is a one of well known functions
invokes nested visitor class which understands more about this specific function
In this approach a specific visitor for DATEADD (or all the date handling functions) might have the list of words YEAR, MONTH and so on to change the understanding of first argument from "possible column" to "known static magic word".
Given task can hardly be accomplished in general, for any possible case, however it looks like many cases can be handled correctly. An idea is to implement "duck typing" approach:
detect expressions which can possibly be scalar and "constant" and take a deeper look on them only
in deeper look recursively apply this approach to all expression arguments
if none of them violates your understanding of "scalar constant expression" - then it is one
I am using Snap! to try to find the earliest item in a list. For instance, in list [3,1,2], I would like to report "1." I would like the solution to work for words as well (for instance, given list [Bob, George, Ari] report "Ari").
I tried to use recursion to solve the problem
and the solution works. However, I cannot find a way to do so recursively without the second "if else" statement. Is there a way to use recursion to solve this problem without the "if 0= length of..." statement?
Play with it here.
I don't see a way to do this without two if...else statements. You need two checks:
Is the list exhausted?
Is the first element less than all the following elements?
In some languages, you can use the conditional ternary operator ?:, but I don't think Snap! supports that. It's really just syntactic sugar for an if...else anyway.
You can do some clean-up on this function, though.
I recommend explicitly handling the case of a zero-length list.
"Earliest" is confusing. I recommend the term "least", since you're checking with the "less than" operator.
Don't call keep items such that [] from [] multiple times. This is inefficient and potentially a bug if someone modifies one line but forgets to modify the other. Instead, save the result in a script variable.
Don't compare the current first element to every element in the list. This gives the function an O(n^2) run time. Instead, compare it only to the least element so far. This reduces the run time to O(n).
Some of these changes are implemented here:
What are the operator precedence rules for the DB2 RDBMS engine?
I am looking for explicit rules which mention actual operators instead of precedence relations between groups of operators.
I found this document http://www-01.ibm.com/support/knowledgecenter/SSEPEK_10.0.0/com.ibm.db2z10.doc.sqlref/src/tpc/db2z_precedenceofoperations.dita via googling, but I am looking for something that goes into more detail, e.g. precedence relations between AND, OR and NOT
If you're looking for search-condition operators, then see the Information Center section on Search conditions. Additional details are in the topic, but the basic rules are:
Search conditions within parentheses are evaluated first. If the order
of evaluation is not specified by parentheses, NOT is applied before
AND, and AND is applied before OR. The order in which operators at the
same precedence level are evaluated is undefined to allow for
optimization of search conditions.
From DB2 for z/OS 10.0.0>DB2 reference information>DB2 SQL>Language elements>Expressions>Precedence of operations
Expressions within parentheses are evaluated first. When the order of
evaluation is not specified by parentheses, prefix operators are
applied before multiplication and division, and multiplication,
division, and concatenation are applied before addition and
subtraction. Operators at the same precedence level are applied from
left to right.
I used a case in my order by and it seemed to work
SELECT * FROM TABLE WHERE KEY IN (KEY1 , KEY2) ORDER BY
CASE
WHEN KEY = KEY1 THEN 1
ELSE 2
END
I haven't check with more than 2 keys.
It's over here. Simple google could help you.
I have a list of simple regular expressions:
ABC.+DE.+FHIJ.+
.+XY.+Z.+AB
.+KLM.+NO.+J.+
QRST.+UV
they all have alternating patterns of .+ and some text (I will call "words") repeated some number of times. A pattern may or may not begin or end in .+. These regular expression are all mutually exclusive. When another regex is added I want to remove any other matching regular expressions, and add one regular expression that combines the added one with all of its matches. For example, adding:
.+J.+
would match,
ABC.+DE.+FHIJ.+
.+KLM.+NO.+J.+
and thus, these would be remove and replaced with the added regular expression resulting in:
.+J.+
.+XY.+Z.+AB
QRST.+UV
I need to store these patterns either in some data structure or (preferably) in a database in an efficient manner. I first tried a tree of dictionaries, only to realize that in the case that a regex starts with a .* it has to search the entire tree for the next word, which is order O(2^n). Unfortunately, (unless I am mistaken) it appears that neither SQLite (which I am using) nor any other relational database that I have used, supports "regular expression" as a data type. My question is, is there an efficient method for storing and retrieving such simple regular expressions? If there is no canned method, is there some data structure that would be relatively efficient (say, at worst amortized polynomial time)?
Could you please explain what you are using these regular expressions for as that would make it easier to provide a better answer? In particular when I see the way you are splitting your regular expressions I'm wondering if a Trie or a Directed acyclic word graph would be a better fit.
From their you may find your answer is as simple as providing better normalization or finding an alternative no SQL db made specifically for your problem area.