Difference between sequential and combined predicates - selenium

In Selenium I have written a xpath and both of them retrieves the same result.
//a[#role='tab'][text()=' Assets']
//a[#role='tab' and text()=' Assets']
Does both of them have the same meaning?

In most cases a[b][c] has exactly the same effect as a[b and c]. There are two exceptions to be aware of:
They are not equivalent if either predicate is numeric, or has a dependency on position() or last() (I call these positional predicates). For example a[#x][1] selects the first a element that has an #x attribute, while a[1][#x] selects the first a element provided it has an #x attribute (and selects nothing otherwise). By contrast a[1 and #x] converts the integer 1 to the boolean true(), so it just means a[#x].
There may be differences in behaviour if evaluation of b or c fails with a dynamic error. The precise rules here depend on which version of XPath you are using, and to be honest the rules leave implementations some leeway, but you need to exercise care if you want to be sure that in the event of b being false, c is not evaluated. (This hardly matters in XPath 1.0 because very few expressions throw dynamic errors.)

When you add Square Brackets ([]) to XPath you are adding a condition, so
first row adding 2 conditions
Which produce similar results as adding condition with and
Normally you don't use first row, because it less readable,
Mainly because this syntax represent in other languages a Matrix
// return a random m-by-n matrix with values between 0 and 1
public static double[][] random(int m, int n) {
See tutorial:
5 XPaths with predicates
A predicate is an expression that can be true or false
It is appended within [...] to a given location path and will refine results
More than one predicate can be appended to and within (!) a location path

The first one is a predicate, which means it checks if a[#role='tab'] is true then it proceeds to [text()=' Assets']
The second one is a just using an and operator so it validates both are true.

Related

Fast lookup of tree with placeholders?

For an application I'm considering, there would be a large (100,000+) 'database' of trees (think expressions in a programming language, or S-expressions), and I would need to query that database for expressions that match a specific given expression.
Before giving the details of what I'd like to have, note that I'd appreciate any information related to indexing a large set of trees for optimizing lookup by a subtree.
In my specific situation (which would be for a backend to be used by Metamath proof assistants), expressions have the following structure (in Haskell-like notation):
data Expression = Placeholder Id | VarName Id | ConstName Id [Expression]
or as a BNF for an S-expression form:
Expression = '?' Id | Id | '(' Id Expression* ')'
where Id is some kind of identifier.
For example, I could have a database with expressions like
(equiv ?ph ?ps)
(not (in (appl (sqrt) (2)) (Q)))
(equiv (eq ?A ?B) (forall ?x (equiv (in ?x ?A) (in ?x ?B))))
In this context, two expressions match if they can be made equal by substitution of expressions for placeholders. So looking up (equiv (eq A (emptyset)) ?ph) in the above mini-database would result in the first and last expressions.
So again: how would I implement fast lookups in a large set of (expression) trees with placeholders? What kind of index data structure could I use?
I would implement the lookup with a trie. Each key would consist of one of the following:
ConstName Identifier
Variable w/ context info
ConstValue
Placeholder
These should be ordered in some fashion- possibly Placeholder, then all ConstNames (alphabetical), then variables (scope ordering, then argument order), then ConstValues (numerical order). As long as there's a concrete ordering for usage in the trie, you're fine.
Traverse the expression's tree, injecting the appropriate keys into the trie as they are encountered. Do this for all the expressions you want to insert into your data structure. When it comes time to query it, you can traverse the trie in a similar fashion, but with a few new rules.
Everything matches a placeholder node. If it matches some other key as well, then you'll need to explore both branches (easily done via a recursive DFS-like approach).
A placeholder matches everything. This is not equivalent to the previous point- we are talking about placeholders in the query here, the previous bullet is regarding placeholders as trie keys.
Now, this does mean that the search space can somewhat "explode" as you encounter placeholders, but there is one thing you can do to try to mitigate this in practice. Traverse the expression's tree in a breadth-first fashion (both in construction of the trie, and querying). This means if one of the arguments is a placeholder, you won't have to full-depth search every single subtree that matches that expression so far- instead you jump ahead to the next argument- which may not be a placeholder, and will thus greatly prune the search space (compared to matching "everything").
For completeness sake, lets take one of your examples
(not (in (appl (sqrt) (2)) (Q)))
and make a trie entry from that-
not -> in -> apply -> "Q" -> sqrt -> 2
adding (not (in ?ph E)) to this would result in-
not -> in -> apply -> "Q" -> sqrt -> 2
\-> ?ph -> "E"
Continue in this fashion injecting expressions into the trie. Also traverse in this fashion for querying until you reach the ends of your searches into the trie, and return those that matched.
Note- the uniqueness of these entries is based on the assumption you do not have to support variadic functions. If you do, attach to each key some context info (read the next paragraphs for info on how to do this) to distinguish which arguments go to which functions
There is one detail I glossed over- variables. If you only want it to match if they are the exact same variable name, then no work is necessary. But this likely isn't what you want; you probably want it to match generic variables as long as they are "consistent" with each other. The way to do this is to assign each variable an identifier that represents the scope of which it was first defined.
The easiest way to do this is just compose an identifier from the concatenation of the argument ordering of its ancestors. That is, if a variable is first defined as the second argument to a function which is the fifth argument to the root function, then we might label it as (5, 2) or (2, 5), whichever makes more sense intuitively. Either way, this will ensure the variable is given a consistent identifier regardless of other variables / functions elsewhere. Then proceed as normal with this new variable name.

SonarLint - questions about some of the rules for VB.NET

The large majority of SonarLint rules that I've come across in Java seemed plausible and justified. However, ever since I've started using SonarLint for VB.NET, I've come across several rules that left me questioning their usefulness or even whether or not they are working correctly.
I'd like to know if this is simply a problem of me using some VB.NET constructs in a suboptimal way or whether the rule really is flawed.
(Apologies if this question is a little longer. I didn't know if I should create a separate question for each individual rule.)
The following rules I found to leave some cases unconsidered that would actually turn up as false-positives:
S1871: Two branches in the same conditional structure should not have exactly the same implementation
I found this one to bring up a lot of false-positives for me, because sometimes the order in which the conditions are checked actually does matter. Take the following pseudo code as example:
If conditionA() Then
doSomething()
ElseIf conditionB() AndAlso conditionC() Then
doSomethingElse()
ElseIf conditionD() OrElse conditionE() Then
doYetAnotherThing()
'... feel free to have even more cases in between here
Else Then
doSomething() 'Non-compliant
End If
If I wanted to follow this Sonar rule and still make the code behave the same way, I'd have to add the negated version of each ElseIf-condition to the first If-condition.
Another example would be the following switch:
Select Case i
Case 0 To 40
value = 0
Case 41 To 60
value = 1
Case 61 To 80
value = 3
Case 81 To 100
value = 5
Case Else
value = 0 'Non-compliant
There shouldn't be anything wrong with having that last case in a switch. True, I could have initialized value beforehand to 0 and ignored that last case, but then I'd have one more assignment operation than necessary. And the Java ruleset has conditioned me to always put a default case in every switch.
S1764: Identical expressions should not be used on both sides of a binary operator
This rule does not seem to take into account that some functions may return different values every time you call them, for instance collections where accessing an element removes it from the collection:
stack.Push(stack.Pop() / stack.Pop()) 'Non-compliant
I understand if this is too much of an edge case to make special exceptions for it, though.
The following rules I am not actually sure about:
S3385: "Exit" statements should not be used
While I agree that Return is more readable than Exit Sub, is it really bad to use a single Exit For to break out of a For or a For Each loop? The SonarLint rule for Java permits the use of a single break; in a loop before flagging it as an issue. Is there a reason why the default in VB.NET is more strict in that regard? Or is the rule built on the assumption that you can solve nearly all your loop problems with LINQ extension methods and lambdas?
S2374: Signed types should be preferred to unsigned ones
This rule basically states that unsigned types should not be used at all because they "have different arithmetic operators than signed ones - operators that few developers understand". In my code I am only using UInteger for ID values (because I don't need negative values and a Long would be a waste of memory in my case). They are stored in List(Of UInteger) and only ever compared to other UIntegers. Is this rule even relevant to my case (are comparisons part of these "arithmetic operators" mentioned by the rule) and what exactly would be the pitfall? And if not, wouldn't it be better to make that rule apply to arithmetic operations involving unsigned types, rather than their declaration?
S2355: Array literals should be used instead of array creation expressions
Maybe I don't know VB.NET well enough, but how exactly would I satisfy this rule in the following case where I want to create a fixed-size array where the initialization length is only known at runtime? Is this a false-positive?
Dim myObjects As Object() = New Object(someOtherList.Count - 3) {} 'Non-compliant
Sure, I could probably just use a List(Of Object). But I am curious anyway.
Thanks for raising these points. Note that not all rules apply every time. There are cases when we need to balance between false positives/false negatives/real cases. For example with identical expressions on both sides of an operator rule. Is it a bug to have the same operands? No it's not. If it was, then the compiler would report it. Is it a bad smell, is it usually a mistake? Yes in many cases. See this for example in Roslyn. Should we tune this rule to exclude some cases? Yes we should, there's nothing wrong with 2 << 2. So there's a lot of balancing that needs to happen, and we try to settle for an implementation that brings the most value for the users.
For the points you raised:
Two branches in the same conditional structure should not have exactly the same implementation
This rule generally states that having two blocks of code match exactly is a bad sign. Copy-pasted code should be avoided for many reasons, for example if you need to fix the code in one place, you'll need to fix it in the other too. You're right that adding negated conditions would be a mess, but if you extract each condition into its own method (and call the negated methods inside them) with proper names, then it would probably improves the readability of your code.
For the Select Case, again, copy pasted code is always a bad sign. In this case you could do this:
Select Case i
...
Case 0 To 40
Case Else
value = 0 ' Compliant
End Select
Or simply remove the 0-40 case.
Identical expressions should not be used on both sides of a binary operator
I think this is a corner case. See the first paragraph of the answer.
"Exit" statements should not be used
It's almost always true that by choosing another type of loop, or changing the stop condition, you can get away without using any "Exit" statements. It's good practice to have a single exit point from loops.
Signed types should be preferred to unsigned ones
This is a legacy rule from SonarQube VB.NET, and I agree with you that it shouldn't be enabled by default in SonarLint. I created the following ticket in our JIRA: https://jira.sonarsource.com/browse/SLVS-1074
Array literals should be used instead of array creation expressions
Yes, it seems to be a false positive, we shouldn't report on array creations when the size is explicitly specified. https://jira.sonarsource.com/browse/SLVS-1075

Semantic predicates fail but don't go to the next one

I tried to use ANTLR4 to identify a range notation like <1..100>, and here is my attempt:
#parser::members {
def evalRange(self, minnum, maxnum, num):
if minnum <= num <= maxnum:
return True
return False
}
range_1_100 : INT { self.evalRange(1, 100, $INT.int) }? ;
But it does not work for more than one range like:
some_rule : range_1_100 | range_200_300 ;
When I input a number (200), it just stops at the first rule:
200
line 3:0 rule range_1_100 failed predicate: { self.evalRange(1, 100, $INT.int) }?
(top (range_1_100 200))
It is not as I expected. How can I make the token match the next rule (range_200_300)?
Here's an excerpt from the docs (emphasis mine):
Predicates can appear anywhere within a parser rule just like actions can, but only those appearing on the left edge of alternatives can affect prediction (choosing between alternatives).
[...]
ANTLR's general decision-making strategy is to find all viable alternatives and then ignore the alternatives guarded with predicates that currently evaluate to false. (A viable alternative is one that matches the current input.) If more than one viable alternative remains, the parser chooses the alternative specified first in the decision.
Which basically means your predicate must be the first item in the alternation to be taken into account during the prediction phase.
Of course, you won't be able to use $INT as it wasn't matched yet at this point, but you can replace it with something like _input.LA(1) instead (lookahead of one token) - the exact syntax depends on your language target.
As a side note, I'd advise you to not validate the input through the grammar, it's easier and better to perform a separate validation pass after the parse. Let the grammar handle the syntax, not the semantics.

What is the XQuery equivalent of the ESQL COALESCE function?

I'm trying convert WMB 7 mapping nodes to IIB 9 nodes. The auto-convert process turns some ESQL functions to XQuery functions.
Specifically, it turns the ESQL function
COALESCE (var0, var1)
(which returns the first non-null value, as in if var0 = null then var1 else var0) into
XQUERY (var0,var1)
Is it a correct conversion?
If it is, can someone provide a link to API? I couldn't find this on XQuery syntax and operators manuals.
XQuery is not an API, but a standard, and the full syntax can be found online: XQuery 1.0 and XQuery 3.0 (there is no 2.0). You'll also find many manuals, tutorials etc.
XQuery relies on XPath, which is even wider used than XQuery and can be found in libraries for almost every general purpose language.
Your expression is correct XQuery, in that it considers everything a sequence, and the comma concatenates (and flattens) two sequences.
XPath does not know NULL, but it knows xsi:nil and (), the latter being the empty sequence. An empty sequence is removed from the result.
I am not sure what XQuery processor is used underneath, but the correct expression should be ($var0, $var1)[1]2, which works the same way as your COALESCE operation1. In XPath and XQuery, variables are referenced with the $ sign. The number of variables or expressions separated by the comma is unbounded. If all are the empty sequence (null), the result is the empty sequence.
Without [1], it will return all items that are non-null and discard the rest. You can use another index, like [3] to get the third non-null value. If no such value exists, it will return null (empty sequence).
1 which behaves not exactly the way you described it. I believe it behaves like if var0 == null then var1 else var0, it selects the first non-null value (I've updated the OP).
2 as Florent has explained in the comments, a warning with this expression is in place. If you have $var1 := (1, 2) and $var2 := (3, 4), the expression $var1, $var2)[1] will return 1, not (1, 2), because sequences cannot contain subsequences, and indexing a sequence with [x] will return the xth value of the flattened sequence. You can safe-guard your expression with (zero-or-one($var1), zero-or-one($var2))[1].

How to tell if an identifier is being assigned or referenced? (FLEX/BISON)

So, I'm writing a language using flex/bison and I'm having difficulty with implementing identifiers, specifically when it comes to knowing when you're looking at an assignment or a reference,
for example:
1) A = 1+2
2) B + C (where B and C have already been assigned values)
Example one I can work out by returning an ID token from flex to bison, and just following a grammar that recognizes that 1+2 is an integer expression, putting A into the symbol table, and setting its value.
examples two and three are more difficult for me because: after going through my lexer, what's being returned in ex.2 to bison is "ID PLUS ID" -> I have a grammar that recognizes arithmetic expressions for numerical values, like INT PLUS INT (which would produce an INT), or DOUBLE MINUS INT (which would produce a DOUBLE). if I have "ID PLUS ID", how do I know what type the return value is?
Here's the best idea that I've come up with so far: When tokenizing, every time an ID comes up, I search for its value and type in the symbol table and switch out the ID token with its respective information; for example: while tokenizing, I come across B, which has a regex that matches it as being an ID. I look in my symbol table and see that it has a value of 51.2 and is a DOUBLE. So instead of returning ID, with a value of B to bison, I'm returning DOUBLE with a value of 51.2
I have two different solutions that contradict each other. Here's why: if I want to assign a value to an ID, I would say to my compiler A = 5. In this situation, if I'm using my previously described solution, What I'm going to get after everything is tokenized might be, INT ASGN INT, or STRING ASGN INT, etc... So, in this case, I would use the former solution, as opposed to the latter.
My question would be: what kind of logical device do I use to help my compiler know which solution to use?
NOTE: I didn't think it necessary to post source code to describe my conundrum, but I will if anyone could use it effectively as a reference to help me understand their input on this topic.
Thank you.
The usual way is to have a yacc/bison rule like:
expr: ID { $$ = lookupId($1); }
where the the lookupId function looks up a symbol in the symbol table and returns its type and value (or type and storage location if you're writing a compiler rather than a strict interpreter). Then, your other expr rules don't need to care whether their operands come from constants or symbols or other expressions:
expr: expr '+' expr { $$ = DoAddition($1, $3); }
The function DoAddition takes the types and values (or locations) for its two operands and either adds them, producing a result, or produces code to do the addition at run time.
If possible redesign your language so that the situation is unambiguous. This is why even Javascript has var.
Otherwise you're going to need to disambiguate via semantic rules, for example that the first use of an identifier is its declaration. I don't see what the problem is with your case (2): just generate the appropriate code. If B and C haven't been used yet, a value-reading use like this should be illegal, but that involves you in control flow analysis if taken to the Nth degree of accuracy, so you might prefer to assume initial values of zero.
In any case you can see that it's fundamentally a language design problem rather than a coding problem.