I have a docx document with some formulas, e.g.
{IF "Name" = "Foo" "Foo" "Bar"}
which should print "Bar" at the end.
In Word I have to press "F9" to get the expression evaluated.
No I am using docx4j, can I somehow tell docx4j to do the evaluation?
I'm afraid not. You can get the expression of course (see this discussion about some classes which help), but there is currently nothing in docx4j to evaluate an IF field for you.
If the objective is to include/exclude text, you'll be able to achieve the same end with an OpenDoPE conditional content control (based on whether an XPath evaluates to true or false). (docx4j can evaluate these; they can also be nested, to support complex content)
Related
What is the order of operations for boolean operators? Left to right? Right to left? Specific operators have higher priority?
For example, if I search for:
jakarta OR apache AND website
What do I get? Is it
Anything with "jakarta" as well as anything with both "apache" and "website"?
Anything with "website" that also has either "jakarta" or "apache"?
Something else?
Short answer:
In Lucene, the AND operator takes precedence over the OR operator. So, you are effectively doing this:
jakarta OR (apache AND website)
You can verify this for yourself by parsing your query string and seeing how it converts AND and OR to the "required" and "optional" operators.
And the NOT operator takes precendence over the AND operator, since we are discussing precedence.
But you need to be very careful when dealing with Lucene's so-called "boolean" operators, as they do not behave the way you may expect based on their collective name ("boolean").
(Unfortunately I have never seen any official documentation which provides a citation for these precedence rules - but instead I am relying on empirical observations. See below for more about that. If the documentation for this does exist, that would be great to see.)
Longer Answer
One key thing to understand is that Lucene boolean operators are not really "boolean" in the sense that you may think, based on Boolean algebra, where you use parentheses to help avoid ambiguity (or where you need to know what rules a programming language may be applying) - and where everything evaluates to TRUE or FALSE.
Lucene boolean operators serve a subtly different purpose.
They are not purely concerned with TRUE/FALSE inclusion/exclusion, but also concerned with how to score results so that the more relevant results have higher scores than less relevant results.
The Lucene query jakarta OR apache AND website is equivalent to the following:
jakarta +apache +website
This means the document's field must contain apache and website, but may also include jakarta (for a higher relevance score).
You can see this for yourself by taking your original query string and parsing it:
Query query = parser.parse(queryString);
...and then printing the resulting string representation of the query. The + operator is the "required" operator. It:
requires that the term after the "+" symbol exist somewhere in the field
And the lack of a + operator means the default of "may" as in "may contain" - meaning the term is optional: it does not need to be present, if there is some other clause in the query which does match a document.
The use of AND forces the terms on either side of the AND to be required.
You can encounter some potentially surprising situations.
Consider this:
foo AND bar OR baz AND bat
This parses to the following:
+foo +bar +baz +bat
This is because the AND operators are transformed to + operators for every term, rendering the OR redundant.
It's the same result as if you had written this:
foo AND bar AND baz AND bat
But not the same as this:
(foo AND bar) OR (baz AND bat)
which is parsed to this, where the parentheses are retained:
(+foo +bar) (+baz +bat)
Bottom Line:
Use parentheses to explicitly make your intentions clear, when using AND and OR and also NOT.
Regarding NOT, since we mentioned it - that takes prescendence over AND.
The query:
foo AND bar NOT baz AND bat
Is parsed as:
+foo +bar -baz +bat
So, a document field must contain foo, bar and bat - and must not contain baz.
Why does this situation exist?
I don't know, but I think Lucene originally did not include AND, OR and NOT - but instead used + (must include), - (must not include) and "nothing" (may include). The so-called boolean operators AND, OR, NOT were added later on, as a kind of "syntactic sugar" for these original operators - introduced for people who were more familiar with AND, OR and NOT from other contexts. I'm basing this on the following thread:
Getting a Better Understanding of Lucene's Search Operators
A summary of that thread is included in this answer about the NOT operator.
I want SpaCy matcher to match keywords (multi-word entities) in a document irrespective of their case.
Token.lemma is case sensitive... So, with this code, I can only find "product preferences" rather than "PRODUCT PREFERENCES" or "Product Preferences" in my document.
pat_piece = ({"LEMMA": token.lemma_.lower()} if is_final_token(token, tmpdoc)
else {"LOWER": token.lower_})
Can someone suggest how I can edit my code to match ALL cases for keywords (i.e., entities)?
With the provided attributes you can only match LOWER or LEMMA, not "lowercase lemma". So if you generate this pattern:
{"LEMMA": "product"}
for a token whose lemma is PRODUCT, it simply won't match.
If you want to match lowercase lemmas, some options:
postprocess the docs to lowercase lemmas before running the matcher (either separately in your script or with a custom pipeline component)
use a custom lemmatizer that produces lowercase lemmas
use a custom extension with a getter to return the lowercase form of the lemma for use with a "_" matcher pattern (a "property extension" as described here: https://spacy.io/usage/processing-pipelines#description)
If your only concern is matching lowercase lemmas, I'd suggest the first option as the easiest to implement and fastest to run in the matcher.
I am load testing an application that has a link that looks like this:
https://example.com/myapp/table?qid=1434e99d-5b7c-4e74-b64e-c24e9564514d&rsid=5c94ddc7-e2e4-4e69-8547-49572486f4d1
I need to get the dynamic value of the rsid so I can use it later in my script.
So far I have tried using the regex extractor and I am probably doing it wrong.
I have tried things like:
name = myvar
regular expression = rsid=(.*?) # didnt work
regular expression = <a href=".*?rsid=(.*?)"> # didnt work
Template = $1$
I have one extractor set up to get the csrf value and that one works as expected but that is also because the csrf value is in the page source.
The above link is NOT in the page source as far as I can see but it DOES show up when I inspect the link. I dont know if that is obfuscation or something else?
How can I extract the value of the rsid? Is the regular expression extractor the right one to use for this?
Should I be using something else?
Is it just a formula issue?
Thanks in advance.
Try something like:
rsid=[0-9A-Fa-f\-]{36}
the above regular expression should match a GUID-like structure and your rsid seems to be an instance of it.
Demo:
Also be aware of the Boundary Extractor, it's sufficient to specify "left" and "right" boundaries and it will extract everything in-between. In general coming up with "boundaries" is much easier than creating a regular expression, it's more readable and JMeter processes the Boundary Extractors much faster. More information: The Boundary Extractor vs. the Regular Expression Extractor in JMeter
I am trying to take the Information from an element from the middle on.
And this value is only displayed this way
see image:
It would be the value "info_se"
You need to escape ? sign (as well as other meta characters) in your regular expression with a back slash so the whole expression would be something like:
a href="#" data-url="Cervello/Release.aspx\?info_s=(.+?)"
Demo:
References:
JMeter Regular Expressions
Using RegEx (Regular Expression Extractor) with JMeter
Perl 5 Regex Cheat sheet
Use regular expression extractor under your sampler that return the full html. Save in reference name as info and then use it later ${info}
info_s=(\S+)"
Template $1$
Match No. 1
My error was another, this "info_s" field was decoded, so the system needs this coded encoding.
I managed to find several that stored this value, and in the parameters of HTTP request, I was informed to code the extracted value of the Extractor
I get, finding in the html the field that stored this value "info_s", decoded, then using the encoder option of jmeter, I was able to capture the correct value.
I want to find any element with a given text on my page, but when I pass it to find without an element it gives me back an error
find(material.attachment_filename) #material.attachment_filename is "01. pain killer.mp3"
But if i do:
find('a',text: material.attachment_filename)
It works fine, and the given error is:
Selenium::WebDriver::Error::InvalidSelectorError:
Given css selector expression "01. pain killer.mp3" is invalid: SyntaxError: An invalid or illegal string was specified
Capybara's find takes 3 arguments (a Capybara selector type, a locator string, and an options hash). If the selector type isn't specified it defaults to :css which means the locator string needs to be a CSS selector.
This means that find(material.attachment_filename) in your case is equivalent to
find(:css, "01. pain killer.mp3")
which will raise an error as you've seen because "01. pain killer.mp3" isn't valid CSS. If you want to find any element containing the text you could do something like
find('*', text: "01. pain killer.mp3")
which will find any element containing the text, however that's also going to find all the ancestor elements too since they also contain the text, So what you'd probably want is to use a regex to make sure the element contains only that content
find('*', /\A#{Regexp.escape(material.attachment_filename)}\z/)
which should be interpreted as
find('*', /\A01\. pain killer\.mp3\z/)
Note: That is going to be pretty slow if your page has anything more than simple content on it because it means transferring all the elements from selenium to capybara to check the text content.
A more performant solution would be to use XPath which has support for finding elements by text content (CSS does not)
find(:xpath, XPath.descendant[XPath.string.n.is(material.attachment_filename)]) #using the XPath library - contains (assuming Capybara.exact == false)
find(:xpath, XPath.descendant[XPath.string.n.is(material.attachment_filename)], exact: true) #using the XPath library - equals (you could also pass exact:false to force contains)
If the text won't contain XPath special characters (ex. apostrophes) that need escaping, you can use a string to define the XPath:
find(:xpath, ".//*[contains(., '#{material.attachment_filename}')]") #contains the text
find(:xpath, ".//*[text()='#{material.attachment_filename}']") #equals the text
If the element is actually a link you're looking for though then you would probably want to use
find_link("01. pain killer.mp3")
or
find(:link, "01. pain killer.mp3")