Find string that is not between specific html tag - regex-lookarounds

I'm being required to use regular expressions to parse HTML. I do realize regular expressions are bad for HTML matching.
I would like to find a specific string and evaluate whether or not its between two strings.
In this example ® must be immediately between <sup> and </sup>
Example:
<sup>®</sup>
I believe this would involve using negative lookaheads and lookbehinds. My first thought would be:
(?<!<sup>)®(?!<\/sup>)
Unfortunately this fails as I don't believe you can do a lookahead and lookbehind in this combination.
Just using the negative-lookahead does work and is probably good enough for my purposes...
®(?!<\/sup>)
...but I'd like to know if it's possible to combine a lookahead and lookbehind in this way. Or is there another technique I should be using?
Thanks in advance

Your initial regex (i.e. (?<!<sup>)®(?!<\/sup>)) is correct, as demonstrated in the example usage at https://www.debuggex.com/r/WyY9y0Zq2Krz_3Xm
However, it works in Python and PCRE, but not in Javascript (you can check by choosing each of them in the dropdown). Javascript does not have negative lookbehind support.

Related

How to write xpath for following example?

For example, I have div tag that has two attributes.
class='hello#123' text='321#he#321llo#321'
<div> class='hello#123' text='321#he#321llo#321'></div>
Here, I want to write xpath for both class and text attributes but numbers may change dynamically. ie., "hello#123" may become "345" when we reload. "321#he#321llo#321" may become "567#he#456llo#321".
Note: Need to write xpath in single line not separately.
Assuming that you have the (corrected) two-attribute-HTML
<div class='hello#123' text='321#he#321llo#321'>...</div>
you can select it using the following, for example:
Using the contains() function
//div[contains(#class,'hello') and contains(#text,'#he#')]
This is quite specific and only applicable if the "hello" is always split in the same way
Using the translate() function to mask everything except the chars for "hello"
//div[translate(#class,'#0123456789','')='hello' and translate(#text,'#0123456789','')='hello']
This removes all # chars and digits and checks if the remaining string is "hello"
I guess combining these two approaches you will be able to create your own XPath expression fitting your needs. The patterns you provided were not fully clear, so this may only approach a good enough solution.

JMeter - get value from href

I am load testing an application that has a link that looks like this:
https://example.com/myapp/table?qid=1434e99d-5b7c-4e74-b64e-c24e9564514d&rsid=5c94ddc7-e2e4-4e69-8547-49572486f4d1
I need to get the dynamic value of the rsid so I can use it later in my script.
So far I have tried using the regex extractor and I am probably doing it wrong.
I have tried things like:
name = myvar
regular expression = rsid=(.*?) # didnt work
regular expression = <a href=".*?rsid=(.*?)"> # didnt work
Template = $1$
I have one extractor set up to get the csrf value and that one works as expected but that is also because the csrf value is in the page source.
The above link is NOT in the page source as far as I can see but it DOES show up when I inspect the link. I dont know if that is obfuscation or something else?
How can I extract the value of the rsid? Is the regular expression extractor the right one to use for this?
Should I be using something else?
Is it just a formula issue?
Thanks in advance.
Try something like:
rsid=[0-9A-Fa-f\-]{36}
the above regular expression should match a GUID-like structure and your rsid seems to be an instance of it.
Demo:
Also be aware of the Boundary Extractor, it's sufficient to specify "left" and "right" boundaries and it will extract everything in-between. In general coming up with "boundaries" is much easier than creating a regular expression, it's more readable and JMeter processes the Boundary Extractors much faster. More information: The Boundary Extractor vs. the Regular Expression Extractor in JMeter

Usage of Regular Expression Extractor JMeter?

Using Regular Extractor in JMeter, I need to get the value of "fullBkupUNIXTime" from the below response,
{"fullBackupTimeString":["Mon 10 Apr 2017 14:14:36"],"fullBkupUNIXTime":["1491833676"],"fullBackupDirName":["10_04_2017_0636"]}
I tried with Ref Name as time and
Regular Expression: "fullBkupUNIXTime": "([0-9])" and "(.+?)"
and pass them as input for 2nd request ${time}
The above 2 two doesn't work out for me.
Please Help me out of this.
First of all: why not just use this thing?
Then, if you firm with your RegExp adventure to get happen.
First expression is not going to work because you've defined it to match exactly one [0-9] charcter.
Add the appropriate repetition character, like "fullBkupUNIXTime": "([0-9]+)".
And basically it make sense to tell the engine to stop at first narrowest match too: "fullBkupUNIXTime": "([0-9]+?)"
Next, make sure you're handling space chars between key and value and colon mark properly. Better mark them explicitly, if any, with \s
And last but not least: make sure you're properly handle multiple lines (if appropriate, of course). Add the (?m) modifier to your expression.
And/or (?im) to be not case-sensitive, in addition.
[ is a reserve character in regex, you need to escape it, in your case use:
Regular Expression fullBkupUNIXTime":\["(\d+)
Template: $1$
Match No.: 1

IEnumString searching substrings - possible?

I've implemented auto completion to a combobox like this article shows. Is it possible to make it search for substrings instead of just the beginning of the words?
http://www.codeproject.com/Articles/2371/IAutoComplete-and-custom-IEnumString-implementatio
I haven't found any way to customize how IEnumString/IAutoComplete compares the strings. Is it possible?
The built in search options help a bit but it is complete chaos. To find instring matches you need to set flag AcoWordFilter. But this will prevent from numbers being matched!! However, there is a trick to get the numbers to match: preced with a double-quote as in "3 to find a string containing or starting with "3". Some more chaos? In the AcoWordFilter you also need to prefix other characters not considered part of a "word", eg. you need to prefix parentheses with a " but then you will not find parentheses at the first position!
So the solution is either to create your own implementation of IAutoComplete or offer the user to switch between the modes (a bit awkward).
I dont think that the MS engineers are especially proud of such chaos. How about one more option: AcoSearchAnwhere?
After retrieving the Edit control's IAutoComplete interface, query it for an IAutoComplete2 interface. Calling its SetOptions member you can disable prefix filtering by specifying the ACO_NOPREFIXFILTERING AUTOCOMPLETEOPTIONS.
This is available on Windows Vista and later. If you need a solution that works with pre-Vista versions, you'll have to write your own.

Change Url using Regex

I have url, for example:
http://i.myhost.com/myimage.jpg
I want to change this url to
http://i.myhost.com/myimageD.jpg.
(Add D after image name and before point)
i.e I want add some words after image name and before point using regex.
What is the best way do it using regex?
Try using ^(.*)\.([a-zA-Z]{3,5}) and replacing with \1D\2. I'm assuming the extension is 3-5 alphanumeric numbers but you can modify it to suit. E.g. if it's just jpg images then you can put that instead of the [a-zA-Z]{3,5}.
Sounds like a homework question given the solution must use a regex, on that assumption here is an outline to get you going.
If all you have is a URL then #mathematical.coffee's solution will suit. However if you have a chunk of text within which is one or more URLs and you have to locate and change just those then you'll need something a little more involved.
Look at the structure of a URL: {protocol}{address}{item}; where
{protocol} is "http://", "ftp://" etc.;
{address} is a name, e.g. "www.google.com", or a number, e.g. "74.125.237.116" - there will always be at least one dot in the address; and
{item} is "/name" where name is quite flexible - there will be zero or more items, you can think of them as directories and a file but this isn't strictly true. Also the sequence of items can end in a "/" (including when there are zero of them).
To make a regex which matches a URL start by matching each part. In the case of the items you'll want to match the last in the sequence separately - you'll have zero or more "directories" and one "file", the latter must be of the form "name.extension".
Once you have regexes for each part you just concatenate them to produce a regex for the whole. To form the replacement pattern you can surround parts of your regex with parentheses and refer to those parts using \number in the replacement string - see #mathematical.coffee's solution for an example.
The best way to learn regexs is to use an editor which supports them and just experiment. The exact syntax may not be the same as NSRegularExpression but they are mostly pretty similar for the basic stuff and you can translate from one to another easily.