Strip HTML from string in SSRS 2005 (VB.NET) - vb.net

my SSRS DataSet returns a field with HTML, e.g.
<b>blah blah </b><i> blah </i>.
how do i strip all the HTML tags? has to be done with inline VB.NET
Changing the data in the table is not an option.
Solution found ... = System.Text.RegularExpressions.Regex.Replace(StringWithHTMLtoStrip, "<[^>]+>","")

Thanx to Daniel, but I needed it to be done inline ... here's the solution:
= System.Text.RegularExpressions.Regex.Replace(StringWithHTMLtoStrip, "<[^>]+>","")
Here are the links:
http://weblogs.asp.net/rosherove/archive/2003/05/13/6963.aspx
http://msdn.microsoft.com/en-us/library/ms157328.aspx

Here's a good example using Regular Expressions: https://web.archive.org/web/20210619174622/https://www.4guysfromrolla.com/webtech/042501-1.shtml

If you know the HTML is well-formed enough, you could, if you make sure it has a root node, convert the data in that field into a System.Xml.XmlDocument and then get the InnerText value from it.
Again, you will have to make sure the text has a root node, which you can add yourself if needs be, since it will not matter, and make sure the HTML is well formed.

If you don't want to use regular expressions (for example if you need better performance) you could try a small method I wrote a while ago, posted at CodeProject.

I would go to Report Properties and then code and add the following
Dim mRemoveTagRegex AS NEW System.Text.RegularExpressions.Regex("<(.|\n)+?>", System.Text.RegularExpressions.RegexOptions.Compiled)
Function RemoveHtml(ByVal text As string) AS string
If text IsNot Nothing Then
Return mRemoveTagRegex.Replace(text, "")
End If
End Function
Then you can use Code.RemoveHtml(Fields!Content.Value) to remove the html tags.
In my opinion this is preferable then having multiple copies of the regex.

Related

How to write xpath for following example?

For example, I have div tag that has two attributes.
class='hello#123' text='321#he#321llo#321'
<div> class='hello#123' text='321#he#321llo#321'></div>
Here, I want to write xpath for both class and text attributes but numbers may change dynamically. ie., "hello#123" may become "345" when we reload. "321#he#321llo#321" may become "567#he#456llo#321".
Note: Need to write xpath in single line not separately.
Assuming that you have the (corrected) two-attribute-HTML
<div class='hello#123' text='321#he#321llo#321'>...</div>
you can select it using the following, for example:
Using the contains() function
//div[contains(#class,'hello') and contains(#text,'#he#')]
This is quite specific and only applicable if the "hello" is always split in the same way
Using the translate() function to mask everything except the chars for "hello"
//div[translate(#class,'#0123456789','')='hello' and translate(#text,'#0123456789','')='hello']
This removes all # chars and digits and checks if the remaining string is "hello"
I guess combining these two approaches you will be able to create your own XPath expression fitting your needs. The patterns you provided were not fully clear, so this may only approach a good enough solution.

JMeter - get value from href

I am load testing an application that has a link that looks like this:
https://example.com/myapp/table?qid=1434e99d-5b7c-4e74-b64e-c24e9564514d&rsid=5c94ddc7-e2e4-4e69-8547-49572486f4d1
I need to get the dynamic value of the rsid so I can use it later in my script.
So far I have tried using the regex extractor and I am probably doing it wrong.
I have tried things like:
name = myvar
regular expression = rsid=(.*?) # didnt work
regular expression = <a href=".*?rsid=(.*?)"> # didnt work
Template = $1$
I have one extractor set up to get the csrf value and that one works as expected but that is also because the csrf value is in the page source.
The above link is NOT in the page source as far as I can see but it DOES show up when I inspect the link. I dont know if that is obfuscation or something else?
How can I extract the value of the rsid? Is the regular expression extractor the right one to use for this?
Should I be using something else?
Is it just a formula issue?
Thanks in advance.
Try something like:
rsid=[0-9A-Fa-f\-]{36}
the above regular expression should match a GUID-like structure and your rsid seems to be an instance of it.
Demo:
Also be aware of the Boundary Extractor, it's sufficient to specify "left" and "right" boundaries and it will extract everything in-between. In general coming up with "boundaries" is much easier than creating a regular expression, it's more readable and JMeter processes the Boundary Extractors much faster. More information: The Boundary Extractor vs. the Regular Expression Extractor in JMeter

Dynamic, Nested Replace

I'm using SQL Server 2008 and need to strip out quite a bit of data within a string. Because of the nature and variability of the string, I think I'm needing to use multiple, nested REPLACE commands. The problem is each REPLACE needs to build on the previous one. Here is a sample of what I'm looking at:
<Paragraph><Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>
Essentially, I need it to return just the text outside of the <> brackets so for this example it would be:
Treatment by test.
Also, I wanted to mention that the strings inside the <> brackets can vary quite a bit for each row both by content and length, but it isn't relevant for what I'm needing other than making it more complex for replacing.
Here is what I've tried:
REPLACE(note,substring(note,patindex('<%>',note),CHARINDEX('>',note) - CHARINDEX('<',note) + 1),'')
And it returns:
<Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>
Somehow I need to keep going with replacing each of the <> brackets but don't know how to proceed. Any help or guidance would be greatly appreciated!!!
Depending on how you have that string holding the HTML fragment available you could try to use something like:
SELECT convert(xml, '<Paragraph><Replacement Id="40B"><Le><Run Foreground="#FFFF0000">Treatment by </Run></Le><Op isFreeText="True"><Run Foreground="#FFFF0000">test</Run></Op><Tr><Run Foreground="#FFFF0000">. </Run></Tr></Replacement></Paragraph>').value('/', 'varchar(255)') as stripped
You convert it to XML and then use the built in xml parser function "value".

Split and find specific text?

ok so i've made a HTTPWEBREQUEST and i've made the source of the result show in a richtextbox, Now say i have this in the richtextbox
<p>Short URL: <code>http://URL.me/u/eywnp</code></p>
How would i go about just getting the "http://URL.me/u/eywnp" ive tried split but didnt work, guess i'm doing it wrong?
NOTE the URL will be different everytime
Split isn’t the right tool for the job. It will result in a rather complex piece of code that’s quite brittle (meaning it will break as soon as there’s the slightest change in the input).
For a robust, well-written solution you need to parse the HTML properly. Luckily there exist canned solutions for that: The HtmlAgilityPack library.
Dim doc As New HtmlDocument()
doc.LoadHtml(yourCode)
Dim result = doc.DocumentElement.SelectNodes("//a[#href]")(0)("href")
The only complicated part here is the string "//a[#href]". This is an XPath string. XPath strings are a mini-language that is used to address elements in an HTML or XML document. They are conceptually similar to file paths (like C:\Users\foo\Documents\file.txt) but with a slightly different syntax.
The XPath simply selects all the <a> elements having a href attribute from your document. Then you can grab the first of that collection and retrieve the href attribute’s value.
Thanks for all your help, i did find a solution and i used
Dim iStartIndex, iEndIndex As Integer
With RichTextBox1.Text
iStartIndex = .IndexOf("<p>Short URL: <code><a href=") + 29
iEndIndex = .IndexOf(""">", iStartIndex)
Clipboard.SetText(.Substring(iStartIndex, iEndIndex - iStartIndex))
End With
works perfect so far

Change Url using Regex

I have url, for example:
http://i.myhost.com/myimage.jpg
I want to change this url to
http://i.myhost.com/myimageD.jpg.
(Add D after image name and before point)
i.e I want add some words after image name and before point using regex.
What is the best way do it using regex?
Try using ^(.*)\.([a-zA-Z]{3,5}) and replacing with \1D\2. I'm assuming the extension is 3-5 alphanumeric numbers but you can modify it to suit. E.g. if it's just jpg images then you can put that instead of the [a-zA-Z]{3,5}.
Sounds like a homework question given the solution must use a regex, on that assumption here is an outline to get you going.
If all you have is a URL then #mathematical.coffee's solution will suit. However if you have a chunk of text within which is one or more URLs and you have to locate and change just those then you'll need something a little more involved.
Look at the structure of a URL: {protocol}{address}{item}; where
{protocol} is "http://", "ftp://" etc.;
{address} is a name, e.g. "www.google.com", or a number, e.g. "74.125.237.116" - there will always be at least one dot in the address; and
{item} is "/name" where name is quite flexible - there will be zero or more items, you can think of them as directories and a file but this isn't strictly true. Also the sequence of items can end in a "/" (including when there are zero of them).
To make a regex which matches a URL start by matching each part. In the case of the items you'll want to match the last in the sequence separately - you'll have zero or more "directories" and one "file", the latter must be of the form "name.extension".
Once you have regexes for each part you just concatenate them to produce a regex for the whole. To form the replacement pattern you can surround parts of your regex with parentheses and refer to those parts using \number in the replacement string - see #mathematical.coffee's solution for an example.
The best way to learn regexs is to use an editor which supports them and just experiment. The exact syntax may not be the same as NSRegularExpression but they are mostly pretty similar for the basic stuff and you can translate from one to another easily.