Split and find specific text? - vb.net

ok so i've made a HTTPWEBREQUEST and i've made the source of the result show in a richtextbox, Now say i have this in the richtextbox
<p>Short URL: <code>http://URL.me/u/eywnp</code></p>
How would i go about just getting the "http://URL.me/u/eywnp" ive tried split but didnt work, guess i'm doing it wrong?
NOTE the URL will be different everytime

Split isn’t the right tool for the job. It will result in a rather complex piece of code that’s quite brittle (meaning it will break as soon as there’s the slightest change in the input).
For a robust, well-written solution you need to parse the HTML properly. Luckily there exist canned solutions for that: The HtmlAgilityPack library.
Dim doc As New HtmlDocument()
doc.LoadHtml(yourCode)
Dim result = doc.DocumentElement.SelectNodes("//a[#href]")(0)("href")
The only complicated part here is the string "//a[#href]". This is an XPath string. XPath strings are a mini-language that is used to address elements in an HTML or XML document. They are conceptually similar to file paths (like C:\Users\foo\Documents\file.txt) but with a slightly different syntax.
The XPath simply selects all the <a> elements having a href attribute from your document. Then you can grab the first of that collection and retrieve the href attribute’s value.

Thanks for all your help, i did find a solution and i used
Dim iStartIndex, iEndIndex As Integer
With RichTextBox1.Text
iStartIndex = .IndexOf("<p>Short URL: <code><a href=") + 29
iEndIndex = .IndexOf(""">", iStartIndex)
Clipboard.SetText(.Substring(iStartIndex, iEndIndex - iStartIndex))
End With
works perfect so far

Related

Using XPath In Cycle For

I'm trying to solve one simple problem that i can't understand how i could solve in excel vba.
I have one cycle for and a Xpath with 24 elements and i want to get text from each element, but when i use xpath shows me that i put the wrong xpath. I understand that i put i inside the string of xpath and that get me wrong. I try different approach like using
Name.Add findApp.FindElementByXPath("(//span[#class='offer-item-title'])["&"i]").Text
but nothing seems work. Can someone help me how i could solve this? Thank you so much :)
Code:
for i=0 to 23
Name.Add findApp.FindElementByXPath("(//span[#class='offer-item-title'])[i]").Text
Next i
XPath doesn't know that i is the name of a VB variable, it thinks it is the name of an element in your source document.
You can construct an expression like this:
FindElementByXPath("(//span[#class='offer-item-title'])[" & i & "]")
Or better, but I don't know if VBA offers the capability, is to pass a parameter into the XPath expression -- ideally you only want to compile the XPath expression once, rather than repeating the compilation 23 times, because compiling it typically takes 100 times longer than executing it.
But for this particular example, it would be better to construct an expression that reads everything you want in one go, rather than making 24 separate calls. Incidentally, XPath indexing starts at one, so the call with i=0 will select nothing. Given that this is XPath 1.0, you can do
//span[#class='offer-item-title'])[position() < 24]

unable to perform search on custom_field(JIRA-Python)

I'm getting the below error when I search on custom_field.
{"errorMessages":["Field \'customfield_10029\' does not exist or you do not have permission to view it."],"warningMessages":[]}
But I have enough permissions(Admin) to access that field. And also I enabled the field visible.
URL = 'https://xyz.atlassian.net/rest/api/2/search?jql=status="In+Progress"+and+customfield_10029=125&fields=id,key,status'
Custom fields in JQL searches are referenced using the abbreviation 'cf' followed by their ID inside square brackets '[id]', so your URL would be:
URL =
'https://xyz.atlassian.net/rest/api/2/search?jql=status="In+Progress"+and+cf[10029]=125&fields=id,key,status'
Make sure you properly encode the square brackets in UTF-8 format in your language's encoding method.
PS. Generally speaking, it's much easier to reference custom fields in JQL searches by their names, not their IDs. It makes the search URL easier to read and understand what is being searched for.
I get a 400 response code with customized field syntax:
https://domain/rest/api/2/search?maxResults=500&jql=cf[10025]='xxxxxxxxxd'&fields=id,key,issuetype,status,customfield_10025

Beautiful Soup find string and then the nth element down

In Beautiful Soup is it possible to search for a text string and then from there find the nth element down?
Update
I am Trying to target the following code field to grab the text. I tried a soup find and findall however I have other pages that I want to target that are just slightly different so I need something really robust
My Plan
Go to page
Find the string Model name
Find nth element down, in this case the next anchor tag
My code to find string
model_name=soup.find(text='Model name')
print model_name
Ok I got it, its actually really simple. The solution I found gets a little messy but it works
All you got to use is the next operator so the code looks like this
model_name=soup.find(text='Model name').next.text
adding as many next operators until you reach the target.

Getting the RIGHT word count of a PDF file

The response in this topic helped me understand why sometimes my
PDF fails to find a word and why I keep getting different word counts when using
different PDF word count programs. I decided to use xpdf. I converted it to text
and added the -layout tag and then opened the resulting text file with Word 2003.
I noted the word count. Then I decided, unfortunately, to remove the -layout tag.
This time, though, the word count is different.
Why did that tag affect the word count? Is there an accurate way to find the word count
of a PDF file? I would even pay for such software if I have to so long as it gives me
the right number of words.
(I checked another topic but thought I'd find out if the solution I just offered would solve everything. There was another topic where advancedpdf was recommended.)
I'd like to argue that there is no reliable word counting. One could, for example, just to make your life harder, put each character of this lovely Stackoverflow answer into a single text object and position such objects such that, only when rendered, gives a meaningful paragraph to humans. Like this:
<html><body><style>
div {float: left;}
</style><div><p>S</p></div><div><p>t</p></div><div><p>a</p></div>
<div><p>c</p></div><div><p>k</p></div>
I would suggest an open source solution using Java. First you would have to parse the pdf file and extract all the text using Tika.
Then i believe you can achieve this simply by scanning the extracted text and counting the words.
Sample code would look like this:
if (f.getName().endsWith(".txt"))
{
in = new BufferedReader(new FileReader(f));
StringBuilder sb = new StringBuilder();
String s = null;
while ((s = in.readLine()) != null)
sb.append(s);
String[] tokenizedTerms = sb.toString().replaceAll("[\\W&&[^\\s]]", "").split("\\W+"); //to get individual terms
}
In tokenizedTerms array , you wil have all the terms(words) of the document and you can count them by calling tokenizedTerms.length(). Hope this was useful. :-)

Strip HTML from string in SSRS 2005 (VB.NET)

my SSRS DataSet returns a field with HTML, e.g.
<b>blah blah </b><i> blah </i>.
how do i strip all the HTML tags? has to be done with inline VB.NET
Changing the data in the table is not an option.
Solution found ... = System.Text.RegularExpressions.Regex.Replace(StringWithHTMLtoStrip, "<[^>]+>","")
Thanx to Daniel, but I needed it to be done inline ... here's the solution:
= System.Text.RegularExpressions.Regex.Replace(StringWithHTMLtoStrip, "<[^>]+>","")
Here are the links:
http://weblogs.asp.net/rosherove/archive/2003/05/13/6963.aspx
http://msdn.microsoft.com/en-us/library/ms157328.aspx
Here's a good example using Regular Expressions: https://web.archive.org/web/20210619174622/https://www.4guysfromrolla.com/webtech/042501-1.shtml
If you know the HTML is well-formed enough, you could, if you make sure it has a root node, convert the data in that field into a System.Xml.XmlDocument and then get the InnerText value from it.
Again, you will have to make sure the text has a root node, which you can add yourself if needs be, since it will not matter, and make sure the HTML is well formed.
If you don't want to use regular expressions (for example if you need better performance) you could try a small method I wrote a while ago, posted at CodeProject.
I would go to Report Properties and then code and add the following
Dim mRemoveTagRegex AS NEW System.Text.RegularExpressions.Regex("<(.|\n)+?>", System.Text.RegularExpressions.RegexOptions.Compiled)
Function RemoveHtml(ByVal text As string) AS string
If text IsNot Nothing Then
Return mRemoveTagRegex.Replace(text, "")
End If
End Function
Then you can use Code.RemoveHtml(Fields!Content.Value) to remove the html tags.
In my opinion this is preferable then having multiple copies of the regex.