vb.net get all attributes value using htmlagilitypack - vb.net

this is the html
<div id="catlist-listview" class="cat-listview cat-listbsize">
<ul>
<li>title1</li>
<li>title2</li>
<li>title3</li>
<li>title4</li>
<li>title5</li>
<li>title6</li>
<li>title7</li>
<li>title8</li>
<li>title9</li>
<li>title10</li>
</ul>
</div>
and my code is
dim htmldoc as new htmldocument
htmldoc.loadhtml(source)
for each link as htmlnode in htmldoc.document.selectnodes("//*[#id='catlist-listview']/ul")
textbox3.text = link.innerhtml
next
the output is
<li>title1</li>
<li>title2</li>
<li>title3</li>
<li>title4</li>
<li>title5</li>
<li>title6</li>
<li>title7</li>
<li>title8</li>
<li>title9</li>
<li>title10</li>
i want get all and only http://wantedlink1 to http://wantedlink10
i try attributes("href") but i get only one link
i want to list all the link like this :
http://wantedlink1
http://wantedlink2
http://wantedlink3
.
.
.
http://wantedlink10
any help ??

Basically, you can change XPath for SelectNodes() to be selecting individual <a> elements instead of <ul>. Then from this point, it will be easy to iterate through the result and get href attribute one by one. Or you achieve the same using LINQ, like the following for example :
'select <a> elements'
Dim links = htmldoc.Document.SelectNodes("//*[#id='catlist-listview']/ul/li/a")
'project to IEnumerable of href attribute value'
Dim hrefs = links.Cast(Of HtmlNode)().Select(Function(x) x.GetAttributeValue("href", ""))
'join the `hrefs`, separated by newline, into one string'
textbox3.text = String.Join(Environment.NewLine, hrefs)
dotnetfiddle demo

Related

Searching for n-th child in n-th parent

There is a strange behaviour when it comes to finding elements by xpath. The situation:
<body>
...
<div class="ingredients-group">
<div class="group-header">
<h3>Title 1</h3>
... other stuff
</div>
</div>
<div class="ingredients-group">
<div class="group-header">
<h3>Title 2</h3>
... other stuff
</div>
</div>
...
I want to check the text of the H3 tag on the second ingredient-group. So I did the following in Selenium:
WebElement group2 = driver.findElement(By.xpath("//div[#class='ingredients-group'][2]"));
WebElement title2 = group2.findElement(By.xpath("//h3"));
String titleText = title2.GetText();
The last statement returns "Title 1". I would expect it to return "Title 2".
Strangely, this statement returns "Title 2":
String titleText = driver.findElement(By.xpath("//div[#class='ingredients-group'][2]//h3")).getText();
I would like to use the first option (group2.findElement), because there are several other elements in the containers I would like to refer to without having to write the full xpath.
Any ideas on this?
Thanks
Use findElements to return a list of webelements (h3 tags) and then access them as you would any other list:
WebElement groups = driver.findElements(By.xpath("//div[#class='ingredients-group'][2]"));
WebElement theH3Tag = groups[0].findElement(By.xpath(".//h3")); //make this a relative xpath
String titleText = theH3Tag.GetText();
WebElement the2ndH3Tag = groups[1].findElement(By.xpath(".//h3")); //make this a relative xpath
String titleText = the2ndH3Tag.GetText();
Or loop through the list:
WebElement[] groups = driver.findElements(By.xpath("//div[#class='ingredients-group'][2]"));
for (WebElement group : groups) {
WebElement h3Tag = group.findElement(By.xpath(".//h3")); //make this a relative xpath
String titleText = h3Tag.GetText();
}

WebBrowser1 GetElement By name

help use :
WebBrowser1.Document.GetElementById("xxxx").SetAttribute("value", "TESTE")
but I don't have the id, I need to use the name
codigo:
<input class="InputElement is-complete Input" autocomplete="xxxx" autocorrect="off" spellcheck="false" name="xxx" inputmode="numeric" aria-label="xxxx" placeholder="xxxx" aria-placeholder="xxxx" aria-invalid="false" value="xxxx">
Because more than one element can have the same name attribute, getElementsByName will return a node list of all elements with that name.
let x = document.getElementsByName("xxx");
This will set the "value" attribute for all elements with name "xxx".
for (let i = 0; i < x.length; i++) {
x[i].SetAttribute("value", "TESTE");
}
Edit: Sorry I thought you were using javascript.
In VB.net something like this should work
Dim allElements As HtmlElementCollection = WebBrowser1.Document.All
For Each Elem As HtmlElement In allElements
If Elem.GetAttribute("name") = "codigo-date" Then
Elem.SetAttribute("value", "TESTE)
edit: Elem.SetAttribute("value", "TESTE")
End If
Next

Extracting data from div tag

so im scraping data from a website and it has some data in its div tag
like this :
<div class="search-result__title">\nDonald Duck <span>\xa0|\xa0</span>\n<span class="city state" data-city="city, TX;city, TX;city, TX;city, TX" data-state="TX">STATENAME, CITYNAME\n</span>\n</div>,
I want to scrape "Donald Duck" part and state and city name after rel="nofollow"
the site contains a lot of data so name and state are different
the code that i have written is
div = soup.find_all('div', {'class':'search-result__title'})
print (div.string)
this gives me a error
"ResultSet object has no attribute '%s'. You're probably treating a list of items like a single item. Did you call find_all() when you meant to call find()?" % key
first, use .text. Second, find_all() will return a list of elements. You need to specify the index value with either: print (div[0].text), or since you will probably have more than 1 element, just iterate through them
from bs4 import BeautifulSoup
html = '''<div class="search-result__title">\nDonald Duck <span>\xa0|\xa0</span>\n<span class="city state" data-city="city, TX;city, TX;city, TX;city, TX" data-state="TX">STATENAME, CITYNAME\n</span>\n</div>'''
soup = BeautifulSoup(html, 'html.parser')
div = soup.find_all('div', {'class':'search-result__title'})
print (div[0].text)
#OR
for each in div:
print (each.text)

compare the 'class' of container tag

Let's say I extract some classes from some HTML:
p_standards = soup.find_all("p",attrs={'class':re.compile(r"Standard|P3")})
for p_standard in p_standards:
print(p_standard)
And the output looks like this:
<p class="P3">a</p>
<p class="Standard">b</p>
<p class="P3">c</p>
<p class="Standard">d</p>
And let's say I only wanted to print the text inside the P3 classes so that the output looks like:
a
c
I thought this code below would work, but it didn't. How can I compare the class name of the container tag to some value?
p_standards = soup.find_all("p",attrs={'class':re.compile(r"Standard|P3")})
for p_standard in p_standards:
if p_standard.get("class") == "P3":
print(p_standard.get_text())
I'm aware that in my first line, I could have simply done r"P3" instead of r"Standard|P3", but this is only a small fraction of the actual code (not the full story), and I need to leave that first line as it is.
Note: doing something like .find("p", class_ = "P3") only works for descendants, not for the container tag.
OK, so after playing around with the code, it turns out that
p_standard.get("class")[0] == "P3"
works. (I was missing the [0])
So this code works:
p_standards = soup.find_all("p",attrs={'class':re.compile(r"Standard|P3")})
for p_standard in p_standards:
if p_standard.get("class")[0] == "P3":
print(p_standard.get_text())
I think the following is more efficient. Use select and CSS Or syntax to gather list based on either class.
from bs4 import BeautifulSoup as bs
html = '''
<html>
<head></head>
<body>
<p class="P3">a</p>
<p class="Standard">b</p>
<p class="P3">c</p>
<p class="Standard">d</p>
</body>
</html>
'''
soup = bs(html, 'lxml')
p_standards = soup.select('.Standard,.P3')
for p_standard in p_standards:
if 'P3' in p_standard['class']:
print(item.text)

Excel VBA Select Option Website Dropdown List

I am trying to automate this website using VBA excel. I am stuck at one point where I need to select value from the drop-down box. I am very much new to this as this is my first such project.
This is what I have coded to select the value:
Set objSelect = objIE.document.getElementById("personTitle")
For Each opt In objSelect.Options
If opt.Value = "Miss" Then
'Debug.Print "found!"
opt.Selected = True
'opt.Selected = "selected"
Else
'Debug.Print "not found!"
opt.Selected = False
End If
Next
I have also tried using the debug.print to check if the value that I am trying to find is actually getting matched or not- and it turns out that it matches.
The only problem I am facing is that the value is not getting set.
Can any of the gurus here please help?
Here is the HTML of that section:
<div class="input-wrap input-wrap__inline">
<div tabindex="-1" class="select is-placeholder"><div class="select_display">Title</div><div class="select_arrow glyphicon glyphicon-chevron-down"></div><dl class="select_list"><dt class="pretend-dd is-hover" data-index="1" data-val="Mr">Mr</dt><dt class="pretend-dd" data-index="2" data-val="Mrs">Mrs</dt><dt class="pretend-dd" data-index="3" data-val="Miss">Miss</dt><dt class="pretend-dd" data-index="4" data-val="Ms">Ms</dt><dt class="pretend-dd" data-index="5" data-val="Dr">Dr</dt></dl></div><select name="personTitle" class="parsley-validated hasCustomSelect .no-change, .bv-dropdown-select is-invisible" id="personTitle" required="" data-required-message="Please select a title">
<option selected="selected" value="">Title</option>
<option value="Mr">Mr</option>
<option value="Mrs">Mrs</option>
<option value="Miss">Miss</option>
<option value="Ms">Ms</option>
<option value="Dr">Dr</option>
</select>
</div>
I think you want a different class. The class in that HTML snippet is select_list. Then the subsequent dt tags.
If you observe the following CSS selector, where "." means class and " dt" means select all dt tags inside elements of that class, you will see it makes the correct selections:
In the code below, I translate this selector into:
ieDoc.getElementsByClassName("select_list")(0).getElementsByTagName("dt")
This assumes that index 0 is the correct one to use for elements of the class "select_list". You can easily inspect the collection to find the right index if you set it to a variable e.g.
Dim x As Object
Set x = ieDoc.getElementsByClassName("select_list")(0).getElementsByTagName("dt")
Code:
Dim currentOption As Object
For Each currentOption In ieDoc.getElementsByClassName("select_list")(0).getElementsByTagName("dt")
If InStr(currentOption.innerText, "Miss") > 0 Then
currentOption.Selected = True
End If
Next currentOption
Here are a couple options to try if you haven't already:
If opt.Value = "Miss" Then
'Debug.Print "found!"
opt.Click
OR
If opt.Value = "Miss" Then
'Debug.Print "found!"
opt.Focus
opt.FireEvent ("onchange")
If this turns out to be something done in kendoGrid or kendoDropDownList, I might be able to help with that also.