I am writing code in VBA using Selenium to extract information from eCommerce platforms.
I want to capture for a particular product, information like # Sizes, sizes which are available and which are stocked out.
I am able to get the # sizes and their names but not information like stock outs.
Source Code -->
<div class="size-swatch">
<div class="circel-size variant instock">
<span>L</span>
</div>
</div>
<div class="size-swatch">
<div class="circle-size variant oos">
<span>XL</span>
</div>
</div>
VBA Code-->
'''
Set results = driver.FindElementsByCSS("circle-size variant oos")
For Each result In results
Worksheets("Sheet1").Cells(1, 40 + c).Value = results.Text
c = c + 1
Next
'''
driver.FindElementsByCSS("circle-size variant oos")
to make this Css work, you should remove the spaces and put a .
driver.FindElementsByCSS("div.circle-size.variant oos")
should get the job done.
Related
The following line will the selected first CSS selector matched, but how do I select next match
ie.Document.getElementsByTagName("iframe")(0).contentDocument.querySelector("option[value='2']").Selected = True
Html code
<div class="col-xs-10 col-sm-9 col-md-6 col-lg-5 form-field input_controls"><input name="sys_original.incident.impact" id="sys_original.incident.impact" type="hidden" value=""><select name="incident.impact" class="form-control" id="incident.impact" style="direction: ltr;" onchange="onChange('incident.impact');" mandatory="true" ng-non-bindable="true"><option value="">-- None --</option><option value="1">1 - High</option><option value="2">2 - Medium</option><option value="3">3 - Low</option></select></div>
The other one
<div class="form-group is-required" id="element.incident.urgency"><div id="label.incident.urgency" nowrap="true" type="choice" data-type="label" choice="1"><label class=" col-xs-12 col-md-3 col-lg-4 control-label" dir="ltr" onclick="return labelClicked(this);" for="incident.urgency"><span title="" class="required-marker label_description" id="status.incident.urgency" aria-label="Mandatory - must be populated before Submit" data-original-title="Mandatory - must be populated before Submit" oclass="" mandatory="true"></span><span title="" class="label-text" data-original-title="Measure of the business criticality based on the impact and on the business needs of the Customer" data-html="false">Urgency</span></label></div><div class="col-xs-10 col-sm-9 col-md-6 col-lg-5 form-field input_controls"><input name="sys_original.incident.urgency" id="sys_original.incident.urgency" type="hidden" value=""><select name="incident.urgency" class="form-control" id="incident.urgency" style="direction: ltr;" onchange="onChange('incident.urgency');" mandatory="true" ng-non-bindable="true"><option value="">-- None --</option><option value="1">1 - High</option><option value="2">2 - Medium</option><option value="3">3 - Low</option></select></div><div class="col-xs-2 col-sm-3 col-lg-2 form-field-addons"></div></div>
Getting aNodeList of all elements with option tags in the frame:
You should be able to get a nodeList with using the querySelectorAll method of document, rather than just querySelector. The CSS pattern would switch from specifying the exact value for the attribute value e.g. value='2', to a more general, elements with option tag having attribute value i.e.
ie.Document.getElementsByTagName("iframe")(0).contentDocument.querySelectorAll("option[value]")
But it looks like this will bring back more nodes than you may require due to multiple dropdowns with option tags. It is hard to say without more HTML to test with.
Being more selective:
Ideally, you want to add in an additional selector to the CSS selector query that narrows this down with an id or className etc that limits to just nodes of interest. Then index into that list, from 0, to set various options.
E.g. for the first of the two selection lists you shown you can isolate with the following CSS selector, to get a nodeList of the possible options:
select[id=incident.impact] option
N.B. The second list can be obtained with select[id=incident.urgency] option.
CSS query:
So, when you apply the CSS selector with the .querySelectorAll method you should get a nodeList matching the above elements. You can then access by index:
Dim aNodeList As Object, i As Long
Set aNodeLIst = ie.Document.getElementsByTagName("iframe")(0).contentDocument.querySelectorAll("select[id=incident.impact] option")
For i = 0 to aNodeList.Length - 1
Debug.Print aNodeList.item(i).outerHTML
Next i
Select a specific node:
aNodeList.item(2).Selected = True '<== Medium, as shown in query results image above
tl;dr
If you literally want the next option on from your current, provided you aren't at the end of the list, you can use an adjacent sibling selector of:
select[id=incident.urgency] option[value=2] + option
Where the + option specifies to select the next element with option tag after the element with id incident.urgency and option tag with attribute value=2.
In this case, it would select option 3 = Low.
If you were at option 1, not 2, it would be select[id=incident.urgency] option[value=1] + option etc.
This question already has answers here:
Scraping data from website using vba
(5 answers)
Closed 4 years ago.
I am trying to get price detail of particular project from URL but I am clueless.
example:-From url (https://www.99acres.com/ppc-2515-residential-apartment-mailer) for project Eden Richmond Enclave i want price 14.22 to 31.15 Lac in range A1.
Below is the code I tried:
Sub test()
Set driver = CreateObject("Selenium.FirefoxDriver")
driver.get "https://www.99acres.com/ppc-2515-residential-apartment-mailer"
Range("A1") = driver.FindElementByXPath("//h1[contains(#class,'font-size15')][contains(text(),'Eden Richmond Enclave')][contains(#class,'product-lrg-box')]").Text
End Sub
Pic
below is the html code:-
<div class="pro-text">
<div class="product-text-box">
<div class="product-heading"><span><img src="https://newprojects.99acres.com/projects/eden_group/eden_richmond_enclave/bb7ttfq9.gif">
<h1 class="font-size15">Eden Richmond Enclave <p>Narendrapur</p>
</h1>
</span> </div>
</div>
<div class="product-text-box">
<ul class="product-lrg-box">
<li> <span><strong><span class="rupee-font">₹ </span>14.22 to 31.15 Lac</strong></span></li>
<li><strong>499-1093 SQFT</strong></li>
<li><strong>1-3 BHK</strong></li>
<li style="width:20% !important;"><strong>December 2020</strong></li>
</ul>
<div id="tabs" class="tab-link tabs-menu tabs-menu-new">
<ul>
<li>e-Brochure</li>
<li>Amenities</li>
<!-- <li style="width:20% !important;">Floor Plan</li>-->
<li style="width:20% !important;">Directions</li>
</ul>
</div>
<span class="enquire-new-bt" id="294015-469203,151100-enquire-new-bt" data-val="2"> I am Interested </span> </div>
</div>
Perhaps try
.FindElementByCSS("ul.product-lrg-box span")
Depends whether you intend to repeat this process for other elements.
For the above, with the HTML you supplied you get:
Otherwise,
.FindElementByCSS("ul.product-lrg-box")
retrieves the entire string.
Without being able to view the page and knowing if you want to retrieve more elements you might consider
.FindElementsByCss("ul.product-lrg-box span")
and looping over the collection returned (if valid); or,
Try scraping with IE and using something like:
IE.documentQuerySelectorAll("ul.product-lrg-box span")
and then looping over the NodeList returned.
To extract the text 14.22 to 31.15 Lac you can use either of the following Locator Strategies:
XPath:
//h1[contains(.,'Eden Richmond Enclave')]//following::div[1]/ul[#class='product-lrg-box']/li//span/strong[not(#class='rupee-font')]
I do parsing of my vendors web sites to get current item prices.
I am using vb.net and a windows desktop app I coded in visual studio 2015
One vender has recently changed the way they display prices
The item prices are inside a common main div.
But the prices can be inside a table or a simple div
I can easily scrape prices from the tables or the divs's but I need to select the way the price is displayed first and then send that to the proper parsing code
Notice in the html below that this is the main div for both pricing schemes
<div class="price" id="itemPrice12345">
.. a table or a simple price div are here.
</div>
This is how the div looks when it contains a table
<div class="price" id="itemPrice12345">
<table class="bglt"><tbody>
<tr>
<td class="texttable" nowrap="">1 to 9</td>
<td class="texttable">$9.93</td>
</tr></tbody></table>
</div>
I parse the prices out of a table using this node selection code
Dim tables As HtmlAgilityPack.HtmlNodeCollection
tables = WebPageDocument.DocumentNode.SelectNodes("//*[contains(#id,'itemPrice')]/div[1]/table")
Then I iterate through the table rows "./tr" and table columns "./td" to pick out the prices
The same main div without a table looks like this
<div class="price" id="itemPrice12345">
<div class="price firstprice">$3.50</div>
</div>
I parse the price out of this simple div using this node selection code
Dim PriceNode As HtmlAgilityPack.HtmlNodeCollection
PriceNode = WebPageDocument.DocumentNode.SelectNodes("//*[contains(#id,'itemPrice')]/div[1]/div")
ItemPrice = PriceNode(0).InnerText
My question is, how can I determine if the main item price div contains a table or a simple price div inside?
Once I know that, I can send the parsing to the proper code section.
So I guess I need to inquire what is inside the main div first, but I am not sure how to do that?
Thanks for any help
Try selecting a single node first and see if it's something:
Dim _itemPriceSelector As String = "//*[contains(#id,'itemPrice')]/div[1]"
Dim _divSelector As String = _itemPriceSelector & "/div"
Dim _tableSelector As String = _itemPriceSelector & "/table"
If doesNodeExist(_tableSelector) Then
Dim tables As HtmlAgilityPack.HtmlNodeCollection
tables = WebPageDocument.DocumentNode.SelectNodes(_tableSelector )
ElseIf doesNodeExist(_divSelector) Then
Dim PriceNode As HtmlAgilityPack.HtmlNodeCollection
PriceNode = WebPageDocument.DocumentNode.SelectNodes(_divSelector)
ItemPrice = PriceNode(0).InnerText
End If
Private Function doesNodeExist(selector) As Boolean
If IsNothing(WebPageDocument.DocumentNode.SelectSingleNode(selector)) Then
Return False
Else
Return True
End If
End Function
There are 2 data-ember-action, how can I use a string for the 2 data-ember-action, I want to get the value of the 2 data-ember-action.
`
PHP
<div class="wallet__amount">
6,000
</div>
</div>
<div data-ember-action="1180" class="wallet__item">
<div class="wallet__currency">
BTC
</div>
<div class="wallet__amount">
0.25588524
</div>
</div>
`
There are 2 "data-ember-action" but different values, in html, mostly the values will change and what I want is to have a button that would get the 2 different values of it.
I am really confused on how to get this, I have no idea. Please help me
You should not try to parse HTML with string methods or regex. But you could use an available HTML parser like HtmlAgilityPack. For example with this LINQ version:
Dim doc As New HtmlAgilityPack.HtmlDocument()
doc.LoadHtml(html)
Dim divDataMember = doc.DocumentNode.Descendants("div").
FirstOrDefault(Function(node) node.Attributes("data-ember-action") IsNot Nothing)
' omitting the If divDataMember IsNot Nothing check
Dim walletCurrency = divDataMember.Descendants("div").
FirstOrDefault(Function(node) node.GetAttributeValue("class", "").Equals("wallet__currency", StringComparison.InvariantCultureIgnoreCase))?.InnerText?.Trim()
Dim walletAmount = divDataMember.Descendants("div").
FirstOrDefault(Function(node) node.GetAttributeValue("class", "").Equals("wallet__amount", StringComparison.InvariantCultureIgnoreCase))?.InnerText?.Trim()
So I want to be able to extract the some HTML code from a webpage and assign it to a variable with Excel VBA. Here is my example VBA code:
Pass = IE.Document.getElementsByClassName("summary_field_value easy-read-display")(0).innerText
This returns text, but not the right text from the webpage. In the HTML code, there are a number of fields that look like this:
<div class="ui-body-b summary_field">
<span class="summary_field_name">Username:</span>
<span class="summary_field_value easy-read-display">TestUser</span>
<div class="ui-body-b summary_field">
<span class="summary_field_name">Password:</span>
<span class="summary_field_value easy-read-display">uhQT65$We2</span>
So when my code runs, it produces "TestUser". How can I get it to return "uhQT65$We2" which is the password since the class names are the same (summary_field_value easy-read-display)?
Thanks for the help.