Excel VBA: Scraping HTML data behind script - vba

I have this HTML, and I'm trying scrape the values behind data-v-88f004c6 after "Compra:" and after "Venta:"
<div class="info-tc" data-v-88f004c6>
<span data-v-88f004c6>Tipo de cambio del dólar hoy en Perú</span>
<span data-v-88f004c6>
<i aria-hidden="true" class="fa fa-info-circle i-help" data-v-88f004c6></i>
</span>
<span data-v-88f004c6>Compra:
<strong data-v-88f004c6></strong>
</span>
<span data-v-88f004c6>Venta:
<strong data-v-88f004c6></strong>
</span>
</div>
When I bring up the developer tools from Chrome I can see that the values appear like this (3.199 and 3.237):
<div class="info-tc" data-v-88f004c6="">
<span data-v-88f004c6="">Tipo de cambio del dólar hoy en Perú</span>
<span data-v-88f004c6="">
<i aria-hidden="true" class="fa fa-info-circle i-help" data-v-88f004c6=""></i>
</span>
<span data-v-88f004c6="">Compra:
<strong data-v-88f004c6="">3.199</strong>
</span>
<span data-v-88f004c6="">Venta:
<strong data-v-88f004c6="">3.237</strong>
</span>
</div>
However when I scrape the values using Excel I get symbols like this for the prices.
Tipo de cambio del dólar hoy en Perú    Compra:   Venta:Â
I'm using a code like this to scrape the web:
Set GetRawHTML = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", urlWebSite, False
.send
GetRawHTML = .responseText
End With
kambistafx = GetRawHTML.body.innerHTML
fxprice = kambistafx .getElementsByClassName("info-tc").Item(0).innerText
What should I do to scrape those values?

I did this with selenium basic
CSS selectors
#__layout span:nth-child(2) +*") 'compra
#__layout span:nth-child(3) +*") 'venta
These are contracted versions of full selectors which were:
#__layout > div > main > section.banner > div.central-section > div.col.paddingTop2 > section > div > span:nth-child(2) > strong
#__layout > div > main > section.banner > div.central-section > div.col.paddingTop2 > section > div > span:nth-child(3) > strong
Example CSS query:
Source page values
Code:
Option Explicit
Public Sub GetInfoSel()
Dim d As WebDriver, keys As New keys
Set d = New ChromeDriver
Const URL = "https://kambista.com/"
With d
.Start "Chrome"
.Get URL
Debug.Print .FindElementByCss("#__layout span:nth-child(2) +*").Text 'compra
Debug.Print .FindElementByCss("#__layout span:nth-child(3) +*").Text 'venta
.Quit
End With
End Sub
Note:
After installing Selenium basic and opening excel you need to add a reference via VBE > Tools > References > Selenium type library

Related

Roboframework Selenium - fidn web element and iterate in nested sub-elements

Hello I have this HTML structure:
<div class="1234" role="article">
<div class="A">
<h2 class="B">
<a class="C" href="https://www.test.it">
</a>
</h2>
</div>
<div class="X">
<span class="Y">"some text"
</span>
</div>
</div>
<div class="1234" role="article">
<div class="A">
<h2 class="B">
<a class="C" href="https://www.test2.it">
</a>
</h2>
</div>
<div class="X">
<span class="Y">"some text2"
</span>
</div>
</div>
My goal is to iterate in each Div with role=article, and gather corresponding href and text
(i.e. https://www.test.it - "some text" for the first one)
I've created a basic for loop:
${elements}= Get WebElements xpath://div[contains(#role, 'article')]
FOR ${element} IN #{elements}
Log To Console ${element.get_attribute('href')}
END
But i cannot figure it out how to get the sub elements that I need.
Any help is more than appreciated.
Many thanks
###Update
this works for the href, but I'm unable to get the span text
${elements}= Get WebElements xpath://div[contains(#role, 'article')]
FOR ${element} IN #{elements}
${sub1}= Set Variable ${element.find_element_by_xpath(".//h2//a[contains(#class, 'C')]")}
Log To Console ${sub1.get_attribute('href')}
END
To iterate in each DIV with role="article" and gather corresponding href and text you can use the following Locator Strategies:
href:
${elements}= Get WebElements xpath://div[contains(#role, 'article')]
FOR ${element} IN #{elements}
${sub1}= Set Variable ${element.find_element_by_xpath(".//h2/a[#class='C']")}
Log To Console ${sub1.get_attribute('href')}
END
text:
${elements}= Get WebElements xpath://div[contains(#role, 'article')]
FOR ${element} IN #{elements}
${sub1}= Set Variable ${element.find_element_by_xpath(".//div[#class='X']/span[#class='Y']")}
Log To Console ${sub1.get_attribute('innerHTML')}
END

Trouble selecting a hidden menu item using SeleniumBasic for vba

I am having some trouble selecting a hidden menu item on a work webpage using SeleniumBasic for vba. I have tried to use WebDriver.Mouse.MouseTo to hover over each menu option so that I can select the object nested "beneath" it, but after the first hover the object cannot be found.
In the picture below I intend to navigate like this:
Pricing Admin
System Admin
Multi-PAG Upload
To do this, I have to hover over Pricing Admin and subsequently hover over System Admin so that menu appears to click on Multi-PAG Upload. I have successfully gotten the driver to hover over Pricing Admin which brings up first menu list with three items ending in System Admin. However, trying to FindElement() for System Admin so that I can hover on it has proven very difficult.
I tend get an object required error or an XPath selector invalid depending on the method that I attempt. I start having problems at Set systemAdmin =.
Any advice would be welcome!
Public Sub SeleniumTest()
Dim driver As New WebDriver
'open chrome to site
driver.start "chrome"
driver.Get "http://www.website.net"
'login
driver.FindElementByName("j_username").SendKeys ("user")
driver.FindElementByName("j_password").SendKeys ("pass")
driver.FindElementById("submit_button").Click
'hover over Pricing Admin
Dim pricingAdmin As WebElement
Set pricingAdmin = driver.FindElementById("prcngAdmMnuFrm:prcngAdmMnu")
driver.Mouse.MoveTo pricingAdmin
Dim systemAdmin As WebElement
'neither selection method below works properly
' Set systemAdmin = driver.FindElementByXPath("//*[contains(text(),'System Admin')]")
' Set systemAdmin = driver.FindElementByXPath("//div[#id='prcngAdmMnuFrm:prcngAdmMnu']/div/div/ul/li/ul/li[3]/ul/li[4]/a/span/span")
driver.Mouse.MoveTo systemAdmin
Dim multiPagUpload As WebElement
' Set multiPagUpload = driver.FindElement("??")
multiPagUpload.Click
'closes browser window
driver.Quit
End Sub
Here is the (abridged) HTML for the site. I trimmed out a bit of the lists for simplicity's sake but if it's actually necessary (for using javascript, etc) let me know and I can pop more in.
<div id="prcngAdmMnuFrm:prcngAdmMnu" style="">
<div class="ui-widget ui-widget-content wijmo-wijmenu ui-corner-all ui-helper-clearfix wijmo-wijmenu-horizontal" aria-activedescendant="ui-active-menuitem" role="menubar">
<div class="scrollcontainer checkablesupport">
<ul style="display: block;" class="wijmo-wijmenu-list ui-helper-reset" tabindex="0">
<li role="menuitem" class="ui-widget wijmo-wijmenu-item ui-state-default ui-corner-all wijmo-wijmenu-parent" aria-haspopup="true" style="">
<a href="#" class="wijmo-wijmenu-link ui-corner-all" id="">
<span class="wijmo-wijmenu-text">
<span class="wijmo-wijmenu-text">Pricing Admin</span>
</span>
<span class="ui-icon ui-icon-triangle-1-s"></span>
</a>
<ul class="wijmo-wijmenu-list ui-widget-content ui-corner-all ui-helper-clearfix wijmo-wijmenu-child" style="display: none; left: 0px; top: 38px; position: absolute; list-style-type: none;" aria-hidden="true">
<li role="menuitem" class="ui-widget wijmo-wijmenu-item ui-state-default ui-corner-all wijmo-wijmenu-parent" aria-haspopup="true" style="">
<a href="#" class="wijmo-wijmenu-link ui-corner-all ui-state-focus">
<span class="wijmo-wijmenu-text">
<span class="wijmo-wijmenu-text">System Admin</span>
</span>
<span class="ui-icon ui-icon-triangle-1-e"></span>
</a>
<ul class="wijmo-wijmenu-list ui-widget-content ui-corner-all ui-helper-clearfix wijmo-wijmenu-child" style="display: none; left: 215px; top: -1px; position: absolute; list-style-type: none;" aria-hidden="true">
<li role="menuitem" class="ui-widget wijmo-wijmenu-item ui-state-default ui-corner-all">
<a onclick="showProcessingMessage('Loading');;var self = this; setTimeout(function() { var f = function(opt){ice.ace.ab(ice.ace.extendAjaxArgs({"source":"prcngAdmMnuFrm:menu_pad_sa_multi","execute":'#all',"render":'#all',"event":"activate"}, opt));}; f({node:self});}, 10);" style="cursor:pointer;" class="wijmo-wijmenu-link ui-corner-all">
<span class="wijmo-wijmenu-text">
<span class="wijmo-wijmenu-text">Multi-PAG Upload</span>
</span>
</a>
</li>
</ul>
</li>
</ul>
</li>
</ul>
</div>
</div>
<script type="text/javascript">
var widget_prcngAdmMnuFrm_prcngAdmMnu = ice.ace.create("Menubar", ["prcngAdmMnuFrm:prcngAdmMnu", {
"autoSubmenuDisplay": true,
"direction": "auto",
"animation": {
"animated": "fade",
"duration": 400
}
}]);
</script>
</div>
If I've left anything out that you need to troubleshoot, please let me know!
The xpath which is used in the code is not correct. my suggesting to find the anchor element and move the mouse over.
# System Admin Menu
'Hover over Pricing Admin
Dim systemAdmin As WebElement
Set systemAdmin = driver.FindElementByXPath("//a[.//span[contains(.,'System Admin')]]")
driver.Mouse.MoveTo pricingAdmin
If the mouse hover does not work, we can still try to handle the menu by clicking on the anchor element and then sendkeys (keys.Arrow_Right)
#Multi-PAG Upload
Dim multiPagUpload As WebElement
Set multiPagUpload = driver.FindElementByXPath("//a[.//span[contains(.,'Multi-PAG Upload')]]")
multiPagUpload.Click

Unable to find element using protactor which have same classes

I have one check box and two links with same classes & same Div.During the automation testing using protractor, i want to click on check box but it click on Links.
i am writing this code but its not working, please provide a solution.
Code:
element(by.id('Remember')).click();
[1
please find HTMl code:-
<div class="input-field">
<div class="pas_rembr">
<input name="remember" id="Remember" class="css-checkbox ng-dirty ng-valid-parse ng-touched ng-not-empty ng-valid ng-valid-required" ng-model="rememberMe" type="checkbox" ng-required="true" required="required">
<!-- <label for="Remember" class="css-label">I agree with the <a class="text_link" target="_blank" ng-href="{{baseUrl}}terms-conditions">Terms & Conditions</a>.</label> -->
<label for="Remember" class="css-label">I have read and I agree with <a class="text_link" target="_blank" ng-href="/lmd/terms-conditions" href="/lmd/terms-conditions">Terms and Conditions</a> and the <a class="text_link" target="_blank" ng-href="/lmd/privacy" href="/lmd/privacy">Privacy Policy</a> of this site.</label>
</div>
<span class="errorForm ng-hide" ng-show="(memberForm.remember.$dirty || submitted) && ((memberForm.remember.$error.required))">
<span class="errorForm ng-scope ng-hide" ng-show="memberForm.remember.$error.required" translate="TERAMS_CONDITION_IS_REQUIRED">Terms and condition is required</span>
</span>
</div>
Try this:-
var el = element(by.css('label[for="Remember"]'));
browser.actions().mouseMove(el, {x: 20, y: 3}).click().perform();
For your tests, the CSS selector would be
element = $$('label.css-label > a:nth-child(2)');
This should be able to click on the second a child of the label element.

Right syntax in HTML scraping

I have a code which dynamically changing
<tbody>
' ------------------- Block 1 ----------------------
<tr class="table-row">
<td class="cell">
<div>18/4/2018</div>
</td>
<td class="cell">
<div>
<form id="idc" method="post" action=""> ' id is dinamic so cant use it
<div style=""><input type="hidden" name="idc_hf_0" id="idc_hf_0" /></div> ' id and name is dinamic so cant use them
Download all invoice documents as ZIP-file
<span>
<a class="icon zipdownload" title="Download all invoice documents as ZIP-file" href=""></a>
</span>
<span class="has-explanation">
<a class="helper" href="javascript:;" title="The zip-file contains only PDF files of Tax/Fee statements and the Fleet Invoice with all annexes if available.">
<span class="icon question" id="table-header-explanation"></span>
</a>
</span>
</form>
</div>
</td>
<td class="cell">
<div>
<a class="" title="View >>" href="">View >></a>
</div>
</td>
</tr>
' ################### Block1 END #######################
' ------------------- Block 2 ----------------------
<tr class="table-row">
<td class="cell">
<div>13/4/2018</div> ' need this
</td>
<td class="cell">
<div>
<form id="idd" method="post" action="">
<div style=""><input type="hidden" name="idd_hf_0" id="idd_hf_0" /></div>
<div>
<span>Collective Payment Order</span> (<span>2018-500421707</span>)
<span>
<span class="invisible"> | </span><span>
<a class="Download" title="Download" href="">English</a>
</span>
</span>
</div>
<div>
<span>Tax/Fee CSV list</span> <span>
<a class="icon csv" title="Download" href=""></a> ' need this HREF1
</span>
</div>
<div>
<span>Detailed Trip CSV list</span> <span>
<a class="icon csv" title="Download" href=""></a> ' need this HREF2
</span>
</div>
Download all invoice documents as ZIP-file
<span>
<a class="icon zipdownload" title="Download all invoice documents as ZIP-file" href=""></a>
</span>
<span class="has-explanation">
<a class="helper" href="javascript:;" title="The zip-file contains only PDF files of Tax/Fee statements and the Fleet Invoice with all annexes if available.">
<span class="icon question" id="table-header-explanation"></span>
</a>
</span>
</form>
</div>
</td>
<td class="cell">
<div>
<a class="" title="View >>" href="">View >></a>
</div>
</td>
</tr>
' ################### Block2 END #######################
<tbody>
So there are two blocks which are dynamic. So can be such structure
Block1
Block1
Block2
Block1
Block2
Block2
Block2
Block1
I need get from this blocks:
Count of Block2
Date of each block2
HREF1 from class="icon csv"
HREF2 from class="icon csv"
differentiate between block 1 and 2 Block 1 does not have
class="icon csv" or by <span>Tax/Fee CSV list</span> <span>
I confused how to use getelement properties, trying to get
Set IeDoc = IeApp.Document
With IeDoc
Set IeTbody = .getElementsByTagName("tbody").getElementsByClassName("table-row")
d = IeTbody.legth
For Each stEl In IeTbody
Next stEl
End With
But got error "Object does not support this property or method", maybe use better querySelector?
How is got links?
logical it must be something like
Set IeDoc = IeApp.Document
With IeDoc
Set Blocks = .getElementsByTagName("tbody")
For Each block In Blocks
Set hasClass = .getElementsByClassName("table-row").getElementsByClassName("cell")(1).getElementsByClassName("icon csv")
if not hasClass is nothing then
b.Date = Blocks(block).getElementsByClassName("table-row").getElementsByClassName("cell")(0).getElementsByTagName("div")(0).innerText()
b.Href1 = Blocks(block).getElementsByClassName("table-row").getElementsByClassName("cell")(1).getElementsByClassName("icon csv")(0)
b.Href2 = Blocks(block).getElementsByClassName("table-row").getElementsByClassName("cell")(1).getElementsByClassName("icon csv")(1)
end if
Next block
End With
So this isn't very robust but was a play around with Regex and parsing the HTML you gave. Look behind would help to pull in date with regex split but I couldn't work that out at present. I have currently adapted a regex function by #FlorentB
Public Matches As Object
' Or add in Tools > References > VBScript Reg Exp for early binding
Public Sub testing()
Dim str As String, countOfBlock2 As Long, arr() As String, i As Long
str = Range("A1") 'I am reading in from sheet but this would be your response text
arr = SplitRe(str, "\<div>[\d]+[\/-][\d]+[\/-][\d]+\<\/div>") 'look behind would help
For i = LBound(arr) To UBound(arr)
If InStr(1, arr(i), "class=""icon csv""") > 0 Then
countOfBlock2 = countOfBlock2 + 1 ' "Block 2"
Debug.Print Replace(Replace(Matches(i - 1), "<div>", ""), "</div>", "") 'dates from Block 2
Debug.Print Split(Split(arr(i), """icon csv"" title=""Download"" href=")(1), "></a>")(0)
Debug.Print Split(Split(arr(i), """icon csv"" title=""Download"" href=")(2), "></a>")(0)
End If
Next i
Debug.Print "count of block2 = " & countOfBlock2
End Sub
'https://stackoverflow.com/questions/28107005/splitting-string-in-vba-using-regex?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
Public Function SplitRe(Text As String, Pattern As String, Optional IgnoreCase As Boolean) As String()
Static re As Object
If re Is Nothing Then
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.MultiLine = True
End If
re.IgnoreCase = IgnoreCase
re.Pattern = Pattern
SplitRe = Strings.Split(re.Replace(Text, ChrW(-1)), ChrW(-1))
Set Matches = re.Execute(Text)
End Function
Output:

Dynamic buttons added to the page, don't know how to find the element

I have a complected structure of the page and I have no idea how to find the element...
The page contains folders that are created by a user, I need to create a folder and then to click on it, but I have no idea how to find the element that I've created. The structure look like this:
<div class="row-text" style="width: calc(100% - 84px);">
<span class="row-item-name">
<span class="row-item-link">
<a class="grid-row-element-name">Eclipse111</a>
</span>
<span class="row-item-actions hover-child">
<a>Share</a><span> | </span><a watchdox-rename="name" watchdox-save-func="rename(element, name)" class="rename-link"><span translate="">Rename</span></a>
</span>
</span>
<br>
<span class="row-meta-data">
<span class="creation-date-formatted">Today at 10:30 | </span>
<span class="row-email">orgadmin#mailinator.com</span>
</span>
</div>
<div class="grid-row-buttons">
<div class="row-tools">
<div class="btn-group dropdown" uib-dropdown="">
<button type="button" class="btn btn-default uib-dropdown-toggle clear-button dropdown-toggle" uib-dropdown-toggle="" aria-haspopup="true" aria-expanded="false">
<span class="icon-wd-material-menu"></span>
</button>
<ul uib-dropdown-menu="" class="dropdown-menu-highZ contextual-menu dropdown-menu" role="menu">
</ul>
</div>
</div>
</div>
The class="grid-row-element-name" contains the name of the folder that was created (each folder has its own )....
I have no idea how to continue with the testing cause I am not able to click on the folder....
Thank you.
try the following, since you said "grid-row-element-name" has the folder name then trying using that class name in cssSelector.
List<WebElement> elements = driver.findElements(By.cssSelector("a.grid-row-element-name"));
String folderName = null; //name of the folder which you want to click.
for (WebElement ele : elements) { //Iterate over the loop
if (ele.getText().equalsIgnoreCase(folderName)) {
ele.click();//once the folder you want is found go for the click.
}
}
//OR
//To click on the last element try this
elements.get(elements.size()-1).click();
Since you are looking for an element with specific text that you have created (the folder name), I would approach this by looking for an A tag that contains the folder name.
BTW, you didn't tag your question with a language so the below is Java. You should be able to convert it easily to another language by just reusing the XPath, if needed.
String folderName = "Eclipse111";
// create folder
WebElement folder = driver.findElement(By.xpath("//a[text()='" + folderName + "']"));