Right syntax in HTML scraping - vba

I have a code which dynamically changing
<tbody>
' ------------------- Block 1 ----------------------
<tr class="table-row">
<td class="cell">
<div>18/4/2018</div>
</td>
<td class="cell">
<div>
<form id="idc" method="post" action=""> ' id is dinamic so cant use it
<div style=""><input type="hidden" name="idc_hf_0" id="idc_hf_0" /></div> ' id and name is dinamic so cant use them
Download all invoice documents as ZIP-file
<span>
<a class="icon zipdownload" title="Download all invoice documents as ZIP-file" href=""></a>
</span>
<span class="has-explanation">
<a class="helper" href="javascript:;" title="The zip-file contains only PDF files of Tax/Fee statements and the Fleet Invoice with all annexes if available.">
<span class="icon question" id="table-header-explanation"></span>
</a>
</span>
</form>
</div>
</td>
<td class="cell">
<div>
<a class="" title="View >>" href="">View >></a>
</div>
</td>
</tr>
' ################### Block1 END #######################
' ------------------- Block 2 ----------------------
<tr class="table-row">
<td class="cell">
<div>13/4/2018</div> ' need this
</td>
<td class="cell">
<div>
<form id="idd" method="post" action="">
<div style=""><input type="hidden" name="idd_hf_0" id="idd_hf_0" /></div>
<div>
<span>Collective Payment Order</span> (<span>2018-500421707</span>)
<span>
<span class="invisible"> | </span><span>
<a class="Download" title="Download" href="">English</a>
</span>
</span>
</div>
<div>
<span>Tax/Fee CSV list</span> <span>
<a class="icon csv" title="Download" href=""></a> ' need this HREF1
</span>
</div>
<div>
<span>Detailed Trip CSV list</span> <span>
<a class="icon csv" title="Download" href=""></a> ' need this HREF2
</span>
</div>
Download all invoice documents as ZIP-file
<span>
<a class="icon zipdownload" title="Download all invoice documents as ZIP-file" href=""></a>
</span>
<span class="has-explanation">
<a class="helper" href="javascript:;" title="The zip-file contains only PDF files of Tax/Fee statements and the Fleet Invoice with all annexes if available.">
<span class="icon question" id="table-header-explanation"></span>
</a>
</span>
</form>
</div>
</td>
<td class="cell">
<div>
<a class="" title="View >>" href="">View >></a>
</div>
</td>
</tr>
' ################### Block2 END #######################
<tbody>
So there are two blocks which are dynamic. So can be such structure
Block1
Block1
Block2
Block1
Block2
Block2
Block2
Block1
I need get from this blocks:
Count of Block2
Date of each block2
HREF1 from class="icon csv"
HREF2 from class="icon csv"
differentiate between block 1 and 2 Block 1 does not have
class="icon csv" or by <span>Tax/Fee CSV list</span> <span>
I confused how to use getelement properties, trying to get
Set IeDoc = IeApp.Document
With IeDoc
Set IeTbody = .getElementsByTagName("tbody").getElementsByClassName("table-row")
d = IeTbody.legth
For Each stEl In IeTbody
Next stEl
End With
But got error "Object does not support this property or method", maybe use better querySelector?
How is got links?
logical it must be something like
Set IeDoc = IeApp.Document
With IeDoc
Set Blocks = .getElementsByTagName("tbody")
For Each block In Blocks
Set hasClass = .getElementsByClassName("table-row").getElementsByClassName("cell")(1).getElementsByClassName("icon csv")
if not hasClass is nothing then
b.Date = Blocks(block).getElementsByClassName("table-row").getElementsByClassName("cell")(0).getElementsByTagName("div")(0).innerText()
b.Href1 = Blocks(block).getElementsByClassName("table-row").getElementsByClassName("cell")(1).getElementsByClassName("icon csv")(0)
b.Href2 = Blocks(block).getElementsByClassName("table-row").getElementsByClassName("cell")(1).getElementsByClassName("icon csv")(1)
end if
Next block
End With

So this isn't very robust but was a play around with Regex and parsing the HTML you gave. Look behind would help to pull in date with regex split but I couldn't work that out at present. I have currently adapted a regex function by #FlorentB
Public Matches As Object
' Or add in Tools > References > VBScript Reg Exp for early binding
Public Sub testing()
Dim str As String, countOfBlock2 As Long, arr() As String, i As Long
str = Range("A1") 'I am reading in from sheet but this would be your response text
arr = SplitRe(str, "\<div>[\d]+[\/-][\d]+[\/-][\d]+\<\/div>") 'look behind would help
For i = LBound(arr) To UBound(arr)
If InStr(1, arr(i), "class=""icon csv""") > 0 Then
countOfBlock2 = countOfBlock2 + 1 ' "Block 2"
Debug.Print Replace(Replace(Matches(i - 1), "<div>", ""), "</div>", "") 'dates from Block 2
Debug.Print Split(Split(arr(i), """icon csv"" title=""Download"" href=")(1), "></a>")(0)
Debug.Print Split(Split(arr(i), """icon csv"" title=""Download"" href=")(2), "></a>")(0)
End If
Next i
Debug.Print "count of block2 = " & countOfBlock2
End Sub
'https://stackoverflow.com/questions/28107005/splitting-string-in-vba-using-regex?utm_medium=organic&utm_source=google_rich_qa&utm_campaign=google_rich_qa
Public Function SplitRe(Text As String, Pattern As String, Optional IgnoreCase As Boolean) As String()
Static re As Object
If re Is Nothing Then
Set re = CreateObject("VBScript.RegExp")
re.Global = True
re.MultiLine = True
End If
re.IgnoreCase = IgnoreCase
re.Pattern = Pattern
SplitRe = Strings.Split(re.Replace(Text, ChrW(-1)), ChrW(-1))
Set Matches = re.Execute(Text)
End Function
Output:

Related

select radio buttons using vba

Sub Mind_stone()
Dim IE As Object
Dim doa As HTMLDocument
Dim HTMLdoc As HTMLDocument
Dim button As HTMLInputButtonElement
Dim i As Integer
Set IE = CreateObject("internetexplorer.application")
IE.Visible = True
IE.navigate "https://eqvm.titusn.co.in/PavbnjsPnrtl/login.jsp"
Do While IE.busy
Application.Wait DateAdd("s", 1, Now)
Loop
IE.document.getElementById("username").Value = "efeqwf"
IE.document.getElementById("password").Value = "*****"
IE.document.getElementsByClassName("button ok")(0).Click
IE.document.querySelector("[title='PSR - Indirect Materials']").Click
'IE.document.getElementsByClassName("dijitReset dojoxMultiSelectItem")(1).Click
'IE.document.querySelector("[title='div_3_1_2_radiogroup']").Click
'itemlength = IE.document.all.Item("div_3_1_2_radiogroup").Length
'IE.document.getElementsByName("div_3_1_2_radiogroup").Item(1).Checked = True
End Sub
<div class="dijitReset dojoxMultiSelectItem BPMMultiSelectFocused dojoxCheckedMultiSelectSelectedOption" id="dojox_form__CheckedMultiSelectItem_7" widgetid="dojox_form__CheckedMultiSelectItem_7" aria-selected="true">
<div class="dijit dijitReset dijitInline dojoxMultiSelectItemBox dijitRadio dijitRadioChecked dijitChecked" role="presentation" widgetid="dijit_form_CheckBox_7">
<input
type="radio"
class="dijitReset dijitCheckBoxInput"
data-dojo-attach-point="focusNode"
data-
<div
class="dijit dijitReset dijitInline dojoxMultiSelectItemBox dijitRadio dijitRadioChecked dijitChecked"
role="presentation"
widgetid="dijit_form_CheckBox_7"
/>
<input
type="radio"
class="dijitReset dijitCheckBoxInput"
data-dojo-attach-point="focusNode"
data-dojo-attach-event="onclick:_onClick"
value="on"
tabindex="0"
id="dijit_form_CheckBox_7"
aria-labelledby=" dijit_form_CheckBox_7_radio_label"
name="div_3_1_2_radiogroup"
aria-checked="true"
style="user-select: none;"
/>
</div>
<input
type="radio"
class="dijitReset dijitCheckBoxInput"
data-dojo-attach-point="focusNode"
data-dojo-attach-event="onclick:_onClick"
value="on"
tabindex="0"
id="dijit_form_CheckBox_7"
aria-labelledby=" dijit_form_CheckBox_7_radio_label"
name="div_3_1_2_radiogroup"
aria-checked="true"
style="user-select: none;"
/>
<div class="dijitInline dojoxMultiSelectItemLabel" data-dojo-attach-point="labelNode" data-dojo-attach-event="onclick:_onClick" id="dijit_form_CheckBox_7_radio_label">Create New / Create from existing PSR</div>
</div>
i am new to the web-scraping , now i am trying to click the radio button through vba but i cant find the solution to that
for the first 4 stage (username,password,button ok,title) i have successfully passed but i am struggling for the above code
Second one is which i'am trying to click

How to chain selectors in order to get element inside with webdriverio

I have a page with list of products on it.
This is how HTML DOM looks like for one product item:
<div class="module card listing-search-card js-product-card " id="product-entry-123" data-product-id="123" data-toggle-status="open" data-out-of-stock="" data-toggle-isbundle="false" data-load-prices-async="false">
<div class="product-entry__wrapper">
<div class="card__header">
<div class="promotion">
<div class="product-entry__right promotion-card__body on-promotion--banner-offer">
</div>
<a href="/Products/p/123" tabindex="-1">
<picture>
<img class="card__image mobile-img lazyload" src="/medias/image-mobile">
<img class="card__image desktop-img lazyloaded" src="/medias/image-desktop">
</picture>
</a>
</div>
</div>
<div class="product-entry__body-actions-wrapper">
<div class="product-entry__body card__body">
<h3 class="card__title">
Schweppes
</h3>
<div class="product-entry__summary card__description-wrapper">
<div class="product-entry__summary__list">
<div class="card__detail-wrapper">
<div class="product-entry__summary__item card__description-product-detail">
33 x 24</div>
<div class="product-entry__summary__item card__description-product-code">
<span class="product-entry__code">
123</span>
</div>
</div>
<div class="container-type">
box</div>
</div>
</div>
</div>
<div class="cta-container">
<div class="card__amount-wrapper ">
<div class="card__amount">
61,83 € <span class="base-unit">HT/CHACUN</span>
<p class="sales-unit-price is-price">
<span>soit</span> 10,00 €
</span></span></p>
</div>
</div>
<div class="add-to-cart__footer add-to-cart__action">
<div class="success-overlay">Add to cart</div>
<div class="add-to-cart__action--active">
<div class="form-quantity__wrapper quantity-action quantity-action__wrapper"
data-form-quantity-id="123">
<div class="form-quantity ">
<button class="form-quantity__decrease quantity-action__decr icon-Minus disabled" type="button"
tabindex="-1" aria-label="decrement" data-form-quantity-decrement="">
</button>
<input id="product-123" class="form-quantity__input form-control quantity-action__value js-
quantity-input-typing" name="product-123" type="text" value="1" maxlength="4" data-price-
single="10.00" data-price-currency="€" data-parsley-range="[1,9999]" data-form-quantity-times="1"
data-parsley-multiplerange="1" data-parsley-type="integer" data-parsley-validation-threshold="1"
required="">
<button class="form-quantity__increase quantity-action__incr icon-Add-to-list" type="button"
tabindex="-1" aria-label="increment" data-form-quantity-increment="">
</button>
</div>
<span class="form-quantity__update" data-form-quantity-success=""></span>
</div>
<div class="add-to-cart__total">
<button class="button button--primary js-addToCart" role="button" title="Add
to cart" data-product-id-ref="123" data-modal-trigger="" data-modal-target="#add-to-cart-modal" data-
modal-before-trigger="addToCart" data-component-id="product list" tabindex="-1">
<div class="button__text">
<span class="button__text-add js-added-price">Add</span>
<span class="button__text-to-cart js-added-price">to cart</span>
</div>
<span class="button__text js-added-price mobile-only">Add</span>
</button>
</div>
</div>
</div>
<div class="add-to-template">
<button class="add-to-template--button button js-addToNewTemplate" type="button" data-modal-
trigger="" data-modal-target="#add-to-template-modal" data-modal-before-
trigger="openAddToTemplateModal" data-product-code="123">
<span>Add to list</span>
</button>
</div>
</div>
</div>
</div>
I am calling this function:
isSortedAlphabeticallyAscending($$('div.js-product-card'));
And the function implementation is:
isSortedAlphabeticallyAscending(list) {
for (let i = 0; i < (list.length - 1); i++) {
let outOfStockCurrent = list[i].getAttribute('data-out-of-stock');
let outOfStockNext = list[i + 1].getAttribute('data-out-of-stock');
let idCurrent = list[i].getAttribute('id');
let idNext = list[i + 1].getAttribute('id');
console.log("outOfStockCurrent " + outOfStockCurrent + " " + idCurrent);
console.log("outOfStockNext " + outOfStockNext + " " + idNext);
let productIdCurrent = idCurrent.split('-').pop();
let productIdNext = idNext.split('-').pop();
let currentText = list[i].$('a[href*="' + productIdCurrent + '"]').getText();
let nextText = list[i+1].$('a[href*="'+ productIdNext + '"]').getText();
console.log("currentText " + currentText);
console.log("nextText " + nextText);
if(outOfStockCurrent === "true" || outOfStockNext === "true") continue;
if (currentText > nextText) return false;
}
return true;
}
I ignore out of stock products since they are always at the bottom of the page.
But the list[i].$('a[href*="' + productIdCurrent + '"]').getText() is always returning empty text.
I would like it to get "Schweppes" text, i.e. product name.
Is there a way to chain somehow differently part with .$a[href ...] to get the text from the <a> tag inside the <div> element of the list of products using webdriverio 5?
Thanks!
The above selector list[i].$('a[href*="' + productIdCurrent + '"]').getText() targeted 2 elements.
What I needed to go one div further and find it there:
list[i].$('div.product-entry__body-actions-wrapper').$('a[href*="' + productIdCurrent + '"]').getText()
And voila, text appeared :)
Hope it will help someone with the similar issue :D

How to access nested input attribute using Jsoup

Please could You help my with selecting int value from input attr"value"? In this case "12".
<td class="fit-content">
<div class="add-cart" data-original-title="Ilość w op. zbiorczym: 12 szt.">
<div class="input-group">
<input class="form-control form-control-sm" type="text" value="12">
</div>
</div>
Try
String value = doc.selectFirst("div.add-cart div.input-group input").attr("value");

Selenium automation script for <input>tag working as select tag (drop down) with checkbox in it for multiple selection?

Can anybody help me to find a solution on here
Selenium automation script for <input> tag working as select tag (drop down) with checkbox in it for multiple selection.
The values selected are reading from excel file. Please find below code trial :
List<WebElement> elements = (List<WebElement>) driver
.findElements(By.id("ctl00_ContentPlaceHolder1_rcbClinicians_Input"));
int numberOfElements = elements.size();
System.out.println("----size----- " + numberOfElements);
for (int i = 0; i < numberOfElements; i++) {
System.out.println(i);
elements = driver.findElements(By.id("ctl00_ContentPlaceHolder1_rcbClinicians_Input"));
elements.get(i).click();
}
Please help me find a solution
<li>Choose Clincians* </li>
<li>
<div id="ctl00_ContentPlaceHolder1_rcbClinicians" class="RadComboBox RadComboBox_Default" style="width:500px;white-space:normal;">
<table summary="combobox" style="border-width:0;border-collapse:collapse;width:100%" class="rcbFocused">
<tbody><tr>
<td style="width:100%;" class="rcbInputCell rcbInputCellLeft"><input name="ctl00$ContentPlaceHolder1$rcbClinicians" type="text" class="rcbInput" id="ctl00_ContentPlaceHolder1_rcbClinicians_Input" value="" autocomplete="off"></td>
<td class="rcbArrowCell rcbArrowCellRight"><a id="ctl00_ContentPlaceHolder1_rcbClinicians_Arrow" style="overflow: hidden;display: block;position: relative;outline: none;">select</a></td>
</tr>
</tbody></table>
<input id="ctl00_ContentPlaceHolder1_rcbClinicians_ClientState" name="ctl00_ContentPlaceHolder1_rcbClinicians_ClientState" type="hidden" autocomplete="off" value="{"logEntries":[],"value":"","text":"","enabled":true,"checkedIndices":[],"checkedItemsTextOverflows":false}">
</div>
</li>
</ul>
<ul>
<li>Choose CaseManagers* </li>
<li>
<div id="ctl00_ContentPlaceHolder1_rcbCaseManagers" class="RadComboBox RadComboBox_Default" style="width:500px;white-space:normal;">
<table summary="combobox" style="border-width:0;border-collapse:collapse;width:100%" class="">
<tbody><tr>
<td style="width:100%;" class="rcbInputCell rcbInputCellLeft"><input name="ctl00$ContentPlaceHolder1$rcbCaseManagers" type="text" class="rcbInput" id="ctl00_ContentPlaceHolder1_rcbCaseManagers_Input" value="" autocomplete="off"></td>
<td class="rcbArrowCell rcbArrowCellRight"><a id="ctl00_ContentPlaceHolder1_rcbCaseManagers_Arrow" style="overflow: hidden;display: block;position: relative;outline: none;">select</a></td>
</tr>
</tbody></table>
<input id="ctl00_ContentPlaceHolder1_rcbCaseManagers_ClientState" name="ctl00_ContentPlaceHolder1_rcbCaseManagers_ClientState" type="hidden" autocomplete="off" value="{"logEntries":[],"value":"","text":"","enabled":true,"checkedIndices":[],"checkedItemsTextOverflows":false}">
</div>
</li>
</ul>
I don't see anything in your code that you are reading values from Excel. if you want to select value one by one yu can call this below method n time witch the the value you want to select. If you want multiple selection you can modify it with a for loop to do so.
/**
* #author mbn217
* #Date ------
* #Purpose This method will select a element from a list of
* values by the name passed in the parameter
* #param element --> element of the webpage (the weblist element)
* #param Name --> The value we want to select
* #return N/A
*/
public static void selectElementByName(WebElement element, String Name) {
Select selectitem = new Select(element);
selectitem.selectByVisibleText(Name);
}

VBA to click on a button in IE with no ID, Name, ClassName

I need to click on a button in IE. I tried few options with no success. Everything i tried are using TagName/ ClassName/ Name/ ID. But I don't find any of these in the below.
<FORM method=post name=frmSearch action=searchResultsSummary>
<INPUT type=hidden value=GMT+01:00 name=clitz>
<DIV style="MARGIN-LEFT: 30px">
<DIV style="MARGIN-BOTTOM: 10px; FONT-WEIGHT: bold">Common Criteria</DIV>
<TABLE style="MARGIN-LEFT: 20px" cellSpacing=5 cellPadding=0 border=0>
<TBODY>
<TR>
<TD id=oDocumentCategoryCell style="HEIGHT: 22px" colSpan=3 noWrap>Document category:</TD>
<TD colSpan=3 align=left>
<SELECT style="DISPLAY: block" name=docCategory old_value="2">
<OPTION value=%>Any Category</OPTION>
<OPTION value=FINANCE>Finance Document</OPTION>
<OPTION selected value=INBOUND>Inbound Document</OPTION>
<OPTION value=PROCESS>Internal Process</OPTION>
<OPTION value=OUTBOUND>Outbound Document</OPTION>
</SELECT>
</TD>
<TD style="WIDTH: 15px"></TD>
<TD>
<INPUT onclick="javascript:if (!validateSearchForm()) return;if (!checkCountOfDays()) return; addRefKeysToForm(frmSearch); frmSearch.submit();" style="WIDTH: 100px" type=button value=Search>
</TD></TR>
<TR>
<TD style="HEIGHT: 22px" colSpan=3 noWrap>Document type:</TD>
<TD colSpan=3 align=left>
<SELECT style="DISPLAY: block" name=docType old_value="0">
<OPTION selected value=%>Any Type</OPTION>
<OPTION value=CSN_ORD_VALID>CSN Order Validation</OPTION>
<OPTION value=CREDIT_ORD>Credit Order</OPTION>
<OPTION value=EOI_ORDCPY>EOI Order Copy</OPTION>
<OPTION value=LEASE_ORD>Lease Order</OPTION>
<OPTION value=ORDER_REPORT>Order Load Report</OPTION>
<OPTION value=TRADE_ORD>Trade Order</OPTION>
<OPTION value=WATSON>WATSON Quote</OPTION>
<OPTION value=WATSON_UPDATE>WATSON Update Quote</OPTION>
<OPTION value=WNGQ>WNGQ Request</OPTION></SELECT> </TD></TR>
<TR>
I tried going by input tags and it is not working.
I used the below code. It clicks on the button i want but it is not considering the values i gave before clicking on submit button.
Dim btnInput As MSHTML.HTMLInputElement
Dim frm As MSHTML.IHTMLElementCollection
Application.ScreenUpdating = True
Set frm = ie.Document.getElementsByName("frmSearch")
For Each btnInput In frm
If btnInput.Value = "frmSearch.submit()" Then
btnInput.submit
Exit For
End If
Next btnInput
Can some one help me on how to click on the button.
As #Tim Williams suggest, this should work document.querySelector("input[value=Search]").
If this still doesn't return the correct input element then the selector can be made more specific according to the DOM tree where the searched input is located, e.g. like this.
' Add reference to Microsoft Internet Controls (SHDocVw)
' Add reference to Microsoft HTML Object Library
Dim inputElement As HTMLInputElement
Set inputElement = doc.querySelector( _
"div[class=main] div[id=inner] table input[type=button][value=Search]")
If Not inputElement Is Nothing Then
inputElement.Click
End If
HTML
<div class="main">
<div id="inner">
<table>
<tbody>
<tr>
<td>
<INPUT onclick="alert('This is the corect one');"
style="WIDTH: 100px"
type=button
value=Search>
</td>
</tr>
</tbody>
</table>
</div>
</div>
<INPUT onclick="alert('No this is not the searched one');"
style="WIDTH: 100px"
type=button
value=Search>
EDIT:
Not sure if I understand your problem correctly.
To select the INPUT element this selector can be used:
Set inputElement = doc.querySelector( _
"form[name=frmSearch] table tbody tr td input[type=button][value=Search]")
To set the value attribute setAttribute() function can be used. To check value attribute getAttribute() function can be used.
If Not inputElement Is Nothing Then
inputElement.setAttribute "value", "some-new-value"
If inputElement.getAttribute("value") = "some-new-value" Then
inputElement.Click
End If
End If