VBA code to scrape the price of particular project from URL [duplicate] - vba

This question already has answers here:
Scraping data from website using vba
(5 answers)
Closed 4 years ago.
I am trying to get price detail of particular project from URL but I am clueless.
example:-From url (https://www.99acres.com/ppc-2515-residential-apartment-mailer) for project Eden Richmond Enclave i want price 14.22 to 31.15 Lac in range A1.
Below is the code I tried:
Sub test()
Set driver = CreateObject("Selenium.FirefoxDriver")
driver.get "https://www.99acres.com/ppc-2515-residential-apartment-mailer"
Range("A1") = driver.FindElementByXPath("//h1[contains(#class,'font-size15')][contains(text(),'Eden Richmond Enclave')][contains(#class,'product-lrg-box')]").Text
End Sub
Pic
below is the html code:-
<div class="pro-text">
<div class="product-text-box">
<div class="product-heading"><span><img src="https://newprojects.99acres.com/projects/eden_group/eden_richmond_enclave/bb7ttfq9.gif">
<h1 class="font-size15">Eden Richmond Enclave <p>Narendrapur</p>
</h1>
</span> </div>
</div>
<div class="product-text-box">
<ul class="product-lrg-box">
<li> <span><strong><span class="rupee-font">₹ </span>14.22 to 31.15 Lac</strong></span></li>
<li><strong>499-1093 SQFT</strong></li>
<li><strong>1-3 BHK</strong></li>
<li style="width:20% !important;"><strong>December 2020</strong></li>
</ul>
<div id="tabs" class="tab-link tabs-menu tabs-menu-new">
<ul>
<li>e-Brochure</li>
<li>Amenities</li>
<!-- <li style="width:20% !important;">Floor Plan</li>-->
<li style="width:20% !important;">Directions</li>
</ul>
</div>
<span class="enquire-new-bt" id="294015-469203,151100-enquire-new-bt" data-val="2"> I am Interested </span> </div>
</div>

Perhaps try
.FindElementByCSS("ul.product-lrg-box span")
Depends whether you intend to repeat this process for other elements.
For the above, with the HTML you supplied you get:
Otherwise,
.FindElementByCSS("ul.product-lrg-box")
retrieves the entire string.
Without being able to view the page and knowing if you want to retrieve more elements you might consider
.FindElementsByCss("ul.product-lrg-box span")
and looping over the collection returned (if valid); or,
Try scraping with IE and using something like:
IE.documentQuerySelectorAll("ul.product-lrg-box span")
and then looping over the NodeList returned.

To extract the text 14.22 to 31.15 Lac you can use either of the following Locator Strategies:
XPath:
//h1[contains(.,'Eden Richmond Enclave')]//following::div[1]/ul[#class='product-lrg-box']/li//span/strong[not(#class='rupee-font')]

Related

How to create an Xpath in a tricky section of document (for me) for the purpose of using with Selenium Basic in VBA

OK, so I mentioned Selenium Basic as that is the use of the XPath and I believe Selenium Basic uses Selenium version 2 so maybe it won't be able to understand some/all answers that might require the latest Selenium. But someone might take that into account if necessary.
There are dynamic classes at play here.
Criteria for selection.
1. Class starting with 'NextToJump__eventWrapper' (the outer one) must be used.
2. Class starting with 'NextToJump__venue' must contain text = 'Ballarat'
3. Class starting with 'NextToJump__race' (and/or span) must contain text = 'Race 2'
I need to be able to click on the <a> tag that contains Points 2 and 3.
The best that I've been able to do (and checked) using ChroPath in Chrome Devtools is...
//div[starts-with(#class,'NextToJump__eventWrapper')]//descendant::*[contains(text(),'Ballarat')]
But note that there are 2 cases of Point 2 in the HTML but only 1 case that satisfies Points 2 and 3.
Thanks
<div class="NextToJump__eventWrapper--13zZJ">
<div>
<div class="NextToJump__raceEvent--bfMON" data-testid="next-to-jump-item">
<a class="Link__link--9x4YY" href="/racing-betting/greyhound-racing/crayford-am/20200708/race-1-1801951-58544404">
<div class="NextToJump__iconWrapper--1yG60"></div>
<div class="NextToJump__eventDetail--CUzdX">
<div class="NextToJump__venue--1jwWA">Ballarat</div>
<div class="NextToJump__race--3JydR"><span>Race 1</span></div>
</div>
<div class="NextToJump__countdown--EG8mR"><span class="Countdown__countdown--4vRpD Countdown__imminent--2yc2K">52s</span></div>
</a>
</div>
<div class="NextToJump__raceEvent--bfMON" data-testid="next-to-jump-item">
<a class="Link__link--9x4YY active" href="/racing-betting/greyhound-racing/rockhampton/20200708/race-4-1799474-58466521" aria-current="page">
<div class="NextToJump__iconWrapper--1yG60"></div>
<div class="NextToJump__eventDetail--CUzdX">
<div class="NextToJump__venue--1jwWA">Rockhampton</div>
<div class="NextToJump__race--3JydR"><span>Race 4</span></div>
</div>
<div class="NextToJump__countdown--EG8mR"><span class="Countdown__countdown--4vRpD Countdown__imminent--2yc2K">2m 52s</span></div>
</a>
</div>
<div class="NextToJump__raceEvent--bfMON" data-testid="next-to-jump-item">
<a class="Link__link--9x4YY" href="/racing-betting/greyhound-racing/ballarat/20200708/race-4-1799454-58465201">
<div class="NextToJump__iconWrapper--1yG60"></div>
<div class="NextToJump__eventDetail--CUzdX">
<div class="NextToJump__venue--1jwWA">Ballarat</div>
<div class="NextToJump__race--3JydR"><span>Race 2</span></div>
</div>
<div class="NextToJump__countdown--EG8mR"><span class="Countdown__countdown--4vRpD Countdown__imminent--2yc2K">5m 52s</span></div>
</a>
</div>
</div>
</div>
The xpath expression you need to use to select your target <a> tag is long and convoluted, but that's life....
[formatted for ease of reading, but you can use that in one line]
//a
[ancestor::div[starts-with(#class,'NextToJump__eventWrapper')]]
[.//div[.="Ballarat"]
[starts-with(#class,'NextToJump__venue-')]
[./following-sibling::div[.="Race 2"]
[starts-with(#class,'NextToJump__race-')]
]
]
Edit:
In "plain English":
Find an <a> node which meets ALL these conditions (i) has an ancestor (not a parent) node which is a <div>, which <div> has a class attribute with an attribute name which starts with NextToJump__eventWrapper; and (ii) it has <div>descendant (not just a child) node, which has Ballarat as a text node AND which has a class attribute with an attribute name which starts with NextToJump__venue-, where that <div>descendant itself has a following sibling which is a <div> which itself has a Race 2 text node AND which has a class attribute with an attribute name which starts with NextToJump__race-...
Yes, the word "plain" doesn't really fit here, but that's the closest I could get. I like xpath, and it's very powerful, but sometimes it's very hard to follow... As an aside, it would have been somewhat less cryptic if xquery was used instead of straight xpath.

Word VBA web scraping-how can I scrape certain classes & skip others

I'm trying to sequentially scrape certain classes from a web page, but not others. However, I cannot work out how to selectively choose the "sub" classes I'm interested in - transcript-question and transcript-answer, but not timestamp, which all seem to be inside transcription-item-wrapper.
Is there an elegant way to do this, or do I need to work with the extracted string and remove unwanted HTML code?
Current code:
Sub ScrapeToWord()
Const URL = "http://......."
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim topics As Object, posts As Object, topic As Object
http.Open "GET", URL, False
http.send
html.body.innerHTML = http.responseText
Set topics = html.getElementsByClassName("transcription-item-wrapper")
For Each posts In topics
For Each topic In posts.getElementsByClassName("transcript-question")
ActiveDocument.Tables(1).Cell(1, 1).Range.Text = topic.innerText
Next topic
Next posts
End Sub
A snippet of the HTML code:
<div class="transcription-section">
<div class="transcription-section-wrapper">
<div class="transcription-item-wrapper"><div class="transcript-qa"><div class="timestamp"></div><div class="transcript-question">Tape 01</div></div></div><div class="transcription-item-wrapper"><div class="transcript-qa"><div class="timestamp">
<p class="34" id="01003400">01:00:34:00
</p>
<span class="listen"></span>
<span class="watch"></span>
</div><div class="transcript-question">Could begin with a brief overview of your life.</div></div><div class="transcript-qa"><div class="timestamp"></div><div class="transcript-answer">I was born in 1942. I was born on a farm and started school when I was 4 years old.</div></div></div><div class="transcription-item-wrapper"><div class="transcript-qa"><div class="timestamp">
<p class="60" id="01010000">01:01:00:00
</p>
<span class="listen"></span>
<span class="watch"></span>
</div><div class="transcript-question">And then?</div></div><div class="transcript-qa"><div class="timestamp"></div><div class="transcript-answer">During the Depression my father lost the farm then we moved to Sandridge and I went to school there until I was about 8. We then went dairy farming there and</div></div></div><div class="transcription-item-wrapper"><div class="transcript-qa"><div class="timestamp">
<p class="90" id="01013000">01:01:30:00
</p>
<span class="listen"></span>
<span class="watch"></span>
</div><div class="transcript-answer">no machine milking in those days, it was all hand milking. </div></div><div class="transcript-qa"><div class="timestamp">
You can use querySelectorAll to gather a nodeList of items and use the two classes of interest with CSS OR syntax to match on either. Then loop the list to do whatever you want with the matched items.
Dim i As Long, nodeList As Object
Set nodeList = html.querySelectorAll(".transcript-question, .transcript-answer")
For i = 0 To nodeList.Length-1
Debug.Print nodeList.item(i).innerText 'do something with each return value e.g. put in a table or print out
Next

ngSwitchWhen doesn't work when duplicate whens are written

I am learning angular2 using ng-book2 book and I was just playing around Built in directives.
I was reading about ngSwitch and I stumbled upon this feature where we can write multiple ngSwitchWhen with same conditions like following code:
<ul [ngSwitch]="choice">
<li *ngSwitchWhen="1">First choice</li>
<li *ngSwitchWhen="2">Second choice</li>
<li *ngSwitchWhen="3">Third choice</li>
<li *ngSwitchWhen="4">Fourth choice</li>
<li *ngSwitchWhen="2">Second choice, again</li>
<li *ngSwitchDefault>Default choice</li>
</ul>
which will output following result:
Second Choice
Second choice, again
I wrote code as below:
<div [ngSwitch]="myVar">
<div *ngSwitchWhen="myVar==1">My Var is 1</div>
<div *ngSwitchWhen="myVar==2">My Var is 2</div>
<div *ngSwitchWhen="myVar==3">My Var is 3</div>
<div *ngSwitchWhen="myVar==3">Special feature of ng Swtich</div>
<div *ngSwitchDefault>My Var is {{myVar}}</div>
</div>
which does not print output with same conditions.
I thought my code was proper but when I saw *ngSwitchWhen="myVar==3"
I found out my mistake.
But strangely it works properly except for repeated conditions
Is there any difference between these two conditions?
*ngSwitchWhen="2"
*ngSwitchWhen="myVar==3"
Which one to use?
ngSwitchWhen="2"
This expression checks the value of switchcase against the variable myVar(myVar=="6")
ngSwitchWhen="myVar==3"
Whereas this expression evaluates to myVar==(myVar==2) the value inside the parantheses return 1 if myVar is 2 and 0 if not

Not finding the Correct xpath

I'm trying write a Python script to get some information from Google's products listed on the top right of the screen. (Usual 6 pictures with price and seller)
I am using Python, PhantomJS and Selenium
Doing a google search for "red shoe" I want my script to return the prices. I get stuck in the step where I try to even find the element containing the products. Am I missing something with my xpath?
def getTopSongs(object):
print "Working YETI"
browser = webdriver.PhantomJS('c:/projects/phantomjs/phantomjs.exe')
browser.get('http://google.com/search?q=red+shoe')
time.sleep(5)
title = browser.find_element_by_xpath('//div[contains#class, "pla-unit")]/text()[contains(., "red")]/following::b').text
From Google's webpage I element under a few nested
<div id="rhs">
...
<div class="_Pwb">
<div class="_Ohb">
<div style="width:109px" class="pla-unit">
<div class="_PD">
<div class="pla-unit-img-container">
<div class="_Z5">
<div class="_vT"><a href="http://www.somewebsite.com">
<span class="rhsl4">Nina 'Forbes' Peep Toe Pump <b>Red</b> R...</span>
<span class="rhsg3 rhsl5">Nina 'Forbes' Peep Toe Pum...</span>
<span class="rhsg4">Nina 'Forbes' Peep Toe Pu...</span></a>
</div>
<div class="_QD"><b>$78.95</b></div>
<div class="_mC">
<span class="rhsl4 a">Nordstrom</span>
<span class="rhsg3 rhsl5 a">Nordstrom</span>
<span class="rhsg4 a">Nordstrom</span>
</div>
</div>
*Update:
I added more HTML. In this example I am looking to get the text from ($78.95) annd (Norstrom)
*Update
To clarify,
<div id="rhs">
is an unique element
There are however multiple (6) elements of:
<div style="width:109px" class="pla-unit">
The elements under each category have the same name and follow the same structure and substructures
ie, there are 6
<div class="_PD">
<div class="pla-unit-img-container">
<div class="_Z5">
<div class="_vD">
<div class="_QD">
<div class="_mC">
and so on.
The main objective is to get all of the elements but for purposes of debugging I was asking help to get the first one.
The xpath for a price unit using XPathChecker on Firefox is:
id('rhs_block')/x:div[1]/x:div/x:div/x:div/x:div[1]/x:div[1]/x:div[2]/x:div[2]/x:b
You can use ancestor:: to go back up then following-sibling:: to get elements at the same level that follow it.
I haven't tried this but give it a shot:
title = browser.find_element_by_xpath('//div[contains#class, "pla-unit")]/text()[contains(., "red")]/ancestor::div/following-sibling::div[1]').text
Then to get to your div class ='mC' you just change:
following-sibling::div[1]
to
following-sibling::div[2]
and get the text from the spans under that.

accessing list items by selenium web driver and storing them to list

I want to store element one, element two...as mentioned below in a list by web driver...find the page source code
<div class="widget">
<div class="widget">
<h2>Programs</h2>
<div class="w">
<ul class="xoxo blogroll">
<li>
<li><a title="Element ONE" href="http://www.zemtv.com/?s=4+Man+show">Test ONE</a></li>
<li><a title="Element Two" href="http://www.zemtv.com/?s=8pm+with+Fareeha+Idrees">Test Two</a></li>
<li><a title="Element three" href="http://www.zemtv.com/?s=Aaj+kamran+khan">Element Three</a></li>
<li>
<li>
I am using the code like below
List<WebElement> allNames = driver.findElements(By.xpath("//div[#class='xoxo blogroll']/a"));
but it is not accessing any list items. Can any body help me in this regard.
xpath used in the above code is not pointing to anchor tag.
//div[#class='xoxo blogroll']/a
should be replaced with
//ul[#class='xoxo blogroll']/li/a
List<WebElement> lst = driver.findElements(By.xpath("//div[#class='xoxo blogroll']/a"));
for(int i=0;i<lst.size();i++)
{System.out.println(lst.get(i).getText();}