I need a vbscript that could be used to copy an output from different webpages and copy it into excel sheet
Example:
Website like truecaller.Com which you can search for people by phone number.
Each number represent by unique web address ex(www.truecaller.com/au/439965324)
I need to make an excel sheet that has two columns; the 1st one is the web address and the 2nd one 8s the related name
Excel VBA is not the best for web scraping but it can get the job done.
Firstly you'll need to make sure you download the latest Internet Explorer, or at least ensure you have version 9 or above.
Secondly, you'll have to enable some references on your macros (these are analogous to imports in languages like Java). To do this, open your VBA editor, and go to Tools > References. You'll want to tick Microsoft Internet Controls and Microsoft HTML Object Library.
Now you're good to go, the following code should work for you. Not being a member of true caller, I only see "-" in the name field, but I imagine it's different if you have an account. The script I've made simply pulls out the name, number and address. I'm sure you won't have a problem with looping through your desired URLs and then placing the grabbed data where you want them.
Sub Test()
'to refer to the running copy of Internet Explorer
Dim ie As InternetExplorer
'to refer to the HTML document returned
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "www.truecaller.com/au/439965324"
'Wait until IE is done loading page
Do While ie.readyState <> READYSTATE_COMPLETE
Application.StatusBar = "Trying to go to StackOverflow ..."
DoEvents
Loop
'show text of HTML document returned
Set html = ie.document
MsgBox html.DocumentElement.innerHTML
Dim element As IHTMLElement
Set element = html.getElementsByClassName("result__details")(0)
Dim Name As String
Dim Number As String
Dim Address As String
Name = element.Children(0).Children(1).innerText
Number = element.Children(1).Children(1).innerText
Address = element.Children(2).Children(1).innerText
MsgBox ("Name is " & Name & " with number " & Number & ". Address: " & Address)
'close down IE and reset status bar
Set ie = Nothing
Application.StatusBar = ""
End Sub
If you want to learn more about scraping with VBA then here's a good link:
http://www.wiseowl.co.uk/blog/s393/scraping-websites-vba.htm
Related
Below is the error I'm getting when entering the codeI'm new to VBA but I'd like to create a macro that returns the top 5 search results of the item listed in cell A15 using either google chrome or microsoft edge? I tried adding in the code below and got an error.
VBA can only automate the Internet Explorer browser. So if you want to use google chrome or Microsoft edge then this approach will not work for you.
Below is the sample code that populates Google search results to a worksheet.
Option Explicit
Public Sub GetLink()
Dim ie As New InternetExplorer
Dim url As String
url = "https://google.co.uk/search?q=" + Sheet1.Range("A2").Value
With ie
.Visible = True
.navigate url
While .Busy Or .readyState < 4: DoEvents: Wend
Sheet1.Range("B2").Value = .document.querySelector("#search div.r [href*=http]").href
Sheet1.Range("C2").Value = .document.querySelector("#search div.r [href*=http]").innerText
.Quit
End With
End Sub
Output:
Reference:
How to get the first search result link of a google search using VBA?
Below is another helpful thread. Its solution uses the XMLHTTP object.
Using VBA in Excel to Google Search in IE and return the hyperlink of the first result
Further, you can try to check the above example and try to modify the sample as per your own requirements.
VBA code to interact with specific IE window that is already open
Above is a thread to find and go to an already open instance of IE using shell applications in VBA. AFTER I found the open IE instance I am looking for, I need to query the tables from that IE page without using it's URL. The reason that I cannot use it's URL is that this IE page is a generic 'result' page that opens in a separate window after doing a search on the main website, so if I use the URL of the result page, which is: https://a836-acris.nyc.gov/DS/DocumentSearch/BBLResult, it will return an error. Are there any other methods that allow querying tables without using URL connections, like a "getElements" for tables?
K.Davis, Tim William: you are correct in your assumptions. The first part of my code/project opens up a search page: objIE.navigate "https://a836-acris.nyc.gov/DS/DocumentSearch/BBL" and through it I submit a search form. The second part (outlined above in the first paragraph) opens up a result page (pop-up). I am trying to automate the retrieving of the tables from that page. I tried using QueryTables.Add method, the way I am familiar with to connect to the data/webpage requires an URL. If I use the URL from the result page it returns an error, thus I am looking for suggestions/help on how I could otherwise connect. That said I am able to retrieve elements of the page using 'getElements' method but not able to query tables. There are other ways to connect to the data source using the QueryTables.Add method, see, https://learn.microsoft.com/en-us/office/vba/api/excel.querytables.add but I am not familiar with these other methods. Hope this clarifies a bit.
I haven't experienced a problem with this as although you have an intermediate window the final IE window resolves to being the main IE window with focus. I was able to grab the results table with the following code using the indicated search parameters:
Option Explicit
Public Sub GetInfo()
Dim IE As New InternetExplorer
With IE
.Visible = True
.navigate "https://a836-acris.nyc.gov/DS/DocumentSearch/BBL"
While .Busy Or .readyState < 4: DoEvents: Wend
With .document
.querySelector("option[value='3']").Selected = True
.querySelector("[name=edt_block]").Value = 1
.querySelector("[name=edt_lot]").Value = "0000"
.querySelector("[name=Submit2]").Click
End With
While .Busy Or .readyState < 4: DoEvents: Wend
Dim hTable As HTMLTable
Set hTable = .document.getElementsByTagName("table")(6)
'do stuff with table
.Quit
End With
End Sub
You can copy a table via clipboard. Any tick windings appear in the right place but as empty icons.
For clipboard early bound go VBE > Tools > References > Microsoft-Forms 2.0 Object Library.
If you add a UserForm to your project, the library will get automatically added.
Dim clipboard As DataObject
Set clipboard = New DataObject
clipboard.SetText hTable.outerHTML
clipboard.PutInClipboard
ThisWorkbook.Worksheets("Sheet1").Cells(1, 1).PasteSpecial
Late bound use
Dim clipboard As Object
Set clipboard = GetObject("New:{1C3B4210-F441-11CE-B9EA-00AA006B1A69}")
I'm a journalist. I spend countless hours copying brief passages from various webpages, and then pasting those passages - along with attribution to the websites I found them - into web-based articles.
For example, many of my articles have passages which look like this:
The Mexican finance minister wrote:
The euro exchange rate is, strictly speaking, too low for the German economy's competitive position.
I want to use VBA to do the following:
(1) When I highlight the text I want to copy - in the example "The euro exchange rate is, strictly speaking, too low for the German economy's competitive position"
(2) I would also automatically copy the url where the text comes from (in this case, http://www.cnbc.com/2017/02/06/german-finance-minister-agrees-euro-too-low-for-germany.html)
(3) So when I paste the text into my blog, it would automatically paste the text and ALSO the url. In other words, I would end up with what I wrote above.
I think the write script is IE.LocationURL to automatically determine the url I'm at. And I know how to launch Internet Explorer and navigate to a web page.
But I don't know how to put the script together.
Here's my attempt:
Const READYSTATE_COMPLETE = 4
' Declare Windows API function for setting active window
Declare Function SetForegroundWindow Lib "user32" _
Alias "SetForegroundWindow" (ByVal Hwnd As Long)As Long
' Declare Internet Explorer object
Dim IE As SHDocVw.InternetExplorer
Sub Main
' create instance of InternetExplorer
Set IE = New InternetExplorer
' using your newly created instance of Internet Explorer
With IE
SetForegroundWindow IE.HWND
.Visible = True
.Navigate2 "http://www.cnbc.com/2017/02/06/german-finance-minister-agrees-euro-too-low-for-germany.html"
' Wait until page we are navigating to is loaded
Do While .Busy
Loop
Do
Loop Until .ReadyState = READYSTATE_COMPLETE
On Error Resume Next
If Err Then
'Do Nothing
Else
'Copy the selected text
SendKeys "^c"
' Here's where I'm trying to copy the url
debug.print.IE.LocationURL
End With
End If
' Tidy Up
Set IE = Nothing
End Sub
I would then run another script to automatically log into my publishing platform and paste the copied text and url info. Ideally, it would be pasted in the format shown at the top of this post (with linked text and then indented quote).
But if I just have the copied text and url, that would still save me a lot of time.
I use Nuance Dragon Naturally Speaking to run my VBA scripts.
But I'm lost. Please help steer me in the right direction! Thanks!
UPDATE: I guess what I really need is a way to store the url as a string. I can then later write the string (and just paste the selected text the old-fashioned way, with control-v.)
So does anyone know how to read and store the url as a string or value?
Playing around with a bunch of possibilities, I think I finally created a script which works.
To do it, I
(1) first have to copy the url manually from the web page by highlighting the url and the copying it to clipboard using control-c on my keyboard;
(2) select (i.e. manually highlight with my keyboard) the text within the article which I want to copy.
Here's the script:
Sub Main
Dim MyData As DataObject
Dim strClip As String
Set MyData = New DataObject
MyData.GetFromClipboard
strClip = MyData.GetText
Wait 5
SendKeys "^c"
'I have code here to open up the webpage where I'm inserting the information
SendKeys "^v"
' The line above pastes the text I have selected
SendKeys strClip
' The line above sends the url which I previously saved
End Sub
I don't know if it's the most efficient or elegant solution, but it works.
I am trying to program a Webcrawler, using Visual Basic. I have a list with links, stored in an Excel (column 1). The Macro should then open each link and add certain information from the website to the excel file.
Here's the first link (stored in field A2).
The Macro should identify and insert the name of the hotel into column 2 (B2), the rating in column 3 (C2) and the address in column 4 (D2). This process could then be repeated with a loop for all other links (all websites have the same structure).
My code so far (I did not add the loop yet):
Sub Hoteldetails()
Dim IEexp As Object
Set IEexp = CreateObject("InternetExplorer.Application")
IEexp.Visible = True
Range("A2").Select
Selection.Hyperlinks(1).Follow NewWindow:=False, AddHistory:=True
End Sub
How can I "select" the specific data I want and insert it into the excel file? I tried to record the macro via "Add Data", but was not able to import the data from the website. I also tried to do it by using various example codes, but it did not work out for my specific website.
Thanks a lot for any assistance!
tl;dr;
I am not going to do all the work for you but this is fairly easy if the pages have the same structure.
You can issue a browserless XMLHTTP request, to get a nice fast response, and then select the items of interest using either id or classname and collection index.
Here is an example, using the link you provided, which you can adapt into a loop over all links.
Webpage view:
Code output:
VBA:
Option Explicit
Public Sub GetInfo()
Dim sResponse As String, HTML As New HTMLDocument
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "https://www.tripadvisor.co.uk/Hotel_Review-g198832-d236315-Reviews-Grand_Hotel_Kronenhof-Pontresina_Engadin_St_Moritz_Canton_of_Graubunden_Swiss_Alps.html", False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
With HTML
.body.innerHTML = sResponse
Debug.Print "HotelName: " & .getElementById("HEADING").innerText
Debug.Print "Address: " & .getElementsByClassName("detail")(0).innerText
Debug.Print "Rating: " & .getElementsByClassName("overallRating")(0).innerText
End With
End Sub
References:
VBE > Tools > References > HTML Object Library
You have several options:
Option 1: IEObject
Either you need to use the getElementBy methods in IEObject and use string manipulation to extract the data you need. 2 options for string extractions:
Extracting a top-level element by Name or by Id then use string manipulation functions such as Mid, InStr, Left and Right
Use Regex (VBA Vbscript object) to extract the data (recommended)
Option 2: Scrape HTML Add-In
Sometime ago I developed an AddIn for Excel that allows you to easily scrape HTML data within an Excel formula. The process is similar as above as you still need to create a relevant Regex. See an example below for TripAdvisor:
The formula in B2 looks like this (A2 is the link, and the second argument is the Regex):
=GetElementByRegex(A2;"<h1 id=""HEADING"".*?>(?:(?:.|\n)*?)</div>((?:.|\n)*?)</h1>")
You can download the AddIn here:
http://www.analystcave.com/excel-tools/excel-scrape-html-add/
I am writing a macro that will scrape my company's internal SAP site for vendor information. For several reasons I have to use VBA to do so. However, I cannot figure out why I keep getting these three errors when I attempt to scrape the page. Is it possible that this has something to do with the UAC integrity model? Or is there something wrong with my code? Is it possible for a webpage using http can be handled differently in internet explorer? I am able to go to any webpage, even other internal webpages, and can scrape each of those just fine. But when i attempt to scrape the SAP page, i get these errors. The error descriptions and when they occur are:
800706B5 - The interface is unknown (occurs when I place breakpoints before running the offending code)
80004005 - Unspecified error (occurs when I don't place any errors and just let the macro run)
80010108 - The Object invoked has disconnected from its clients. (I can't seem to get a consistent occurrence of this error, it seems to happen around the time that something in excel is so corrupted that no page will load and i have to reinstall excel)
I have absolutely no idea what is going on. The Integrity page didn't make much sense to me, and all the research I found on this talked about connecting to databases and using ADO and COM references. However I am doing everything through Internet Explorer. Here is my relevant code below:
Private Sub runTest_Click()
ie.visible = True
doScrape
End Sub
'The code to run the module
Private Sub doTest()
Dim result As String
result = PageScraper.scrapeSAPPage("<some num>")
End Sub
PageScraper Module
Public Function scrapeSAPPage(num As Long) As String
'Predefined URL that appends num onto end to navigate to specific record in SAP
Dim url As String: url = "<url here>"
Dim ie as InternetExplorer
set ie = CreateObject("internetexplorer.application")
Dim doc as HTMLDocument
ie.navigate url 'Will always sucessfully open page, regardless of SAP or other
'pauses the exection of the code until the webpage has loaded
Do
'Will always fail on next line when attempting SAP site with error
If Not ie.Busy And ie.ReadyState = 4 Then
Application.Wait (Now + TimeValue("00:00:01"))
If Not ie.Busy And ie.ReadyState = 4 Then
Exit Do
End If
End If
DoEvents
Loop
Set doc = ie.document 'After implementation of Tim Williams changes, breaks here
'Scraping code here, not relevant
End Function
I am using IE9 and Excel 2010 on a Windows 7 machine. Any help or insight you can provide would be greatly appreciated. Thank you.
I do this type of scraping frequently and have found it very difficult to make IE automation work 100% reliably with errors like those you have found. As they are often timing issues it can be very frustrating to debug as they don't appear when you step through, only during live runs To minimize the errors I do the following:
Introduce more delays; ie.busy and ie.ReadyState don't necessarily give valid answers IMMEDIATELY after an ie.navigate, so introduce a short delay after ie.navigate. For things I'm loading 1 to 2 seconds normally but anything over 500ms seems to work.
Make sure IE is in a clean state by going ie.navigate "about:blank" before going to the target url.
After that you should have a valid IE object and you'll have to look at it to see what you've got inside. Generally I avoid trying to access the entire ie.document and instead use IE.document.all.tags("x") where 'x' is a suitable thing I'm looking for such as td or a.
However after all these improvements although they have increased my success rate I still have errors at random.
My real solution has been to abandon IE and instead do my work using xmlhttp.
If you are parsing out your data using text operations on the document then it will be a no-brainer to swap over. The xmlhttp object is MUCH more reliable. and you just get the "responsetext" to access the entire html of the document.
Here is a simplified version of what I'm using in production now for scraping, it's so reliable it runs overnight generating millions of rows without error.
Public Sub Main()
Dim obj As MSXML2.ServerXMLHTTP
Dim strData As String
Dim errCount As Integer
' create an xmlhttp object - you will need to reference to the MS XML HTTP library, any version will do
' but I'm using Microsoft XML, v6.0 (c:\windows\system32\msxml6.dll)
Set obj = New MSXML2.ServerXMLHTTP
' Get the url - I set the last param to Async=true so that it returns right away then lets me wait in
' code rather than trust it, but on an internal network "false" might be better for you.
obj.Open "GET", "http://www.google.com", True
obj.send ' this line actually does the HTTP GET
' Wait for a completion up to 10 seconds
errCount = 0
While obj.readyState < 4 And errCount < 10
DoEvents
obj.waitForResponse 1 ' this is an up-to-one-second delay
errCount = errCount + 1
Wend
If obj.readyState = 4 Then ' I do these on two
If obj.Status = 200 Then ' different lines to avoid certain error cases
strData = obj.responseText
End If
End If
obj.abort ' in real code I use some on error resume next, so at this point it is possible I have a failed
' get and so best to abort it before I try again
Debug.Print strData
End Sub
Hope that helps.