excel VBA - function to parse data - vba

I am trying to parse a webpage but am having difficulty in getting the information coming through.
I have a simple button which navigate to a website in a worksheet
Private Sub Sellit_Click()
Dim IE As Object
Dim HTMLDoc As HTMLDocument
Dim oHTML_Element As IHTMLElement
Set IE = CreateObject("Internetexplorer.Application")
IE.Visible = True
apiShowWindow IE.hwnd, SW_MAXIMIZE
IE.navigate "https://www.yahoo.com/"
Do
Loop Until IE.ReadyState = READYSTATE_COMPLETE
DoEvents
Scrape
End Sub
While the function Scrape in a module
Function Scrape()
Dim IE As Object
Dim HTMLDoc As HTMLDocument
Dim oHTML_Element As IHTMLElement
MsgBox IE.document.Title
End Function
I kinda think i know the problem here is the IE doesn't go from the worksheet to the module and vise versa but am not quite sure how to fix it.
your help will be much apperciated

You'd have to declare the IE object variable publicly, then refer to it using it's fully qualified name. To do so:
At the top of the Worksheet code module type Public IE as Object
Remove variable declarations to IE within the SellIt Click event and the Scrape function. Since this is declared publicly, it shouldn't be declared privately within the code.
In the standard module, also remove the IE declaration. (Same reason as step 2)
Change MsgBox IE.document.Title to include the sheet's codename. For example, if the sheet codename is Sheet1 it should read MsgBox Sheet1.IE.document.Title
Let me know if that helps.

Related

How to enter a website address using VBA and search

I know this may seem easy. I have already entered a code to try and get this to work, but ran into one problem. The format on the link below is the same for all city and states. As long as you can type the name of the city ("City_Search") and the State ("State_Search") you should be able to access the website with the information as seen below.
I have attached the formula I am using below. If anyone can assist me with the search I would appreciate it.
Sub SearchBot1()
'dimension (declare or set aside memory for) our variables
Dim objIE As InternetExplorer 'special object variable representing the IE browser
Dim aEle As HTMLLinkElement 'special object variable for an <a> (link) element
Dim HTMLinputs As MSHTML.IHTMLElementCollection
'initiating a new instance of Internet Explorer and asigning it to objIE
Set objIE = New InternetExplorer
'make IE browser visible (False would allow IE to run in the background)
objIE.Visible = True
'navigate IE to this web page (a pretty neat search engine really)
objIE.navigate "https://datausa.io/profile/geo/" & Range("City_Search").Value & "-" & Range("State_Search").Value
'wait here a few seconds while the browser is busy
Do While objIE.Busy = True Or objIE.readyState <> 4: DoEvents: Loop
End Sub
The Idea would be for me to type any city into excel and once I hit run on a macro it will go to the site and search for the towns data. I have added a link below as an example of the page I am looking to get when I search.
https://datausa.io/profile/geo/hoboken-nj/
You need to hyphenate cities that have spaces in their title. Counties need to be the correct abbreviation and both are case sensitive i.e. need to be all lower case. So you need to add these hyphens, if missing, using a function like Replace in vba, to swop Chr$(32) with "-" or Chr$(45), and potentially LCase$ to convert to lowercase.
You should also fully qualify the range with the worksheet you intend to use.
With data already in correct format in cell:
E.g. with los-angeles-ca or los-angeles-county-ca in a cell.
Option Explicit
Public Sub SearchBot1()
Dim objIE As InternetExplorer, aEle As HTMLLinkElement
Dim HTMLinputs As MSHTML.IHTMLElementCollection
Set objIE = New InternetExplorer
'e.g. https://datausa.io/profile/geo/los-angeles-ca/
With objIE
.Visible = True
.navigate "https://datausa.io/profile/geo/" & Range("City_Search").Value & "-" & Range("State_Search").Value
Do While .Busy = True Or .readyState <> 4: DoEvents: Loop
Stop
' .Quit '<== Uncomment me to close browser at end
End With
End Sub
Adding hyphens:
If you had los angeles, not los-angeles, in a cell:
Replace$(Range("City_Search").Value, Chr$(32), Chr$(45))
Lowercase and hyphen:
To be really safe you could convert to lowercase aswell to handle any upper case letters in the cell you are referencing e.g.
For Los Angeles use: Replace$(LCase$(Range("City_Search").Value)
Option Explicit
Public Sub SearchBot1()
Dim objIE As InternetExplorer, aEle As HTMLLinkElement
Dim HTMLinputs As MSHTML.IHTMLElementCollection, ws As Worksheet
Set ws = ThisWorkbook.Worksheets("Sheet1")
Set objIE = New InternetExplorer
'e.g. https://datausa.io/profile/geo/los-angeles-ca/
With objIE
.Visible = True
.navigate "https://datausa.io/profile/geo/" & ws.Range("City_Search").Value & "-" & ws.Range("State_Search").Value
Do While .Busy = True Or .readyState <> 4: DoEvents: Loop
Stop
' .Quit '<== Uncomment me to close browser at end
End With
End Sub
That gets you to the pages. What you do then......
DID you know that this website has its own data-search API?
And you can also extract data using a background object instead of creating an Internet Explorer?
For instance:
Sub getCityData()
''' Create a background server connection
Dim myCon As Object: Set myCon = CreateObject("MSXML2.ServerXMLHTTP.6.0")
''' Open a connection string with the DataUSA API and basic request for (geo, place, population)
myCon.Open "GET", "http://api.datausa.io/api/?show=geo&sumlevel=place&required=pop"
myCon.send ''' Send the request
''' Dataset in the ResponseText is HUGE so for demo show first 5000 characters
Sheet1.Range("A1").Value2 = Left(myCon.responseText, 5000)
End Sub
That will pull the ENTIRE DATA SET for every "place" in America with its population for every year from 2013 onwards in about a second. It will place the first 5000 characters of the dataset in to cell A1 on Sheet1 (I recommend putting this in a new Excel file).
I don't have time to learn the site's API but it seems to have good documentation On github and the responses come back in JSON format - if you really want to make a powerful excel interface use their API with background connections - they have so much data for the USA at your fingertips

getelementbytagname isn't recognized

I'm attempting to have Excel open a web site, populate some fields, submit, and download the resulting data in a file.
My code never gets very far, however, because it looks like Excel doesn't recognize "getelementsbytagname" as an existing operation. I assume this is the issue because it does not correct the case to GetElementsByTagName like it does for everything else.
My References in the editor include Microsoft Internet Controls and Microsoft HTML Object Library. Is there another one that I need to activate?
The code is just a modified version of something found online.
Private Sub IE_automation()
'Retrieve data from Enterprise Reporting with IE
Dim i As Long
Dim IE As Object
Dim objElement As Object
Dim objCollection As Object
'Create Internet Explorer Object
Set IE = CreateObject("InternetExplorer.Application")
'Comment out while troubleshooting
'IE.Visible = False
'Send the form data to URL as POST binary request
IE.Navigate "http://corpprddatawhs1/Reports/Pages/Report.aspx?ItemPath=%2fInventory%2fInventory+By+Branch"
'Set statusbar
Application.StatusBar = "Webhost data is loading. Please wait..."
'Wait while IE loading
Do While IE.busy
Application.Wait DateAdd("s", 1, Now)
Loop
'Find 2 input tags:
' 1. Text field
' <input type="text" class="null" name="ct132$ct104$1ct105$txtValue" size="30" value="" />
'
' 2. Button
' <input type="submit" value="View Report" />
Application.StatusBar = "Searching form submission. Please wait..."
Set objCollection = IE.document.getelementsbytagname("input")
My References in the editor include Microsoft Internet Controls and Microsoft HTML Object Library.
Then you have no need to do this:
Dim IE As Object
Dim objElement As Object
Dim objCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
Declaring everything As Object is a technique called late-binding. In that paradigm you create a new instance of an object by passing a ProgID to the CreateObject function.
Late-bound code is, by definition, resolved at run-time: the compiler is happy with the Object interface, and the onus is on you to use the correct members.
With late-bound code you can't get IntelliSense or auto-complete, because the compiler has no idea what you're up to: all it's seeing is Object.
When you actually reference a type library, you can use early binding, which means the types and member calls are resolved at compile-time instead of run-time. But for this to work, you need to use the types you're referencing. Otherwise you're late-binding against a type library that's referenced... and late-bound code doesn't require a reference (just that some version of the type library is registered on the machine that's running the code).
Dim browser As InternetExplorer
Set browser = New InternetExplorer
browser.Navigate url
'...
Dim dom As HTMLDocument
Set dom = browser.Document
Dim inputElements As HTMLElementCollection
Set inputElements = dom.getElementsByTagName("input")
And now you're coding against an HTML DOM just like you would against any Excel worksheet, with IntelliSense and parameter info and all the goodies.
This is suspicious:
Do While IE.busy
Application.Wait DateAdd("s", 1, Now)
Loop
That wait-loop isn't even looking at the browser's ReadyState. Try this instead:
Do Until IE.ReadyState = 4 And IE.Busy = False
DoEvents
Loop
See this post for an allegedly fail-safe way to go about it.
Your code otherwise looks fine though, so my money is on the wait-loop.
Even though VBA is case-insensitive, all the methods/properties under document are case-sensitive.
A safe way to call them with late-binding is to use CallByName:
Set items = CallByName(IE.document, "getElementsByTagName", VbMethod, "input")

Cant download a scorecard from pgatour.com

I've searched throughout the site here to improve my simple code, but continue to have an error 424 at the set allRowofdata line, which I gleaned from here. I'm just trying to load a golfers scorecard into my excel spread. I appreciate any help I can get.
Here's my code:
Note: based on advise from below, I've changed my code to the below, but now get a type mismatch error at set allRowOfData.
Sub Trial()
Dim IE As InternetExplorer
Dim allRowOfData
Dim document As Object
Set IE = New InternetExplorer
IE.Visible = False
IE.navigate "https://www.pgatour.com/players/player.32757.patton-kizzire.html/scorecards/r457"
Do Until IE.readyState = 4: DoEvents: Loop
Set allRowOfData = IE.document.getElementById("module-1510443455695-953403-17")
Dim myValue As String: myValue = allRowOfData.Cells().innerHTML
IE.Quit
Set IE = Nothing
Range("A1").Value = myValue
End Sub
I think you've set the page element before it gets a chance to load on the page, therefore 'allRowOfData' remains nothing.
Consider replacing this:
Application.Wait Now + TimeSerial(0, 0, 1)
With:
Do Until IE.ReadyState = 4: DoEvents: Loop
Or alternatively (inefficient but can't see any reason why it wouldn't work):
Do until not(allRowOfData is nothing)
Do events
Set allRowOfData = IE.document.getElementById("#module-1510443455695-953403-17")
Loop
The above assumes your usage of .getElementByID method is correct. Untested, written on mobile.
Edit 1:
See if changing this
Dim allRowOfData
To this
Dim allRowOfData As IHTMLElement
Works.
Before you re-run code, in VBA editor, click Tools > References and add a reference to the Microsoft HTML Object Library (scroll down and tick it).
Type Mismatch would suggest we're assigning something to a variable of the wrong type, so let's get rid of the implicit variant.
I haven't been able to examine the code's HTML source as I'm on mobile. It may be the case though that the specific element that you want is created subsequent to the page's initial loading (in which case you may need to use the inefficient alternative above), or that you need to iterate through a IHTMLElementCollection to get at the information. All depends on the page's structure and loading behaviour.

vba can not find opening IE 11 Browser.

I have a vba code which can upload the data from an excel sheet to a website. However, the code works fine in Win7 System and IE browser 8,but it does not work on a win8 IE browser 11.
Here are part of the code:
Dim objIE As SHDocVw.InternetExplorer
Dim htmlDoc As MSHTML.HTMLDocument
Dim htmlFrame As MSHTML.HTMLFrameElement
Dim frame As HTMLIFrame
Dim htmlElement As HTMLDTElement
Dim myDoc As Object
Set curSheet = ActiveWorkbook.ActiveSheet
Set oShApp = CreateObject("Shell.Application")
For Each oWin In oShApp.Windows
If oWin.Name = "Windows Internet Explorer" Then
Set IE = oWin
Exit For
End If
Next
If IE Is Nothing Then
MsgBox ("Please sign into Avocado, then re-run this macro")
Set IE = New InternetExplorerMedium
IE.Visible = True
IE.navigate "https://www.google.com"
Exit Sub
End If
Sheets("Prepare").Select
fPathName = Cells(5, 5)
Call MakeFolders(fPathName)
Call MakeFolders2(fPathName)
Call MakeFolders3(fPathName)
'fFileName = fPathName & "\*.xls"
fFileName = Dir(fPathName & "\*.xls")
The code runs in a loop when enters the statement : "If IE Is Nothing Then"
Even when the google site is opened, the program still keeps prompting out the msgBox, and reopen the website again and again, and it never executes to the last part "Sheets("Prepare").Select". I am very confused because it works perfect in the IE 8 browsers. I am wondering if there is any difference between IE11 and IE8 in terms of vba IE function.
Please take a look up it and give me some ideas on this, your help is greatly appreciated. Thank you very much.
I ran into the identical problem and found, that the window name of the shell.application has changed in IE 11. Earlier versions had the name "Windows Internet Explorer", but IE 11 uses only "Internet Explorer"
If you change your if condition accordingly, it will work again ...
I think you are using an old version of internet explorer. Use the code I provided bellow and see if it works. Please make sure you have the following references added to your project:
Microsoft HTML Object Library
Microsoft Internet Controls.
If you are not sure how to add the references to your code check out this link.External References
Sub Test()
Dim objIE As InternetExplorer
Dim htmlDoc As HTMLDocument
Dim htmlFrame As HTMLFrameElement
Dim frame As HTMLIFrame
Dim htmlElement As HTMLDTElement
Dim myDoc As Object
Dim curSheet As Worksheet
' Set the variables
Set curSheet = ActiveWorkbook.ActiveSheet
Set objIE = New InternetExplorer
' Make the browser visible and navigate
With objIE
.Visible = True
.navigate "https://www.google.com"
End With
WaitForInternetToLoad objIE
Sheets("Prepare").Select
fPathName = Cells(5, 5).Value
Call MakeFolders(fPathName)
Call MakeFolders2(fPathName)
Call MakeFolders3(fPathName)
'fFileName = fPathName & "\*.xls"
fFileName = Dir(fPathName & "\*.xls")
End Sub
Sub WaitForInternetToLoad(ByRef ie As InternetExplorer)
Do
dovents
Loop While Not ie.readyState = READYSTATE_COMPLETE
End Sub
Extra Information.
Hi I see you are having some trouble with this and I want to extend my help.
Let's start by the fact that you are using two external libraries in your code; and you should have references set to them for the code to work properly. See my picture I have highlighted the libraries yellow.
What are the differences:
Microsoft Internet Controls
Is the one that takes care of the internet object. It creates an internet explorer application and it navigates to certain links. Once it finishes navigating to the url then " internet object" has a "document". This object cannot do anything else.
Microsoft HTML Object Library
This library takes care of the Html document. And as you probably guessed you will assign the "document" from the previous object (internet explorer) to a HTML document variable and then you can do further manipulation.

VBA: not able to pull value within <input> tag using getelementsbyTagname().Value

This is my first post on stackflow :) I've been Googling VBA knowledge and writing some VBA for about a month.
My computer info:
1.window 8.1
2.excel 2013
3.ie 11
My excel reference
Microsoft Object Library: yes
Microsoft Internet Controls: yes
Microsoft Form 2.0 Object library: yes
Microsoft Script Control 1.0: yes
Issue:
I was trying to retrieve data from internet explorer automatically using VBA.
I would like to retrieve the value within an input tag from a id called "u_0_1" which is under a id called "facebook". I am expecting to retrieve the value "AQFFmT0qn1TW" on cell c2. However, it got this msg popped up after I run the VBA "run-time error '91':object variable or with block variable not set.
I have been trying this for a couple of weeks using different methods such as,
1.getelementsbyClassname
2.getelementbyid
3.getelementsbyTagname
But it just doesn't work.
url:
http://coursesweb.net/javascript/getelementsbytagname
Below is my VBA code. Could you guys help me out a little bit please?
Private Sub CommandButton1_Click()
Dim ie As Object
Dim Doc As HTMLDocument
Dim getThis As String
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 0
ie.navigate "http://coursesweb.net/javascript/getelementsbytagname"
Do
DoEvents
Loop Until ie.readyState = 4
Set Doc = ie.document
getThis = Trim(Doc.getElementById("u_0_1")(0).getElementsByTagName("input")(0).Value)
Range("c2").Value = getThis
End Sub
Thanks for your help. I have no idea that there is difference between JS and VBA in aspect of getelementsby () methods. And using the loop method to find the id which I find it very useful as well.
I still have some issues to retrieve value from a form or input type. I hope that you could help me or give me some suggestions as well.
Expected Result:
retrieve the value "AQFFmT0qn1TW" and copy it on Cell ("c2") automatically.
Actual Result:
nothing return to Cell ("C2")
Below is the HTML elements.
<form rel="async" ajaxify="/plugins/like/connect" method="post" action="/plugins/like/connect" onsubmit="return window.Event && Event.__inlineSubmit && Event.__inlineSubmit(this,event)" id="u_0_1">
<input type="hidden" name="fb_dtsg" value="AQFFmT0qn1TW" autocomplete="off">
Below is the VBA code based on your code.
Private Sub CommandButton1_Click()
Dim ie As Object
Dim Doc As HTMLDocument
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 0
ie.navigate "http://coursesweb.net/javascript/getelementsbytagname"
Do
DoEvents
Loop Until ie.readyState = 4
Set Doc = ie.document
Set Elements = Doc.getElementsByTagName("input")
For Each Element In Elements
If Element.name = "fb_dtsg" Then
Range("c2").Value = Element.innerText
End If
Next Element
Set Elements = Nothing
End Sub
Cheers.
first of all, I can't find in source of website tags you were searching. Anyway, I think you can't chain getElementById.getElementsByTag as in JS. You have to loop through collection of document elements.
Private Sub CommandButton1_Click()
Dim ie As Object
Dim Doc As HTMLDocument
Dim Elements As IHTMLElementCollection
Dim Element As IHTMLElement
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 0
ie.navigate "http://coursesweb.net/javascript/getelementsbytagname"
Do
DoEvents
Loop Until ie.readyState = 4
Set Doc = ie.document
Set Elements = Doc.getElementsByTagName("ul")
For Each Element In Elements
If Element.ID = "ex4" Then
Sheets(1).Cells(1, 1).Value = Element.innerText
End If
Next Element
Set Elements = Nothing
End Sub
First I'm getting collection of tags "ul", then looping through them for id "ex4". In your case you'd get collection of "input"s then loop for id you want. Finding id which is followed by different id shouldn't be hard, just some if...thens.
If you need further assistant please respond with url in which I can find exactly what you're looking for.
Cheers