I'm trying to detect if a web page has certain text. For example, I want to see if this web page includes the following phrase: "Here is my code"
I can't get it to ever find that the "If Then" condition is satisfied. Here's what I'm trying:
Const READYSTATE_COMPLETE = 4
Declare Function SetForegroundWindow Lib "user32" _
Alias "SetForegroundWindow" (ByVal Hwnd As Long)As Long
' Declare Internet Explorer object
Dim IE As SHDocVw.InternetExplorer
Dim strProgramName As String
Sub Main
' create instance of InternetExplorer
Set IE = New InternetExplorer
' using your newly created instance of Internet Explorer
With IE
SetForegroundWindow IE.HWND
.Visible = True
.Navigate2 "https://stackoverflow.com/questions/38355762/how-do-i-modify-web-scraping-code-to-loop-through-product-bullets-until-it-finds"
' Wait until page we are navigating to is loaded
Do While .Busy
Loop
Do
Loop Until .readyState = READYSTATE_COMPLETE
On Error Resume Next
If Err Then
Else
End If
Wait 2
If InStr(IE.document.body.innerHTML, "Here is my code") > 0 Then
MsgBox "Yessiree Bob"
Else
MsgBox "The text dosen't exist"
End If
Set IE = Nothing
' Tidy Up
End With
End Sub
I've also tried:
FindText = InStr(1, IE.document.body.innerHTML, "Here is my code")
If FindText > 0 Then
And
msg = IE.document.body.innerHTML
If InStr(msg, "Here is my code") > 0 Then
But nothing works. I've looked on Stack Overflow, but can't find this exact question.
Thanks in advance!
Use:
If InStr(IE.document.getElementById("body").innerHTML, "Here is my code") > 0 Then
Related
I inherited this VBA script from my predecessor. It works fine for me in Excel 2013 up until recently when I was told I may need to work from home. Come to find out, the Office 2016 environment of my newly accessed VPN desktop does not like this script. I keep getting "The remote server machine is unknown or unavailable" when it reaches .ReadyState <> READYSTATE_COMPLETE.
The navigation did not fail as I can see the window where it successfully navigated to the URL and I can interact with it correctly. The strange thing is if I change the URL to "www.google.com" I get a valid ready state result.
I also need to figure out how to late bind the Shell Windows so it will work with both the v15 and v16 libraries simultaneously.
The intent of this script is to automate a process that
1. Opens an internal database at DBurl via web interface
2. Manipulates and runs a java script located on the web page
3. Close the browser window without closing any other browser windows
This could be modified for someone else's use by looking for a page element, such as a search box or specific button on a page, and interacting with it.
Edit:
Additional testing has revealed that a pause at and skipping the Do While loop and resuming at IETab1 = SWs.Count results in this script working in Office 2016. The only issue, then, is without the loop, the page isn't yet ready for the next step when the script tries to run the interaction. A wait for 5 seconds in place of the loop band-aid's this issue. Finding why the .ReadyState won't read will fix this issue.
Declare PtrSafe Function apiShowWindow Lib "user32" Alias "ShowWindow" _
(ByVal hwnd As Long, ByVal nCmdShow As Long) As Long
Sub OpenWebDB()
Dim ieApp As Object
Dim SWs As ShellWindows
Dim IETab1 As Integer
Dim JScript As String
Dim CurrentWindow As Object
Dim DBurl As String
Dim tNow As Date, tOut As Date
DBurl = "My.Database.url"
Set SWs = New ShellWindows
tNow = Now
tOut = tNow + TimeValue("00:00:15")
If ieApp Is Nothing Then
Set ieApp = CreateObject("InternetExplorer.Application")
With ieApp
.Navigate DBurl
Do While tNow < tOut And .ReadyState <> READYSTATE_COMPLETE
DoEvents
tNow = Now
Loop
IETab1 = SWs.Count
End With
End If
If Not tNow < tOut Then GoTo DBFail
On Error GoTo DBFail
Set CurrentWindow = SWs.Item(IETab1 - 1).Document.parentWindow
JScript = "javascript: DoSomething"
Call CurrentWindow.execScript(JScript)
On Error GoTo 0
SWs.Item(IETab1 - 1).Quit
Set ieApp = Nothing
Set SWs = Nothing
Exit Sub
DBFail:
MsgBox (DBurl & vbCrLf & "took too long to connect or failed to load correctly." & vbCrLf & _
"Please notify the Database manager if this issue continues."), vbCritical, "DB Error"
SWs.Item(IETab1 - 1).Quit
Set ieApp = Nothing
Set SWs = Nothing
End Sub
Try to remove the tNow < tOut from the Do While condition. Or, using the While statement to wait page complete:
While IE.ReadyState <> 4
DoEvents
Wend
The intent of this script is to automate a process that
1. Opens an internal database at DBurl via web interface
2. Manipulates and runs a java script located on the web page
3. Close the browser window without closing any other browser windows
Besides, according to the intent of the script, I suggest you could refer the following code (it could loop through the tabs, and close specific tab according the title):
Sub TestClose()
Dim IE As Object, Data As Object
Dim ticket As String
Dim my_url As String, my_title As String
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.Navigate "https://www.microsoft.com/en-sg/" '1st tab
.Navigate "https://www.bing.com", CLng(2048) '2nd
.Navigate "https://www.google.com", CLng(2048) '3rd
While IE.ReadyState <> 4
DoEvents
Wend
'wait some time to let page load
Application.Wait (Now + TimeValue("0:00:05"))
'get the opened windows
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
'loop through the window and find the tab
For x = 0 To (IE_count - 1)
On Error Resume Next
'get the location and title
my_url = objShell.Windows(x).Document.Location
my_title = objShell.Windows(x).Document.Title
'debug to check the value
Debug.Print x
Debug.Print my_title
'find the special tab based on the title.
If my_title Like "Bing" & "*" Then
Set IE = objShell.Windows(x)
IE.Quit 'call the Quit method to close the tab.
Exit For 'exit the for loop
Else
End If
Next
End With
Set IE = Nothing
End Sub
I'm using this code to grab the instance of Internet Explorer from word VBA and scraping some values from a webpage. I'm looping through 4 items (just in case, sometimes I've accidentally grabbed something called "Windows Explorer", which I have no idea what that is) to grab Internet Explorer. But before I begin scraping values, I want to make sure my tab name is "Overview - City". How can I test against the tab names?
Dim shellWins As ShellWindows, IE As InternetExplorer
Dim i As Long
Set shellWins = New ShellWindows
'Find Internet Explorer - if it can't find it, close the program
If shellWins.Count > 0 Then
For i = 0 To 3
On Error Resume Next
If shellWins.Item(i).Name = "Internet Explorer" Then
Set IE = shellWins.Item(i)
Exit For
End If
On Error GoTo 0
If i = 3 Then
MsgBox "Could not find Internet Explorer.", vbExclamation, "Error"
Exit Sub
End If
Next i
Else
MsgBox "Could not find Internet Explorer.", vbExclamation, "Error"
Exit Sub
End If
I tried following the guide here and used this bit to try and Debug.Print all the active tab names in IE once I had found it:
Dim IE_Tab As SHDocVw.InternetExplorer
Dim SH_Win As SHDocVw.ShellWindows
For each IE_Tab in SH_Win
Debug.Print IE_Tab.Name 'This returns nothing?
Next IE_Tab
But the immediate window returns blank with no error. What am I doing wrong?
Here is some code that should find a reference to the open Internet Explorer tab. It does this by looping through the Shell.Application.Windows collection. The function supports looking for just the WindowName, the URL and WindowName, and supports specifying the compare method or if you want to do a like match. I kept this code late bound, to avoid needing references.
The code is commented, somewhat, let me know if there are questions.
Code
Option Explicit
Private Function GetIEWindow(WindowName As String, _
ExactMatch As Boolean, _
Optional CompareMethod As VbCompareMethod = vbTextCompare, _
Optional URL As String) As Object
Dim Window As Object
Dim Windows As Object: Set Windows = CreateObject("Shell.Application").Windows
For Each Window In Windows
'Make sure the app is Internet Explorer. Shell Windows can include other apps
If InStr(1, Window.FullName, "IEXPLORE.EXE", vbTextCompare) > 0 Then
'Perform exact matches, where the title or url and title match exactly
If ExactMatch Then
If Len(URL) = 0 Then
If Window.LocationName = WindowName Then
Set GetIEWindow = Window
Exit Function
End If
Else
If Window.LocationName = WindowName And Window.LocationUrl = URL Then
Set GetIEWindow = Window
Exit Function
End If
End If
Else
'Otherwise do a In String match
If Len(URL) = 0 Then
If InStr(1, Window.LocationName, WindowName, CompareMethod) > 0 Then
Set GetIEWindow = Window
Exit Function
End If
Else
If InStr(1, Window.LocationName, WindowName, CompareMethod) > 0 And InStr(1, Window.LocationUrl, URL, CompareMethod) > 0 Then
Set GetIEWindow = Window
Exit Function
End If
End If
End If
End If
Next
End Function
Sub ExampleUsage()
Dim IE As Object: Set IE = GetIEWindow("exe", True)
If Not IE Is Nothing Then
Debug.Print "I found the IE window"
Else
Debug.Print "I didn't find the IE window"
End If
End Sub
The tab is just another window. You can use the GetWebPage function below to loop through the windows and get the URL you are looking for.
References are
Microsoft Internet Controls
Microsoft Shell Controls and Automation
Sub Example()
Dim ieWin As InternetExplorer
Set ieWin = CreateObject("InternetExplorer.Application")
With ieWin
.Navigate "https://www.google.com/"
.Visible = True
.Silent = True
End With
Set ieWin = GetWebPage("https://www.google.com/")
End Sub
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
'Desc: The Function gets the Internet Explorer window that has the current
' URL from the sURL Parameter. The Function Timesout after 30 seconds
'Input parameters:
'String sURL - The URL to look for
'Output parameters:
'InternetExplorer ie - the Internet Explorer window holding the webpage
'Result: returns the the Internet Explorer window holding the webpage
''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''''
Public Function GetWebPage(sUrl As String) As InternetExplorer
Dim winShell As Shell
Dim dt As Date
dt = DateAdd("s", 300, DateTime.Now)
Dim ie As InternetExplorer
Do While dt > DateTime.Now
Set winShell = New Shell
'loop through the windows and check the internet explorer windows
For Each ie In winShell.Windows
If ie.LocationURL = sUrl Then
Set GetWebPage = ie
Do While ie.Busy
DoEvents
Loop
Exit Do
Set winShell = Nothing
End If
Next ie
Set winShell = Nothing
DoEvents
Loop
End Function
Hope you are doing well. The site i tried to scrape category-names from is very simple to look at if you notice it's inspected element but when i create a parser i can't pull the data. I wanted to scrape only the 7 category names from that page. I tried with all possible angles but failed. If anybody helps me point out what I'm doing wrong, I would be very grateful to him. Thanks in advance. FYC, I'm pasting here the code I tried with.
Sub ItemName()
Dim http As New MSXML2.XMLHTTP60, html As New HTMLDocument
Dim topics As Object, topic As Object, posts As Object, post As Object, ele As Object
Dim x As Long
x = 2
http.Open "GET", "http://www.bjs.com/tv--electronics.category.3000000000000144985.2002193", False
http.send
html.body.innerHTML = http.responseText
Set topics = html.getElementsByClassName("categories")
For Each topic In topics
For Each posts In topic.getElementsByTagName("li")
For Each post In posts.getElementsByTagName("a")
Set ele = post.getElementsByTagName("h4")(0)
Cells(x, 1) = ele.innerText
x = x + 1
Next post
Next posts
Next topic
End Sub
Here's one possible solution, I'm using the internet explorer object instead of MSXML. I'm able to retrieve the data from the page, and it's pretty quick.
Here's the full code:
Option Explicit
#If VBA7 Then
Public Declare PtrSafe Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As LongPtr)
#Else
Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
#End If
Sub ItemName()
On Error GoTo errhand:
Dim ie As Object: Set ie = CreateObject("InternetExplorer.Application")
Dim topics As Object, topic As Object
Dim i As Byte
With ie
.Visible = False
.Navigate "http://www.bjs.com/tv--electronics.category.3000000000000144985.2002193"
Sleep 500 ' Wait for the page to start loading
Do Until .document.readyState = 4 Or .busy = False Or i >= 100
Sleep 100
DoEvents
i = i + 1
Loop
End With
Set topics = ie.document.getElementsByClassName("name ng-binding")
For Each topic In topics
'Print out the element's innertext
Debug.Print topic.innertext
Next
ie.Quit
Set ie = Nothing
Exit Sub
errhand:
Debug.Print Err.Number, Err.Description
ie.Quit
Set ie = Nothing
End Sub
As the content of that site are generated dynamically, so there is no way for xmlhttp request to catch the page source. However, to get around that selenium is good to go, as it works well when it comes to deal with javascriptheavy website. I only used selenium in my below script to get the page source. As soon as it get that, I reverted back to usual vba method to accomplish the process.
Sub Grabbing_item()
Dim driver As New ChromeDriver, html As New HTMLDocument
Dim post As Object
With driver
.get "http://www.bjs.com/tv--electronics.category.3000000000000144985.2002193"
html.body.innerHTML = .ExecuteScript("return document.body.innerHTML;")
.Quit
End With
For Each post In html.getElementsByClassName("name")
x = x + 1: Cells(x, 1) = post.innerText
Next post
End Sub
I am new to vba.
I am trying to use below code by David Zemens to fetch data from yelp
Option Explicit
Private Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim Listings As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Sleep 200
Loop
Set html = .Document
End With
Set Listings = html.getElementsByTagName("LI") ' ## returns the list
For Each l In Listings
'## make sure this list item looks like the listings Div Class:
' then, build the string to put in your cell
If InStr(1, l.innerHTML, "media-block clearfix media-block-large main-attributes") > 0 Then
Range("A1").Offset(r, 0).Value = l.innerText
r = r + 1
End If
Next
Set html = Nothing
Set ie = Nothing
End Sub
Problem is that it's not getting any data from the source.
Regards
There's a lot of work to be done.
Here's something that you can start with. Hopefully, you will be able to find the other pieces of information using the same logic. This will print business names in the immediate window. I've found the business names in meta tag description.
I've changed the sleep amount to 5 seconds. IE will be able to fully load and the rest of the code will be processed reliably. The initial 200 milliseconds gave results once every couple of runs. I guess this depends how fast your computer is so 5 seconds is pretty safe I guess.
Option Explicit
Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Sub find()
'Uses late binding, or add reference to Microsoft HTML Object Library
' and change variable Types to use intellisense
Dim returnstring As String 'this is going to hold boutiques names
Dim ie As Object 'InternetExplorer.Application
Dim html As Object 'HTMLDocument
Dim meta As Object 'IHTMLElementCollection
Dim l As Object 'IHTMLElement
Dim r As Long
Set ie = CreateObject("InternetExplorer.Application")
With ie
.Visible = False
.Navigate "http://www.yelp.com/search?find_desc=boutique&find_loc=New+York%2C+NY&ns=1&ls=3387133dfc25cc99#start=10"
' Don't show window
'Wait until IE is done loading page
Do While .readyState <> 4
Application.StatusBar = "Downloading information, Please wait..."
DoEvents
Sleep 5000
Loop
Set html = .Document
End With
Set meta = html.GetElementsByTagName("META") ' ## returns attribures
Dim m As Object
For Each m In meta
If InStr(m.Content, "Reviews on Boutique in New York -") > 0 Then
returnstring = Replace(m.Content, "Reviews on Boutique in New York -", "")
End If
Next
Dim i As Integer
For i = 0 To UBound(Split(returnstring, ","))
Debug.Print (Split(returnstring, ",")(i))
Next
Set html = Nothing
Set ie = Nothing
End Sub
Myoutput:
i have a problem with fetching data from an internal web based Dataservice (cognos).
Basically i put together a GET request like "blah.com/cognosapi.dll?product=xxx&date=yyy...", send it to the server and receive a webpage that i can store as HTML and parse into my excel form later.
I build a VBA program which worked quite well in the past, but the webservice changed an now they are displaying a "your report is running" page in between that lasts from 1sec to 30sec. So when i call my function i always download this "your report is running" page insteat of the data. How can i catch the page that automatically loads up after the "report is running" page?
This is the DownloadFile Function with the GETstring and the target path as parameters.
Public Function DownloadFile(sSourceUrl As String, _
sLocalFile As String) As Boolean
Dim HttpReq As Object
Set HttpReq = CreateObject("MSXML2.XMLHTTP")
Dim HtmlDoc As New MSHTML.HTMLDocument
HttpReq.Open "GET", sSourceUrl, False
HttpReq.send
If HttpReq.Status = 200 Then
HttpReq.getAllResponseHeaders
HtmlDoc.body.innerHTML = HttpReq.responseText
Debug.Print HtmlDoc.body.innerHTML
End If
'Download the file. BINDF_GETNEWESTVERSION forces
'the API to download from the specified source.
'Passing 0& as dwReserved causes the locally-cached
'copy to be downloaded, if available. If the API
'returns ERROR_SUCCESS (0), DownloadFile returns True.
DownloadFile = URLDownloadToFile(0&, _
sSourceUrl, _
sLocalFile, _
BINDF_GETNEWESTVERSION, _
0&) = ERROR_SUCCESS
End Function
Thanks
David
finally you gave me the final link to solve my problem. I baked the code into my DownloadFile Function to stay with the IE Object until the end and then close it.
One Error i found is was that the readystate should be polled before anything is done with the HTMLObject.
Public Function DownloadFile(sSourceUrl As String, _
sLocalFile As String) As Boolean
Dim IE As InternetExplorer
Set IE = New InternetExplorer
Dim HtmlDoc As New MSHTML.HTMLDocument
Dim collTables As MSHTML.IHTMLElementCollection
Dim collSpans As MSHTML.IHTMLElementCollection
Dim objSpanElem As MSHTML.IHTMLSpanElement
Dim fnum As Integer
With IE
'May changed to "false if you don't want to see browser window"
.Visible = True
.Navigate (sSourceUrl)
'this waits for the page to be loaded
Do Until .readyState = 4: DoEvents: Loop
End With
'Set HtmlDoc = wait_for_html(sSourceUrl, "text/css")
Do
Set HtmlDoc = IE.Document
'searching for the "Span" tag
Set collSpans = HtmlDoc.getElementsByTagName("span")
'first Span element cotains...
Set objSpanElem = collSpans(0)
'... this if loading screen is display
Loop Until Not objSpanElem.innerHTML = "Your report is running."
'just grab the tables and leave the rest
Set collTables = HtmlDoc.getElementsByTagName("table")
fnum = FreeFile()
Open sLocalFile For Output As fnum ' save the file and add html and body tags
Print #fnum, "<html>"
Print #fnum, "<body>"
Print #fnum, collTables(15).outerHTML 'title
Print #fnum, collTables(17).outerHTML 'Date
Print #fnum, collTables(18).outerHTML 'Part, Operation etc.
Print #fnum, collTables(19).outerHTML 'Measuerements
Print #fnum, "</body>"
Print #fnum, "</html>"
Close #fnum
IE.Quit 'close Explorer
DownloadFile = True
End Function
Since you're using a GET request, I'm assuming any required parameters can be provided in the URL string. In that case, you might be able to use InternetExplorer.Application, which should automatically update its Document property whenever the page refreshes. You could then set up a loop which periodically checks for some value (tag text, URL, etc...) that's unique to the desired page.
Here's a sample which loads a URL, then waits until the page's <title> tag is the desired value.
Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Function wait_for_html(strURL as String, strDesiredText as String) as String
Dim IE As InternetExplorer
Set IE = New InternetExplorer
IE.Navigate (strURL)
While IE.ReadyState <> 4
Sleep 10
Wend
Dim objHtml As MSHTML.HTMLDocument
Dim collTitle As MSHTML.IHTMLElementCollection
Dim objTitleElem As MSHTML.IHTMLTitleElement
Do
Sleep 1000
Set objHtml = IE.Document
Set collTitle = objHtml.getElementsByTagName("title")
Set objTitleElem = collTitle(0)
Loop Until objTitleElem.Text = strDesiredText
wait_for_html = objHtml.body.innerHTML
End Function
The above needs references to Microsoft Internet Controls and Microsoft HTML Object Library.