How to check for Ajax loading while scraping using VBA - vba

Private Sub CommandButton1_Click()
Dim webpage As String
webpage = GetWebpage("http://www.oddsportal.com/soccer/germany/bundesliga-2011-2012/b-moenchengladbach-bayer-leverkusen-806581/")
Debug.Print webpage
Sheet1.Cells(12, 1) = webpage
End Sub
Function GetWebpage(url As String, Optional fileName As String) As String
Dim xml As Object ' MSXML2.XMLHTTP
Dim result As String
Set xml = GetMSXML
' grab webpage
With xml
.Open "GET", url, True
.send
End With
GetWebpage = xml.responseText
' write to file?
If Len(fileName) > 0 Then
If Not FileExists(fileName) Then
Call CreateFile(fileName, GetWebpage)
Else ' file exists
If MsgBox("File already exists, overwrite?", vbYesNo) = vbYes Then
Call CreateFile(fileName, GetWebpage)
End If
End If
End If
End Function
Function GetMSXML() As Object
On Error Resume Next
Set GetMSXML = CreateObject("MSXML2.XMLHTTP.6.0")
End Function
Sub CreateFile(fileName As String, contents As String)
' create file from string contents
Dim tempFile As String
Dim nextFileNum As Long
nextFileNum = FreeFile
tempFile = fileName
Open tempFile For Output As #nextFileNum
Print #nextFileNum, contents
Close #nextFileNum
End Sub
Function FileExists(fileName As String) As Boolean
FileExists = (Len(Dir(fileName)) > 0)
End Function
This the code I am using, it works well for static or non ajax sites, but in case of ajax the content is missing.

If the question is how to check what ajax requests a web page is making, you can look in the network tab of your browser's developer tools area.

Related

VBA Error 52 on a function that tests if a file exists

I'm trying to pull text from a bunch of XML files into Word. I'm working from a list of files and have found that some of them don't actually exist in the folder. So, I'm using this function to check whether the files actually exist before opening them. But I'm still getting error 52 (Bad file name or number).
This is the function:
Function FileThere(FileName As String) As Boolean
FileThere = (Dir(FileName) > "")
End Function
And this is the code I'm calling it from:
Sub PullContent()
Dim docList As Document
Dim docCombinedFile As Document
Dim objFileListTable As Table
Dim objRow As Row
Dim strContent As String
Dim strFileCode As String
'Code # for the current file. (Pulled in temporarily, output to the Word doc.)
Dim strFilename As String
'Name of XML file. Created based on strFileCode
Set docCombinedFile = Documents.Add
'The new doc which will list all warnings
Dim strXml As String
'String variable that holds the entire content of the data module
Dim strInvalidCodes
'String listing any invalid file codes. Displayed at the end.
Dim FSO As Object: Set FSO = CreateObject("Scripting.FileSystemObject")
Documents.Open FileName:="C:\Users\kelly.keck\Documents\Triton MTS\IETMs - Test\IETMList.docx"
Set docList = Documents("IETMList.docx")
Set objFileListTable = docList.Tables(1)
For Each objRow In objFileListTable.Rows
strFileCode = objRow.Cells(4).Range.Text
strFileCode = Left(strFileCode, Len(strFileCode) - 2)
strFilename = strFileCode & ".xml"
strPath = "C:\Applications\xml\"
If FileThere(strPath & strFileCode) = True Then
'MsgBox (strPath & strFilename)
strXml = FSO.OpenTextFile(strPath & strFilename).ReadAll
Else
strInvalidCodes = strInvalidCodes & vbCr & strFileCode
End If
Next
MsgBox ("The following filenames were invalid: " & vbCr & strInvalidCodes)
End Sub
Getting this error seems to defeat the purpose of having a function to check if a file exists, but I'm not sure what's wrong with the function.
A bit late to the party, but this hasn't had an accepted answer yet.
I generally use this method to test if a file exists, and your code uses FileSystemObject already so could use that reference.
Public Function FileExists(ByVal FileName As String) As Boolean
Dim oFSO As Object
Set oFSO = CreateObject("Scripting.FileSystemObject")
FileExists = oFSO.FileExists(FileName)
End Function
I believe that you need to be sure that FileThere is actually returning the Boolean value you intend. It would be more reliable if you checked the Len property (the number of characters) or checked whether it actually returns the empty string.
The following is more verbose than absolutely necessary in order to make the logic clear. If you were to use Len, instead, then you'd check Len(Dir(FileName)) > 0
Function FileThere(FileName as String) as Boolean
Dim bFileExists as Boolean
If Dir(FileName) = "" Then
bFileExists = False
Else
bFileExists = True
End If
FileThere = bFileExists
End Function

html file save restore from vba

I want to re-use a snapshot of a web response for testing an app that needs to do some web-scraping. What I tried to do is just save the response (from Chrome) and reload the the string from the file:
doc.body.innerHtml = StringFromFile
This doesn't work though, although it looks like good html. By not work, I mean data that is in tag "Table"(6) when going through the web is not found. Is there a better way to load the html doc?
The code below is an attempt to both save an existing doc to file and then reuse it. Its worthless but maybe it will help someone set me straight on this.
Cheers
Private Function GetEWhipersTestHtmlDoc() As HTMLDocument
Dim doc As HTMLDocument
Set doc = New HTMLDocument
Dim sText As String
sText = GetStringFromFile(GetFileName("UnconfirmedRelease"))
doc.body.innerHTML = sText
Set GetEWhipersTestHtmlDoc = doc
End Function
Private Function GetFileName(testName As String) As String
GetFileName = ThisWorkbook.path & Application.PathSeparator & _
"Earnings Whispers Test Scenarios" & Application.PathSeparator & testName & ".txt"
Debug.Assert Dir(GetFileName) <> ""
End Function
Private Function SaveHtmlStringToFile(testName As String, sInnerHtml As String) As String
Dim fso As Object
Dim oFile As Object
Dim sPath As String
Set fso = CreateObject("Scripting.FileSystemObject")
sPath = GetFileName(testName)
Set oFile = fso.CreateTextFile(sPath)
oFile.WriteLine sInnerHtml
oFile.Close
End Function
** UPDATE **
Saving doc.body.outerHtml seems to be an upgrade from what I had. The text can turned into the web page using 'code snippet'. I am getting an error when trying to put the saved text back into a new document though:
Err 600, Application-defined or object-defined error
Private Function GetEWhipersTestHtmlDoc() As HTMLDocument
Dim doc As New HTMLDocument
Dim sText As String
' Error Handling
On Error GoTo ErrHandler
sText = GetStringFromFile(GetFileName("UnconfirmedRelease"))
doc.body.outerHTML = sText <---- ** ERROR is Here
Set GetEWhipersTestHtmlDoc = doc
Exit Function
ErrHandler:
Select Case DspErrMsg("blah")
Case Is = vbAbort: Stop: Resume 'Debug mode - Trace
Case Is = vbRetry: Resume 'Try again
Case Is = vbIgnore: 'End routine
End Select
End Function
final update
thanks to Tim and David I got something usable. The only hair of Tim's final solution is that HtmlDocument.Write is restricted as far as VBA is concerned. So to 'fool' the compiler, I needed to declare it as an Object:
Dim doc As Object <--- don't let vba know we want to write to HTMLDoc
Set doc = New HTMLDocument
Dim sText As String
sText = GetStringFromFile(GetFileName("UnconfirmedRelease"))
doc.Open
doc.Write sText <-- no intellisense, but compiles...and works!
doc.Close

VBA: Sub to Write to a Log File

I have a set of macros defined in my workbook, and I'd like to offer the user the option to log events related to those macros in a log file.
I initiate the log by creating the following in ThisWorkbook:
Public writeLog as Boolean
Public logWrite as Object
Public log as Object
Private Sub Worksheet_Open()
Dim prompt as Integer
prompt = MsgBox("Would you like to log events for this session?", vbYesNo, "Log Events?")
If prompt Then
writeLog = True
Set logWrite = CreateObject("Scripting.FileSystemObject")
Set log = logWrite.CreateTextFile("C:/TEST.txt", False)
Else
writeLog = False
End If
End Sub
I then created a procedure that I can use to write an argument to this object, which I've stored in its own module:
Public Sub PrintLog(obj as Object, argument as String)
If writeLog = True Then
obj.WriteLine argument
End If
End Sub
Unfortunately, this doesn't work, and I'm not sure why: even if I don't include obj as an argument to the function (since log and logWrite were created as global variables), I'm not able to Call WriteLog("String here.") or Call WriteLog(log, "String here.") without an error (Compile Error: Argument Not Optional.)
Is it possible to get such a Sub() to work, so that I can call it from anywhere in the workbook (after a button is pressed in a userform, for example) without having to define a new Scripting.FileSystemObject in every module?
I think that you can solve your problem by making some minor changes to your code. I tried the following setup:
logger module:
Option Explicit
Private log As Object
Public Sub initLog()
Dim prompt As VbMsgBoxResult
Dim fso As Object
prompt = MsgBox("Would you like to log events for this session?", vbYesNo, "Log Events?")
If prompt = vbYes Then
Set fso = CreateObject("Scripting.FileSystemObject")
Set log = fso.CreateTextFile("C:/TEST.txt", False)
End If
End Sub
Public Sub PrintLog(argument As String)
If Not log Is Nothing Then
log.WriteLine argument
End If
End Sub
Public Sub yadda()
'test
PrintLog "yadda"
End Sub
ThisWorkbook:
Private Sub Workbook_Open()
initLog
End Sub
This is my no-frills drop in replacement for Debug.Print(), that logs to "Log.txt" at your Workbook path.
To install : Just search and replace "Debug.Print" with "Log", and optionally call LogClear() at the start of your program.
Public Function Log(ByRef a_stringLogThis As String)
' send to TTY
Debug.Print (a_stringLogThis)
' append (not write) to disk
Open ThisWorkbook.path & "\Log.txt" For Append As #1
Print #1, a_stringLogThis
Close #1
End Function
OPTIONAL : And here's a helper you COULD call at the beginning of your to clear out the previous logs.
Public Function LogClear()
Debug.Print ("Erasing the previous logs.")
Open ThisWorkbook.path & "\Log.txt" For Output As #1
Print #1, ""
Close #1
End Function
OPTIONAL : Finally, if can't live without date and time in your logging, use this Log statement instead:
Public Function Log(ByRef a_stringLogThis As String)
' prepare date
l_stringDateTimeNow = Now
l_stringToday = Format(l_stringDateTimeNow, "YYYY-MM-DD hh:mm:ss")
' concatenate date and what the user wants logged
l_stringLogStatement = l_stringToday & " " & a_stringLogThis
' send to TTY
Debug.Print (l_stringLogStatement)
' append (not write) to disk
Open ThisWorkbook.path & "\Log.txt" For Append As #1
Print #1, l_stringLogStatement
Close #1
End Function
I believe you're having issues as writeLog already exists as a boolean. Error should be popping up "Ambiguous name detected"
Try the following,
Public bLog as Boolean
Public logWrite as Object
Public log as Object
Private Sub Worksheet_Open()
Dim prompt as Integer
prompt = MsgBox("Would you like to log events for this session?", vbYesNo, "Log Events?")
If prompt Then
bLog = True
Set logWrite = CreateObject("Scripting.FileSystemObject")
Set log = logWrite.CreateTextFile("C:/TEST.txt", False)
Else
bLog = False
End If
End Sub
Public Sub WriteLog(Optional obj as Object, Optional argument as String)
If bLog = True Then
obj.WriteLine argument
End If
End Sub
Edit: made parameters optional in WriteLog (or PrintLog) for further testing
' Write to a log file using Separator and Array of variant Parameters
' Auto generate the file
' USE EndLog to close
'use:
' PrintLog vbtab, "one", 2, 3
' PrintLog vbtab, "Apple","Windows","Linux","Android","Commodore","Amiga","Spectrum"
' EndLog
' Generate a csv file:
' PrintLog ";", rst!ID, rst!Name
Private FileLog As Object
Private fso As Object
Const DEBUG_LOG_FILE = "C:\log.txt"
Public Sub PrintLog(ByVal Separator As String, ParamArray Arguments() As Variant)
Dim ele As Variant
Dim line As String
If FileLog Is Nothing Then
Set fso = CreateObject("Scripting.FileSystemObject")
Set FileLog = fso.CreateTextFile(DEBUG_LOG_FILE, True, True)
End If
line = CStr(Now()) ' Print Timestamp
For Each ele In Arguments
If line > "" Then line = line & Separator
line = line & CStr(ele)
Next
If line > "" Then FileLog.WriteLine line
End Sub
Public Sub EndLog()
On Error Resume Next
FileLog.Close
Set FileLog = Nothing
Set fso = Nothing
On Error GoTo 0
End Sub

ExcelVBA - HttpReq via MSXML2.XMLHTTP - fetch page after loading page

i have a problem with fetching data from an internal web based Dataservice (cognos).
Basically i put together a GET request like "blah.com/cognosapi.dll?product=xxx&date=yyy...", send it to the server and receive a webpage that i can store as HTML and parse into my excel form later.
I build a VBA program which worked quite well in the past, but the webservice changed an now they are displaying a "your report is running" page in between that lasts from 1sec to 30sec. So when i call my function i always download this "your report is running" page insteat of the data. How can i catch the page that automatically loads up after the "report is running" page?
This is the DownloadFile Function with the GETstring and the target path as parameters.
Public Function DownloadFile(sSourceUrl As String, _
sLocalFile As String) As Boolean
Dim HttpReq As Object
Set HttpReq = CreateObject("MSXML2.XMLHTTP")
Dim HtmlDoc As New MSHTML.HTMLDocument
HttpReq.Open "GET", sSourceUrl, False
HttpReq.send
If HttpReq.Status = 200 Then
HttpReq.getAllResponseHeaders
HtmlDoc.body.innerHTML = HttpReq.responseText
Debug.Print HtmlDoc.body.innerHTML
End If
'Download the file. BINDF_GETNEWESTVERSION forces
'the API to download from the specified source.
'Passing 0& as dwReserved causes the locally-cached
'copy to be downloaded, if available. If the API
'returns ERROR_SUCCESS (0), DownloadFile returns True.
DownloadFile = URLDownloadToFile(0&, _
sSourceUrl, _
sLocalFile, _
BINDF_GETNEWESTVERSION, _
0&) = ERROR_SUCCESS
End Function
Thanks
David
finally you gave me the final link to solve my problem. I baked the code into my DownloadFile Function to stay with the IE Object until the end and then close it.
One Error i found is was that the readystate should be polled before anything is done with the HTMLObject.
Public Function DownloadFile(sSourceUrl As String, _
sLocalFile As String) As Boolean
Dim IE As InternetExplorer
Set IE = New InternetExplorer
Dim HtmlDoc As New MSHTML.HTMLDocument
Dim collTables As MSHTML.IHTMLElementCollection
Dim collSpans As MSHTML.IHTMLElementCollection
Dim objSpanElem As MSHTML.IHTMLSpanElement
Dim fnum As Integer
With IE
'May changed to "false if you don't want to see browser window"
.Visible = True
.Navigate (sSourceUrl)
'this waits for the page to be loaded
Do Until .readyState = 4: DoEvents: Loop
End With
'Set HtmlDoc = wait_for_html(sSourceUrl, "text/css")
Do
Set HtmlDoc = IE.Document
'searching for the "Span" tag
Set collSpans = HtmlDoc.getElementsByTagName("span")
'first Span element cotains...
Set objSpanElem = collSpans(0)
'... this if loading screen is display
Loop Until Not objSpanElem.innerHTML = "Your report is running."
'just grab the tables and leave the rest
Set collTables = HtmlDoc.getElementsByTagName("table")
fnum = FreeFile()
Open sLocalFile For Output As fnum ' save the file and add html and body tags
Print #fnum, "<html>"
Print #fnum, "<body>"
Print #fnum, collTables(15).outerHTML 'title
Print #fnum, collTables(17).outerHTML 'Date
Print #fnum, collTables(18).outerHTML 'Part, Operation etc.
Print #fnum, collTables(19).outerHTML 'Measuerements
Print #fnum, "</body>"
Print #fnum, "</html>"
Close #fnum
IE.Quit 'close Explorer
DownloadFile = True
End Function
Since you're using a GET request, I'm assuming any required parameters can be provided in the URL string. In that case, you might be able to use InternetExplorer.Application, which should automatically update its Document property whenever the page refreshes. You could then set up a loop which periodically checks for some value (tag text, URL, etc...) that's unique to the desired page.
Here's a sample which loads a URL, then waits until the page's <title> tag is the desired value.
Public Declare Sub Sleep Lib "kernel32" (ByVal dwMilliseconds As Long)
Function wait_for_html(strURL as String, strDesiredText as String) as String
Dim IE As InternetExplorer
Set IE = New InternetExplorer
IE.Navigate (strURL)
While IE.ReadyState <> 4
Sleep 10
Wend
Dim objHtml As MSHTML.HTMLDocument
Dim collTitle As MSHTML.IHTMLElementCollection
Dim objTitleElem As MSHTML.IHTMLTitleElement
Do
Sleep 1000
Set objHtml = IE.Document
Set collTitle = objHtml.getElementsByTagName("title")
Set objTitleElem = collTitle(0)
Loop Until objTitleElem.Text = strDesiredText
wait_for_html = objHtml.body.innerHTML
End Function
The above needs references to Microsoft Internet Controls and Microsoft HTML Object Library.

SQL "%" equivalent in VBA

Is there any SQL equivalent of "%" sign in VBA?
I need to return a few files just with some characters in the middle.
Help really appreciated!
For instance here is my code: I need to download all file that has in the name 2013 from that webpage and save and call them differently. Is this mission possible?
Sub Sample()
Dim strURL As String
Dim strPath As String
Dim i As Integer
strURL = "http://cetatenie.just.ro/wp-content/uploads/Ordin-********.2013.pdf"
strPath = "C:\Documents and Settings\ee28118\Desktop\178.pdf"
Ret = URLDownloadToFile(0, strURL, strPath, 0, 0)
If Ret = 0 Then
MsgBox "File successfully downloaded"
Else
MsgBox "Unable to download the file"
End If
End Sub
You can use the Like Operator.
Characters in pattern Matches in string
? Any single character.
* Zero or more characters.
# Any single digit (0–9).
[charlist] Any single character in charlist.
[!charlist] Any single character not in charlist
Example :
Dim MyCheck
MyCheck = "aBBBa" Like "a*a" ' Returns True.
MyCheck = "F" Like "[A-Z]" ' Returns True.
MyCheck = "F" Like "[!A-Z]" ' Returns False.
MyCheck = "a2a" Like "a#a" ' Returns True.
MyCheck = "aM5b" Like "a[L-P]#[!c-e]" ' Returns True.
MyCheck = "BAT123khg" Like "B?T*" ' Returns True.
MyCheck = "CAT123khg" Like "B?T*" ' Returns False.
When you navigate to the uploads folder, you get a directory listing of all the files in it. You can loop through the hyperlinks on that listing and test each to see if it meets your criterion and, if so, download it. You need a reference to MSXML and MSHTML. Here's an example.
Sub Sample()
Dim sUrl As String
Dim xHttp As MSXML2.XMLHTTP
Dim hDoc As MSHTML.HTMLDocument
Dim hAnchor As MSHTML.HTMLAnchorElement
Dim Ret As Long
Dim sPath As String
Dim i As Long
sPath = "C:\Documents and Settings\ee28118\Desktop\"
sUrl = "http://cetatenie.just.ro/wp-content/uploads/"
'Get the directory listing
Set xHttp = New MSXML2.XMLHTTP
xHttp.Open "GET", sUrl
xHttp.send
'Wait for the page to load
Do Until xHttp.readyState = 4
DoEvents
Loop
'Put the page in an HTML document
Set hDoc = New MSHTML.HTMLDocument
hDoc.body.innerHTML = xHttp.responseText
'Loop through the hyperlinks on the directory listing
For i = 0 To hDoc.getElementsByTagName("a").Length - 1
Set hAnchor = hDoc.getElementsByTagName("a").Item(i)
'test the pathname to see if it matches your pattern
If hAnchor.pathname Like "Ordin-*.2013.pdf" Then
Ret = UrlDownloadToFile(0, sUrl & hAnchor.pathname, sPath, 0, 0)
If Ret = 0 Then
Debug.Print sUrl & hAnchor.pathname & " downloaded to " & sPath
Else
Debug.Print sUrl & hAnchor.pathname & " not downloaded"
End If
End If
Next i
End Sub
Edit
I assumed that URLDownloadToFile was already written. I didn't write one, I just used the below function to test the code that iterates through the files. You can use it to make sure the above code works for you, but you'll need to write the actual code to download the file eventually. With all the arguments to URLDownloadToFile, I'm surprised it doesn't exist already.
Function UrlDownloadToFile(lNum As Long, sUrl As String, sPath As String, lNum1 As Long, lNum2 As Long) As Long
UrlDownloadToFile = 0
End Function
Try below code : The boolean function would return true if the string has the string 2013 in it.
Sub Sample()
Dim result As Boolean
result = has2013("http://cetatenie.just.ro/wp-content/uploads/Ordin-********.2013.pdf")
Debug.Print result
result = has2013("http://cetatenie.just.ro/wp-content/uploads/Ordin-********.2014.pdf")
Debug.Print result
End Sub
Function has2013(lnk As String) As Boolean
has2013 = lnk Like "*2013*"
End Function
in VBA use the LIKE function with wildcard characters:
here is an example (copied from Ozgrid Forums)
Dim sht As Worksheet
For Each sht In ActiveWorkbook.Worksheets
If sht.Name Like "FRI*" Then
'Add code for Friday sheets
Else
If sht.Name Like "MON*" Then
'Add code for Monday sheets
End If
End If
Next
The multiplication character * takes the place of zero or more characters, whereas ? takes the place of exactly 1 character, and # takes the place of 1 number. There are other more specific char. matching strategies if you only want to match certain characters.
so there you go!
Also, you could take a look at Ozgrid Forums: Using Regular Expressions in VBA
To get a list of the files on the server, read up on FTP (using DIR) at Mr Excel - List files using FTP