Print/Import all web page source data using vba - vba

I have code below which imports only part of source code into sheet. I want all source code as it is.`Sub GetSourceCode()
Dim ie As Object
Dim str As String
Dim arr
str = Sheets("sheet2").Range("I1").Value
Set ie = CreateObject("INTERNETEXPLORER.APPLICATION")
ie.Navigate "https://tiweb.industrysoftware.automation.com/prdata/cgi-bin/n_prdata_index.cgi?"
ie.Visible = False
Do Until ie.ReadyState = 4
DoEvents
Loop
ie.Document.getelementsbyname("pr_numbers")(0).Value = str
Application.SendKeys ("~")
Do Until ie.ReadyState = 4
DoEvents
Loop
Worksheets("Download_PRdata2").Activate
arr = Split(ie.Document.body.outertext)
Worksheets("Download_PRdata2").Activate
ActiveSheet.Range("A1").Resize(UBound(arr) + 1, 1).Value = Application.Transpose(arr)
End Sub`

Hi you can refer the below code
' Fetch Entire Source Code
Private Sub HTML_VBA_Excel()
Dim oXMLHTTP As Object
Dim sPageHTML As String
Dim sURL As String
'Change the URL before executing the code
sURL = "http://www.google.com"
'Extract data from website to Excel using VBA
Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
oXMLHTTP.Open "GET", sURL, False
oXMLHTTP.send
sPageHTML = oXMLHTTP.responseText
'Get webpage data into Excel
' If longer sourcecode mean, you need to save to a external text file or somewhere,
' since excel cell have some limits on storing max characters
ThisWorkbook.Sheets(1).Cells(1, 1) = sPageHTML
MsgBox "XMLHTML Fetch Completed"
End Sub
Source : http://www.vbausefulcodes.in/usefulcodes/get-data-or-source-code-from-webpage-using-excel-vba.php
Hope this will be useful to you!

you can save source code in a text file like this. add the below function instead of this line ThisWorkbook.Sheets(1).Cells(1, 1) = sPageHTML
Createtextfile (sPageHTML)
and add this below function after End Sub.
Sub Createtextfile(sPageHTML)
Dim fso As Object
Set fso = CreateObject("Scripting.FileSystemObject")
Dim oFile As Object
strPath = "E:\test.txt"
Set oFile = fso.Createtextfile(strPath)
oFile.WriteLine sPageHTML
oFile.Close
Set fso = Nothing
Set oFile = Nothing
End Sub
Change the location where you want to save.

Related

Internet Explorer VBA Automation Error: The object Invoked has disconnected from its clients

I'm trying to write code that will read a value from Excel, look it up in an internal web based system and store the results back in the Excel. It reads the Excel with no problem, opens Internet Explorer with no problem, but when I then try to reference what's been opened, I get the above error. The line "ie.Navigate url" works, but the next line "Set DOC = ie.Document" generates the error. Any ideas on what's causing this? Here's my code:
Public Sub getClient()
Dim xOpen As Boolean
xOpen = False
Dim row As Long
Dim xL As Excel.Application
Set xL = New Excel.Application
xL.Visible = False
Dim wb As Excel.Workbook
Dim sh As Excel.Worksheet
'Change the name as needed, out put in some facility to input it or
'process multiples...
Dim filename As String
filename = "auditLookup.xlsx"
Set wb = xL.Workbooks.Open(getPath("Audit") + filename)
xOpen = True
Set sh = wb.Sheets(1)
Dim ie As Variant
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
Dim DOC As HTMLDocument
Dim idx As Integer
Dim data As String
Dim links As Variant
Dim lnk As Variant
Dim iRow As Long
iRow = 2 'Assume headers
Dim clientName As String
Dim clientID As String
Dim nameFound As Boolean
Dim idFound As Boolean
Dim url As String
While sh.Cells(iRow, 1) <> ""
'Just in case these IDs are ever prefixed with zeroes, I'm inserting
'some random character in front, but removing it of course when
'processing.
url = "https://.../" + mid(sh.Cells(iRow, 1), 2)
ie.navigate url
Set DOC = ie.Document
'Search td until we find "Name:" then the next td will be the name.
'Then search for "P1 ID (ACES):" and the next td with be that.
Set links = DOC.getElementsByTagName("td")
clientName = ""
clientID = ""
nameFound = False
idFound = False
For Each lnk In links
data = lnk.innerText
If nameFound Then
clientName = data
ElseIf idFound Then
clientID = data
End If
If nameFound And idFound Then
Exit For
End If
If data = "Name:" Then
nameFound = True
ElseIf data = "P1 ID (ACES):" Then
idFound = True
End If
Next
sh.Cells(iRow, 2) = clientName
sh.Cells(iRow, 2) = clientID
iRow = iRow + 1
Wend
Set ie = Nothing
If xOpen Then
wb.Save
Set wb = Nothing
xL.Quit
Set xL = Nothing
Set sh = Nothing
xOpen = False
End If
Exit Sub
Changing to:
Dim ie As InternetExplorer
Set ie = New InternetExplorerMedium
...
Solved the problem. Plus I did need to add back the Do loop mentioned in the comments:
Do
DoEvents
Loop Until ie.ReadyState = READYSTATE_COMPLETE

VBA - How to download .xls from website and put data into excel file

I managed to use VBA to get to the point where I'm ready to download an excel file from the web but I'm having trouble figuring out how to actually download that file and put its contents into an excel file I'm working in. Any suggestions? Thanks
Here is the code so far:
Sub GetData()
Dim IE As InternetExplorer
Dim HTMLDoc As HTMLDocument
Dim objElement As HTMLObjectElement
Set IE = New InternetExplorer
With IE
.Visible = True
.Navigate "http://www.housepriceindex.ca/default.aspx"
While .Busy Or .ReadyState <> READYSTATE_COMPLETE: Wend
.Document.getElementById("lnkTelecharger2").Click
While .Busy Or .ReadyState <> READYSTATE_COMPLETE: Wend
Set HTMLDoc = .Document
Set objElement = HTMLDoc.getElementById("txtEmailDisclaimerEN")
objElement.Value = "abc#abc.com"
Set objElement = HTMLDoc.getElementById("lnkAcceptDisclaimerEN")
objElement.Click
' ... Get CSV somehow ...
'.Quit
End With
Set IE = Nothing
End Sub
Try the below code:
Option Explicit
Sub ImportHistoricalDataSheet()
Const SXH_SERVER_CERT_IGNORE_ALL_SERVER_ERRORS = 13056
Const adSaveCreateOverWrite = 2
Dim aBody, sPath
' Download Historical Data xls file via XHR
With CreateObject("MSXML2.XMLHTTP")
'With CreateObject("MSXML2.ServerXMLHTTP")
'.SetOption 2, SXH_SERVER_CERT_IGNORE_ALL_SERVER_ERRORS
.Open "GET", "http://www.housepriceindex.ca/Excel2.aspx?langue=EN&mail=abc%40abc.com"
.Send
' Get binary response content
aBody = .responseBody
' Retrieve filename from headers and concatenate full path
sPath = ThisWorkbook.Path & "\" & Replace(Split(Split(.GetAllResponseHeaders, "filename=", 2)(1), vbCrLf, 2)(0), "/", "-")
End With
' Save binary content to the xls file
With CreateObject("ADODB.Stream")
.Type = 1
.Open
.Write aBody
.SaveToFile sPath, adSaveCreateOverWrite
.Close
End With
' Open saved workbook
With Workbooks.Open(sPath, , True)
' Get 1st worksheet values to array
aBody = .Worksheets(1).UsedRange.Value
.Saved = True
.Close
End With
' Delete saved workbook file
CreateObject("Scripting.FileSystemObject").DeleteFile sPath, True
' Insert array to target worksheet
ThisWorkbook.Sheets("Sheet1").Cells(1, 1).Resize(UBound(aBody, 1), UBound(aBody, 2)).Value = aBody
End Sub

VBA HTML Scraping - '.innertext' from complex table

All,
I've created the following Module to scrape a single value (1m % change in London house prices) from the below address:
https://www.hometrack.com/uk/insight/uk-cities-house-price-index/
The specific value is nested within the following code:
The below VBA code is my attempt at scraping. I, perhaps wrongly, feel that I am very close to capturing the value - but the code will not work.
Does anyone know where I am going wrong here? It doesn't show an error message but also doesn't output any values.
Sub HousePriceData()
Dim wb As Workbook
Dim ws As Worksheet
Dim TxtRng As Range
Dim ie As Object
Dim V As Variant
Dim myValue As Variant
Set ie = CreateObject("INTERNETEXPLORER.APPLICATION")
ie.NAVIGATE "https://www.hometrack.com/uk/insight/uk-cities-house-price-index/"
ie.Visible = False
While ie.ReadyState <> 4
DoEvents
Wend
Set wb = ActiveWorkbook
Set ws = wb.Sheets("Input")
Set TxtRng = ws.Range("C15")
Set myValue = ie.document.getElementById("cities-index-table").getElementsByTagName("tr")(7).g‌​etElementsByTagName("td")(5)
TxtRng = myValue.innerText
End Sub
Try to use XHR and primitive parsing instead of awkward IE:
Sub Test()
Dim strUrl As String
Dim strTmp As String
Dim arrTmp As Variant
strUrl = "https://www.hometrack.com/uk/insight/uk-cities-house-price-index/"
With CreateObject("MSXML2.XMLHttp")
.Open "GET", strUrl, False
.Send ""
strTmp = .ResponseText
End With
arrTmp = Split(strTmp, ">London</a></td>", 2)
strTmp = arrTmp(1)
arrTmp = Split(strTmp, "<td>", 7)
strTmp = arrTmp(6)
arrTmp = Split(strTmp, "</td>", 2)
strTmp = arrTmp(0)
ThisWorkbook.Sheets("Input").Range("C15").Value = strTmp
End Sub
try use this
Dim Engmt As String
Engmt = "ERRORHERE"
On Error Resume Next
Engmt = Trim(ie.document.getElementById("cities-index- table").getElementsByTagName("tr")(12).g‌​etElementsByTagName("td")(4).innerText)
On Error GoTo 0
If Engmt = "ERRORHERE" Then
TxtRng.Value = "ERROR"
Else
TxtRng.Value = Engmt
End If

Copy and paste the web page data into a notepad

I need to copy open a XML in IE and select the content(Ctrl+A) in the webpage and copy them (Ctrl+c) and paste them in a notepad. Below is the code but it is not working.
Dim ie As Object
Dim ieDoc As Object
Dim Data As String
Set ie = CreateObject("InternetExplorer.Application")
ie.navigate "C:\Data\test_10.xml" ie.Visible = True
Do Until (ie.readyState = 4 And Not ie.Busy)
DoEvents
Loop
SendKeys "^a", True
Application.Wait (5)
SendKeys "^c"
Dim FileNo As Integer
FileNo = FreeFile
Open "C:\Data\Sample.txt" For Output As FileNo
SendKeys "^v", True
Close FileNo
The Open statement doesn't open a Notepad application, it just creates a file handle for Input / Output to a file from within VBA. You need to create a notepad application object similar to the way you create the IE application object.
Also consider avoiding SendKeys all together. Instead
read out the data from the IE object into a string variable using the InnerHTML property
write out the string into a flat file using Open / Write
optionally re-open the text file in the notepad application
Try this:
Sub pExtractXMLData()
Dim strURLtoNavigate As String
Dim strHTML As String
strURLtoNavigate = "C:\Data\test_10.xml"
strHTML = UsingXmlParser(strURLtoNavigate)
Call WriteVarToDisk(strHTML, "C:\Data\Sample.txt")
End Sub
Public Function UsingXmlParser(strUrl As String)
Dim objxmlhttp As Object
Set objxmlhttp = CreateObject("MSXML2.XMLHTTP")
objxmlhttp.Open "GET", strUrl, False
objxmlhttp.send
'objxmlhttp.WaitForResponse
UsingXmlParser = objxmlhttp.ResponseText
Set objxmlhttp = Nothing
End Function
Public Sub WriteVarToDisk(vartowrite, FiletoWrite)
On Error Resume Next
Dim fso, MyFile
Set fso = CreateObject("Scripting.FileSystemObject")
Set MyFile = fso.CreateTextFile(FiletoWrite, True)
MyFile.WriteLine (vartowrite)
MyFile.Close
End Sub
Try this .. you can open the notepad in excel. do all the works and save back as notepad..
Below codes will help you.
Sub ImportXMLtoList()
Dim strTargetFile As String
Dim wb as Workbook
dim dwb as workbook
Application.Screenupdating = False
Application.DisplayAlerts = False
strTargetFile = "C:\Data\test_10.xml"
Set wb = Workbooks.OpenXML(Filename:=strTargetFile,LoadOption:=xlXmlLoadImportToList)
Application.DisplayAlerts = True
wb.Sheets(1).UsedRange.Copy
set dwb = workbooks.open("C:\Data\Sample.txt")
dwb.activesheet.range("A1").PasteSpecial xlPasteValues
dwb.close true
wb.Close False
Application.Screenupdating = True
End Sub

Search column for urls, save webpages as individual text files

I have code here that works for a url that is hard coded, and it only works for one url and one text file.
Sub saveUrl_Test()
Dim FileName As String
Dim FSO As Object
Dim ieApp As Object
Dim Txt As String
Dim TxtFile As Object
Dim URL As String
URL = "www.bing.com"
FileName = "C:\mallet\bing.com.txt"
Set FSO = CreateObject("Scripting.FileSystemObject")
Set TxtFile = FSO.OpenTextFile(FileName, 2, True, -1)
Set ieApp = CreateObject("InternetExplorer.Application")
ieApp.Visible = True
ieApp.Navigate URL
While ieApp.Busy Or ieApp.ReadyState <> 4
DoEvents
Wend
Txt = ieApp.Document.body.innerText
TxtFile.Write Txt
TxtFile.Close
ieApp.Quit
Set ieApp = Nothing
Set FSO = Nothing
End Sub
What I want it to do is search in column B for urls (possibly using InStr(variable, "http://") as a boolean), and then save each webpage as an individual text file. Would there be a way to name the text files using part of the URL strings? Also, is there a way for the webpage not to open, but still save as a text file? Opening the webpages wastes a lot of time.
I created this additional sub based on #MikeD's suggestion, but I get the wend without while error.
Sub url_Test(URL As String, FileName As String)
Dim FSO As Object
Dim ieApp As Object
Dim Txt As String
Dim TxtFile As Object
Set FSO = CreateObject("Scripting.FileSystemObject")
Set TxtFile = FSO.OpenTextFile(FileName, 2, True, -1)
Set ieApp = CreateObject("InternetExplorer.Application")
ieApp.Visible = True
ieApp.Navigate URL
While ieApp.Busy Or ieApp.ReadyState <> 4
DoEvents
Wend
Txt = ieApp.Document.body.innerText
TxtFile.Write Txt
TxtFile.Close
ieApp.Quit
Set ieApp = Nothing
Set FSO = Nothing
End Sub
Sub LoopOverB()
Dim myRow As Long
myRow = 10
While Cells(myRow, 2).Value <> ""
If InStr(1, Cells(myRow, 2).Value, "http:\\", vbTextCompare) Then Call url_Test(Cells(myRow, 2).Value, "C:\mallet\test\" & Cells(myRow, 1).Value & ".txt")
myRow = myRow + 1
Wend
End Sub
First you could parameterize the sub
Sub saveUrl_param(URL as String, FileName as String)
....
End Sub
and remove the Dim and assignment statements for URL and FileName
Secondly you write another Sub which loops through non-empty cells in column B, retrieving values and conditionally calling the saveUrl_param() routine.
example:
Sub LoopOverB()
Dim C As Range
For Each C In Intersect(ActiveSheet.Columns("B"), ActiveSheet.UsedRange).SpecialCells(xlCellTypeConstants)
' If C = .... Then ' note: URL in [B], filename in [C]
' saveUrl_param(C, C(1,2))
' End If
Next C
End Sub
and no - you can't do it without opening the Web page; you somehow have to get the page from the server (or the proxy). This is done by
ieApp.Navigate URL
and the following While ... Wend construct waits until the page is fully loaded into the browser object.
To speed up things you could skip
ieApp.Visible = True
once you have confidence that your Sub is working correctly, and you could move
Dim ieApp As Object ' I would prefer As SHDocVw.InternetExplorer .... don't like late binding
Set ieApp = CreateObject("InternetExplorer.Application")
to the calling sub and hand over the ieApp object to the subroutine as a parameter in order to not open/close the browser again & again.