Search column for urls, save webpages as individual text files - vba

I have code here that works for a url that is hard coded, and it only works for one url and one text file.
Sub saveUrl_Test()
Dim FileName As String
Dim FSO As Object
Dim ieApp As Object
Dim Txt As String
Dim TxtFile As Object
Dim URL As String
URL = "www.bing.com"
FileName = "C:\mallet\bing.com.txt"
Set FSO = CreateObject("Scripting.FileSystemObject")
Set TxtFile = FSO.OpenTextFile(FileName, 2, True, -1)
Set ieApp = CreateObject("InternetExplorer.Application")
ieApp.Visible = True
ieApp.Navigate URL
While ieApp.Busy Or ieApp.ReadyState <> 4
DoEvents
Wend
Txt = ieApp.Document.body.innerText
TxtFile.Write Txt
TxtFile.Close
ieApp.Quit
Set ieApp = Nothing
Set FSO = Nothing
End Sub
What I want it to do is search in column B for urls (possibly using InStr(variable, "http://") as a boolean), and then save each webpage as an individual text file. Would there be a way to name the text files using part of the URL strings? Also, is there a way for the webpage not to open, but still save as a text file? Opening the webpages wastes a lot of time.
I created this additional sub based on #MikeD's suggestion, but I get the wend without while error.
Sub url_Test(URL As String, FileName As String)
Dim FSO As Object
Dim ieApp As Object
Dim Txt As String
Dim TxtFile As Object
Set FSO = CreateObject("Scripting.FileSystemObject")
Set TxtFile = FSO.OpenTextFile(FileName, 2, True, -1)
Set ieApp = CreateObject("InternetExplorer.Application")
ieApp.Visible = True
ieApp.Navigate URL
While ieApp.Busy Or ieApp.ReadyState <> 4
DoEvents
Wend
Txt = ieApp.Document.body.innerText
TxtFile.Write Txt
TxtFile.Close
ieApp.Quit
Set ieApp = Nothing
Set FSO = Nothing
End Sub
Sub LoopOverB()
Dim myRow As Long
myRow = 10
While Cells(myRow, 2).Value <> ""
If InStr(1, Cells(myRow, 2).Value, "http:\\", vbTextCompare) Then Call url_Test(Cells(myRow, 2).Value, "C:\mallet\test\" & Cells(myRow, 1).Value & ".txt")
myRow = myRow + 1
Wend
End Sub

First you could parameterize the sub
Sub saveUrl_param(URL as String, FileName as String)
....
End Sub
and remove the Dim and assignment statements for URL and FileName
Secondly you write another Sub which loops through non-empty cells in column B, retrieving values and conditionally calling the saveUrl_param() routine.
example:
Sub LoopOverB()
Dim C As Range
For Each C In Intersect(ActiveSheet.Columns("B"), ActiveSheet.UsedRange).SpecialCells(xlCellTypeConstants)
' If C = .... Then ' note: URL in [B], filename in [C]
' saveUrl_param(C, C(1,2))
' End If
Next C
End Sub
and no - you can't do it without opening the Web page; you somehow have to get the page from the server (or the proxy). This is done by
ieApp.Navigate URL
and the following While ... Wend construct waits until the page is fully loaded into the browser object.
To speed up things you could skip
ieApp.Visible = True
once you have confidence that your Sub is working correctly, and you could move
Dim ieApp As Object ' I would prefer As SHDocVw.InternetExplorer .... don't like late binding
Set ieApp = CreateObject("InternetExplorer.Application")
to the calling sub and hand over the ieApp object to the subroutine as a parameter in order to not open/close the browser again & again.

Related

VBA Excel: for each results into cells? counter not working?

I´m creating a macro that crawls into subfolders and retrieve the name of some files. I used code from this answer to another question and works fine to get the results into the immediate window, but I want to get them into cells, as a list. What I get is just the result of the first iteration.
What I´m trying to do might be obvious, but I swear I tried and couldn´t find the answer by myself. For the record, I´m just starting to code.
My code here. The important part comes at the end, in Sub ListFiles(fld As Object, Mask As String).
Option Explicit
Sub Retrieve_Info()
Dim strPath As Variant
Dim pasta_destino As Range
Dim fle As String
Dim fso As Object 'FileSystemObject
Dim fldStart As Object 'Folder
Dim fld As Object 'Folder
Dim fl As Object 'File
Dim Mask As String
Set pasta_destino = ThisWorkbook.Worksheets("VINCULATOR").Range("pasta_destino")
strPath = Application.GetOpenFilename _
(Title:="Selecione o arquivo.xlsx", _
FileFilter:="Excel Files *.xlsx* (*.xlsx*),")
If Not strPath = False Then
pasta_destino = strPath
fle = Dir(strPath)
Set fso = CreateObject("scripting.FileSystemObject") ' late binding
'Set fso = New FileSystemObject 'or use early binding (also replace Object types)
Set fldStart = fso.GetFolder(Replace(strPath, fle, ""))
Mask = "*.xlsx"
For Each fld In fldStart.SubFolders
ListFiles fld, Mask
Next
End If
End Sub
Sub ListFiles(fld As Object, Mask As String)
Dim fl As Object 'File
Dim vrow As Integer
Dim vinculadas As Range
Dim n_vinc As Range
Set vinculadas = ThisWorkbook.Worksheets("VINCULATOR").Range("vinculadas")
Set n_vinc = ThisWorkbook.Worksheets("VINCULATOR").Range("n_vinc")
vrow = 0
For Each fl In fld.Files
If fl.Name Like Mask And InStr(fl.Name, "completo") = 0 Then
vrow = vrow + 1
vinculadas.Cells(vrow, 1) = fld.Path & "\" & fl.Name
End If
Next
n_vinc = vrow
End Sub
Please, help!
I have taken a slightly different approach which might be easier for you to follow in addition to executing faster. Please try this.
Sub SpecifyFolder()
' 10 Dec 2017
Dim Fd As FileDialog
Dim PathName As String
Dim Fso As Object
Dim Fold As Object, SubFold As Object
Dim i As Long
Set Fd = Application.FileDialog(msoFileDialogFolderPicker)
With Fd
.ButtonName = "Select"
.InitialView = msoFileDialogViewList
.InitialFileName = "C:\My Documents\" ' set as required
.Show
If .SelectedItems.Count Then
PathName = .SelectedItems(1)
Else
Exit Sub ' user cancelled
End If
End With
Set Fd = Nothing
Set Fso = CreateObject("Scripting.FileSystemObject")
Set Fold = Fso.GetFolder(PathName)
ListFiles Fold, "*.xlsx"
For Each SubFold In Fold.SubFolders
ListFiles SubFold, "*.xlsx"
Next SubFold
Set Fso = Nothing
End Sub
Sub ListFiles(Fold As Object, _
Mask As String)
' 10 Dec 2017
Dim Fun() As String ' file list
Dim Rng As Range
Dim Fn As String ' file name
Dim i As Long ' array index
ReDim Fun(1 To 1000) ' maximum number of expected files in one folder
Fn = Dir(Fold.Path & "\")
Do While Len(Fn)
If Fn Like Mask And InStr(Fn, "completo") = 0 Then
i = i + 1
Fun(i) = Fold.Path & "\" & Fn
End If
Fn = Dir
Loop
If i Then
ReDim Preserve Fun(1 To i)
With ThisWorkbook.Worksheets("VINCULATOR")
' specify the column in which to write (here "C")
i = .Cells(.Rows.Count, "C").End(xlUp).Row
Set Rng = .Cells(i + 1, "C").Resize(UBound(Fun), 1)
Application.ScreenUpdating = False
Rng.Value = Application.Transpose(Fun)
Application.ScreenUpdating = True
End With
End If
End Sub
As you see, I have dispensed with specifying a target range, just the sheet and the column (I chose column C; please change as required in the ListFiles sub). Note that the code appends new lists to the existing content of the indicated column.
There are two things the code doesn't do to my entire satisfaction. One, it doesn't write to the first row of an empty column C. Instead, it leaves the first row blank. You might actually like that. Two, It doesn't do sub-subfolders. File names are extracted only from the selected folder and its immediate subfolders. Additional programming would be required for either additional feature, if required.
Finally, I admit that I didn't test for correct transfer of the lists to the worksheet. I think it works correctly but you should check that the first and last names are listed in your worksheet column. They are extracted from the folder but perhaps their omission when writing to the sheet would be a typical error to occur in this particular method.

Excel VBA to search a website's search box and click a link/button

I'm using Excel VBA to log into a website, then type a .txt file name in a search box and click a button/link to search for it in the resulting second page. The resulting third page has an icon you have to click to download the file. Lastly I want to input the info in the .txt file to a new excel spreadsheet.
My getLogin function works and I successfully log into the website. Stepping through with F8 I can see each line of code entering in my username and password values in the respective fields. I'm on the second page level where I want to search in a searchbox using a filename, but at
If Not FileN Is Nothing And FileN.Length > 0 Then
FileN(0).Value = fileName
in the SearchFile function, when I step through it using F8, I see it keeps skipping FileN(0).Value = fileName and doesn't input the file name. The same happens for the link I click to search,
Set ElementCol = ie.document.getElementsByTagName("a")
For Each l In ElementCol
If l.href = "javascript:myFunction('/mailbox/jsp/MBIList.jsp')" Then
l.Click
It doesn't enter the If Statement and therefore doesn't click it.
My code so far:
Sub getComponents()
Dim WebAddressIn As String
Dim ie As Object
WebAddressIn = "https://..."
'get ie instance to work with that is logged in
Set ie = getLogin(WebAddressIn, "usern", "pw")
Do Until ie.readyState = 4
Loop
ie.Visible = True
Dim fileName As String
fileName = Format(Now(), "yyyyMMdd") & ".TXT"
Set ie = searchFile(fileName, ie)
End Sub
Function searchFile(fileName As String, ie As Object)
Dim Doc As Object, lastRow As Long, tblTR As Object
Dim UserString As String
Dim FileN As Object ' MSHTML.IHTMLElement
Dim ElementCol As Object ' MSHTML.IHTMLElementCollection
Dim AllText As String
Do While ie.Busy
Loop
Set FileN = ie.document.getElementsByName("MsgNamePattern")
If Not FileN Is Nothing And FileN.Length > 0 Then
FileN(0).Value = fileName
End If
Do While ie.Busy
Loop
Set ElementCol = ie.document.getElementsByTagName("a")
For Each l In ElementCol
If l.href = "javascript:myFunction('/mailbox/jsp/MBIList.jsp')" Then
l.Click
Exit For
End If
Next l
Do While ie.Busy
Loop
Set searchFile = ie
Set ie = Nothing
End Function
Function getLogin(WebAddressIn As String, UserNameIn As String, PasswordIn As String)
Dim Doc As Object, lastRow As Long, tblTR As Object
Set ie = CreateObject("internetexplorer.application")
ie.Visible = False
Dim UserString As String
Dim PWString As String
ie.Navigate2 WebAddressIn
ie.Visible = True
Dim UserN As Object ' MSHTML.IHTMLElement
Dim PW As Object ' MSHTML.IHTMLElement
Dim ElementCol As Object ' MSHTML.IHTMLElementCollection
Dim AllText As String
Do While ie.Busy
Loop
' enter username and password in textboxes
Set UserN = ie.document.getElementsByName("userid")
If Not UserN Is Nothing And UserN.Length > 0 Then
' fill in first element named "username", assumed to be the login name field
UserN(0).Value = UserNameIn
End If
Set PW = ie.document.getElementsByName("password")
' password
If Not PW Is Nothing And PW.Length > 0 Then
' fill in first element named "password", assumed to be the password field
PW(0).Value = PasswordIn
End If
Do While ie.Busy
Loop
'Clicks the Sign in button
Set ElementCol = ie.document.getElementsByName("submit")
For Each btnInput In ElementCol
If btnInput.Value = "*Sign In" Then
btnInput.Click
Exit For
End If
Next btnInput
Do While ie.Busy
Loop
Set getLogin = ie
Set ie = Nothing
End Function
And here's the relevant HTML code on this second resulting page:
The Search Box -
<input type="text" name="MsgNamePattern" size="20" onblur="validateMessageName(this)">
The link to search -
<td align="center" valign="center"> <img border="0" src="/mailbox/images/go_off.gif" vspace="7" name="Go" align="top">
</td>

Internet Explorer VBA Automation Error: The object Invoked has disconnected from its clients

I'm trying to write code that will read a value from Excel, look it up in an internal web based system and store the results back in the Excel. It reads the Excel with no problem, opens Internet Explorer with no problem, but when I then try to reference what's been opened, I get the above error. The line "ie.Navigate url" works, but the next line "Set DOC = ie.Document" generates the error. Any ideas on what's causing this? Here's my code:
Public Sub getClient()
Dim xOpen As Boolean
xOpen = False
Dim row As Long
Dim xL As Excel.Application
Set xL = New Excel.Application
xL.Visible = False
Dim wb As Excel.Workbook
Dim sh As Excel.Worksheet
'Change the name as needed, out put in some facility to input it or
'process multiples...
Dim filename As String
filename = "auditLookup.xlsx"
Set wb = xL.Workbooks.Open(getPath("Audit") + filename)
xOpen = True
Set sh = wb.Sheets(1)
Dim ie As Variant
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = True
Dim DOC As HTMLDocument
Dim idx As Integer
Dim data As String
Dim links As Variant
Dim lnk As Variant
Dim iRow As Long
iRow = 2 'Assume headers
Dim clientName As String
Dim clientID As String
Dim nameFound As Boolean
Dim idFound As Boolean
Dim url As String
While sh.Cells(iRow, 1) <> ""
'Just in case these IDs are ever prefixed with zeroes, I'm inserting
'some random character in front, but removing it of course when
'processing.
url = "https://.../" + mid(sh.Cells(iRow, 1), 2)
ie.navigate url
Set DOC = ie.Document
'Search td until we find "Name:" then the next td will be the name.
'Then search for "P1 ID (ACES):" and the next td with be that.
Set links = DOC.getElementsByTagName("td")
clientName = ""
clientID = ""
nameFound = False
idFound = False
For Each lnk In links
data = lnk.innerText
If nameFound Then
clientName = data
ElseIf idFound Then
clientID = data
End If
If nameFound And idFound Then
Exit For
End If
If data = "Name:" Then
nameFound = True
ElseIf data = "P1 ID (ACES):" Then
idFound = True
End If
Next
sh.Cells(iRow, 2) = clientName
sh.Cells(iRow, 2) = clientID
iRow = iRow + 1
Wend
Set ie = Nothing
If xOpen Then
wb.Save
Set wb = Nothing
xL.Quit
Set xL = Nothing
Set sh = Nothing
xOpen = False
End If
Exit Sub
Changing to:
Dim ie As InternetExplorer
Set ie = New InternetExplorerMedium
...
Solved the problem. Plus I did need to add back the Do loop mentioned in the comments:
Do
DoEvents
Loop Until ie.ReadyState = READYSTATE_COMPLETE

Print/Import all web page source data using vba

I have code below which imports only part of source code into sheet. I want all source code as it is.`Sub GetSourceCode()
Dim ie As Object
Dim str As String
Dim arr
str = Sheets("sheet2").Range("I1").Value
Set ie = CreateObject("INTERNETEXPLORER.APPLICATION")
ie.Navigate "https://tiweb.industrysoftware.automation.com/prdata/cgi-bin/n_prdata_index.cgi?"
ie.Visible = False
Do Until ie.ReadyState = 4
DoEvents
Loop
ie.Document.getelementsbyname("pr_numbers")(0).Value = str
Application.SendKeys ("~")
Do Until ie.ReadyState = 4
DoEvents
Loop
Worksheets("Download_PRdata2").Activate
arr = Split(ie.Document.body.outertext)
Worksheets("Download_PRdata2").Activate
ActiveSheet.Range("A1").Resize(UBound(arr) + 1, 1).Value = Application.Transpose(arr)
End Sub`
Hi you can refer the below code
' Fetch Entire Source Code
Private Sub HTML_VBA_Excel()
Dim oXMLHTTP As Object
Dim sPageHTML As String
Dim sURL As String
'Change the URL before executing the code
sURL = "http://www.google.com"
'Extract data from website to Excel using VBA
Set oXMLHTTP = CreateObject("MSXML2.ServerXMLHTTP")
oXMLHTTP.Open "GET", sURL, False
oXMLHTTP.send
sPageHTML = oXMLHTTP.responseText
'Get webpage data into Excel
' If longer sourcecode mean, you need to save to a external text file or somewhere,
' since excel cell have some limits on storing max characters
ThisWorkbook.Sheets(1).Cells(1, 1) = sPageHTML
MsgBox "XMLHTML Fetch Completed"
End Sub
Source : http://www.vbausefulcodes.in/usefulcodes/get-data-or-source-code-from-webpage-using-excel-vba.php
Hope this will be useful to you!
you can save source code in a text file like this. add the below function instead of this line ThisWorkbook.Sheets(1).Cells(1, 1) = sPageHTML
Createtextfile (sPageHTML)
and add this below function after End Sub.
Sub Createtextfile(sPageHTML)
Dim fso As Object
Set fso = CreateObject("Scripting.FileSystemObject")
Dim oFile As Object
strPath = "E:\test.txt"
Set oFile = fso.Createtextfile(strPath)
oFile.WriteLine sPageHTML
oFile.Close
Set fso = Nothing
Set oFile = Nothing
End Sub
Change the location where you want to save.

Copy and paste the web page data into a notepad

I need to copy open a XML in IE and select the content(Ctrl+A) in the webpage and copy them (Ctrl+c) and paste them in a notepad. Below is the code but it is not working.
Dim ie As Object
Dim ieDoc As Object
Dim Data As String
Set ie = CreateObject("InternetExplorer.Application")
ie.navigate "C:\Data\test_10.xml" ie.Visible = True
Do Until (ie.readyState = 4 And Not ie.Busy)
DoEvents
Loop
SendKeys "^a", True
Application.Wait (5)
SendKeys "^c"
Dim FileNo As Integer
FileNo = FreeFile
Open "C:\Data\Sample.txt" For Output As FileNo
SendKeys "^v", True
Close FileNo
The Open statement doesn't open a Notepad application, it just creates a file handle for Input / Output to a file from within VBA. You need to create a notepad application object similar to the way you create the IE application object.
Also consider avoiding SendKeys all together. Instead
read out the data from the IE object into a string variable using the InnerHTML property
write out the string into a flat file using Open / Write
optionally re-open the text file in the notepad application
Try this:
Sub pExtractXMLData()
Dim strURLtoNavigate As String
Dim strHTML As String
strURLtoNavigate = "C:\Data\test_10.xml"
strHTML = UsingXmlParser(strURLtoNavigate)
Call WriteVarToDisk(strHTML, "C:\Data\Sample.txt")
End Sub
Public Function UsingXmlParser(strUrl As String)
Dim objxmlhttp As Object
Set objxmlhttp = CreateObject("MSXML2.XMLHTTP")
objxmlhttp.Open "GET", strUrl, False
objxmlhttp.send
'objxmlhttp.WaitForResponse
UsingXmlParser = objxmlhttp.ResponseText
Set objxmlhttp = Nothing
End Function
Public Sub WriteVarToDisk(vartowrite, FiletoWrite)
On Error Resume Next
Dim fso, MyFile
Set fso = CreateObject("Scripting.FileSystemObject")
Set MyFile = fso.CreateTextFile(FiletoWrite, True)
MyFile.WriteLine (vartowrite)
MyFile.Close
End Sub
Try this .. you can open the notepad in excel. do all the works and save back as notepad..
Below codes will help you.
Sub ImportXMLtoList()
Dim strTargetFile As String
Dim wb as Workbook
dim dwb as workbook
Application.Screenupdating = False
Application.DisplayAlerts = False
strTargetFile = "C:\Data\test_10.xml"
Set wb = Workbooks.OpenXML(Filename:=strTargetFile,LoadOption:=xlXmlLoadImportToList)
Application.DisplayAlerts = True
wb.Sheets(1).UsedRange.Copy
set dwb = workbooks.open("C:\Data\Sample.txt")
dwb.activesheet.range("A1").PasteSpecial xlPasteValues
dwb.close true
wb.Close False
Application.Screenupdating = True
End Sub