VBA to find text from webpages - vba

I have created Macro which gives me all URLs present on any webpages.
We just need to provide the URL and it gives us the all links present in that webpage and paste it in one column
Private Sub CommandButton4_Click()
'We refer to an active copy of Internet Explorer
Dim ie As InternetExplorer
'code to refer to the HTML document returned
Dim html As HTMLDocument
Dim ElementCol As Object
Dim Link As Object
Dim erow As Long
Application.ScreenUpdating = False
'open Internet Explorer and go to website
Set ie = New InternetExplorer
ie.Visible = True
ie.navigate Cells(1, 1)
'Wait until IE is done loading page
Application.StatusBar = "Trying to go to website…"
Set html = ie.document
'Display text of HTML document returned in a cell
'Range("A1") = html.DocumentElement.innerHTML
Set ElementCol = html.getElementsByTagName("a")
For Each Link In ElementCol
erow = Worksheets("Sheet4").Cells(Rows.Count, 1).End(xlUp).Offset(1, 0).Row
Cells(erow, 1).Value = Link
Cells(erow, 1).Columns.AutoFit
'close down IE, reset status bar & turn on screenupdating
'Set ie = Nothing
Application.StatusBar = ""
Application.ScreenUpdating = True
ActiveSheet.Range("$A$1:$A$2752").removeDuplicates Columns:=1, Header:=xlNo
End Sub
Now can anyone will help me to create macro to find particular text from all these URLs present in column and if that text is present then in next column it should print text "text found".
Example if we search text "New" then it should print text "Text found" in next column of the URL.
Thank you.

The key would be the function Instr, if it finds the string "New", it returns the position where it begins, otherwise it returns 0.
do until trim(Cells(i,1).Value) = vbNullString
if instr(Cells(i,1).Value,"New") then
Cells(i,2).value="Text found"
end if

Similar to above.
Dim a As Variant
a = 2
While Cells(a, 1) <> "" And Cells(a + 1, 1) <> ""
If InStr(Cells(a, 1), "new") = 0 Then
Cells(a, 2) = "Text Found"
End If
a = a + 1


IE Automation - VBA Error - Runtime Error 70: Permission Denied

Friends, I'm very new to VBA and here I'm trying to scrape data from an internal website. The process goes like this: There are few serial numbers entered in the column A3 to End. The macro is supposed to navigate to the URL --> pick the serial number from excel --> Enter in the Search Field and Click Search. Once the result is populated on the page, it scrapes specific values and populates in the excel sheet.
The page opens up well, data is picked from excel and when the macro is reading the table cells it gives Runtime 70 Error. Below is my code for reference. Any help to fix is much appreciated.
Sub Type1_Data()
Dim ie As InternetExplorer
Dim html As MSHTML.HTMLDocument
Dim RowNumber, ColumnNumber As Long
RowNumber = 3
ColumnNumber = 0
Dim i As Long
Dim Filt As MSHTML.IHTMLElement
Dim mtbl As MSHTML.IHTMLElement
Dim strempid As MSHTML.HTMLElementCollection
Dim strempid1 As MSHTML.HTMLElementCollection
Dim strempid2 As MSHTML.HTMLElementCollection
Dim strempid3 As MSHTML.HTMLElementCollection
Dim strempid4 As MSHTML.HTMLElementCollection
Dim strempid5 As MSHTML.HTMLElementCollection
Dim strempid6 As MSHTML.HTMLElementCollection
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate ("URL")
Do While ie.READYSTATE = 4: DoEvents: Loop
Do Until ie.READYSTATE = 4: DoEvents: Loop
Set HTMLDoc = ie.document
If HTMLDoc.Title <> "Marketplace | Find a professional" Then
ie.Visible = True
GoTo xy
End If
ie.Visible = True
Dim Ed As Integer
Ed = 3
While ThisWorkbook.Sheets("ProM Search").Cells(Ed, 1).Value <> 0
Ed = Ed + 1
Ed = Ed - 1
For i = 3 To Ed
Application.ScreenUpdating = True
Set UID = HTMLDoc.getElementById("navSelect")
Set Filt = HTMLDoc.getElementById("searchText")
Set mtbl = HTMLDoc.getElementsByTagName("Table")(23)
Application.Wait DateAdd("s", 1, Now)
HTMLDoc.getElementById("NLQTextArea").Value = ThisWorkbook.Sheets("ProM Search").Cells(i, 1).Value
Set strempid = mtbl.getElementsByClassName("dojoxGridCell")(1)
Set strempid1 = mtbl.getElementsByClassName("dojoxGridCell")(2)
Set strempid2 = mtbl.getElementsByClassName("dojoxGridCell")(3)
Set strempid3 = mtbl.getElementsByClassName("dojoxGridCell")(7)
Set strempid4 = mtbl.getElementsByClassName("dojoxGridCell")(9)
Set strempid5 = mtbl.getElementsByClassName("dojoxGridCell")(11)
Set strempid6 = mtbl.getElementsByClassName("dojoxGridCell")(12)
ActiveCell.Offset(0, 1).Activate
ActiveCell.Value = strempid.innerText
ActiveCell.Offset(0, 1).Activate
ActiveCell.Value = strempid1.innerText
ActiveCell.Offset(0, 1).Activate
ActiveCell.Value = strempid2.innerText
ActiveCell.Offset(0, 1).Activate
ActiveCell.Value = strempid3.innerText
ActiveCell.Offset(0, 1).Activate
ActiveCell.Value = strempid4.innerText
ActiveCell.Offset(0, 1).Activate
ActiveCell.Value = strempid5.innerText
ActiveCell.Offset(0, 1).Activate
ActiveCell.Value = strempid6.innerText
ActiveCell.Offset(1, -7).Activate
If ActiveCell.Value = "" Then
MsgBox "Fetching Completed Successfully", vbExclamation, "ProM - Open Seat Search T2"
GoTo qt
End If
Set ie = Nothing
Set HTMLDoc = Nothing
End Sub
HTMLDoc.getElementById("submitAction").Click may cause a page refresh making any referenced elements become stale. Try working always of ie.document rather than setting into a variable when performing actions likely/known to cause page refresh/update. This is a common cause of a permission denied error.
Your If End If should probably be a loop with time out. You only need the one ie.visible = True. You rarely gain from setting Visible to false in my opinion. If you intend to hide it from a user you should do so from the start unless it interferes with functionality.
No point for Application.ScreenUpdating = True as it is never switched off within this sub. If this sub is called then you only need it once outside the loop. Your repeated ActiveCell.Offset(0, 1).Activate can just use a loop with Select Case and set the value direct without activate. Repeating the line Do While ie.READYSTATE = 4: DoEvents: Loop serves no purpose.
You could use Find method of range to determine the row in column 1 where 0 occurs rather than walking down the column. And tests to determine it is found and >=3.
This error means an attempt was made to write to a write-protected disk or to access a locked file. You could check if special permission is needed to access the worksheet. For detailed causes and solutions, you can refer to this doc.
Besides, you can also refer to my working sample about reading the cell value in VBA:
Set ieA = CreateObject("InternetExplorer.Application")
ieA.Visible = True
ieA.navigate "https://www.bing.com"
Do Until ieA.readyState = 4
Set doc = ieA.Document
Dim tempStr As String
tempStr = "sb_form_q"
doc.getElementById(tempStr).Value = ThisWorkbook.Sheets("SheetName").Range("E2").Value
End Sub

Combobox Option Value Has spaces How do i Call them through VBA from excel

Sub Sprint()
Dim IE As Object
Dim objelement As Object
Dim c As Integer
Dim LastRow, i, j As Integer
Set IE = CreateObject("InternetExplorer.Application")
With IE
.Visible = True
.navigate "http://xxx"
'wait until first page loads
Do Until .readyState = 4
On Error Resume Next
Set sht = ThisWorkbook.Worksheets("TestData1")
LastRow = sht.Cells(sht.Rows.Count, "B").End(xlUp).Row
For j = 2 To LastRow
i = 1
If IE.Document.all.Item(i).innertext = "ÔÍÑ (ãîñ. ïîøëèíà)" Then
End If
IE.Visible = True
While IE.Busy
DoEvents 'wait until IE is done loading page.
With IE.Document
'text boxes
IE.Document.getElementById("txtSubscriberId").Value = sht.Cells(j, 9)
.all("txtSubscriberId").Value = sht.Cells(j, 9)
.all("txtSubscriberId").Value = sht.Cells(j, 9)
'This piece is selected in web page drop down since option value has no
IE.Document.getElementById("ddlProdCd").Value = sht.Cells(j, 20)
IE.Document.getElementById("ddlProdCd").FireEvent ("onfocus")
IE.Document.getElementById("ddlProdCd").FireEvent ("onchange")
IE.Document.getElementById("ddlProdCd").FireEvent ("onmousewheel")
'This piece is not selected in web page drop down since option value has
IE.Document.getElementById("ddlPlanDesc").Value = sht.Cells(j, 21)
IE.Document.getElementById("ddlPlanDesc").FireEvent ("onfocus")
IE.Document.getElementById("ddlPlanDesc").FireEvent ("onmousewheel")
IE.Document.getElementById("ddlPlanDesc").FireEvent ("onchange")
End With
Set IE = Nothing
Next j
End With
End Sub
In short 21st column value is not selected in webpage but 20th value is selected since in webpage option value has no space for 20th column value but 21st Column has space in option value in webpage itself
Refer attached Image

Crawler & Scraper using excel vba

I am trying to crawl in an intranet URL, so I can get the excel automatically select one of the options from a dropdown menu, then enter a value in a text box, then click on Find to get redirected to another page, where I want to get a value copy to another worksheet in the same workbook, I have created the below, but the code is not working, saying object required. :(
Sub Test()
Dim rng As Range
Set rng = Sheets("sheet1").Range("A1", Sheets("sheet1").Cells.Range("A1").End(xlDown))
Set ie = CreateObject("InternetExplorer.application")
ie.Visible = True
ie.Navigate ("https://gcd.ad.plc.cwintra.com/GCD_live/login/login.asp")
If ie.ReadyState = 4 Then
ie.Visible = False
Exit Do
End If
ie.Document.forms(0).all("txtUsername").Value = ""
ie.Document.forms(0).all("txtPassword").Value = ""
ie.Visible = True
Appliction.Wait (Now + TimeValue("00:00:02"))
For Each cell In rng
ie.Navigate ("https://gcd.ad.plc.cwintra.com/GCD_live/search.asp")
ie.Document.getElementById("cboFieldName").selectedIndex = 6
SendKeys (cell.Value)
Next cell
End Sub

Refresh or reload Internet Explorer object in Excel VBA

I'm working on a way to fill in online forms using data from an Excel Spreadsheet. I'm using the VBA Modules in Excel to do so.
The steps are as follows:
1) Navigate to page
2) Fill in details
3) Click "Continue" to continue to the next page
4) Fill in more details
5) Click "Save" to save the page
Here is the code:
Sub FillInternetForm()
'Header Start
Dim IE As Object
Dim ROW As Integer
Dim MAXROW As Integer
Set IE = CreateObject("InternetExplorer.Application")
ROW = 2
'Header End
'Step 1 Navigate to the page
IE.Navigate *PAGE_1_URL*
IE.Visible = True
While IE.Busy Or IE.ReadyState <> 4
'Step 1 Ends
'Step 2&3 Fill in the details and click "continue"
IE.Document.ALL("ctl00_PageContent_PAYDPaymentMode").Value = "18"
IE.Document.ALL("ctl00_PageContent_PAYDPaymentDate").Value = Cells(ROW, 1).Value
IE.Document.ALL("ctl00_PageContent_PAYDBusinessUnit").Value = "104"
IE.Document.ALL("ctl00_PageContent_PAYDAmountPaid").Value = Cells(ROW, 2).Value
While IE.Busy Or IE.ReadyState <> 4
'Step 2&3 Ends (IE moved to page 2)
'Step 4&5 Fill in more details and click "Save"
IE.Document.ALL("ctl00_PageContent_BankChequeOrOtherRef").Value = ThisWorkbook.Sheets("sheet1").Cells(ROW, 3).Value
IE.Document.ALL("ctl00_PageContent_PAYDRemarks").Value = ThisWorkbook.Sheets("sheet1").Cells(ROW, 4).Value
'Step 4&5 Ends
End Sub
I did step 1 to 3 fine. However, when it comes to step 4, I get an error. The error is: Runtime error 424 Object required. I am aware that this means that the Module is unable to locate the elements in step 4.
The error is at exactly:
IE.Document.ALL("ctl00_PageContent_BankChequeOrOtherRef").Value = ThisWorkbook.Sheets("sheet1").Cells(ROW, 3).Value
Before the process has to be terminated.
To debug whether the code in step 4 has any problem, I took it out and ran it separately. I took the entire section of Step4&5 and header out and make it navigate to the 2nd page instead.
Like so:
Sub FillInternetForm()
Dim IE As Object
Dim ROW As Integer
Dim MAXROW As Integer
Set IE = CreateObject("InternetExplorer.Application")
ROW = 2
IE.Navigate *PAGE_2_URL*
IE.Visible = True
While IE.Busy Or IE.ReadyState <> 4
DoEvents 'wait until IE is done loading page.
IE.Document.ALL("ctl00_PageContent_BankChequeOrOtherRef").Value = ThisWorkbook.Sheets("sheet1").Cells(ROW, 3).Value
IE.Document.ALL("ctl00_PageContent_PAYDRemarks").Value = ThisWorkbook.Sheets("sheet1").Cells(ROW, 4).Value
End Sub
And it worked.
I have no idea what is the problem. I'm guessing that all the elementID was loaded at the start or something and that the page 2 IDs were not loaded?
Does anyone know the source or solution to this problem? Thank you.
P.S. All names element ID are copied and pasted from the source code of the webform and should be correct.
I have updated the issue in a comment below.
Edit2: Tried the exact same code on a win10 machine with excel 2016. Code worked perfectly. Thanks for all the help provided.
Please give this a try... (not tested)
Sub FillInternetForm()
'Header Start
Dim IE As Object
Dim Doc As Object
Dim ROW As Integer
Dim MAXROW As Integer
Dim ContinueButton As Object
Dim SaveButton As Object
Set IE = CreateObject("InternetExplorer.Application")
ROW = 2
'Header End
'Step 1 Navigate to the page
IE.Navigate PAGE_1_URL
IE.Visible = True
While IE.Busy Or IE.ReadyState <> 4 Or ContinueButton Is Nothing
Set Doc = IE.document
Set ContinueButton = Doc.ALL("ctl00_PageContent_ContinueButton__Button")
'Step 1 Ends
'Step 2&3 Fill in the details and click "continue"
Doc.ALL("ctl00_PageContent_PAYDPaymentMode").Value = "18"
Doc.ALL("ctl00_PageContent_PAYDPaymentDate").Value = Cells(ROW, 1).Value
Doc.ALL("ctl00_PageContent_PAYDBusinessUnit").Value = "104"
Doc.ALL("ctl00_PageContent_PAYDAmountPaid").Value = Cells(ROW, 2).Value
While IE.Busy Or IE.ReadyState <> 4 Or SaveButton Is Nothing
Set Doc = IE.document
Set SaveButton = Doc.ALL("ctl00_PageContent_SaveButton__Button")
'Step 2&3 Ends (IE moved to page 2)
'Step 4&5 Fill in more details and click "Save"
Doc.ALL("ctl00_PageContent_BankChequeOrOtherRef").Value = ThisWorkbook.Sheets("sheet1").Cells(ROW, 3).Value
Doc.ALL("ctl00_PageContent_PAYDRemarks").Value = ThisWorkbook.Sheets("sheet1").Cells(ROW, 4).Value
'Step 4&5 Ends
End Sub
Updated the code to:
Sub FillInternetForm()
'Header Start
Dim IE As Object
Dim DOC As Object
Dim ROW As Integer
Dim MAXROW As Integer
Dim COUNTER As Integer
Set IE = CreateObject("InternetExplorer.Application")
Set SAVEBUTTON = Nothing
ROW = 2
'Header End
'Step 1 Navigate to the page
IE.Navigate "*WEBSITE*"
IE.Visible = True
While IE.Busy Or IE.ReadyState <> 4
'Step 1 Ends
Set DOC = IE.Document
'Step 2&3 Fill in the details and click "continue"
DOC.ALL("ctl00_PageContent_PAYDPaymentMode").Value = "18"
DOC.ALL("ctl00_PageContent_PAYDPaymentDate").Value = Cells(ROW, 1).Value
DOC.ALL("ctl00_PageContent_PAYDBusinessUnit").Value = "104"
DOC.ALL("ctl00_PageContent_PAYDAmountPaid").Value = Cells(ROW, 2).Value
'Step 2&3 Ends (IE moved to page 2)
While IE.Busy Or IE.ReadyState <> 4 Or SAVEBUTTON Is Nothing
If Not IE.Busy Or IE.ReadyState = 4 Then
End If
Set DOC = IE.Document
On Error Resume Next
Set SAVEBUTTON = DOC.ALL("ctl00_PageContent_SaveButton__Button")
'Step 4&5 Fill in more details and click "Save"
DOC.ALL("ctl00_PageContent_BankChequeOrOtherRef").Value = ThisWorkbook.Sheets("sheet1").Cells(ROW, 3).Value
DOC.ALL("ctl00_PageContent_PAYDRemarks").Value = ThisWorkbook.Sheets("sheet1").Cells(ROW, 4).Value
'Step 4&5 Ends
End Sub
Found out that this snippet is not refreshing the page even though it is supposed to.
If Not IE.Busy Or IE.ReadyState = 4 Then
End If
Is it possible that the object is dropped halfway?
Occasionally the code works when visibility of IE is turned off, but it works like once every hour or something before the exact same error pops up.
The page I am accessing seems to use ASPX and generates the page base on a database, no idea if this will help.
Edit: Tried the exact same code on a win10 machine with excel 2016. Code worked perfectly.

Excel VBA Macro: Scraping data from site table that spans multiple pages

Thanks in advance for the help. I'm running Windows 8.1, I have the latest IE / Chrome browsers, and the latest Excel. I'm trying to write an Excel Macro that pulls data from StackOverflow (https://stackoverflow.com/tags). Specifically, I'm trying to pull the date (that the macro is run), the tag names, the # of tags, and the brief description of what the tag is. I have it working for the first page of the table, but not for the rest (there are 1132 pages at the moment). Right now, it overwrites the data everytime I run the macro, and I'm not sure how to make it look for the next empty cell before running.. Lastly, I'm trying to make it run automatically once per week.
I'd much appreciate any help here. Problems are:
Pulling data from the web table beyond the first page
Making it scrape data to the next empty row rather than overwriting
Making the Macro run automatically once per week
Code (so far) is below. Thanks!
End Enum
Sub ImportStackOverflowData()
'to refer to the running copy of Internet Explorer
Dim ie As InternetExplorer
'to refer to the HTML document returned
Dim html As HTMLDocument
'open Internet Explorer in memory, and go to website
Set ie = New InternetExplorer
ie.Visible = False
ie.navigate "http://stackoverflow.com/tags"
'Wait until IE is done loading page
Application.StatusBar = "Trying to go to StackOverflow ..."
'show text of HTML document returned
Set html = ie.document
'close down IE and reset status bar
Set ie = Nothing
Application.StatusBar = ""
'clear old data out and put titles in
'put heading across the top of row 3
Range("A3").Value = "Date Pulled"
Range("B3").Value = "Keyword"
Range("C3").Value = "# Of Tags"
'Range("C3").Value = "Asked This Week"
Range("D3").Value = "Description"
Dim TagList As IHTMLElement
Dim Tags As IHTMLElementCollection
Dim Tag As IHTMLElement
Dim RowNumber As Long
Dim TagFields As IHTMLElementCollection
Dim TagField As IHTMLElement
Dim Keyword As String
Dim NumberOfTags As String
'Dim AskedThisWeek As String
Dim TagDescription As String
'Dim QuestionFieldLinks As IHTMLElementCollection
Dim TodaysDate As Date
Set TagList = html.getElementById("tags-browser")
Set Tags = html.getElementsByClassName("tag-cell")
RowNumber = 4
For Each Tag In Tags
'if this is the tag containing the details, process it
If Tag.className = "tag-cell" Then
'get a list of all of the parts of this question,
'and loop over them
Set TagFields = Tag.all
For Each TagField In TagFields
'if this is the keyword, store it
If TagField.className = "post-tag" Then
'store the text value
Keyword = TagField.innerText
Cells(RowNumber, 2).Value = TagField.innerText
End If
If TagField.className = "item-multiplier-count" Then
'store the integer for number of tags
NumberOfTags = TagField.innerText
'NumberOfTags = Replace(NumberOfTags, "x", "")
Cells(RowNumber, 3).Value = Trim(NumberOfTags)
End If
If TagField.className = "excerpt" Then
Description = TagField.innerText
Cells(RowNumber, 4).Value = TagField.innerText
End If
TodaysDate = Format(Now, "MM/dd/yy")
Cells(RowNumber, 1).Value = TodaysDate
Next TagField
'go on to next row of worksheet
RowNumber = RowNumber + 1
End If
Set html = Nothing
'do some final formatting
Range("A3").CurrentRegion.WrapText = False
Range("A1:C1").EntireColumn.HorizontalAlignment = xlCenter
Range("A1").Value = "StackOverflow Tag Trends"
Range("A1").Font.Bold = True
Application.StatusBar = ""
MsgBox "Done!"
End Sub
There's no need to scrape Stack Overflow when they make the underlying data available to you through things like the Data Explorer. Using this query in the Data Explorer should get you the results you need:
select t.TagName, t.Count, p.Body
from Tags t inner join Posts p
on t.ExcerptPostId = p.Id
order by t.count desc;
The permalink to that query is here and the "Download CSV" option which appears after the query runs is probably the easiest way to get the data into Excel. If you wanted to automate that part of things, the direct link to the CSV download of results is here
You can improve this to parse out exact elements but it loops all the pages and grabs all the tag info (everything next to a tag)
Option Explicit
Public Sub ImportStackOverflowData()
Dim ie As New InternetExplorer, html As HTMLDocument
Application.ScreenUpdating = False
With ie
.Visible = True
.navigate "https://stackoverflow.com/tags"
While .Busy Or .READYSTATE < 4: DoEvents: Wend
Set html = .document
Dim numPages As Long, i As Long, info As Object, item As Object, counter As Long
numPages = html.querySelector(".page-numbers.dots ~ a").innerText
For i = 1 To 2 ' numPages ''<==1 to 2 for testing; use to numPages
Set info = html.getElementById("tags_list")
For Each item In info.getElementsByClassName("grid-layout--cell tag-cell")
counter = counter + 1
Cells(counter, 1) = item.innerText
Next item
While .Busy Or .READYSTATE < 4: DoEvents: Wend
Set html = .document
Next i
Application.ScreenUpdating = True
.Quit '<== Remember to quit application
End With
End Sub
I'm not making use of the DOM, but I find it very easy to get around just searching between known tags. If ever the expressions you are looking for are too common just tweak the code a bit so that it looks for a string after a string).
An example:
Public Sub ZipLookUp()
Dim URL As String, xmlHTTP As Object, html As Object, htmlResponse As String
Dim SStr As String, EStr As String, EndS As Integer, StartS As Integer
Dim Zip4Digit As String
URL = "https://tools.usps.com/go/ZipLookupResultsAction!input.action?resultMode=1&companyName=&address1=1642+Harmon+Street&address2=&city=Berkeley&state=CA&urbanCode=&postalCode=&zip=94703"
Set xmlHTTP = CreateObject("MSXML2.XMLHTTP")
xmlHTTP.Open "GET", URL, False
On Error GoTo NoConnect
On Error GoTo 0
Set html = CreateObject("htmlfile")
htmlResponse = xmlHTTP.ResponseText
If htmlResponse = Null Then
MsgBox ("Aborted Run - HTML response was null")
Application.ScreenUpdating = True
GoTo End_Prog
End If
'Searching for a string within 2 strings
SStr = "<span class=""address1 range"">" ' first string
EStr = "</span><br />" ' second string
StartS = InStr(1, htmlResponse, SStr, vbTextCompare) + Len(SStr)
EndS = InStr(StartS, htmlResponse, EStr, vbTextCompare)
Zip4Digit = Left(Mid(htmlResponse, StartS, EndS - StartS), 4)
MsgBox Zip4Digit
GoTo End_Prog
If Err = -2147467259 Or Err = -2146697211 Then MsgBox "Error - No Connection": GoTo End_Prog 'MsgBox Err & ": " & Error(Err)
End Sub