for my master thesis I need to get the Wikipedia-URLs for a list of actors (approximately 20,000)
sktneer helped me with my first attempt to get the code running. Thank you again!
(see: Get Wikipedia page urls from an Excel list
)
One issue thats left, is that google blocks my queries after a couple of actors. (150-200)
A thought was to build in the Application.Wait command into the code, so that there would be a pause of 2-3 seconds before every new query.
Would this work and if, can you help me to impement this into the code?
Or is this the wrong way and is there even an easier solution?
sample
Code:
Sub XMLHTTP()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
Dim start_time As Date
Dim end_time As Date
Dim i As Long
Dim str_text As String
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.de/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
If XMLHTTP.Status = 200 Then
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid("rso")
If Not objResultDiv Is Nothing Then
Set objH3 = objResultDiv.getelementsbytagname("H3")(0)
Set link = objH3.getelementsbytagname("a")(0)
str_text = Replace(link.innerHTML, "<EM>", "")
str_text = Replace(str_text, "</EM>", "")
Cells(i, 2) = str_text
Cells(i, 3) = link.href
DoEvents
Else
Cells(i, 2) = "Not Found"
Cells(i, 3) = "Not Found"
End If
Else
Cells(i, 2) = "Not Found"
Cells(i, 3) = "Not Found"
End If
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
Related
I need a small help. Below I have mentioned the VBA code that I am using for auto search on google which provides the first link on my excel. The code actually runs pretty well but throws this error "runtime error 91: object variable or with block variable not set" when the google is unable to provide data. This comes often which makes my work a lot harder. To be clear I don't know anything about coding. So kindly help me by creating a loop that can actually skip the non-searchable cell and move onto the other.
Sub XMLHTTP()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
Dim start_time As Date
Dim end_time As Date
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid("rso")
Set objH3 = objResultDiv.getelementsbytagname("H3")(0)
Set link = objH3.getelementsbytagname("a")(0)
str_text = Replace(link.innerHTML, "<EM>", "")
str_text = Replace(str_text, "</EM>", "")
Cells(i, 2) = str_text
Cells(i, 3) = link.href
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
I have about 12,000 rows of data that I need websites for. This VBA code was working fine for about 800 and then stopped. Now I can't get it to run again due to this error. I can't figure out how to get it running again.
Run-time Error '91' Object-variable or With block variable not set
Sub XMLHTTP()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
Dim start_time As Date
Dim end_time As Date
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid("rso")
Set objH3 = objResultDiv.getelementsbytagname("H3")(0)
Set link = objH3.getelementsbytagname("a")(0)
str_text = Replace(link.innerHTML, "<EM>", "")
str_text = Replace(str_text, "</EM>", "")
Cells(i, 2) = str_text
Cells(i, 3) = link.href
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
Run-time Error '91' Object-variable or With block variable not set
You are getting that error because one of the below SET statement is failing.
Set objResultDiv = html.getelementbyid("rso")
Set objH3 = objResultDiv.getelementsbytagname("H3")(0)
Set link = objH3.getelementsbytagname("a")(0)
You need to properly handle the objects. For example
Set objResultDiv = HTML.getelementbyid("rso")
If objResultDiv Is Nothing Then
MsgBox "Id `rso` not found for " & Cells(i, 1)
Else
Set objH3 = objResultDiv.getelementsbytagname("H3")(0)
If objH3 Is Nothing Then
MsgBox "Tag `H3` not found for " & Cells(i, 1)
Else
Set link = objH3.getelementsbytagname("a")(0)
If link Is Nothing Then
MsgBox "Tag `a` not found for " & Cells(i, 1)
Else
'
'~~> Rest of your code
'
End If
End If
End If
It is possible to modify this code to search for large images and to return the link of the first image. I tryed to modify the google link with the google image link but it doesent work
Sub XMLHTTP()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
Dim start_time As Date
Dim end_time As Date
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.com/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid("rso")
Set objH3 = objResultDiv.getelementsbytagname("H3")(0)
Set link = objH3.getelementsbytagname("a")(0)
str_text = Replace(link.innerHTML, "<EM>", "")
str_text = Replace(str_text, "</EM>", "")
Cells(i, 2) = str_text
Cells(i, 3) = link.href
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
Here is my current code, found from multiple people on the Forum. Usually it will work fine for about 100 search results and then give me "Run Time Error '91'"
I'm beginning to think it has something to do with google and not the actual code too because its very randomly occuring
Thanks
Sub XMLHTTP()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
Dim start_time As Date
Dim end_time As Date
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerhtml = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid("rso")
numb_H3 = objResultDiv.getElementsByTagName("H3").Length
If numb_H3 > 0 Then
Set objH3 = objResultDiv.getElementsByTagName("H3")(0)
Set link = objH3.getElementsByTagName("a")(0)
str_text = Replace(link.innerhtml, "<EM>", "")
str_text = Replace(str_text, "</EM>", "")
Cells(i, 2) = str_text
Cells(i, 3) = link.href
Else
End If
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
You should to use if statement:
If XMLHTTP.waitForResponse(WAIT_TIMEOUT) Then 'response ready
Set html = CreateObject("htmlfile")
html.body.innerhtml = XMLHTTP.responseText
Set objResultDiv = html.getelementbyid("rso")
numb_H3 = objResultDiv.getElementsByTagName("H3").Length
If numb_H3 > 0 Then
Set objH3 = objResultDiv.getElementsByTagName("H3")(0)
Set link = objH3.getElementsByTagName("a")(0)
str_text = Replace(link.innerhtml, "<EM>", "")
str_text = Replace(str_text, "</EM>", "")
Cells(i, 2) = str_text
Cells(i, 3) = link.href
Else
End If
DoEvents
End If
I had the same problem, seems to be Google indeed. turning on VPN solves the issue, so there is nothing wrong with the code itself.
I am new to VBA, and I figured that trying to code is the best way to code. Anyway, I am trying to code a macro that will get first URL of a Google search result, but I'm getting error Object variable or With block variable not set when search result is 0, and the remaining operations are skipped. Here's the error image:
Here is the code I used:
Sub XMLHTTP()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object, objResultDiv As Object, objH3 As Object, link As Object
Dim start_time As Date
Dim end_time As Date
lastRow = Range("A" & Rows.Count).End(xlUp).Row
Dim cookie As String
Dim result_cookie As String
start_time = Time
Debug.Print "start_time:" & start_time
For i = 2 To lastRow
url = "https://www.google.co.in/search?q=" & Cells(i, 1) & "&rnd=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.serverXMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1; rv:25.0) Gecko/20100101 Firefox/25.0"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid("rso")
Set objH3 = objResultDiv.getElementsByTagName("H3")(0)
Set link = objH3.getElementsByTagName("a")(0)
str_text = Replace(link.innerHTML, "<EM>", "")
str_text = Replace(str_text, "</EM>", "")
Cells(i, 2) = str_text
Cells(i, 3) = link.href
DoEvents
Next
end_time = Time
Debug.Print "end_time:" & end_time
Debug.Print "done" & "Time taken : " & DateDiff("n", start_time, end_time)
MsgBox "done" & "Time taken : " & DateDiff("n", start_time, end_time)
End Sub
Can anyone help me please?
In the zero result case, H3 is empty so modify your code like this to handle this case
Set html = CreateObject("htmlfile")
html.body.innerhtml = XMLHTTP.ResponseText
Set objResultDiv = html.getelementbyid("rso")
**numb_H3 = objResultDiv.getElementsByTagName("H3").Length**
**If numb_H3 > 0 Then**
Set objH3 = objResultDiv.getElementsByTagName("H3")(0)
Set link = objH3.getElementsByTagName("a")(0)
str_text = Replace(link.innerhtml, "<EM>", "")
str_text = Replace(str_text, "</EM>", "")
Cells(i, 2) = str_text
Cells(i, 3) = link.href
**Else**
**End If**
DoEvents
Next
Here is simplified code for the same method.
Sub xmlHttp()
Dim url As String,
lastRow As Long,
XMLHTTP As Object,
html As Object,
objResultDiv As Object,
objH3 As Object,
link As Object
lastRow = Range("A" & Rows.Count).End(xlUp).Row
For i = 2 To lastRow
url = "https://www.google.co.in/search?q=" & Cells(i, 1)
Set xmlHttp = CreateObject("MSXML2.XMLHTTP")
xmlHttp.Open "GET", URL, False
xmlHttp.setRequestHeader "Content-Type", "text/xml"
xmlHttp.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = xmlHttp.ResponseText
Set objResultDiv = html.getelementbyid("rso")
numb_H3 = objResultDiv.getElementsByTagName("H3").Length
If numb_H3 > 0 Then
Set objH3 = objResultDiv.getElementsByTagName("H3")(0)
Set link = objH3.getElementsByTagName("a")(0)
Range(i, 2) = link
Else
End If
DoEvents
Next
End Sub
One simple workaround--though not the best--is to skip the error.
Try the following modification:
start_time = Time
Debug.Print "start_time:" & start_time
On Error Resume Next '--Add this part.
For i = 2 To lastRow
Other options include a true error handling part, something that returns a value when your search returns nothing.
Let us know if this helps.