I am using below mentioned code for data extraction,this code is working fine for me when java scripts are disabled in Internet Explorer but when i enable scripts then Memory utilization is very high and increase after each url extraction.Suggestion require so that system will use very low memory.I use following code i.e set ie=nothing or ie.refresh2 but still same issue.
Ie.navigate url1234
Do While Ie.ReadyState <> 4 Or _
Ie.Busy = True
DoEvents
Loop
Set html = Ie.Document
html1 = html.body.innerHTML
Related
I am using a Macro Excel functions to load web pages, but many websites are no longer supporting the IE, the main web browser working on VBA.
Set ie = CreateObject("InternetExplorer.Application")
ie.Visible = 0
URL = "http://website/"
ie.Navigate URL
State = 0
Do Until State = 4
DoEvents
State = ie.readyState
Loop
would the problem be solved if i installed IE11 or MS Edge?
I have big Word documents that load approximately 10 seconds each. I don't want to use Wait method, because it's not optimal and risky. For Internet Explorer I use:
Do Until ie.ReadyState = 4 And ie.Busy = False
DoEvents
Loop
I would like to use the same for Word document or Word application, I have them dimensioned as objects, but they don't support such properties.
I need to navigate to a page using VBA. I created the code bellow and it worked just fine until I reinstalled Windows. I don't know if this is the cause, but it is not working anymore.
There are tho cases:
If I'm not logged in before running the macro
The first part, for logging onto the page is working, but when I try to navigate to "http://cltd.ro/catalogsearch/result/?q=60041", I get an error "Method 'Navigate' of object 'IWebBrowser 2' failed".
If I'm already logged on to the site
On the other hand, if I'm already logged in before running the macro, the code is working and I get to the desired page, but the code gets stuccked in to the loop "Do/DoEvents/Loop Until ie.readystate = 4" forever and I cannot continue
Please help!
Thanks!
Sub xx()
Dim ie As Object
Set ie = CreateObject("internetexplorer.application")
ie.Visible = True
AppActivate ie
apiShowWindow ie.hwnd, 3
ie.Visible = True
ie.navigate "http://cltd.ro/customer/account/login/"
Do
DoEvents
Loop Until ie.readystate = 4
On Error Resume Next 'in case I'm already loggen in
Call ie.Document.getElementById("email").setattribute("value", "dfgsdfg") 'not real value ;-)
Call ie.Document.getElementById("pass").setattribute("value", "dfgsdfg") 'not real value ;-)
Call ie.Document.getElementById("send2").Click
Do
DoEvents
Loop Until ie.readystate = 4
On Error GoTo 0 'cancels the resume next
Application.Wait (Now() + TimeValue("0:00:02"))
link = "http://cltd.ro/catalogsearch/resuilt/?q=60041"
ie.navigate link
Do
DoEvents
Loop Until ie.readystate = 4
'Other lines of code ...............
End Sub
This doesn't answer your actual question, but:
(1) You should always have a timeout in your "wait until document ready" loops. Otherwise, those loops can wait forever if the page does not go ready. In fact when I tried navigating manually (via the IE URL bar) to "http://cltd.ro/customer/account/login/", it sat there spinning for quite some time, until I gave up and hit stop. Your code should detect cases where the document takes too ling to go ready, and reply with something like "Website hasn't responded for 60 seconds, do you want to keep waiting? [yes] [No]".
(2) You don't allow for the case where the page does go ready, but it's not the page you wanted. For example, if the website returns an error, or IE decides to time it out, you'll get a page with a URL of the form "res://..." - not the page you actually asked for. You can't just assume that you actually got the right page.
So your existing code is ok for a 1-off script, but not sufficiently robust for proper production use.
I am writing a macro that will scrape my company's internal SAP site for vendor information. For several reasons I have to use VBA to do so. However, I cannot figure out why I keep getting these three errors when I attempt to scrape the page. Is it possible that this has something to do with the UAC integrity model? Or is there something wrong with my code? Is it possible for a webpage using http can be handled differently in internet explorer? I am able to go to any webpage, even other internal webpages, and can scrape each of those just fine. But when i attempt to scrape the SAP page, i get these errors. The error descriptions and when they occur are:
800706B5 - The interface is unknown (occurs when I place breakpoints before running the offending code)
80004005 - Unspecified error (occurs when I don't place any errors and just let the macro run)
80010108 - The Object invoked has disconnected from its clients. (I can't seem to get a consistent occurrence of this error, it seems to happen around the time that something in excel is so corrupted that no page will load and i have to reinstall excel)
I have absolutely no idea what is going on. The Integrity page didn't make much sense to me, and all the research I found on this talked about connecting to databases and using ADO and COM references. However I am doing everything through Internet Explorer. Here is my relevant code below:
Private Sub runTest_Click()
ie.visible = True
doScrape
End Sub
'The code to run the module
Private Sub doTest()
Dim result As String
result = PageScraper.scrapeSAPPage("<some num>")
End Sub
PageScraper Module
Public Function scrapeSAPPage(num As Long) As String
'Predefined URL that appends num onto end to navigate to specific record in SAP
Dim url As String: url = "<url here>"
Dim ie as InternetExplorer
set ie = CreateObject("internetexplorer.application")
Dim doc as HTMLDocument
ie.navigate url 'Will always sucessfully open page, regardless of SAP or other
'pauses the exection of the code until the webpage has loaded
Do
'Will always fail on next line when attempting SAP site with error
If Not ie.Busy And ie.ReadyState = 4 Then
Application.Wait (Now + TimeValue("00:00:01"))
If Not ie.Busy And ie.ReadyState = 4 Then
Exit Do
End If
End If
DoEvents
Loop
Set doc = ie.document 'After implementation of Tim Williams changes, breaks here
'Scraping code here, not relevant
End Function
I am using IE9 and Excel 2010 on a Windows 7 machine. Any help or insight you can provide would be greatly appreciated. Thank you.
I do this type of scraping frequently and have found it very difficult to make IE automation work 100% reliably with errors like those you have found. As they are often timing issues it can be very frustrating to debug as they don't appear when you step through, only during live runs To minimize the errors I do the following:
Introduce more delays; ie.busy and ie.ReadyState don't necessarily give valid answers IMMEDIATELY after an ie.navigate, so introduce a short delay after ie.navigate. For things I'm loading 1 to 2 seconds normally but anything over 500ms seems to work.
Make sure IE is in a clean state by going ie.navigate "about:blank" before going to the target url.
After that you should have a valid IE object and you'll have to look at it to see what you've got inside. Generally I avoid trying to access the entire ie.document and instead use IE.document.all.tags("x") where 'x' is a suitable thing I'm looking for such as td or a.
However after all these improvements although they have increased my success rate I still have errors at random.
My real solution has been to abandon IE and instead do my work using xmlhttp.
If you are parsing out your data using text operations on the document then it will be a no-brainer to swap over. The xmlhttp object is MUCH more reliable. and you just get the "responsetext" to access the entire html of the document.
Here is a simplified version of what I'm using in production now for scraping, it's so reliable it runs overnight generating millions of rows without error.
Public Sub Main()
Dim obj As MSXML2.ServerXMLHTTP
Dim strData As String
Dim errCount As Integer
' create an xmlhttp object - you will need to reference to the MS XML HTTP library, any version will do
' but I'm using Microsoft XML, v6.0 (c:\windows\system32\msxml6.dll)
Set obj = New MSXML2.ServerXMLHTTP
' Get the url - I set the last param to Async=true so that it returns right away then lets me wait in
' code rather than trust it, but on an internal network "false" might be better for you.
obj.Open "GET", "http://www.google.com", True
obj.send ' this line actually does the HTTP GET
' Wait for a completion up to 10 seconds
errCount = 0
While obj.readyState < 4 And errCount < 10
DoEvents
obj.waitForResponse 1 ' this is an up-to-one-second delay
errCount = errCount + 1
Wend
If obj.readyState = 4 Then ' I do these on two
If obj.Status = 200 Then ' different lines to avoid certain error cases
strData = obj.responseText
End If
End If
obj.abort ' in real code I use some on error resume next, so at this point it is possible I have a failed
' get and so best to abort it before I try again
Debug.Print strData
End Sub
Hope that helps.
I have a vbscript using the InternetExplorer object to navigate to a few pages and passing data to those pages. Recently since patching IE8 I've noticed that something is causing the creation of zombie iexplore.exe processes. While running my script and watching the process list in task manager I noticed that when my script creates the InternetExplorer.Application object, two processes appear in the process list. Is this normal behavior? Why does this happen? I'm wondering because even though during my testing it appears both of these processes get killed when I call the InternetExplorer object's quit method, I still suspect that these multiple processes are the root cause of the zombies.
Here is some sample code:
Set ie = CreateObject("InternetExplorer.Application")
ie.Navigate2 "Address"
ie.AddressBar = 1
ie.Toolbar = 1
ie.StatusBar = 1
ie.Width = 600
ie.Height = 400
ie.Left = 300
ie.Top = 150
ie.Visible = 1
Do While ie.Busy
WScript.Sleep 1
Loop
ie.Navigate2 "Address?variable=value"
Do While ie.Busy
WScript.Sleep 1
Loop
...rest of code...
ie.Quit
Set ie = Nothing
I would guess that IE is putting the tab and the browser window in separate processes.
I have a similar issue when using createobject to start MsAccess. Two processes appear in the task list and both go away with the quit command.
However if some problem occurs during the session that causes my program to crash only one of the processes gets shutdown.
This must be some Microsoft system feature.