i want to replace the line htmldoc from htmlobject library to something suitable for selenium. i want to pass htmldoc as argument in another subroutine so Here is the code:
Dim htmldoc As MSHTML.HTMLDocument
Dim htmldiv As Selenium.WebElement
Dim htmlul As Selenium.WebElement
Dim htmlAs As Selenium.WebElements
Dim htmlA As Selenium.WebElement
Dim TableName As String
URL = "https://www.whoscored.com/Statistics"
sel.Start "Chrome"
sel.Get URL
'set htmldoc= sel.document..... something....
Set htmldiv = sel.FindElementById("top-player-stats")
Set htmlul = sel.FindElementById("top-player-stats-options")
Set htmlAs = htmlul.FindElementsByTag("a")
For Each htmlA In htmlAs
TableName = htmlA.attribute("href")
htmlA.Click
GoToTable htmldoc, TableName
Next htmlA
End Sub
If you're trying to capture the entire HTML source code.
One options is to use
sel.PageSource
But that might not behave as you expect as a limitation to how it is generated (source: https://www.selenium.dev/selenium/docs/api/java/org/openqa/selenium/WebDriver.html#getPageSource()).
You could also try these after the page is fully loaded:
sel.ExecuteScript("return document.documentElement.innerHTML")
sel.ExecuteScript("return document.body.innerHTML")
Related
I am not a very experienced coder, and I am trying to put to create a web-scraping tool that does not need to be very powerful/eloquent. My issue is that the only way I can scrape data off a specific website is by using each SHTMLelement's id. I then want to put all of those element's into one element collection, but I can't figure out how to do this. Here is my code:
Dim IE As New SHDocVw.InternetExplorerMedium
Dim HTMLDoc As MSHTML.HTMLDocument
Dim HTMLgrades As MSHTML.IHTMLElementCollection ' The collection I want to make
Dim HTMLgrade1 As MSHTML.IHTMLElement
Dim HTMLgrade2 As MSHTML.IHTMLElement
Dim HTMLgrade3 As MSHTML.IHTMLElement 'there are 100's of grades, but I will just use three here
IE.Visible = True
IE.navigate "website I am navigating to"
' waiting mechanism for website to load
Set HTMLDoc = IE.Document
Set HTMLgrade1 = HTMLDoc.getElementById("longidname_1")
Set HTMLgrade2 = HTMLDoc.getElementById("longidname_2")
Set HTMLgrade3 = HTMLDoc.getElementById("longidname_3")
I have tried all types of code to add each element to the elementcollection, but I keep getting errors. I know there is most likely a super simple solution, so I appreciate any help I can get!
For example:
Dim IE As New SHDocVw.InternetExplorerMedium
Dim HTMLDoc As MSHTML.HTMLDocument
Dim col As New Collection, arr, id
IE.Visible = True
IE.navigate "website I am navigating to"
' waiting mechanism for website to load
arr = Array("longidname_1", "longidname_2", "longidname_3")
For Each id In arr
col.Add HTMLDoc.getElementById(id)
Next
I have been looking for the answer on this question for quite some time. Below here i have two parts of code that load html of a website into memory. Same result. But de getelements methods, for example getelementsbyclassname do not work when i use the 'Get' method. I wanted to use the quicker 'Get' method but due to this different outcome I couldn't. In the first line of code getElementsByClassName works, but in the second part its outcome remains nothing.
I really couldn't figure out why, I have been stuck for some time now. I hope here you can help met out. Thank you in advance.
<i>Dim IE As New SHDocVw.InternetExplorer
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim URL As String
Dim Element1 As MSHTML.IHTMLElement, Element2 As MSHTML.IHTMLElement,
Element3 As MSHTML.IHTMLElement
Dim Elementen As MSHTML.IHTMLElementCollection
URL = "https://www.google.nl/?gfe_rd=cr&dcr=0&ei=KXNcWsHNJ9OB4gTcjqvwCA"
IE.Visible = True
IE.navigate URL
Do While IE.readyState <> READYSTATE_COMPLETE
DoEvents
Loop
Set HTMLDoc = IE.document
Set Element1 = HTMLDoc.getElementsByClassName("gsfi")(0)
Set Element2 = HTMLDoc.getElementById("lst-ib")
Debug.Print Element1.className, Element2.className
Dim XMLPage As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim URL As String
Dim Element1 As MSHTML.IHTMLElement
Dim Element2 As MSHTML.IHTMLElement
URL = "https://www.google.nl/?gfe_rd=cr&dcr=0&ei=KXNcWsHNJ9OB4gTcjqvwCA"
XMLPage.Open "Get", URL, False
XMLPage.send
HTMLDoc.body.innerHTML = XMLPage.responseText
Set Element1 = HTMLDoc.getElementsByClassName("gsfi")(0)
Set Element2 = HTMLDoc.getElementById("lst-ib")
Debug.Print Element2.className</i>
How can I use "createDocumentFromUrl()" to fetch "HTMLDocument" from a webpage directly in vba? I tried a lot to reach out any documentation on it in SO but failed to find out. Hope there is somebody to stretch a helping hand to accomplish this. Thanks in advance.
Here is what I've tried so far which is definitely not right:
Sub HtmlScraper()
Dim odoc As Object
Set odoc = New HTMLDocument
odoc.Open createDocumentFromUrl("http://www.stackoverflow.com", "null")
MsgBox odoc.body.innerHTML
End Sub
I tried like this as well but no luck:
Sub htmlparser()
Dim odoc As HTMLDocument, hdoc As HTMLDocument
Set odoc = New HTMLDocument
Set hdoc = New HTMLDocument
Set hdoc = odoc.createDocumentFromUrl("http://www.stackoverflow.com", Null, False)
MsgBox hdoc.body.outerHTML
End Sub
This worked for me, it may be the site.
Sub test()
Dim d As MSHTML.HTMLDocument
Set d = New MSHTML.HTMLDocument
Dim d2 As MSHTML.HTMLDocument
set d2=d.createDocumentFromUrl("www.bbc.co.uk", "null")
While d.readyState <> "complete"
DoEvents
Wend
End Sub
I have made a macro to read text from a saved html page/file. I now need to make it more advance by reading an already open webpage. would really appreciate the help
Need to replace the line
URL = "file:///C:/test.html"
with something that will read an open webpage. I can make sure that there is only one tab open. I'm using the latest IE
Dim URL As String
Dim Data As String
URL = "file:///C:/test.html"
Dim ie As Object
Dim ieDoc As Object
Set ie = CreateObject("InternetExplorer.Application")
ie.navigate URL
Do Until (ie.readyState = 4 And Not ie.Busy)
DoEvents
Loop
Set ieDoc = ie.Document
Data = ieDoc.body.innerText
If you know the title or url of the already open webpage you're looking for, then this code will let you control it
' Determine if a specific instance of IE is already open.
Set objShell = CreateObject("Shell.Application")
IE_count = objShell.Windows.Count
For x = 0 To (IE_count - 1)
On Error Resume Next ' sometimes more web pages are counted than are open
my_url = objShell.Windows(x).Document.Location
my_title = objShell.Windows(x).Document.Title
'You can use my_title of my_url, whichever you want
If my_title Like "Put your webpage title here" & "*" Then 'identify the existing web page
Set ie = objShell.Windows(x)
Exit For
Else
End If
Next
Use this code to get the currently running internet explorer (work atleast with IE9):
Dim ie As Object
Dim objShell As Object
Dim objWindow As Object
Dim objItem As Object
Set objShell = CreateObject("Shell.Application")
Set objWindow = objShell.Windows()
For Each objItem In objWindow
If LCase(objItem.FullName Like "*iexplore*") Then
Set ie = objItem
End If
Next objItem
MsgBox ie.Document.body.innertext
' Add reference to
' - Microsoft Internet Controls (SHDocVw)
' - Microsoft Shell Controls and Automation (Shell32)
' Find all running instances of IE and get web page Url
' Source: http://msdn.microsoft.com/en-us/library/windows/desktop/bb773974(v=vs.85).aspx
' Useful link: http://msdn.microsoft.com/en-us/library/windows/desktop/bb776890(v=vs.85).aspx
Sub main()
Dim browsers
Set browsers = GetBrowsers
Dim browser
Dim url
For Each browser In browsers
url = browser.document.Location.href
Debug.Print CStr(url)
Next browser
End Sub
Public Function GetBrowsers() As Collection
Dim browsers As New Collection
Dim shellApp As Shell32.Shell
Dim wnds As SHDocVw.ShellWindows
Set shellApp = New Shell
Set wnds = shellApp.Windows
Dim i As Integer
Dim ie As SHDocVw.WebBrowser
Dim name
For i = 1 To wnds.Count
Set ie = wnds(i)
If ie Is Nothing Then GoTo continue
If UCase(ie.FullName) Like "*IEXPLORE.EXE" Then
browsers.Add ie
End If
continue:
Next i
Set GetBrowsers = browsers
Set shellApp = Nothing
End Function
I'd like to use the MSHTML library to parse some HTML that I have in a string variable. However, I can't figure out how to do this. I can easily parse the contents of a webpage given a known URL, but not the source HTML directly. Is this possible? If so, how?
Public Sub ParseHTML(sHTML As String)
Dim oHTML As New HTMLDocument, oDoc As HTMLDocument
'This works:'
Set oDoc = oHTML.createDocumentFromUrl("http://www.google.com", "")
'I would like to do the following but no such method actually exists:'
Set oDoc = oHTML.createDocumentFromString(sHTML)
....
'Parse the HTML using the oDoc variable'
....
You can;
Dim odoc As Object
Set odoc = CreateObject("htmlfile") '// late binding
'// or:
'// Set odoc = New HTMLDocument
'// for early binding
odoc.open
odoc.write "<p> In his house at R'lyeh, dead <b>Cthulhu</b> waits dreaming</p>"
odoc.Close
MsgBox odoc.body.outerHTML
For straight HTML code such as Access-Rich-Text this does it:
Dim HTMLDoc As New HTMLDocument
HTMLDoc.Body.innerHTML = strHTMLText
This is a much better example. You will not get a null exception, nor late binding.
(And if you use WPF, just add System.Windows.Forms in your reference.)
Dim a As Object
a = New mshtml.HTMLDocument
a.open()
a.writeln(code)
a.close()
Do Until a.readyState = "complete"
System.Windows.Forms.Application.DoEvents()
Loop
Dim doc As mshtml.HTMLDocument = a
Dim b As mshtml.HTMLSelectElement = doc.getElementsByTagName("Select").item("lang", 0)