Failed to find any tables on the website - VBA MSXML2 - vba

I'm trying to fetch some data from a website using MSXML2 library. There are no errors, however the list of elements within tag is 0.
While I use IE, it works, but its much slower and sometimes website doesn't even load.
Edit: I've noticed that website is showing "Loading" page before loaded and I think it might be an issue.
Here is the code:
Sub Test()
Dim Data As Variant
Dim Tables As Object
Dim Website As Object
Dim x As Long, y As Long
Dim oRow As Object, oCell As Object
Dim FirstCol As Integer, LastCol As Integer, FirstRow As Integer, LastRow As Integer
Set Website = CreateObject("htmlFile")
y = 0: x = 0 'X - row, Y - column
With CreateObject("MSXML2.serverXMLHTTP")
.Open "GET", "www.example.com", False
.send
Website.body.innerHTML = .responseText
End With
Set Tables = Website.getElementsByTagName("table")
'And when I go to debug mode and check, there are 0 tables. However, on the website, there are many tables within <table> tag
End Sub

Related

Programatically sort pages in a Visio Document using VBA

Does anyone know a method to sort Visio pages alphabetically using VBA?
I looked to see if a method such as vzdVisioDocument.Pages.Sort exists, but found nothing in documentation or through internet searches.
Do I need to write my own sorting function using the Application.ActiveDocument.Pages.ItemU("Page Name").Index property? That seems to be the method suggested by recording a macro of the action.
So that wasn't as painful as expected. With vzdVisioDocument as an already defined Visio.Document:
' Make a collection of titles to iterate through
Dim colPageTitles As Collection
Set colPageTitles = New Collection
Dim intPageCounter As Integer
For intPageCounter = 1 To vzdVisioDocument.Pages.Count
colPageTitles.Add vzdVisioDocument.Pages.Item(intPageCounter).Name
Next intPageCounter
' For each title in the collection, iterate through pages and find the appropriate new index
Dim intPageIndex As Integer
Dim varPageTitle As Variant
For Each varPageTitle In colPageTitles
For intPageIndex = 1 To vzdVisioDocument.Pages.Count
' Check to see if the title comes before the index's current page title
If StrComp(varPageTitle, vzdVisioDocument.Pages.Item(intPageIndex).Name) < 0 Then
' If so, set the new page index
vzdVisioDocument.Pages.ItemU(varPageTitle).Index = intPageIndex
Exit For
End If
Next intPageIndex
Next varPageTitle
' Clean up
Set colPageTitles = Nothing
I mentioned this in another comment, but when I made some test pages, it was always shuffling the pages around when I ran it because I the way that this is implemented, I don't believe that Exit For should be in there.
I also swapped the comparison to StrCompare due to personal preference along with the order of the for loops.
Sub PageSort()
Dim titlesColl As Collection
Set titlesColl = New Collection
Dim i As Long
For i = 1 To ActiveDocument.Pages.Count
titlesColl.Add ActiveDocument.Pages.Item(i).Name
Next i
Dim title As Variant
For i = 1 To ActiveDocument.Pages.Count
For Each title In titlesColl
If StrComp(ActiveDocument.Pages.Item(i).Name, title, vbTextCompare) < 0 Then
ActiveDocument.Pages.Item(title).index = i
End If
Next title
Next i
Set titlesColl = Nothing
End Sub
Private Sub reorderPages()
Dim PageNameU() As String
Dim isBackgroundPage As Boolean
Dim vsoPage As Visio.Page
Dim vsoCellObj As Visio.Cell
'// Get All Pages
Dim i As Integer
For Each vsoPage In ActiveDocument.Pages
i = i + 1
ReDim Preserve PageNameU(i)
PageNameU(i) = vsoPage.NameU
Next vsoPage
For i = 1 To UBound(PageNameU)
Set vsoPage = vsoPages.ItemU(PageNameU(i))
Set vsoCellObj = vsoPage.PageSheet.Cells("UIVisibility")
isBackgroundPage = vsoPage.Background
'// Make foreground page to set page index
If isBackgroundPage = True Then
vsoCellObj.FormulaU = visUIVNormal
vsoPage.Background = False
End If
vsoPage.Index = NumNonAppSysPages + i
'// Set to background page
If isBackgroundPage = True Then
vsoCellObj.FormulaU = visUIVHidden
vsoPage.Background = True
End If
Next i
End Sub

Unable to shake off hardcoded delay from my script

I've written a script in vba in combination with selenium to parse all the company names available in a webpage. The webpage has got lazyloading method active so there are only 20 links become visible in each scroll. If I scroll 2 times then the number of links visible are 40 and so on. There are 1000 links available in that webpage. My below script can reach the bottom of that page handling all the scroll and fetch all the names available in that webpage.
However, it is necessary to wait a certain time after each scroll for that webpage to update the content. This is where I've used hardcoded delay but the process of hardcoding thing is very inconsistent and sometimes it makes the browser quit before the completion of the whole operation.
How can I modify this portion .Wait 6000 to make it Explicit Wait instead of Hardcoded Wait.
This is what I've written so far:
Sub Getlinks()
Dim driver As New ChromeDriver, prevlen&, curlen&
Dim posts As Object, post As Object
With driver
.get "http://fortune.com/fortune500/list/"
prevlen = .FindElementsByClass("company-title").Count
Do
prevlen = curlen
.ExecuteScript ("window.scrollTo(0, document.body.scrollHeight);")
.Wait 6000 ''I like to kick out this hardcoded delay and use explicit wait in place
Set posts = .FindElementsByClass("company-title")
curlen = posts.Count
If prevlen = curlen Then Exit Do
Loop
For Each post In posts
R = R + 1: Cells(R, 1) = post.Text
Next post
End With
End Sub
Here is a completely different approach that doesn't require using a browser, instead it submits a series of web requests. With this approach, waiting for a page to load isn't a concern.
Typically, with lazy loading pages, it will submit a new request to load up the data for the page as you scroll. If you monitor the web traffic you can spot the requests made and emulate those, I have done that below.
The result should be a list of company names, in ascending order in whatever the first sheet of Excel is.
Things you'll need:
Add References to:
Microsoft Scripting Runtime
Microsoft XML v6.0
Add the VBA-JSON code to your project. You can find that here
Edit
Changed the code to keep pulling data from the site, until there is no more items in the list. Thanks #Qharr for pointing this out.
Code
Public Sub SubmitRequest()
Const baseURL As String = "http://fortune.com/api/v2/list/2358051/expand/item/ranking/asc/"
Dim Url As String
Dim startingNumber As Long
Dim j As Long
Dim getRequest As MSXML2.XMLHTTP60
Dim Json As Object
Dim Companies As Object
Dim Company As Variant
Dim CompanyArray As Variant
'Create an array to hold each company
ReDim CompanyArray(0 To 50000)
'Create a new XMLHTTP object so we can place a get request
Set getRequest = New MSXML2.XMLHTTP60
'The api seems to only support returning 100 records at a time
'So do in batches of 100
Do
'Build the url, the format is something like
'0/100, where 0 is the starting position, and 100 is the ending position
Url = baseURL & startingNumber & "/" & startingNumber + 100
With getRequest
.Open "GET", Url
.send
'The response is a JSON object, for this code to work -
'You'll need this code https://github.com/VBA-tools/VBA-JSON
'What is returned is a dictionary
Set Json = JsonConverter.ParseJson(.responseText)
Set Companies = Json("list-items")
'Keep checking in batches of 100 until there are no more
If Companies.Count = 0 Then Exit Do
'Iterate the dictionary and return the title (which is the name)
For Each Company In Companies
CompanyArray(j) = Company("title")
j = j + 1
Next
End With
startingNumber = startingNumber + 100
Loop
ReDim Preserve CompanyArray(j - 1)
'Dump the data to the first sheet
ThisWorkbook.Sheets(1).Range("A1:A" & j) = WorksheetFunction.Transpose(CompanyArray)
End Sub
There you go:
Sub Getlinks()
Dim driver As New ChromeDriver
Dim pcount As Long, R as long
Dim posts As Object, post As Object
With driver
.get "http://fortune.com/fortune500/list/"
Do
.ExecuteScript ("window.scrollTo(0, document.body.scrollHeight);")
Set posts = .FindElementsByClass("company-title")
pcount = posts.Count
Loop Until pcount = 1000
For Each post In posts
R = R + 1: Cells(R, 1) = post.Text
Next post
End With
End Sub
Or even better, print as you go:
Sub Getlinksasyougo()
Dim driver As New ChromeDriver
Dim pcount As Long, R As Long, i As Long
Dim posts As Object, post As Object
With driver
.get "http://fortune.com/fortune500/list/"
i = 1
Do
.ExecuteScript ("window.scrollTo(0, document.body.scrollHeight);")
Set posts = .FindElementsByClass("company-title")
pcount = posts.Count
If i <> pcount Then
For R = i To pcount - 1
Cells(R, 1) = posts(R + 1).Text
Next R
i = pcount
End If
Loop Until pcount = 1000
End With
End Sub
Here's a way to approach it using the "look for the spinner element" method discussed in one of the comments, which helps you avoid having to specify the number of elements you're expecting the page to load. The class name of the spinner actually changes depending on whether or not it's visible, which makes it pretty easy to just wait for the spinner to become visible + disappear again before getting the page elements.
This method still involves some waiting; by default, it waits 1/10th of a second after each attempt to find the spinner, either until the spinner is found or for some maximum number of attempts. But that's much faster than waiting 5 seconds every time.
Also, unrelated, but don't write stuff to cells one at a time, it's really slow. It's much faster to write it to an array first + write the entire array at once.
Sub getLinks()
Dim bot As New ChromeDriver
bot.Get "http://fortune.com/fortune500/list/"
Dim posts As WebElements
Dim numPosts As Long
Dim finishedScrolling As Boolean
finishedScrolling = False
Do Until finishedScrolling
'Set beginning post count and scroll down
Dim startPosts As Long
startPosts = numPosts
bot.ExecuteScript "window.scrollTo(0, document.body.scrollHeight);"
'Wait for spinner to become visible, then wait for up to 5 seconds for rehide
Call waitForElements(bot, "div[class^='F500-spinner ']", 50)
Call waitForElements(bot, "div[class^='F500-spinner hide']", 50)
'See if any new posts have loaded
Set posts = bot.FindElementsByClass("company-title")
numPosts = posts.Count
If numPosts = startPosts Then
finishedScrolling = True
End If
Loop
'Write text to results array
Dim post As WebElement
ReDim resultsArr(1 To posts.Count, 1 To 1) As String
Dim i As Long
i = 1
For Each post In posts
resultsArr(i, 1) = post.Text
i = i + 1
Next
'Write array to sheet
With ActiveSheet
.Range(.Cells(1, 1), .Cells(UBound(resultsArr, 1), 1)).Value = resultsArr
End With
End Sub
Sub waitForElements(bot As WebDriver, css As String, maxAttempts As Long, Optional waitTimeMS As Long = 100)
'Use a CSS selector string to wait for element(s) to appear on a page or to reach max number of attempts
'By default, bot waits 0.1 second after each attempt
Dim i As Long
Dim foundElem As Boolean
foundElem = False
Do Until foundElem
i = i + 1
If bot.FindElementsByCss(css).Count > 0 Then
foundElem = True
ElseIf i = maxAttempts Then
foundElem = True
Else
bot.Wait waitTimeMS
End If
Loop
End Sub
Define a timeout (specified period of time that will be allowed to elapse) to get rid of the hardcoded delay. The timeout needs to be hardcoded.
The differences between this and your original code are:
The loop itself is running over and over (doesn't wait 6 s on each iteration) and checks for new content until new content is found or the timeout is reached.
If the lazy loading takes more time than expected for instance when loading number 21 to 50 the loop "waits" and tries to get new content for the maximum time defined in timeout.
Downside: On the last step when all content is loaded the loop will take as many seconds as the timeout is set to.
Code:
Sub Getlinks()
Dim driver As New ChromeDriver, prevlen&, curlen&
Dim posts As Object, post As Object
Dim timeout As Integer, startTime As Double
timeout = 10 ' set the timeout to 10 seconds
With driver
.get "http://fortune.com/fortune500/list/"
prevlen = .FindElementsByClass("company-title").Count
startTime = Timer ' set the initial starting time
Do
.ExecuteScript ("window.scrollTo(0, document.body.scrollHeight);")
Set posts = .FindElementsByClass("company-title")
curlen = posts.Count
If curlen > prevlen Then
startTime = Timer ' reset start time if new elements found
prevlen = curlen ' set new prevlen
End If
Loop While Round(Timer - startTime, 2) <= timeout ' check if timeout is reached
For Each post In posts
R = R + 1: Cells(R, 1) = post.Text
Next post
End With
End Sub
I don't know if this will help as it's still a 'hard-coded' solution but you could try a delay function rather than the wait function and see if that helps with the program exiting issue.
Function Delay(Seconds As Single)
Dim StopTime As Single: StopTime = Timer + Seconds
Do While Timer < StopTime
DoEvents
Loop
End Function
I think you are almost there.
Although I don't think you can avoid waiting, the work around is to keep a number of times checking for new posts as you scroll down with a shorter wait.
Example below is to check for new posts 5 times each with 2 seconds wait, so a total of 10 seconds before declaring end of the page. Adjust these 2 parameters to suit.
Sub Getlinks()
Dim driver As New ChromeDriver, prevlen&, curlen&
Dim posts As Object, post As Object
' Counter for number of times when there are NO NEW POSTS
Dim NoIncreaseCount As Integer
Const MaxNoIncreaseCount As Integer = 5
Const WaitTime As Integer = 2000 ' 2 seconds wait time each scroll down
With driver
.get "http://fortune.com/fortune500/list/"
prevlen = .FindElementsByClass("company-title").Count
NoIncreaseCount = 0
Do Until NoIncreaseCount = MaxNoIncreaseCount
.ExecuteScript ("window.scrollTo(0, document.body.scrollHeight);")
.Wait WaitTime
Set posts = .FindElementsByClass("company-title")
curlen = posts.Count
If prevlen < curlen Then
' There are new Posts
prevlen = curlen
NoIncreaseCount = 0
Else
' No new Posts
NoIncreaseCount = NoIncreaseCount + 1
End If
Loop
For Each post In posts
R = R + 1: Cells(R, 1) = post.Text
Next post
End With
End Sub

VBA Macro, get URL from given range loop

The way my code is currently set up, it gets the data from the URL that i've indicated in the code. However, I actually want to provide a list of URLs in Sheet2 that it would loop through until it's extracted all data. I dont want to have to update the code each time individually per URL. There are thousands... How would i be able to do that?
Here is the code:
Public Sub exceljson()
Dim https As Object, Json As Object, i As Integer
Dim Item As Variant
Set https = CreateObject("MSXML2.XMLHTTP")
https.Open "GET", "https://min-api.cryptocompare.com/data/price?fsym=USD&tsyms=BTC", False
https.Send
Set Json = JsonConverter.ParseJson(https.responseText)
i = 2
For Each Item In Json.Items
Sheets(1).Cells(i, 2).Value = Item
i = i + 1
Next
MsgBox ("complete")
End Sub
I'll just pretend that all of the URLS are in Column A here:
Public Sub exceljson()
Dim https As Object, Json As Object, i As Integer, j As Integer
Dim Item As Variant
Set https = CreateObject("MSXML2.XMLHTTP")
For j = 1 to Sheets(2).UsedRange.Rows.count
If Len(Trim$(Sheets(2).Cells(j, 1).Value2)) > 0 Then
https.Open "GET", Trim$(Sheets(2).Cells(j, 1).Value2), False
https.Send
Set Json = JsonConverter.ParseJson(https.responseText)
i = 2
For Each Item In Json.Items
Sheets(1).Cells(i, 2).Value = Item
i = i + 1
Next Item
End If
Next j
MsgBox ("complete")
End Sub
I like to use the trim() method to be safe that I'm not catching anything extra

Debugging in autocad VBA ide is not displaying where the error is

Whenever i am trying to debug or run the program and if it encounters error, the VBE (Autocad) doesn't display the line where the error is, unlike in other IDEs, it used to come at that line and highlight with yellow color. Also, the scroll doesn't work. I know i should install plugins but i am unable to help myself.
Option Explicit
Sub Test()
'Declarations
'Opened Document
Dim acDocu As AcadDocument
Set acDocu = ThisDrawing.Application.ActiveDocument
'Select on screen
Dim acSelectionSet As AcadSelectionSet
Set acSelectionSet = ThisDrawing.SelectionSets.Add("SjjEffffT")
acSelectionSet.SelectOnScreen
'Manipulating in loops for finding group names having objects selected
Dim entity As AcadEntity
Dim entityhandle() As String
Dim Grp As AcadGroup
Dim groupname() As String
Dim i As Integer
i = 0
Dim j As Integer
j = 0
Dim temp As Integer
temp = 0
Dim GrpEnt As AcadEntity
Dim grpenthandle As String
Dim entity_count As Integer
'Dim entity_array As Variant
entity_count = acSelectionSet.Count
ReDim entityhandle(entity_count)
ReDim groupname(entity_count)
For Each entity In acSelectionSet
'entity_array = entity
entityhandle(i) = entity.Handle
For Each Grp In ThisDrawing.groups
For Each GrpEnt In Grp
grpenthandle = GrpEnt.Handle
If entityhandle(i) = grpenthandle Then
If temp = 0 Then
groupname(j) = Grp.Name
Debug.Print "Group in selection:" & groupname(j)
j = j + 1
End If
End If
temp = temp + 1
Next
temp = 0
Next
i = i + 1
Next
'Copying the objects and pasting into new drawing
Dim acDocto As AcadDocument
Dim file_name As String
'file_name = InputBox("Enter the file name along with full path and extension")
file_name = "D:\PI_Tool_files_3223\D00440023new.DWG"
Set acDocto = Documents.Open(file_name)
Dim acObject As AcadObject
Dim retvalue As Variant
retvalue = acDocu.CopyObjects(entityhandle, acDocto.ModelSpace)
acSelectionSet.Delete
End Sub
The code is written above. But i think the problem is with the add-in as i can't debug.
The VBA IDE is pretty old (1998) and it has limited debugging abilities. You should stop using this, it's an obsolete technology, not actively supported by Microsoft/Autodesk anymore.
For some errors, it is not able to locate the line where the error occurred, and you're left with obscure error codes and useless messages.
Have you tried setting a breakpoint at the first possible line? (Set acDocu = ThisDrawing.Application.ActiveDocument)
Then step through to see the offending object/property/method.
It doesn't always work.
Can you load the code into a module, instead of "ThisDrawing", then debug?

Passing Conexion String to VBA Macro

I have always found you a great help when I have questions. This time it's something related to Excel VBA.
I have a macro that brings back data from a website. You simply have to hard code the connection string into it.( xmlHttp.Open "GET", "http://www.example.com", False )
Sub GET_HTML_DATA()
Dim xmlHttp As Object
Dim TR_col As Object, TR As Object
Dim TD_col As Object, TD As Object
Dim row As Long, col As Long
Set xmlHttp = CreateObject("MSXML2.XMLHTTP.6.0")
xmlHttp.Open "GET", "http://www.example.com", False
xmlHttp.setRequestHeader "Content-Type", "text/xml"
xmlHttp.send
Dim html As Object
Set html = CreateObject("htmlfile")
html.body.innerHTML = xmlHttp.responseText
Dim tbl As Object
Set tbl = html.getElementById("curr_table")
row = 1
col = 1
Set TR_col = html.getElementsByTagName("TR")
For Each TR In TR_col
Set TD_col = TR.getElementsByTagName("TD")
For Each TD In TD_col
Cells(row, col) = TD.innerText
col = col + 1
Next
col = 1
row = row + 1
Next
End Sub
I was wondering if and how can this code be changed to accept a parameter as the connection string so I can call on it Run "GET_HTML_DATA(parameter)"
I have tried to declare a parameter in the parenthesis and include that in place of www.example.com but when I run the macro it tells me The macro may not be available in this workbook..."
Am I doing it right or is there another way I do not know?
In your sub in the parenthesis you need to declare the parameter as (input As String) and then use "input" in your code. Then you can use that Run "GET_HTML_DATA(parameter)". Alternatively, I guess you can simply put all your code in a simple function if you would have to return some output like
Function myFunction(input As String) As Double
//code goes here
End Function