I'm trying to download table data of a webpage but it looks like that the webpage is too big.
If I put the ".response" data in a "String"-variable, and save it to a textfile, I'll get all the data what is on the webpage. If I do this after saving it to an "HTMLDocument"-variable I only get the first 51594 characters
This is my script:
Sub nemigaparts_articles()
Dim oHtml As HTMLDocument
Dim oElement As Object
Dim URL As String
Dim id As String
Dim z As Long
Dim y As Long
Dim row As Long
Dim totaal As Long
Dim begin As Long
'data for save as
Dim start As Long
Dim eind As Long
Application.ScreenUpdating = False
begin = 1
totaal = 2
row = 1
start = 1
eind = 0
'laat statusbalk zien
UserForm1.Show
For z = begin To totaal Step 1
'get page data
'URL = Sheets("drawings").Cells(z, 3).Value
'id = Sheets("drawings").Cells(z, 1).Value
URL = "https://nemigaparts.com/cat_spares/pet/porsche/964/13/902000/"
id = 1
Set oHtml = New HTMLDocument
With CreateObject("WINHTTP.WinHTTPRequest.5.1")
.Open "GET", LCase(URL), False
.send
oHtml.body.innerHTML = .responseText
End With
Sheets("articles").Cells(1, 11) = z
Dim imageurl As Object
Set imageurl = oHtml.getElementsByClassName("col-lg-9 col-md-9 col-sm-9 col-xs-12 xs-margin-bottom-50")(0)
URL_img = imageurl.getElementsByTagName("a")(0).href
Dim atribuut As Object
Set atribuut = oHtml.getElementsByClassName("technical")(0)
If Not atribuut Is Nothing Then
x = oHtml.getElementsByTagName("tr").Length - 1
MsgBox (x)
For y = 1 To x Step 1
Sheets("articles").Cells(row, 1) = id
Sheets("articles").Cells(row, 2) = URL_img
Sheets("articles").Cells(row, 3) = oHtml.getElementsByTagName("tr")(y).innerHTML
row = row + 1
Next
End If
DoEvents
UserForm1.Label2.Width = (z / totaal * 100) / 100 * 186
UserForm1.Label4.Caption = "Artikel " & z & " van " & totaal
Next
End Sub
Related
So I know this is long and not the prettiest, but what I am trying to accomplish is to cycle through a list of tables and look for a bookmark that I have placed in certain tables in the document. These tables have the ability to be anywhere in the doc, so I am looping through all and looking for each possible bookmark on each table.
Right now, the below is my current code. objDoc returns the correct Doc name and opens the correct Doc. The problem is after that, when the code cycles through the tables in that Doc, it does not see my Bookmarks. I have verified it is selecting the correct Doc and tables with this code. When I use the 'ActiveDoc' operator after 'objDoc.Activate', it selects the Doc I am running the code from, not objDoc where I meaning to perform this search. If I run this as a test macro in the Doc connected to objDoc outside of the below code, all variables assign correctly.
Please help, this is driving me crazy, thank you!
P.S. - also any help on slimming this down is welcome!
Dim objDoc As Document
Set objDoc = objWord.Documents.Open(strPath)
Dim fileName As String
fileName = Dir(strPath)
objDoc.Activate
Dim x As Long
Dim data0, data1, data2, data3, data4, data5, data6, data7, data8, data9, data10, data11, data12, data13, data14, data15, data16 As Long
x = 0
Dim J As Integer
Dim iTableNum As Integer
Dim oTbl As Table
objDoc.Activate
iTableNum = objDoc.Tables.Count
For J = 1 To objDoc.Tables.Count
Set oTbl = objDoc.Tables(J)
tryagain:
oTbl.Select
objDoc.Tables(J).Select ''''''ERROR LINE
If Selection.Bookmarks.Exists("data" & x) And x < 17 Then
iTableNum = objDoc.Tables.Count
'Exit For
If x = 0 Then
data0 = J
ElseIf x = 1 Then
data1 = J
ElseIf x = 2 Then
data2 = J
ElseIf x = 3 Then
data3 = J
ElseIf x = 4 Then
data4 = J
ElseIf x = 5 Then
data5 = J
ElseIf x = 6 Then
data6 = J
ElseIf x = 7 Then
data7 = J
ElseIf x = 8 Then
data8 = J
ElseIf x = 9 Then
data9 = J
ElseIf x = 10 Then
data10 = J
ElseIf x = 11 Then
data11 = J
ElseIf x = 12 Then
data12 = J
ElseIf x = 13 Then
data13 = J
ElseIf x = 14 Then
data14 = J
ElseIf x = 15 Then
data15 = J
Else
data16 = J
Exit For
End If
ElseIf x < 17 Then
x = x + 1
GoTo tryagain
End If
x = 0
Next J
x = 0
Something like this might be a little easier to manage:
Sub Tester()
Dim objDoc As Document, strPath As String
Dim x As Long, J As Long
Dim data(0 To 16) As Long
strPath = "some path here"
Set objDoc = Documents.Open(strPath)
For J = 1 To objDoc.Tables.Count 'loop over tables
With objDoc.Tables(J)
For x = LBound(data) To UBound(data) 'loop bookmarks
If .Range.Bookmarks.Exists("data" & x) Then data(x) = J
Next x
End With
Next J
'show the results
For x = LBound(data) To UBound(data)
Debug.Print x, data(x)
Next x
End Sub
There is no need to loop through the tables to find the bookmark. There can only be one bookmark of a given name in a document, so either it exists or it doesn't. Hence, there is no need to loop through all the tables and again through all the bookmarks for each table:
With objDoc
For x = LBound(Data) To UBound(Data) 'loop bookmarks
If .Bookmarks.Exists("data" & x) Then
If .Bookmarks("data" & x).Range.Information(wdWithInTable) = True Then
Data(x) = .Range(0, .Bookmarks("data" & x).Range.End).Tables.Count
End If
End If
Next x
End With
There is potential for further simplification (eliminating If tests) if you know that all the bookmarks exist and/or that any that do exist are in tables.
I am trying to pull data from a website (https://www.baseball-reference.com/teams/ARI/2017-schedule-scores.shtml) by using the following code:
Sub GetBaseballReferenceData()
'created loop so we can loop through all different team url's
Dim x As Integer
Dim i As Integer
For i = 1 To 30
x = Cells(Rows.Count, 2).End(xlUp).Row
x = x + 2
'gets the team abbreviation that we use in our url
Team = Cells(i, "A")
'these two strings are used for url, they don't change
Const bbref_site As String = "https://www.baseball-reference.com/teams/"
Const year_schedule_scores As String = "/2017-schedule-scores"
Dim qt As QueryTable
Dim ws As Worksheet
Set ws = ActiveSheet
'uses Url to return data
Set qt = ws.QueryTables.Add(Connection:="URL;" & bbref_site & Team & year_schedule_scores & ".shtml", Destination:=Cells(x, 2))
qt.Refresh BackgroundQuery:=False
Next i
End Sub
When I run the code it works and gets me the information I want. However, the W/L column should be formatted like this (1-2, 2-3, 3-0) and instead will be formatted as a date. When I try to reformat it as a text it returns an error code. How do I pull the data I want from the website as a text initially?
Thanks for any and all help!
I changed the code slightly
Edit: Added qt.WebDisableDateRecognition
Option Explicit
Sub GetBaseballReferenceData()
'created loop so we can loop through all different team url's
Dim x As Integer
Dim i As Integer
Dim Team As String
Dim qt As QueryTable
Dim ws As Worksheet
Dim WLRange As Range
'these two strings are used for url, they don't change
Const bbref_site As String = "https://www.baseball-reference.com/teams/"
Const year_schedule_scores As String = "/2017-schedule-scores"
Set ws = ActiveSheet
For i = 1 To 1
x = Cells(Rows.Count, 2).End(xlUp).Row
x = x + 2
'gets the team abbreviation that we use in our url
Team = Cells(i, "A")
'uses Url to return data
Set qt = ws.QueryTables.Add(Connection:="URL;" & bbref_site & Team & year_schedule_scores & ".shtml", Destination:=Cells(x, 2))
qt.WebDisableDateRecognition = True
qt.Refresh False
'qt.Refresh BackgroundQuery:=False
Next i
End Sub
You can also use XHR
Option Explicit
Public Sub GetSchedules()
Dim x As Long, i As Long, URL As String, Team As String
Const bbref_site As String = "https://www.baseball-reference.com/teams/"
Const year_schedule_scores As String = "/2017-schedule-scores"
Dim sResponse As String, HTML As New HTMLDocument, wsSchedule As Worksheet, wsTeam As Worksheet
Dim wb As Workbook: Set wb = ThisWorkbook
Set wsSchedule = wb.Worksheets("Schedules"): Set wsTeam = wb.Worksheets("TeamNames")
wsSchedule.Cells.ClearContents
Application.ScreenUpdating = False
Dim http As Object: Set http = CreateObject("MSXML2.XMLHTTP")
With wsTeam
For i = 1 To 30
Team = .Cells(i, "A")
URL = bbref_site & Team & year_schedule_scores & ".shtml"
http.Open "GET", URL, False
http.send
sResponse = StrConv(http.responseBody, vbUnicode)
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
With HTML
.body.innerHTML = sResponse
End With
WriteTable HTML, GetLastRow(wsSchedule, 1) + 2, wsSchedule
Next i
Application.ScreenUpdating = True
End With
End Sub
Public Sub WriteTable(ByVal HTML As HTMLDocument, Optional ByVal startRow As Long = 1, Optional ByVal ws As Worksheet)
Dim headers As Object, i As Long, columnCounter As Long
Dim columnInfo As Object, rowCounter As Long
With ws
Set headers = HTML.querySelectorAll("#team_schedule thead th")
For i = 0 To headers.Length - 1
columnCounter = columnCounter + 1
.Cells(startRow, columnCounter) = headers.item(i).innerText
Next i
Set columnInfo = HTML.querySelectorAll("#team_schedule tbody tr td")
columnCounter = 2
For i = 0 To columnInfo.Length - 1
If i Mod 20 = 0 Then
rowCounter = rowCounter + 1
columnCounter = 2
.Cells(startRow + rowCounter, 1) = rowCounter
Else
columnCounter = columnCounter + 1
End If
If columnCounter = 11 Then
.Cells(startRow + rowCounter, columnCounter) = Chr$(39) & columnInfo.item(i).innerText
Else
.Cells(startRow + rowCounter, columnCounter) = columnInfo.item(i).innerText
End If
Next i
End With
End Sub
I have an event handler Sub that listens for a response message from the Bloomberg API, and stores it in a Dd array.
According to MSDN, event handlers must be subs, but I would like to do further analysis on the data in my main method.
How can I reference the array created in the event handler, such that I can continue to process the data?
Event handler:
Private Sub session_ProcessEvent(ByVal obj As Object)
On Error GoTo errHandler
Dim eventObj As blpapicomLib2.Event
Set eventObj = obj
If Application.Ready Then
If eventObj.EventType = PARTIAL_RESPONSE Or eventObj.EventType = RESPONSE Then
Dim it As blpapicomLib2.MessageIterator
Set it = eventObj.CreateMessageIterator()
Dim numResponse As Integer
numResponse = 0
Dim data() As Variant
Do While it.Next()
numResponse = numResponse + 1
Dim msg As Message
Set msg = it.Message
Dim securityData As Element
Dim securityName As Element
Dim fieldData As Element
Set securityData = msg.GetElement("securityData")
Set securityName = securityData.GetElement("security")
Set fieldData = securityData.GetElement("fieldData")
Sheet1.Cells(currentRow, 4).Value = securityName.Value
Dim numDates As Integer
Dim numFields As Integer
numDates = fieldData.NumValues
numFields = fieldData.GetValue(0).NumElements
ReDim data(numDates, numFields, numResponse)
Dim b As Integer
For b = 0 To numDates - 1
Dim fields As blpapicomLib2.Element
Set fields = fieldData.GetValue(b)
Dim a As Integer
For a = 0 To numFields - 1
Dim field As Element
Set field = fields.GetElement(a)
data(b, a, numResponse) = field.Value
Sheet1.Cells(currentRow, a + 5).Value = data(b, a, numResponse)
Next
currentRow = currentRow + 1
Next b
Loop
' skip a row for next security
currentRow = currentRow + 1
End If
End If
Exit Sub
errHandler:
MsgBox Err.Description
End Sub
...and the sub I would like to process the array in here...
Public Sub RefDataExample()
' Calculate the number of securities and fields
Dim numSecurity As Integer
Dim numFields As Integer
numSecurity = 0
numFields = 0
' clear data area
Range("D4", "H60000").Clear
Do While Cells(numSecurity + 4, 1).Value <> ""
numSecurity = numSecurity + 1
Loop
Do While Cells(numFields + 4, 2).Value <> ""
numFields = numFields + 1
Loop
Dim sSecurity() As String
Dim sFields() As Variant
ReDim sSecurity(0 To numSecurity - 1) As String
ReDim sFields(0 To numFields - 1) As Variant
Dim i As Integer
For i = 0 To numSecurity - 1
sSecurity(i) = Cells(i + 4, 1).Value
Next i
For i = 0 To numFields - 1
sFields(i) = Cells(i + 4, 2).Value
Next i
bbControl.MakeRequest sSecurity, sFields
'Process response array here
End Sub
Remove
Dim data() As Variant
from within all the subs it currently appears in (like from within your session_ProcessEvent())
and place
Public data As Variant
at the top of the module containing your sub (actually you can place it at the top of any module you like)
and you'll have ”data” visible from all subs/functions of your project
Alright,
I've been cobbling together code to automate a task. I have a word document that has something like 300 lines that each have an identifier number, a title, and a website. I would like to search the document by identifier pull the title and website separately and enter them into an excel sheet separately. The identifiers are already listed in excel and I would like them to match up with the appropriate information.
I know its really, really messy -
Public Sub ParseDoc()
Dim list As Workbook
Dim doc As Document
Set doc = "C:\network\path\importantlist.doc"
Dim paras As Paragraphs
Set paras = doc.Paragraphs
Dim para As Paragraph
Dim sents As Sentences
Dim sent As Range
Set list = ActiveSheet
Dim i As Integer
Dim mystring As String
Dim length As Integer
Dim space As String
Dim dot As String
Dim space1 As String
Dim space2 As String
Dim XYZ As Range
dot = "."
space = " "
i = 1
While i < 300 'This loops for the duration of the identifier list in excel
mystring = Cells(i, 1) ' this pulls the unique identifier from the cell
For Each para In paras
Set sents = para.Range.Sentences ' this searches the document by paragraphs to sentences
For Each sent In sents
If InStr(1, sent, mystring) <> 0 Then 'If a the identifier is found
space1 = InStr(1, sent, space, vbTextCompare) 'measure the length to the first blank space (this indicates the title is about to begin)
space2 = InStr(1, sent, dot, vbTextCompare) ' This dot is the ".doc" and indicates the title has concluded, I want the text between these two characters
Set XYZ =
Start:= space1.range.start
End:= space2.range.start
'Here is where I am stuck, I have never used range or selection before and after looking around, I still feel very much at a loss on how to proceed forward...
Next
Next
End Sub
Updated:
Updates values for matching IDs
Appends records that have no matching ID
General Instructions
Insert this into a Excel code module
Set the correct values for the constants in ParseWordDocument()
Cross you finger
Run ParseWordDocument()
Let me know how it went
Option Explicit
Sub ParseWordDocument()
Const WordPath As String = "C:\Users\best buy\Downloads\stackoverflow\Sample Files\A203 Paralegal.docx"
Const iID = 1
Const iTitle = 2
Const iHyperLink = 3
Const TargetSheetName As String = "Sheet1"
Dim k As String, id As String, title As String, hAddress As String, hScreenTip As String, hTextToDisplay As String
Dim lastRow As Long, x As Long, y As Long
Dim arData, h
arData = getWordDocArray(WordPath, False)
With Worksheets(TargetSheetName)
lastRow = .Cells(Rows.Count, iID).End(xlUp).Row + 1
For x = 2 To lastRow
For y = 0 To UBound(arData, 2)
id = Trim(.Cells(x, iID))
If Len(id) And (id = arData(0, y)) Then
id = Trim(.Cells(x, iID))
title = arData(1, y)
hAddress = arData(2, y)
hScreenTip = arData(3, y)
hTextToDisplay = arData(4, y)
.Cells(x, iTitle) = title
.Hyperlinks.Add .Cells(x, iHyperLink), Address:=hAddress, ScreenTip:=hScreenTip, TextToDisplay:=hTextToDisplay
arData(0, y) = ""
Exit For
End If
Next
Next
For y = 0 To UBound(arData, 2)
id = arData(0, y)
If Len(id) Then
title = arData(1, y)
hAddress = arData(2, y)
hScreenTip = arData(3, y)
hTextToDisplay = arData(4, y)
.Cells(lastRow, iID) = id
.Cells(lastRow, iTitle) = title
.Hyperlinks.Add .Cells(lastRow, iHyperLink), Address:=hAddress, ScreenTip:=hScreenTip, TextToDisplay:=hTextToDisplay
arData(0, y) = ""
lastRow = lastRow + 1
End If
Next
End With
End Sub
Function getWordDocArray(WordPath As String, Optional ShowWord As Boolean = False) As Variant
Dim i As Integer, iStart As Integer, iEnd As Integer
Dim id As String, title As String
Dim arData, s
Dim wdApp, wdDoc, h
Set wdApp = CreateObject("Word.Application")
Set wdDoc = wdApp.Documents.Open(Filename:=WordPath, ReadOnly:=True)
wdApp.Visible = ShowWord
ReDim arData(4, 0)
For Each s In wdDoc.Sentences
On Error GoTo SkipSentence
iStart = InStr(s.Text, s.Words(2))
iEnd = InStr(s.Text, "(") - iStart
id = Trim(s.Words(1))
title = Mid(s.Text, iStart, iEnd)
Set h = s.Hyperlinks(1)
ReDim Preserve arData(4, i)
arData(0, i) = id
arData(1, i) = title
arData(2, i) = h.Address
arData(3, i) = h.ScreenTip
arData(4, i) = h.TextToDisplay
i = i + 1
SkipSentence:
On Error GoTo 0
Next
getWordDocArray = arData
If Not ShowWord Then
wdDoc.Close False
wdApp.QUIT
End If
Set wdDoc = Nothing
Set wdApp = Nothing
End Function
I need to constantly update excel file with information, obtained from the following link (warning, ukrainian language):
link to the Ministry of Finance web-site of Ukraine
Useful data is wrapped by the HTML tags <tbody></tbody>.
I need the similar code that retrieves the information from the table
Set htm = CreateObject("htmlFile")' #it doesn't work on mac os machine, but perfectly performs on windows
With CreateObject("msxml2.xmlhttp")
.Open "GET", <site_url_goes_here>, False
.send
htm.body.innerhtml = .responsetext
End With
With htm.getelementbyid("item")' <<<<<---what should I write here in order to parse data from the web-site table?
Sheet2.Cells(Row, 4).Value = p
For x = 1 To .Rows.Length - 1
For y = 0 To .Rows(x).Cells.Length - 1
Sheet2.Cells(Row, y + 1).Value = .Rows(x).Cells(y).innertext
Next y
Row = Row + 1
Next x
End With`
Below code will get the updated data from http://www.minfin.gov.ua in every 60 seconds.
Sub getData()
Application.OnTime Now + TimeSerial(0, 0, 60), "finance_data"
End Sub
Private Sub finance_data()
Dim url As String, lastRow As Long
Dim XMLHTTP As Object, html As Object
Dim tbl As Object, obj_tbl As Object
Dim TR As Object, TD As Object
Dim row As Long, col As Long
lastRow = Range("A" & Rows.Count).End(xlUp).row
url = "http://www.minfin.gov.ua/control/uk/publish/article?art_id=384069&cat_id=234036" & "&r=" & WorksheetFunction.RandBetween(1, 10000)
Set XMLHTTP = CreateObject("MSXML2.XMLHTTP")
XMLHTTP.Open "GET", url, False
XMLHTTP.setRequestHeader "Content-Type", "text/xml"
XMLHTTP.send
Set html = CreateObject("htmlfile")
html.body.innerHTML = XMLHTTP.ResponseText
Set obj_tbl = html.getelementsbytagname("table")
row = 1
col = 1
For Each tbl In obj_tbl
If tbl.classname = "MsoNormalTable" Then
Set TR = tbl.getelementsbytagname("TR")
For Each obj_row In TR
For Each TD In obj_row.getelementsbytagname("TD")
Cells(row, col) = TD.innerText
col = col + 1
Next
col = 1 ' reseting the value
row = row + 1
Next
End If
Next
getData
End Sub