Automatically collect data from multiple webpages for excel - vba

I've been trying to automate a task but recently came to an issue. Part of the task is to copy and paste 2 pieces of information from a website into the excel document.
Here is some examples of what the webpages looks like:
https://nvd.nist.gov/vuln/detail/CVE-2018-0253
https://nvd.nist.gov/vuln/detail/CVE-2018-0300
https://nvd.nist.gov/vuln/detail/CVE-2018-0256
The data that I wish to collect is the "Current Description" and the value for "CVSS v3.0 Base score"
There is always multiple links which I have to take the data from but they are all very similar, the only difference being what CVE-****-**** it is.
Currently I have it so that excel puts the actual links to the webpages into a list.
Is there a way that I can create a macro which automatically goes through the list of links and takes the "Current Description" and "CVSS v3.0 Base Score" data from the websites and puts them into cells in excel.
Thank you for any help / suggestions / tips.

This is how I would approach it. Obviously you can tweak how you handle missing information. I am just giving an example using the CVEs provided. You really should provide your efforts as I have no idea what approach you are using.
Option Explicit
Public Sub GetInfo()
Const BASE_URL As String = "https://nvd.nist.gov/vuln/detail/"
Dim i As Long, CVES(), tempArr()
CVES = Array("CVE-2018-0253", "CVE-2018-0300", "CVE-2018-0256")
For i = LBound(CVES) To UBound(CVES)
tempArr = GetDescriptionAndScore(BASE_URL & CVES(i))
Debug.Print "Desc: " & tempArr(0)
Debug.Print "Score: " & tempArr(1)
Erase tempArr
Next i
End Sub
Public Function GetDescriptionAndScore(ByVal url As String) As Variant
Dim html As New HTMLDocument, arr(1), sResponse As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
With html
.body.innerHTML = sResponse
arr(0) = .querySelector("[data-testid=vuln-description]").innerText
If InStr(1, sResponse, "This vulnerability is currently awaiting analysis") > 0 Then
arr(1) = "Not available"
Else
arr(1) = .querySelector("[data-testid=vuln-cvssv3-base-score]").innerText
End If
End With
GetDescriptionAndScore = arr
End Function
References (via VBE> Tools > References):
HTML Object Library

Related

Scraping a table for data

I need some help to download the stock table located in this URL:
I’ve tried with the code below to at least grab the first line, but what in the inspector is showed as :
<a target=”_blank”href=”/equities/apple-computer-inc” title=Apple Inc”>Apple</a>
I can only see:
A title={fullName} href="about:{pairLink}" target=_blank>{pairName}
This is the code I've put together:
Sub table()
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim Tables As MSHTML.IHTMLElementCollection
Dim table As MSHTML.IHTMLElement
Dim TableRow As MSHTML.IHTMLElement
XMLReq.Open "GET", "https://es.investing.com/stock-screener/?sp=country::5|sector::a|industry::a|equityType::a|exchange::a|eq_market_cap::110630000,1990000000000%3Ceq_market_cap;2", False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "problem" & vbNewLine & XMLReq.Status & "- " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
Set Tables = HTMLDoc.getElementsByTagName("Table")
For Each table In Tables
If table.className = "displayNone genTbl openTbl resultsStockScreenerTbl elpTbl " Then
For Each TableRow In table.getElementsByTagName("td")
Debug.Print TableRow.innerHTML
Next
End If
Next table
End Sub
Any help will be appreciated.
It looks like the actual data that fills the table is pulled from JSON from another request that some javascript or something runs on the page.
This might make it easier to parse the response with a json parser but it might be difficult to compose the correct request to get the data you want. The owners of the website might not want you do do this so they might not make it easy.
It looks like a POST request with a bunch of parameters and also a cookie sent along. So basically you would need to re-create this POST request by adding all of the correct parameters and the correct cookie in the header. I would get a web debugging program like fiddler (shown above) to look and see what is going on.
I was going to also suggest you check and see if that website provides an API but it looks like it doesn't?
EDIT:
I was actually able to get the JSON with the data you want by pretty much just copying the request used on the site:
Sub getdata()
Dim XMLReq As New MSXML2.XMLHTTP60
XMLReq.Open "POST", "https://es.investing.com/stock-screener/Service/SearchStocks", False
XMLReq.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
XMLReq.setRequestHeader "Accept", "application/json"
XMLReq.setRequestHeader "X-Requested-With", "XMLHttpRequest"
XMLReq.send "country%5B%5D=5&exchange%5B%5D=95&exchange%5B%5D=2&exchange%5B%5D=1&sector=5%2C12%2C3%2C8%2C9%2C1%2C7%2C6%2C2%2C11%2C4%2C10&industry=74%2C56%2C73%2C29%2C25%2C4%2C47%2C12%2C8%2C44%2C52%2C45%2C71%2C99%2C65%2C70%2C98%2C40%2C39%2C42%2C92%2C101%2C6%2C30%2C59%2C77%2C100%2C9%2C50%2C46%2C88%2C94%2C62%2C75%2C14%2C51%2C93%2C96%2C34%2C55%2C57%2C76%2C66%2C5%2C3%2C41%2C87%2C67%2C85%2C16%2C90%2C53%2C32%2C27%2C48%2C24%2C20%2C54%2C33%2C19%2C95%2C18%2C22%2C60%2C17%2C11%2C35%2C31%2C43%2C97%2C81%2C69%2C102%2C72%2C36%2C78%2C10%2C86%2C7%2C21%2C2%2C13%2C84%2C1%2C23%2C79%2C58%2C49%2C38%2C89%2C63%2C64%2C80%2C37%2C28%2C82%2C91%2C61%2C26%2C15%2C83%2C68&equityType=ORD%2CDRC%2CPreferred%2CUnit%2CClosedEnd%2CREIT%2CELKS%2COpenEnd%2CRight%2CParticipationShare%2CCapitalSecurity%2CPerpetualCapitalSecurity%2CGuaranteeCertificate%2CIGC%2CWarrant%2CSeniorNote%2CDebenture%2CETF%2CADR%2CETC%2CETN&eq_market_cap%5Bmin%5D=110630000&eq_market_cap%5Bmax%5D=1990000000000&pn=1&order%5Bcol%5D=eq_market_cap&order%5Bdir%5D=d"
If XMLReq.Status <> 200 Then
MsgBox "problem" & vbNewLine & XMLReq.Status & "- " & XMLReq.statusText
Exit Sub
End If
Debug.Print XMLReq.responseText
End Sub
So now you will just need to figure out how to parse the JSON response.

Extract list of all input boxes on webpage vba

I want to create a list on Excel of all the labels of input boxes on a webpage- so I imagine the code would be something like:
Sub IEInteract()
Dim i As Long
Dim URL As String
Dim IE As Object
Dim objCollection As Object
Set IE = CreateObject("InternetExplorer.Application")
IE.Visible = True
URL = "mywebsite.com"
IE.Navigate URL
Do While IE.ReadyState = 4: DoEvents: Loop
Do Until IE.ReadyState = 4: DoEvents: Loop
objCollection = IE.Document.getElementsByTagName("input")
For Each el In objCollection
label = el.label 'or something like that????'
Debug.Print label
Next el
End Sub
Where am I going wrong? Thanks
BTW My VBA is OK, but my HTML is non-existent.
For learning purposes maybe choose a website that has more obvious inputboxes, rather than dropdowns.
Many inputboxes won't be pre-populated so maybe consider reading other properties of the retrieved elements. Or even writing to them and then retrieving those values.
Selecting by tag name can bring back a host of items that you might not have expected.
Bearing all of the above in mind. Try running the following, which generates a collection of <input> tag elements.
Code:
Option Explicit
Public Sub PrintTagInfo()
'Tools > references > Microsoft XML and HTML Object library
Dim http As New XMLHTTP60 '<== this will be specific to your excel version
Dim html As New HTMLDocument
With http
.Open "GET", "https://www.mrexcel.com/forum/register.php", False
.send
html.body.innerHTML = .responseText
End With
Dim inputBoxes As MSHTML.IHTMLElementCollection, iBox As MSHTML.IHTMLElement, i As Long
Set inputBoxes = html.getElementsByTagName("input") '<== the collection of input tags on the page
'<== These are input boxes i.e. you are putting info into them so perhaps populate and then try to read what is in the entry box?
For Each iBox In inputBoxes
Debug.Print "Result #" & i + 1
Debug.Print vbNewLine
Debug.Print "ID: " & iBox.ID '<== select a sample of properties to print out as some maybe empty
Debug.Print "ClassName: " & iBox.className,
Debug.Print "Title: " & iBox.Title
Debug.Print String$(20, Chr$(61))
Debug.Print vbNewLine
i = i + 1
Next iBox
End Sub
Sample output:
From the above, it looks like class name might be in some ways more informative if you are looking to target boxes to input information into.
An initial inspection of the page source, selecting an inputbox and right-click > inspect... will help you refine your choices.
I noticed that a lot of the boxes of interest had the Input tag and then type = "text"
This means you can target elements matching this pattern using CSS selectors. In this case using the selector input[type=""text""].
Adjusting the former code to factor this in gives a smaller set of more targeted results. Note, using .querySelectorAll, to apply the CSS selector, returns a NodeList object which requires a different method of iterating over. A For Each Loop will cause Excel to crash as described here.
Code:
Option Explicit
Public Sub PrintTagInfo()
'Tools > references > Microsoft XML and HTML Object library
Dim http As New XMLHTTP60 '<== this will be specific to your excel version
Dim html As New HTMLDocument
With http
.Open "GET", "https://www.mrexcel.com/forum/register.php", False
.send
html.body.innerHTML = .responseText
End With
Dim inputBoxes As Object, i As Long
Set inputBoxes = html.querySelectorAll("input[type=""text""]") '<== the collection of text input boxes on page. Returned as a NodeList
'<== These are input boxes i.e. you are putting info into them so perhaps populate and then try to read what is in the entry box?
For i = 0 To inputBoxes.Length - 1
Debug.Print "Result #" & i + 1
Debug.Print vbNewLine
Debug.Print "ID: " & inputBoxes.Item(i).ID '<== select a sample of properties to print out as some maybe empty
Debug.Print "ClassName: " & inputBoxes.Item(i).className,
Debug.Print "Title: " & inputBoxes.Item(i).Title
Debug.Print String$(20, Chr$(61))
Debug.Print vbNewLine
Next i
End Sub
Sample results:
Note: I have edited the spacing to fit more into the image.
References added via VBE > Tools > References
Last two are those of interest. The bottom one will be version specific and you will need to re-write XMLHTTP60 which is for XML 6.0 to target your version of Excel if not using Excel 2016.

Web-scraping across multipages without even knowing the last page number

Running my code for a site to crawl the titles of different tutorials spreading across several pages, I found it working flawless. I tried to write some code not depending on the last page number the url has but on the status code until it shows http.status<>200. The code I'm pasting below is working impeccably in this case. However, Trouble comes up when I try to use another url to see whether it breaks automatically but found that the code did fetch all the results but did not break. What is the workaround in this case so that the code will break when it is done and stop the macro? Here is the working one?
Sub WiseOwl()
Const mlink = "http://www.wiseowl.co.uk/videos/default"
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim post As Object
Do While True
y = y + 1
With http
.Open "GET", mlink & "-" & y & ".htm", False
.send
If .Status <> 200 Then
MsgBox "It's done"
Exit Sub
End If
html.body.innerHTML = .responseText
End With
For Each post In html.getElementsByClassName("woVideoListDefaultSeriesTitle")
With post.getElementsByTagName("a")
x = x + 1
If .Length Then Cells(x, 1) = .item(0).innerText
End With
Next post
Loop
End Sub
I found a logic to get around with yellowpage. My update script is able to parse yellowpage but breaks before scraping the last page because there is no "Next Page" button. I tried with this:
"https://www.dropbox.com/s/iptqm79b0byw3dz/Yellowpage.txt?dl=0"
However, the same logic I tried to apply with torrent site but it doesn't work here:
"https://www.yify-torrent.org/genres/western/p-1/"
You can always rely on elements if they exits or not. Here for example, if you try to use the object which you have set your element to, you will get:
Run-time error '91': Object variable or With block variable not set
This is the key you should be looking for to put an end to your code. Please see the below example:
Sub yify()
Const mlink = "https://www.yify-torrent.org/genres/western/p-"
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim post As Object
Dim posts As Object
y = 1
Do
With http
.Open "GET", mlink & y & "/", False
.send
html.body.innerHTML = .responseText
End With
Set posts = html.getElementsByClassName("mv")
On Error GoTo Endofpage
Debug.Print Len(posts) 'to force Error 91
For Each post In posts
With post.getElementsByTagName("div")
x = x + 1
If .Length Then Cells(x, 1) = .Item(0).innerText
End With
Next post
y = y + 1
Endofpage:
Loop Until Err.Number = 91
Debug.Print "It's over"
End Sub

url checker VBA, when redirected, show redirected url

I'm quite new to EXCEL VBA's and I'm kinda stuck finding a way to create a MACRO that shows whether a url is still active (200 ok),
or may be redirected, and if so, I want to know to what URL. And when it's not working at all, then return the right code with the reason the URL isn't working.
So at the moment I have a script that actually works but it doesn't return the url to which an url is redirected to.
It only returns (200 OK) when an url is still active, or the url that the original url has been redirected to is still active. So I know which URLs are dead or are redirected to a dead URL.
But I want to take it a step futher.
As the URLs that I want to check are in the "A" column at the moment, and the results return in the "B" column, I want to see the URL to which I've been redirected in the C column, everytime there an URL has been redirected.
I did find some functions online that should do the job but for some reason I can't fit them in my SUB. Like I mentioned before, it's all quite new to me.
This is what I have at the moment:
Sub CheckHyperlinks()
Dim oColumn As Range
Set oColumn = GetColumn() '' replace this with code to get the relevant column
Dim oCell As Range
For Each oCell In oColumn.Cells
If oCell.Hyperlinks.Count > 0 Then
Dim oHyperlink As Hyperlink
Set oHyperlink = oCell.Hyperlinks(1) '' I assume only 1 hyperlink per cell
Dim strResult As String
strResult = GetResult(oHyperlink.Address)
oCell.Offset(0, 1).Value = strResult
End If
Next oCell
End Sub
Private Function GetResult(ByVal strUrl As String) As String
On Error GoTo ErrorHandler
Dim oHttp As New MSXML2.ServerXMLHTTP30
oHttp.Open "HEAD", strUrl, False
oHttp.send
GetResult = oHttp.Status & " " & oHttp.statusText
Exit Function
ErrorHandler:
GetResult = "Error: " & Err.Description
End Function
I hope one of you could help me out.
Its better to use the WinHttp COM object. That will let you "disable" redirect handling. Read this forum post.
The component you need to reference is Microsoft WinHTTP Services.
Public Function GetResult(ByVal strUrl As String, Optional ByRef isRedirect As Boolean, Optional ByRef target As String) As String
Dim oHttp As New WinHttp.WinHttpRequest
oHttp.Option(WinHttpRequestOption_EnableRedirects) = False
oHttp.Open "HEAD", strUrl, False
oHttp.send
GetResult = oHttp.Status & " " & oHttp.statusText
If oHttp.Status = 301 Or oHttp.Status = 302 Then
isRedirect = True
target = oHttp.getResponseHeader("Location")
Else
isRedirect = False
target = Nothing
End If
End Function

VBA extract and parse data from website to Word

I'm trying to extract some data from here: http://www.hnb.hr/tecajn/f140215.dat
This is the exchange rate list from the Croatian National Bank. The file name "f140215.dat" is basically a date, formatted in the following order:
"f" "DDMMYY" ".dat"
I intend to have the data organized in a Word table, which contains the following cells:
Cell#1 where a user would manually input a date in the following
format: "MMM DD, YYYY"
Cell#2 where a user would manually input the requested currency code
name (USD, GBP, etc)
Cell#3 where the extracted exchange rate should appear for the
specified date and currency.
Underneath the table there is an "UPDATE" button that updates the Cell#3 information. The script I'm asking for should be connected to that button.
After clicking the button, I'd like the script to do the following:
Figure out which page to go to based on the date inputted in Cell #1.
For example, if the Cell#1 contains "February 14, 2015", the script
should point to "http://www.hnb.hr/tecajn/f140215.dat"
On that page, grab the middle value for the currency specified in
Cell#2. For example, if Cell#2 contains "USD", the script should
extract "6,766508" which is the middle value for "840USD001". Only
the middle value is relevant.
Write this value to Cell#3.
So to sum it up, based in the criteria specified in the two table cells, the script needs to identify which page to go to and what data to extract from it, and with that data populate the third cell.
Hope I explained it well enough. This is only a part of the whole invoice generator I'm building. So far I've gotten everything to work, but this I really don't even know how to start. I can send the whole thing if needed, but figured it's not exactly relevant.
EDIT:
I watched some tutorials and played around, and this is what I got so far.
Enum READYSTATE
READYSTATE_UNINITIALIZED = 0
READYSTATE_LOADING = 1
READYSTATE_LOADED = 2
READYSTATE_INTERACTIVE = 3
READYSTATE_COMPLETE = 4
End Enum
Sub Test()
Dim ie As New InternetExplorer
ie.Visible = False
ie.navigate "http://www.hnb.hr/tecajn/f140215.dat"
Do While ie.READYSTATE <> READYSTATE_COMPLETE
DoEvents
Loop
Dim html As HTMLDocument
Set html = ie.document
MsgBox html.DocumentElement.innerText
End Sub
I know it's not much, but like I said, I'm new at this. I was able to get the data into the message box, but I have no idea how to parse it, and without that I can't really do anything mentioned above. What now?
EDIT 2:
Alright!! Made some progress! I've managed to parse it by using the split function:
Sub Test()
Dim ie As New InternetExplorer
ie.Visible = False
ie.navigate "http://www.hnb.hr/tecajn/f140215.dat"
Do While ie.READYSTATE <> READYSTATE_COMPLETE
DoEvents
Loop
Dim html As HTMLDocument
Set html = ie.document
Dim getData As String
getData = html.DocumentElement.innerText
'replaced all the space fields with line breaks
Dim repData As String
repData = Replace(getData, " ", vbCrLf)
'used line breaks as separators
Dim splData As Variant
splData = Split(repData, vbCrLf)
MsgBox splData(1)
MsgBox splData(2)
MsgBox splData(3)
End Sub
Right now it displays the parsed data in message boxes. The rest should be easy!
Addendum from OP's comment:
This is a part of the continued code:
Dim cur As String
cur = ActiveDocument.SelectContentControlsByTitle("valCombo").Item(1).Range.Text
If cur = "USD" Then
ActiveDocument.Tables(1).Cell(7, 3).Range.Text = splData(40) & " HRK"
End If
If cur = "EUR" Then
ActiveDocument.Tables(1).Cell(7, 3).Range.Text = splData(20) & " HRK"
End If
This way it works, but I'd like to set ActiveDocument.Tables(1).Cell(7, 3).Range.Text as a string. However, once I do that, it doesn't do anything. Why is that?
This should help you with the first half of your project; that being the retrieval of the data. As I mentioned in my earlier comment, data retrieval such as this is better suited to an MSXML2.ServerXMLHTT type of object.
You will have to go into the VBE's Tools ► References and add Microsoft XML v6.0.
Sub scrape_CNB()
Dim u As String, dtDATE As Date, xmlHTTP As MSXML2.ServerXMLHTTP60
Dim sTMP As String, sCURR As String
Dim i As Long, j As Long, vLINE As Variant, vRATE As Variant
On Error GoTo CleanUp
Set xmlHTTP = New MSXML2.ServerXMLHTTP60
sCURR = "USD"
dtDATE = CDate("February 14, 2015")
With xmlHTTP
u = "http://www.hnb.hr/tecajn/f" & Format(dtDATE, "ddmmyy") & ".dat"
.Open "GET", u, False
.setRequestHeader "Content-Type", "text/xml"
.send
If .Status <> 200 Then GoTo CleanUp
sTMP = .responseText
vLINE = Split(sTMP, Chr(13) & Chr(10))
For i = LBound(vLINE) To UBound(vLINE)
If CBool(InStr(1, vLINE(i), sCURR, vbTextCompare)) Then
Do While CBool(InStr(1, vLINE(i), Chr(32) & Chr(32))): vLINE(i) = Replace(vLINE(i), Chr(32) & Chr(32), Chr(32)): Loop
vRATE = Split(vLINE(i), Chr(32))
For j = LBound(vRATE) To UBound(vRATE)
MsgBox j & ": " & vRATE(j)
Next j
Exit For
End If
Next i
End With
CleanUp:
Set xmlHTTP = Nothing
End Sub
Since you are not initiating a full Internet.Explorer object, this should be much quicker and the .responseText that is returned is raw text, not HTML.
TBH, I find the cursor position based VBA programming within Word to be hard to deal with; preferring the one-to-one explicitly defined relationship(s) with an Excel worksheet. You may want to consider using Excel as a data repository and merging with Word to provide your invoice output.
Addendum:
Dim cur As String, t as long, r as long, c as long
cur = ActiveDocument.SelectContentControlsByTitle("valCombo").Item(1).Range.Text
t = 1: r = 7: c = 3
Select Case cur
Case "USD"
ActiveDocument.Tables(t).Cell(r, c).Range.Text = splData(40) & " HRK"
Case "EUR"
ActiveDocument.Tables(t).Cell(r, c).Range.Text = splData(20) & " HRK"
End Select