Scraping a table for data - vba

I need some help to download the stock table located in this URL:
I’ve tried with the code below to at least grab the first line, but what in the inspector is showed as :
<a target=”_blank”href=”/equities/apple-computer-inc” title=Apple Inc”>Apple</a>
I can only see:
A title={fullName} href="about:{pairLink}" target=_blank>{pairName}
This is the code I've put together:
Sub table()
Dim XMLReq As New MSXML2.XMLHTTP60
Dim HTMLDoc As New MSHTML.HTMLDocument
Dim Tables As MSHTML.IHTMLElementCollection
Dim table As MSHTML.IHTMLElement
Dim TableRow As MSHTML.IHTMLElement
XMLReq.Open "GET", "https://es.investing.com/stock-screener/?sp=country::5|sector::a|industry::a|equityType::a|exchange::a|eq_market_cap::110630000,1990000000000%3Ceq_market_cap;2", False
XMLReq.send
If XMLReq.Status <> 200 Then
MsgBox "problem" & vbNewLine & XMLReq.Status & "- " & XMLReq.statusText
Exit Sub
End If
HTMLDoc.body.innerHTML = XMLReq.responseText
Set Tables = HTMLDoc.getElementsByTagName("Table")
For Each table In Tables
If table.className = "displayNone genTbl openTbl resultsStockScreenerTbl elpTbl " Then
For Each TableRow In table.getElementsByTagName("td")
Debug.Print TableRow.innerHTML
Next
End If
Next table
End Sub
Any help will be appreciated.

It looks like the actual data that fills the table is pulled from JSON from another request that some javascript or something runs on the page.
This might make it easier to parse the response with a json parser but it might be difficult to compose the correct request to get the data you want. The owners of the website might not want you do do this so they might not make it easy.
It looks like a POST request with a bunch of parameters and also a cookie sent along. So basically you would need to re-create this POST request by adding all of the correct parameters and the correct cookie in the header. I would get a web debugging program like fiddler (shown above) to look and see what is going on.
I was going to also suggest you check and see if that website provides an API but it looks like it doesn't?
EDIT:
I was actually able to get the JSON with the data you want by pretty much just copying the request used on the site:
Sub getdata()
Dim XMLReq As New MSXML2.XMLHTTP60
XMLReq.Open "POST", "https://es.investing.com/stock-screener/Service/SearchStocks", False
XMLReq.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
XMLReq.setRequestHeader "Accept", "application/json"
XMLReq.setRequestHeader "X-Requested-With", "XMLHttpRequest"
XMLReq.send "country%5B%5D=5&exchange%5B%5D=95&exchange%5B%5D=2&exchange%5B%5D=1&sector=5%2C12%2C3%2C8%2C9%2C1%2C7%2C6%2C2%2C11%2C4%2C10&industry=74%2C56%2C73%2C29%2C25%2C4%2C47%2C12%2C8%2C44%2C52%2C45%2C71%2C99%2C65%2C70%2C98%2C40%2C39%2C42%2C92%2C101%2C6%2C30%2C59%2C77%2C100%2C9%2C50%2C46%2C88%2C94%2C62%2C75%2C14%2C51%2C93%2C96%2C34%2C55%2C57%2C76%2C66%2C5%2C3%2C41%2C87%2C67%2C85%2C16%2C90%2C53%2C32%2C27%2C48%2C24%2C20%2C54%2C33%2C19%2C95%2C18%2C22%2C60%2C17%2C11%2C35%2C31%2C43%2C97%2C81%2C69%2C102%2C72%2C36%2C78%2C10%2C86%2C7%2C21%2C2%2C13%2C84%2C1%2C23%2C79%2C58%2C49%2C38%2C89%2C63%2C64%2C80%2C37%2C28%2C82%2C91%2C61%2C26%2C15%2C83%2C68&equityType=ORD%2CDRC%2CPreferred%2CUnit%2CClosedEnd%2CREIT%2CELKS%2COpenEnd%2CRight%2CParticipationShare%2CCapitalSecurity%2CPerpetualCapitalSecurity%2CGuaranteeCertificate%2CIGC%2CWarrant%2CSeniorNote%2CDebenture%2CETF%2CADR%2CETC%2CETN&eq_market_cap%5Bmin%5D=110630000&eq_market_cap%5Bmax%5D=1990000000000&pn=1&order%5Bcol%5D=eq_market_cap&order%5Bdir%5D=d"
If XMLReq.Status <> 200 Then
MsgBox "problem" & vbNewLine & XMLReq.Status & "- " & XMLReq.statusText
Exit Sub
End If
Debug.Print XMLReq.responseText
End Sub
So now you will just need to figure out how to parse the JSON response.

Related

Automatically collect data from multiple webpages for excel

I've been trying to automate a task but recently came to an issue. Part of the task is to copy and paste 2 pieces of information from a website into the excel document.
Here is some examples of what the webpages looks like:
https://nvd.nist.gov/vuln/detail/CVE-2018-0253
https://nvd.nist.gov/vuln/detail/CVE-2018-0300
https://nvd.nist.gov/vuln/detail/CVE-2018-0256
The data that I wish to collect is the "Current Description" and the value for "CVSS v3.0 Base score"
There is always multiple links which I have to take the data from but they are all very similar, the only difference being what CVE-****-**** it is.
Currently I have it so that excel puts the actual links to the webpages into a list.
Is there a way that I can create a macro which automatically goes through the list of links and takes the "Current Description" and "CVSS v3.0 Base Score" data from the websites and puts them into cells in excel.
Thank you for any help / suggestions / tips.
This is how I would approach it. Obviously you can tweak how you handle missing information. I am just giving an example using the CVEs provided. You really should provide your efforts as I have no idea what approach you are using.
Option Explicit
Public Sub GetInfo()
Const BASE_URL As String = "https://nvd.nist.gov/vuln/detail/"
Dim i As Long, CVES(), tempArr()
CVES = Array("CVE-2018-0253", "CVE-2018-0300", "CVE-2018-0256")
For i = LBound(CVES) To UBound(CVES)
tempArr = GetDescriptionAndScore(BASE_URL & CVES(i))
Debug.Print "Desc: " & tempArr(0)
Debug.Print "Score: " & tempArr(1)
Erase tempArr
Next i
End Sub
Public Function GetDescriptionAndScore(ByVal url As String) As Variant
Dim html As New HTMLDocument, arr(1), sResponse As String
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", url, False
.send
sResponse = StrConv(.responseBody, vbUnicode)
End With
sResponse = Mid$(sResponse, InStr(1, sResponse, "<!DOCTYPE "))
With html
.body.innerHTML = sResponse
arr(0) = .querySelector("[data-testid=vuln-description]").innerText
If InStr(1, sResponse, "This vulnerability is currently awaiting analysis") > 0 Then
arr(1) = "Not available"
Else
arr(1) = .querySelector("[data-testid=vuln-cvssv3-base-score]").innerText
End If
End With
GetDescriptionAndScore = arr
End Function
References (via VBE> Tools > References):
HTML Object Library

VBA MSXML2.XMLHTTP60 with Yahoo finance

I am pretty new with VBA, and the issue is that I am trying to scrape the web: https://finance.yahoo.com/screener/predefined/Aluminum and get all sectors of each ticker of the list. The complex part (at least for me) is remove the filters "price" and "exchange" and send the request, I am using the following code, but I can´t not connect it propertly. Some suggestion? Many thanks
sub connect_to_web
Dim XMLPage As New MSXML2.XMLHTTP60
Dim url As String
Dim query as string
url="https://query1.finance.yahoo.com/v1/finance/screener?lang=en-US&region=US&formatted=true&corsDomain=finance.yahoo.com"
XMLPage.Open "Post", url, False
XMLPage.setRequestHeader "Content-Type", "application/x-www-form-urlencoded; charset=UTF-8"
XMLPage.send ""
Debug.Print XMLPage.Status
XMLPage.send ""
Debug.Print XMLPage.Status
end sub

401 while making API call from Excel VBA

We have an internal website, which supports working with API also. When i access an API URI in browser i'm getting JSON response, but if i make the same call from Excel i'm getting 401, invalid credentials. Please find my VBA Code below.
Dim Serv2 As New WinHttpRequest
Sub Conn()
Url = "https://uri/api/nag/"
With Serv2
.Open "GET", Url, False
.Send
MsgBox .Status & " " & .ResponseText
End With
Set Serv2 = Nothing
End Sub
Note: The url provided is sample data.
Many Thanks in advance !!!

Web-scraping across multipages without even knowing the last page number

Running my code for a site to crawl the titles of different tutorials spreading across several pages, I found it working flawless. I tried to write some code not depending on the last page number the url has but on the status code until it shows http.status<>200. The code I'm pasting below is working impeccably in this case. However, Trouble comes up when I try to use another url to see whether it breaks automatically but found that the code did fetch all the results but did not break. What is the workaround in this case so that the code will break when it is done and stop the macro? Here is the working one?
Sub WiseOwl()
Const mlink = "http://www.wiseowl.co.uk/videos/default"
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim post As Object
Do While True
y = y + 1
With http
.Open "GET", mlink & "-" & y & ".htm", False
.send
If .Status <> 200 Then
MsgBox "It's done"
Exit Sub
End If
html.body.innerHTML = .responseText
End With
For Each post In html.getElementsByClassName("woVideoListDefaultSeriesTitle")
With post.getElementsByTagName("a")
x = x + 1
If .Length Then Cells(x, 1) = .item(0).innerText
End With
Next post
Loop
End Sub
I found a logic to get around with yellowpage. My update script is able to parse yellowpage but breaks before scraping the last page because there is no "Next Page" button. I tried with this:
"https://www.dropbox.com/s/iptqm79b0byw3dz/Yellowpage.txt?dl=0"
However, the same logic I tried to apply with torrent site but it doesn't work here:
"https://www.yify-torrent.org/genres/western/p-1/"
You can always rely on elements if they exits or not. Here for example, if you try to use the object which you have set your element to, you will get:
Run-time error '91': Object variable or With block variable not set
This is the key you should be looking for to put an end to your code. Please see the below example:
Sub yify()
Const mlink = "https://www.yify-torrent.org/genres/western/p-"
Dim http As New XMLHTTP60, html As New HTMLDocument
Dim post As Object
Dim posts As Object
y = 1
Do
With http
.Open "GET", mlink & y & "/", False
.send
html.body.innerHTML = .responseText
End With
Set posts = html.getElementsByClassName("mv")
On Error GoTo Endofpage
Debug.Print Len(posts) 'to force Error 91
For Each post In posts
With post.getElementsByTagName("div")
x = x + 1
If .Length Then Cells(x, 1) = .Item(0).innerText
End With
Next post
y = y + 1
Endofpage:
Loop Until Err.Number = 91
Debug.Print "It's over"
End Sub

url checker VBA, when redirected, show redirected url

I'm quite new to EXCEL VBA's and I'm kinda stuck finding a way to create a MACRO that shows whether a url is still active (200 ok),
or may be redirected, and if so, I want to know to what URL. And when it's not working at all, then return the right code with the reason the URL isn't working.
So at the moment I have a script that actually works but it doesn't return the url to which an url is redirected to.
It only returns (200 OK) when an url is still active, or the url that the original url has been redirected to is still active. So I know which URLs are dead or are redirected to a dead URL.
But I want to take it a step futher.
As the URLs that I want to check are in the "A" column at the moment, and the results return in the "B" column, I want to see the URL to which I've been redirected in the C column, everytime there an URL has been redirected.
I did find some functions online that should do the job but for some reason I can't fit them in my SUB. Like I mentioned before, it's all quite new to me.
This is what I have at the moment:
Sub CheckHyperlinks()
Dim oColumn As Range
Set oColumn = GetColumn() '' replace this with code to get the relevant column
Dim oCell As Range
For Each oCell In oColumn.Cells
If oCell.Hyperlinks.Count > 0 Then
Dim oHyperlink As Hyperlink
Set oHyperlink = oCell.Hyperlinks(1) '' I assume only 1 hyperlink per cell
Dim strResult As String
strResult = GetResult(oHyperlink.Address)
oCell.Offset(0, 1).Value = strResult
End If
Next oCell
End Sub
Private Function GetResult(ByVal strUrl As String) As String
On Error GoTo ErrorHandler
Dim oHttp As New MSXML2.ServerXMLHTTP30
oHttp.Open "HEAD", strUrl, False
oHttp.send
GetResult = oHttp.Status & " " & oHttp.statusText
Exit Function
ErrorHandler:
GetResult = "Error: " & Err.Description
End Function
I hope one of you could help me out.
Its better to use the WinHttp COM object. That will let you "disable" redirect handling. Read this forum post.
The component you need to reference is Microsoft WinHTTP Services.
Public Function GetResult(ByVal strUrl As String, Optional ByRef isRedirect As Boolean, Optional ByRef target As String) As String
Dim oHttp As New WinHttp.WinHttpRequest
oHttp.Option(WinHttpRequestOption_EnableRedirects) = False
oHttp.Open "HEAD", strUrl, False
oHttp.send
GetResult = oHttp.Status & " " & oHttp.statusText
If oHttp.Status = 301 Or oHttp.Status = 302 Then
isRedirect = True
target = oHttp.getResponseHeader("Location")
Else
isRedirect = False
target = Nothing
End If
End Function