Why is it not possible to access a public website? - vba

This url can not be opened with URL-copy&paste it shows only an error. I have tried in vain to get access through MSXML2.XMLHTTP, same result!
This is my code:
Sub GetDataWebsite()
Const URL = "http://Zvg-port.de/index.php"
Dim HTML As New HTMLDocument
Dim elmt As Object
Dim x As Long
With CreateObject("MSXML2.XMLHTTP")
.Open "POST", URL, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send
HTML.body.innerHTML = .responseText
End With
Set elmt = HTML.querySelectorAll("tr") 'or any class or tag or name
For x = 0 To elmt.Length - 1
ActiveSheet.Cells(x + 2, 2) = elmt.Item(x).innerText
Next x
End Sub
Nothing happens! What could be the problem? Thanks!

For your first question, you need to add additional parameters to the initial request body.
Weirdly, one has to keep working off html, rather than setting anything into derived variables, if one wants to use querySelectorAll().
For your second question, the result urls are expected to be navigated to after coming from the search page. A little testing indicates a referer header is needed. I know an request, with/without html session, with referer header will work because I tested with Python, but I haven't worked out what the missing bit is for VBA; my current attempts are returning odd encoding that also looks truncated.
Currently, the easiest way I see, if sticking with VBA, to ensure following links, would be to automate a browser, gather the results and the result links, then navigate to each link.
Current code (which answers your first question):
Option Explicit
Public Sub GetDataZvgPort()
Const URL = "https://www.zvg-portal.de/index.php?button=Suchen"
Dim html As MSHTML.HTMLDocument, xhr As Object
Set html = New MSHTML.HTMLDocument
Set xhr = CreateObject("MSXML2.ServerXMLHTTP.6.0")
Dim headers As Variant
With xhr
.Open "POST", URL, False
.setRequestHeader "Content-Type", "application/x-www-form-urlencoded"
.send "land_abk=sh&ger_name=Norderstedt&order_by=2&ger_id=X1526"
headers = .getAllResponseHeaders
html.body.innerHTML = .responseText
End With
Dim x As Long, link As String, gatheredLinks()
ReDim gatheredLinks(html.querySelectorAll("td a").Length - 1)
For x = 0 To html.querySelectorAll("table a nobr").Length - 1
ActiveSheet.Cells(x + 2, 2) = html.querySelectorAll("table a nobr").Item(x).innerText
link = Replace$(html.querySelectorAll("td a").Item(x).href, "about:", "https://www.zvg-portal.de/")
ActiveSheet.Cells(x + 2, 3) = link
Dim j As Long
For j = 0 To html.querySelectorAll("tr").Length - 1
If InStr(html.querySelectorAll("tr").Item(j).innerHTML, "Amtsgericht") > 0 Then
ActiveSheet.Cells(x + 2, 4) = html.querySelectorAll("tr").Item(j).getElementsByTagName("b")(0).innerText
Exit For
End If
Next
gatheredLinks(x) = link
Next x
' With xhr
' For x = LBound(gatheredLinks) To UBound(gatheredLinks)
' .Open "GET", gatheredLinks(x), False
' .setRequestHeader "Referer", "https://www.zvg-portal.de/index.php?button=Suchen"
' .setRequestHeader "Content-Type", "text/html; charset=ISO-8859-1"
' .setRequestHeader "User-Agent", "python-requests/2.24.0"
' .setRequestHeader "Accept-Encoding", "gzip, deflate"
' .setRequestHeader "Connection", "keep-alive"
' .setRequestHeader "Accept", "text/html,application/xhtml+xml,application/xml;"
' .send
' ActiveSheet.Cells(x + 2, 5) = .Status
' html.body.innerHTML = .responseText 'test response
' Dim s As String
' s = .responseText
' ActiveSheet.Cells(x + 2, 6) = s
' Stop
'
' 'do something else
' Next
' End With
Stop
End Sub
Python (using session) which can successfully retrieve content from result links:
import requests
from bs4 import BeautifulSoup as bs
data = {'ger_name': 'Norderstedt','order_by': '2','land_abk': 'sh','ger_id': 'X1526'}
headers = {'Referer': 'https://www.zvg-portal.de/index.php?button=Suchen'}
with requests.Session() as s:
r = s.post('https://www.zvg-portal.de/index.php?button=Suchen', data=data)
print(r.status_code)
soup = bs(r.content, 'lxml')
links = ['https://www.zvg-portal.de/' + i['href'] for i in soup.select('td a')]
s.headers = headers
for link in links:
r = s.get(link)
# print(r.status_code)
soup = bs(r.content, 'lxml')
print(soup.select_one('td p').text)
Session is NOT required. It is used just for efficiency.
Without a session, which still works, the headers sent are:
{'User-Agent': 'python-requests/2.24.0', 'Accept-Encoding': 'gzip, deflate', 'Accept': '*/*', 'Connection': 'keep-alive', 'Referer': 'https://www.zvg-portal.de/index.php?button=Suchen'}

Related

Can't get innertext from object

I am using below code, to retrive the total number of products on a given page, but get a runt-time error 428 on gg = ohtml.innertext
Sub testp()
numpage = GetNumberofPages("https://sheetmaterialswholesale.co.uk/sheet-materials/")
End Sub
Function GetNumberofPages(numpage As String)
Set xhr = New MSXML2.XMLHTTP60
Set html = New MSHTML.HTMLDocument
With xhr
.Open "GET", numpage, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End With
Dim ohtml As Object
Set ohtml = html.querySelectorAll(".noOfProducts")
Dim gg As String
gg = ohtml.innerText
gg = Replace(gg, ",", "")
nmpage = Round(Val(gg) / 36, 0) + 1
'nmpage = Val(Mid(Text, aa + 66, 5))
GetNumberofPages = nmpage
End Function
Any tips how to solve this?
The class you are using doesn't exist on the page and querySelectorAll returns a nodeList which doesn't have a .innerText property.
You can retrieve the page specific product count with the following css selector list which identifies the li elements associated with each product. Take the .Length property of the returned nodeList to get the number of products on the page
Debug.Print oHtml.querySelectorAll(".products li.product").Length
If you actually want the number of pages, calculated using the total product results count, then you can extract this total result count, as shown below, and use the results per page count as the denominator in the page count calculation.
Option Explicit
Public Sub testp()
MsgBox GetNumberOfPages("https://sheetmaterialswholesale.co.uk/sheet-materials/")
End Sub
Public Function GetNumberOfPages(ByVal numpage As String) As Long
Dim xhr As MSXML2.XMLHTTP60
Dim html As MSHTML.HTMLDocument
Set xhr = New MSXML2.XMLHTTP60
Set html = New MSHTML.HTMLDocument
With xhr
.Open "GET", numpage, False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End With
'Dim ohtml As Object
Dim ohtml As MSHTML.IHTMLDOMChildrenCollection
Set ohtml = html.querySelectorAll(".products li.product")
Dim numProductsPerPage As Long, numProducts As Long, numPages As Long
numProductsPerPage = ohtml.Length 'assuming page you pass into function is representative
numProducts = CInt(html.querySelector(".cat-parent .count").innerText)
numPages = WorksheetFunction.Ceiling_Math(numProducts / numProductsPerPage)
GetNumberOfPages = numPages
End Function

How to extract yahoo-finance analyst price targets by VBA?

I am trying to extract yahoo-finance analyst price targets by VBA(eg: no. of analyst, high, low, average, current)
but I can't extract any of these by .getelementsbyclassname/.getelementbyID.
Here is my code:
Sub Analysis_import()
Dim website As String
Dim request As Object
Dim response As String
Dim html As New HTMLDocument
Dim price As Variant
website = "https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL"
Set request = CreateObject("MSXML2.XMLHTTP")
request.Open "get", website, False
request.setRequestHeader "If-Modified-since", "Sat, 1 Jan 2000 00:00:00 GMT"
request.send
response = StrConv(request.responseBody, vbUnicode)
html.body.innerHTML = response
price = html.getElementsByClassName("Fz(m) D(ib) Td(inh)").innerText
Debug.Print price
End Sub
What is the problem? Many thanks!
The fields you wish to grab from that site generate dynamically, hence you can't fetch them using HTMLDocument parser. If you want to locate the fields using tag, id ,class e.t.c, your options would be IE or Selenium.
However, the good news is the required fields are available in some script tag within raw json content. So, you can process them using vba json converter or regex even when you stick with xmlhttp requests. The following script is based on regex.
Sub GrabAnalysisInfo()
Const Url As String = "https://finance.yahoo.com/quote/AAPL/analysis?p=AAPL"
Dim sResp$, sHigh$, currentPrice$
Dim analystNum$, sLow$, tMeanprice$
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", Url, False
.setRequestHeader "User-Agent", "Mozilla/5.0 (Windows NT 6.1) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/88.0.4324.150 Safari/537.36"
.send
sResp = .responseText
End With
With CreateObject("VBScript.RegExp")
.Pattern = "numberOfAnalystOpinions[\s\S]+?raw"":(.*?),"
If .Execute(sResp).count > 0 Then
analystNum = .Execute(sResp)(0).SubMatches(0)
End If
.Pattern = "targetMeanPrice[\s\S]+?raw"":(.*?),"
If .Execute(sResp).count > 0 Then
tMeanprice = .Execute(sResp)(0).SubMatches(0)
End If
.Pattern = "targetHighPrice[\s\S]+?raw"":(.*?),"
If .Execute(sResp).count > 0 Then
sHigh = .Execute(sResp)(0).SubMatches(0)
End If
.Pattern = "targetLowPrice[\s\S]+?raw"":(.*?),"
If .Execute(sResp).count > 0 Then
sLow = .Execute(sResp)(0).SubMatches(0)
End If
.Pattern = "currentPrice[\s\S]+?raw"":(.*?),"
If .Execute(sResp).count > 0 Then
currentPrice = .Execute(sResp)(0).SubMatches(0)
End If
End With
Debug.Print analystNum, tMeanprice, sHigh, sLow, currentPrice
End Sub

JSON and getElementByID return nothing

am trying to get data from this web page to fill a word form using the following vba code
Private Sub Update_Click()
Dim CCv As ContentControl
Dim request As Object
Dim response As String
Dim html As New HTMLDocument
Dim website As String
Dim LGP As Variant
website = "https://www.ebi.ac.uk/chembl/compound_report_card/CHEMBL1112/"
Set request = CreateObject("MSXML2.XMLHTTP")
request.Open "GET", website, False
request.setRequestHeader "If-Modified-Since", "Sat, 1 Jan 2000 00:00:00 GMT"
request.send
response = StrConv(request.responseBody, vbUnicode)
html.body.innerHTML = response
LGP = html.getElementById("Bck-CalcCompProp-ALogP").innerText
For Each CCv In ActiveDocument.ContentControls
Select Case CCv.Title
Case "LogP"
CCv.Range.Text = LGP
Case "Hydrogen bonding"
CCv.Range.Text = ""
Case "Dipole interactions"
CCv.Range.Text = ""
End Select
Next CCv
End Sub
the element data always return nothing, only if i downloaded the page that i can return the data, and i have no luck trying to parse json
could any on pls help to write a working code

Passing Conexion String to VBA Macro

I have always found you a great help when I have questions. This time it's something related to Excel VBA.
I have a macro that brings back data from a website. You simply have to hard code the connection string into it.( xmlHttp.Open "GET", "http://www.example.com", False )
Sub GET_HTML_DATA()
Dim xmlHttp As Object
Dim TR_col As Object, TR As Object
Dim TD_col As Object, TD As Object
Dim row As Long, col As Long
Set xmlHttp = CreateObject("MSXML2.XMLHTTP.6.0")
xmlHttp.Open "GET", "http://www.example.com", False
xmlHttp.setRequestHeader "Content-Type", "text/xml"
xmlHttp.send
Dim html As Object
Set html = CreateObject("htmlfile")
html.body.innerHTML = xmlHttp.responseText
Dim tbl As Object
Set tbl = html.getElementById("curr_table")
row = 1
col = 1
Set TR_col = html.getElementsByTagName("TR")
For Each TR In TR_col
Set TD_col = TR.getElementsByTagName("TD")
For Each TD In TD_col
Cells(row, col) = TD.innerText
col = col + 1
Next
col = 1
row = row + 1
Next
End Sub
I was wondering if and how can this code be changed to accept a parameter as the connection string so I can call on it Run "GET_HTML_DATA(parameter)"
I have tried to declare a parameter in the parenthesis and include that in place of www.example.com but when I run the macro it tells me The macro may not be available in this workbook..."
Am I doing it right or is there another way I do not know?
In your sub in the parenthesis you need to declare the parameter as (input As String) and then use "input" in your code. Then you can use that Run "GET_HTML_DATA(parameter)". Alternatively, I guess you can simply put all your code in a simple function if you would have to return some output like
Function myFunction(input As String) As Double
//code goes here
End Function

Call web service in excel

In a VBA module in excel 2007, is it possible to call a web service? If so, any code snippets? How would I add the web reference?
Yes You Can!
I worked on a project that did that (see comment). Unfortunately no code samples from that one, but googling revealed these:
How you can integrate data from several Web services using Excel and VBA
STEP BY STEP: Consuming Web Services through VBA (Excel or Word)
VBA: Consume Soap Web Services
Here's an overview from MS:
Consuming Web Services in Excel 2007
For an updated answer see this SO question:
calling web service using VBA code in excel 2010
Both threads should be merged though.
In Microsoft Excel Office 2007 try installing "Web Service Reference Tool" plugin. And use the WSDL and add the web-services. And use following code in module to fetch the necessary data from the web-service.
Sub Demo()
Dim XDoc As MSXML2.DOMDocument
Dim xEmpDetails As MSXML2.IXMLDOMNode
Dim xParent As MSXML2.IXMLDOMNode
Dim xChild As MSXML2.IXMLDOMNode
Dim query As String
Dim Col, Row As Integer
Dim objWS As New clsws_GlobalWeather
Set XDoc = New MSXML2.DOMDocument
XDoc.async = False
XDoc.validateOnParse = False
query = objWS.wsm_GetCitiesByCountry("india")
If Not XDoc.LoadXML(query) Then 'strXML is the string with XML'
Err.Raise XDoc.parseError.ErrorCode, , XDoc.parseError.reason
End If
XDoc.LoadXML (query)
Set xEmpDetails = XDoc.DocumentElement
Set xParent = xEmpDetails.FirstChild
Worksheets("Sheet3").Cells(1, 1).Value = "Country"
Worksheets("Sheet3").Cells(1, 1).Interior.Color = RGB(65, 105, 225)
Worksheets("Sheet3").Cells(1, 2).Value = "City"
Worksheets("Sheet3").Cells(1, 2).Interior.Color = RGB(65, 105, 225)
Row = 2
Col = 1
For Each xParent In xEmpDetails.ChildNodes
For Each xChild In xParent.ChildNodes
Worksheets("Sheet3").Cells(Row, Col).Value = xChild.Text
Col = Col + 1
Next xChild
Row = Row + 1
Col = 1
Next xParent
End Sub
Excel 2013 Read Data from a web service and bash the JSON till you can get what you want out of it (given the JSON will always be in the same format).
This code should just work without the need for any plugins.
You will need your own free API key from the currency converter website though.
I used it to load the USD to GBP value into a cell on my sheet.
Option Explicit
Sub Test_LateBinding()
Dim objRequest As Object
Dim strUrl As String
Dim blnAsync As Boolean
Dim strResponse As String
Set objRequest = CreateObject("MSXML2.XMLHTTP")
strUrl = "https://free.currconv.com/api/v7/convert?q=USD_GBP&compact=ultra&apiKey=[MY_API_KEY]"
blnAsync = True
With objRequest
.Open "GET", strUrl, blnAsync
.SetRequestHeader "Content-Type", "application/json"
.Send
While objRequest.readyState <> 4
DoEvents
Wend
strResponse = .responsetext
End With
Dim responseArray() As String
responseArray = Split(strResponse, ":", -1)
Dim value As String
value = responseArray(1)
Dim valueArray() As String
valueArray = Split(value, "}", -1)
Dim finalValue As String
finalValue = valueArray(0)
Sheet2.Cells(22, "C") = finalValue
End Sub