I need following. I need to scrape posts from mine WordPress blog and show them in VB application. But news must be fresh, when user clicks "Refresh" it gets new content from website. So, posts from first page and link to it.
Any way to do this? Is that even possible?
This is very possible and I will give you three possible solutions.
The first solution is to use Telerik's (free!) Testing Framework http://www.telerik.com/teststudio/testing-framework. It is meant to be used for testing your website for flaws and whatnot but it makes for an excellent scraper. Once you get to know how to use the syntax this is most likely the fastest and easiest solution but it also has the most overhead. It uses browser addons to allow you to "control" Mozilla Firefox, Google Chrome, Apple Safari and Microsoft Internet Explorer. The main disadvantage with this one is that you must install the setup package on all computers that you wish to use this on. If you are only building a simple scraper to get data from your own blog and you will only run this on one machine this is probably the best way to go.
The next scraper I will mention is called Watin and you can get it from http://watin.org/. I personally prefer this one to Telerik's because it only requires a few dlls to be included in your project and once deployed it works great on another machine without having to install any special software. Unfortunately there are a few caveats as well. The big issue is that it hasn't been updated since 2011 so I'm assuming the project is dead, although the website still exists. Because of the lack of updates it officially only supports Internet Explorer 6, 7, 8, 9 and FireFox 2 and 3 (But I can vouch for it working fine in IE 10 + 11 on Windows 7+8). The syntax is a bit more wonky than Teleriks but it should be easy enough to use for what you need it for.
The last option that I can recommend is to use the HttpWebRequest and HttpWebResponse classes built into DotNet and do your scraping manually. I will admit that I still use this approach on occasion. It basically just brings back the source code for a certain url and you have to use string manipulation (or regex if you are good at that) to pull out the information you need. The upside to this one is that it has the least overhead as it requires no extra dlls or installs to work, and it's very fast. I will provide you a sample function that I have used and reused in a number of projects to pull data from the web:
Private Function GetMethod(ByVal sPage As String) As String
Dim req As HttpWebRequest
Dim resp As HttpWebResponse
Dim stw As StreamReader
Dim sReturnString As String = ""
Dim sUserAgent As String = "Mozilla/5.0 (compatible; MSIE 10.0; Windows NT 6.1; WOW64; Trident/6.0)"
Try
req = HttpWebRequest.Create(sPage)
req.Method = "GET"
req.AllowAutoRedirect = False
req.UserAgent = sUserAgent
req.Accept = "text/xml,application/xml,application/xhtml+xml,text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5"
req.Headers.Add("Accept-Language", "en-us,en;q=0.5")
req.Headers.Add("Accept-Charset", "ISO-8859-1,utf-8;q=0.7,*;q=0.7")
req.Headers.Add("Keep-Alive", "300")
req.KeepAlive = True
resp = req.GetResponse ' Get the response from the server
If req.HaveResponse Then
resp = req.GetResponse ' Get the response from the server
stw = New StreamReader(resp.GetResponseStream)
sReturnString = stw.ReadToEnd() ' Save the source code of the url into a string
Else
MessageBox.Show("No response received from host " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
End If
Catch exc As WebException
MessageBox.Show("Network Error: " & exc.Message.ToString & " Status Code: " & exc.Status.ToString & " from " & sPage, "Error", MessageBoxButtons.OK, MessageBoxIcon.Error)
End Try
Return sReturnString
End Function
Just a note about the sUserAgent variable: You can change this to any user agent you want to emulate. In this case I am using Mozilla Firefox 5 which is obviously super old but I have been using this function for years and haven't needed to update it.
Related
This is my first post here though have been using the great advice/solutions here for years so am very grateful. But this one, I can't find solution for.
I have an MS access front end/back end in use for various office admin tasks and records. Much of the data my office works with on a daily basis is cloud based. However API access is provided with app key and secret key. I have no issues using the API explorer with these keys, but can't get anything through code (VBA). I keep reading it is perfectly possible to do this with VBA which is why I kept trying different solutions, but now need help. (I have replaced URL and keys etc)
Dim myObj As New MSXML2.XMLHTTP60
Dim url, endPoint, params, tickers, appKey, secretKey As String
url = "theURL.com"
endPoint = "theEndPoint"
params = "id="
tickers = "1"
appKey = "12345678"
secretKey = "12345678"
myObj.Open "GET", url, False
myObj.setRequestHeader "Content-Type", "application/json"
myObj.setRequestHeader "app-key", appKey
myObj.setRequestHeader "secret-key", secretKey
myObj.send
This returns "App Key is required." I have tried various solutions including converting keys to Base64, putting the keys within the Open request, sending the keys as part of the send request etc. Always get "App Key is required" when reading response. I know in this code I have't actually requested anything but that returns same message when I do. I just wanted to keep what I posted simple.
Any help at all is greatly appreciated as this would allow great deal of automation for our office.
Thanks
James
I am working on a project which I did not write, have inherited, and have an issue that I'm not sure quite how to solve. My background is not in .NET, so please excuse anything that doesn't sound right, as I may not know what the correct terminology should be.
We are using Visual Studio 2008 to compile a project that is running on Windows CE 6.0. We are using the Compact Framework v2.0. The software is running on an Embedded processor in a network (WIFI) connected industrial environment. The main UI is written in VB, and all of the supporting DLLs are written using C#.
Up until now we've only been required to connect to http (non-secure) web addresses for GET requests. We now have a requirement to switch these addresses over to https (secure) for security's sake.
The HttpWebRequest is built/submitted from VB. When I provide the code with the https address, I get the "Could not establish secure channel for SSL/TLS" error that is in the subject.
Here is the code for that request:
Dim myuri As System.Uri = New System.Uri(sUrl)
Dim myHttpwebresponse As HttpWebResponse = Nothing
Dim myhttpwebrequest As HttpWebRequest = CType(WebRequest.Create(myuri), HttpWebRequest)
myhttpwebrequest.KeepAlive = False
myhttpwebrequest.Proxy.Credentials = CredentialCache.DefaultCredentials
myhttpwebrequest.ContentType = "text/xml"
myhttpwebrequest.Accept = "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
myhttpwebrequest.AllowAutoRedirect = False
myhttpwebrequest.Timeout = 150000
Dim mycred As NetworkCredential = New NetworkCredential(username, password)
Dim myCredentialCache As CredentialCache = New CredentialCache()
myCredentialCache.Add(myuri, "Basic", mycred)
myhttpwebrequest.Credentials = myCredentialCache
myhttpwebrequest.Method = "GET"
myhttpwebrequest.ProtocolVersion = HttpVersion.Version10
ServicePointManager.CertificatePolicy = New AcceptServerNameMismatch
myHttpwebresponse = CType(myhttpwebrequest.GetResponse(), HttpWebResponse)
I have done quite a bit of reading over the last day or so that indicate that the CertificatePolicy is where I can override the ICertificatePolicy classes to essentially validate all SSL requests. Definitely not safe, and not ideal, but I'm not sure of another way to handle these requests.
My class to do this is:
Public Class MyCertificatePolicy
Implements ICertificatePolicy
Public Shared DefaultValidate As Boolean = True
Public Sub trustedCertificatePolicy()
End Sub
Public Function CheckValidationResult(ByVal srvPoint As ServicePoint, _
ByVal cert As X509Certificate, ByVal request As WebRequest, ByVal problem As Integer) _
As Boolean Implements ICertificatePolicy.CheckValidationResult
Return True
End Function
End Class
Unfortunately when the response comes back, it never calls CheckValidationResult(). Thus, no validation and the error.
So my questions...
The "Right" way to do this according to everything that I've read is to use the ServerCertificateValidationCallback. Unfortunately with the version of Compact Framework that we are using (maybe all?) it is not included. Is there something that I'm missing that would cause that function not to get called?
Again, from what I've read, I believe that the Framework that we're running on doesn't support TLS v1.1 or v1.2. Which most current servers are running. Is there a way in VB to get around this?
Is there another Request method that can be used?
Any help or guidance as to where to go from here is greatly appreciated!
You need to install the trusted root certificate on the device(s), that matches the SSL certificate on your server.
Or change the certificate on the server to match one of the Trusted Roots on the device(s). By default, the devices ship with a very small number of trusted CAs, unlike desktop browsers that contain nearly every CA in the world.
We have a system that can create records simply by a URL being run with appropriate parameters. I would like to build this into Excel using VBA by executing my constructed URL in the background (no browser).
I found references on Stackoverflow to using .post with WinHttp.WinHttpRequest.5.1, but in all occurrences that I have found of people using this they were looking to get a response from the website.
I have already tried this, but it didn't work (variables have already been declared).
Set httpSend = CreateObject("WinHttp.WinHttpRequest.5.1")
websiteURL = "https://www.somewebsite.com/?&staticVariable=xxxx" & URLEncode(variable1)
websiteArguments = "&anotherVariable=" & URLEncode(variable2)
httpSend.Open "POST", websiteURL, False
httpSend.Send (websiteArguments)
websiteResponse = httpSend.ResponseText
Set httpSend = Nothing
For the record, when I run the above, I get a run-time error about "The HTTP redirect request failed" at the httpSend.Send (websiteArguments) stage.
Is posting with WinHttp.WinHttpRequest.5.1 the best way to achieve the results I'm looking for. Is there another more efficient way to run a URL without opening a browser? Again to reiterate, I'm not looking for a response from the website, I just want to execute the URL.
Many thanks.
We have some code that runs to connect to PayPal's PayFlowPro to update a credit card used within a recurring billing subscription. This code used to work fine under a .Net 2 app pool, but when we migrated it to 4.0 it's very touchy - sometimes it works and other times it doesn't. The code seems pretty straightforward so I'm not sure what the issue is.
The error is: System.Web.HttpUnhandledException (0x80004005): Exception of type 'System.Web.HttpUnhandledException' was thrown. ---> System.Runtime.InteropServices.COMException (0x8000000A): The data necessary to complete this operation is not yet available.
The block of code that is intermittently failing (but used to work on an old server) is:
Try
objWinHttp = CreateObject("WinHttp.WinHttpRequest.5.1")
objWinHttp.Open("POST", GatewayHost, False)
objWinHttp.setRequestHeader("Content-Type", "text/namevalue") ' for XML, use text/xml
objWinHttp.SetRequestHeader("X-VPS-Timeout", "90")
objWinHttp.SetRequestHeader("X-VPS-Request-ID", requestID)
objWinHttp.Send(parmList)
Catch exc As Exception
End Try
' Get the text of the response. (DIES ON LINE BELOW)
transaction_response = objWinHttp.ResponseText
The confusing part is it works intermittently which is always hardest to debug. This is something that has existed for years and the only difference is our app pool is now running under .Net 4 vs. .Net 2.0, but I wouldn't think that would be an issue. I flipped it back to 2.0 and now it's working flawlessly though.
Any guesses on where to start looking? Does WinHttp.WinHttpRequest.5.1 have issues in .Net 4? The old server was 2008 R2 and the new one is 2012 R1 so perhaps that's part of it as well?
Update - changing to 2.0 still didn't fix it. It was working and then stopped again. This doesn't make any sense.
Since this was within inline .Net code (not compiled), I just migrated it to System.Net.HttpWebRequest instead which seems to be working much better. Here is sample code for anyone else hitting this:
Dim data As Byte() = New System.Text.ASCIIEncoding().GetBytes(parmList)
Dim request As System.Net.HttpWebRequest = CType(System.Net.HttpWebRequest.Create(GatewayHost), System.Net.HttpWebRequest)
request.Method = "POST"
request.ContentType = "text/namevalue"
request.ContentLength = data.Length
Dim requestStream As System.IO.Stream = request.GetRequestStream()
requestStream.Write(data, 0, data.Length)
requestStream.Close()
Dim responseStream = New System.IO.StreamReader(request.GetResponse().GetResponseStream())
transaction_response = responseStream.ReadToEnd()
responseStream.Close()
I have recently developed and finished a software in which works like aim.
But here's the problem, the server worked just fine for local friends because they lived only 25 miles from the server, so it was lag-less.
But when uploaded to a web host, it lags every time it pings the server.
The server is in PHP, so there's no need to buy a dedicated computer for 400$/month more.
Here's the function in which the client constantly calls upon:
Public Function GetPage(ByVal url As String)
Dim WReq As HttpWebRequest
Dim WResp As WebResponse
Dim sr As IO.StreamReader
Try
WReq = WebRequest.Create(url)
WReq.CookieContainer = cookies
WReq.Timeout = "120000"
WResp = WReq.GetResponse()
sr = New IO.StreamReader(WResp.GetResponseStream())
GetPage = sr.ReadToEnd()
WResp.Close()
Return (GetPage)
Catch err As SystemException
MsgBox("err message: " & err.ToString & vbCrLf)
Catch ex As Exception
MsgBox("err message: " & ex.ToString & vbCrLf)
End Try
End Function
A demo url would be something like
http://localhost/chat/newpm.php?to=User&msg=Hello
So how does OSCAR do it (the platform for AOL, aka AIM)
and how does msg? gtalk or other big im clients do it?
I was thinking about recoding the getpage function so that it would connect to a TCP server and constantly wait for new messages which I am still not sure if this might cause a lag if the host is in the US and the client is not(for example).
Could you please provide me a remedy to this problem?
The Timeout is really high. Check if the app closes the connection after receive the response stream. How many aconnections the computer accepts otherwise the connectios will lock the request until one goes free.
Ok to answer John Feminella:
Exactly I agreee, how does AIM MSN and others solve this hang issue?
Dbasnett: return(GetPage) will return the result of the requested page, and this is just the main snipppet of the code which is called every second at least two times.
John Saunders:
I don't understand the purpose or function of the "using" blocks, I added the catch so that the user is not notified when the server times out or can't be resolved.
Sein Kraft:
Time out is only 2 minutes, i don't think it does
GetPage(Login.server & "pms.php?to=" & to)
That's all, then it would parse the results from the response.