VB.NET Crawler can't handle Single Page Applications such as AngularJS - vb.net

I have an existing web crawler (WinForms) developed using VB.NET for our SEOs which utilizes Web Requests. The application works fine on a regular website but when I try to crawl SPA sites (Single Page Applications), the application can't get a proper response.
Dim siteBody As String = String.Empty
Dim cleanedURL As String
Dim wRequest As System.Net.HttpwRequestuest
Dim wResponse As System.Net.HttpwResponse
Dim rStream As System.IO.Stream
Dim reader As System.IO.StreamReader
Try
wRequest = Nothing
wResponse = Nothing
rStream = Nothing
wRequest = HttpwRequestuest.Create(urlList(i)) 'URL is being passed
wRequest.Credentials = System.Net.CredentialCache.DefaultCredentials
wRequest.UserAgent = "DummyValue"
wRequest.AllowAutoRedirect = False
wResponse = wRequest.GetResponse()
rStream = wResponse.GetResponseStream
reader = New System.IO.StreamReader(rStream)
siteBody = reader.ReadToEnd
reader.Close()
wResponse.Close()
Catch ex As Exception
End Try
Dim passedBody = siteBody 'EMPTY RESULT
Based on the result, we will extract data and check for links and their status codes.

Related

VB.Net: Cannot download stream file generated with ZipOutputStream

I'm trying to zip a set of pdf files and send to client as download.
No matter what combinations of Response settings I try, the code doesn't throw any exception and apparently the zip file stream is created fine, but the file is not sent to the client as a download and when you hit the download button nothing happens.
Private Sub lkbDownloadPdfs_Click(sender As Object, e As System.EventArgs) Handles aDownloadPdfs.ServerClick
Try
Dim WSStockToolAuthTokenUrl As String = ConfigurationManager.AppSettings("WSStockToolAuthTokenUrl")
Dim auth As AuthenticationHeader = Utility.GetAuthenticationForStockToolToken()
Dim token As String = Utility.GetInitializationToken(WSStockToolAuthTokenUrl, auth.UserName, auth.password)
Response.Clear()
Response.ContentType = "application/zip"
Response.AppendHeader("Content-Disposition", "attachment; filename=files.zip")
If (token IsNot Nothing) Then
Dim result As String = PDFApiCallResult(token)
Dim pdfPathList As List(Of String) = Utility.GeneratePDFList(result)
If (pdfPathList.Count = 1) Then
Dim pdfPath As String = pdfPathList.ElementAt(0)
Dim strFile As String
Dim strmZipOutputStream = New ZipOutputStream(Response.OutputStream)
strmZipOutputStream.SetLevel(9)
Dim objCrc32 As New Crc32()
For Each strFile In pdfPathList
Dim Client As WebClient = New WebClient()
Dim strmFile As Stream = Client.OpenRead(strFile)
Dim reader As StreamReader = New StreamReader(strmFile)
Dim Content As String = reader.ReadToEnd()
Dim abyBuffer(Convert.ToInt32(Content.Length - 1)) As Byte
Dim sFile As String = Path.GetFileName(strFile)
Dim theEntry As ZipEntry = New ZipEntry(sFile)
theEntry.DateTime = DateTime.Now
theEntry.Size = Content.Length
strmFile.Close()
objCrc32.Reset()
objCrc32.Update(abyBuffer)
theEntry.Crc = objCrc32.Value
strmZipOutputStream.PutNextEntry(theEntry)
strmZipOutputStream.Write(abyBuffer, 0, abyBuffer.Length)
Next
strmZipOutputStream.Flush()
strmZipOutputStream.Finish()
strmZipOutputStream.Close()
Response.Flush()
Response.Close()
Response.SuppressContent = True
HttpContext.Current.ApplicationInstance.CompleteRequest()
Else
End If
End If
Catch ex As Exception
ExceptionManager.Publish(ex)
End Try
End Sub
Any help? (If you have working C# code I could try to convert it to vb.net too)
Update 1: This is the aspx where the link which does the callback resides:
<asp:UpdatePanel runat="server" ID="updImages" UpdateMode="Conditional">
...
DOWNLOAD PDFS
...
<asp:AsyncPostBackTrigger ControlID="lkbAddToWishlist" />
<asp:AsyncPostBackTrigger ControlID="ddlCustomizations" />
</asp:UpdatePanel>
I've read that maybe the response is not working at all because of the ajax way the update panel does the postback, but not sure about that and how to deal with that.

RDLC reports (Microsoft reportViewer) are not rendering when accessed from client machine after deployed to Server

RDLC reports (Microsoft reportViewer) are not rendering after deployed to Server
My rdlc reports are properly rendering and working in development machine(after deployed) when accessed from within the machine itself , it is also working in production server after deployed(when accessed from within the machine itself). However, when I try to access the reports from client machines, it is not rendering others components of web application are working properly.
the error i am getting is:
google chrome : failed to load pdf document
Internet Exploler: file is damaged and could not be repaired
How do i fix this error?
my controller code is:
Function GenerateReportS(value As String, lcvalue As String) As ActionResult
Dim assetlist As List(Of usp_standardreportquery_Result)
If value IsNot Nothing Or lcvalue IsNot Nothing Then
assetlist = db.usp_standardreportquery.Where(Function(r) If(value IsNot Nothing, r.AssignLocation = value, True) And
If(lcvalue IsNot Nothing, r.LocationCategory = lcvalue, True)).ToList
Else
assetlist = db.usp_standardreportquery.ToList
End If
Dim warnings As Warning()
Dim mimeType As String
Dim streamids As String()
Dim encoding As String
Dim filenameExtension As String
Dim viewer = New ReportViewer()
viewer.LocalReport.ReportPath = "Views\Reports\StandardReport.rdlc"
Dim dataset As Microsoft.Reporting.WebForms.ReportDataSource = New Microsoft.Reporting.WebForms.ReportDataSource("standardreportds", assetlist)
viewer.LocalReport.DataSources.Add(dataset)
Dim params(2) As ReportParameter
params(0) = New ReportParameter("SearchBy", "Location", False)
params(1) = New ReportParameter("value", value, False)
params(2) = New ReportParameter("Category", lcvalue, False)
viewer.LocalReport.SetParameters(params)
dataset.Value = assetlist
viewer.LocalReport.Refresh()
Dim bytes = viewer.LocalReport.Render("PDF", Nothing, mimeType, encoding, filenameExtension, streamids,
warnings)
Return New FileContentResult(bytes, mimeType)
'Return File(bytes, mimeType, "_PackingSlip.pdf")
End Function
how do i fix this error?
Because the error "failed to load pdf file" is hard to troubleshoot, After wasting lots of time and searching the web , I solved the issue as fellows
correcting initialization in
Dim warnings As Warning()
Dim streamids As String()
Dim mimeType As String = "application/pdf"
Dim encoding As String = String.Empty
Dim filenameExtension As String = String.Empty
and finally inserting the following code after Dim bytes =
Response.Buffer = True
Response.Clear()
Response.ContentType = mimeType
Response.BinaryWrite(bytes)
Response.Flush()

how to get source code from link with user name and password in vb or c#

i try to get code source from my facebook account bet i get only the login page code source...
i assumes its happens because problem with cookie
my code...
'download data from url and return string of the source code
Public Shared Function getSourceCode(address As String) As String
Dim reader As StreamReader = Nothing
'Address of URL
Dim URL As String = address
' Get HTML data
Dim client As WebClient = New WebClient()
Try
client.Proxy = Nothing
Dim data As Stream = client.OpenRead(URL)
reader = New StreamReader(data)
Catch
'error
End Try
If reader IsNot Nothing Then Return reader.ReadToEnd
Return ""
End Function

Background Program Not Looping

I have a program that runs in the background looping to check if a page on the site has been changed. It works once and shows the message box but if I change it again it won't do anything.
Imports System.Net
Imports System.String
Imports System.IO
Module Main
Sub Main()
While 1 = 1
Dim client As WebClient = New WebClient()
Dim reply As String = client.DownloadString("http://noahcristinotesting.dx.am/file.txt")
If reply.Contains("MsgBox") Then
Dim Array() As String = reply.Split(":")
MessageBox.Show(Array(2), Array(1))
Dim request As System.Net.FtpWebRequest = DirectCast(System.Net.WebRequest.Create("ftp://noahcristinotesting.dx.am/noahcristinotesting.dx.am/file.txt"), System.Net.FtpWebRequest)
request.Credentials = New System.Net.NetworkCredential("username", "password")
request.Method = System.Net.WebRequestMethods.Ftp.UploadFile
Dim path As String = "C:\test.txt"
Dim createText As String = "completed"
File.WriteAllText(path, createText)
Dim fileftp() As Byte = System.IO.File.ReadAllBytes("C:\test.txt")
Dim strz As System.IO.Stream = request.GetRequestStream()
strz.Write(fileftp, 0, fileftp.Length)
strz.Close()
strz.Dispose()
End If
End While
End Sub
End Module
Not sure at this moment what is causing it to crash when run outside of the IDE, but try trapping exceptions that are being thrown in the loop. I imagine there's an exception happening, cratering your app. The below catch block is by no means production ready, normally you want to catch specific exceptions in order to handle them effectively, but this is a cheap way to see if an exception is being thrown and what it is at runtime.
Sub Main()
Try
While 1 = 1
Dim client As WebClient = New WebClient()
Dim reply As String = client.DownloadString("http://noahcristinotesting.dx.am/file.txt")
If reply.Contains("MsgBox") Then
Dim Array() As String = reply.Split(":")
MessageBox.Show(Array(2), Array(1))
Dim request As System.Net.FtpWebRequest = DirectCast(System.Net.WebRequest.Create("ftp://noahcristinotesting.dx.am/noahcristinotesting.dx.am/file.txt"), System.Net.FtpWebRequest)
request.Credentials = New System.Net.NetworkCredential("username", "password")
request.Method = System.Net.WebRequestMethods.Ftp.UploadFile
Dim path As String = "C:\test.txt"
Dim createText As String = "completed"
File.WriteAllText(path, createText)
Dim fileftp() As Byte = System.IO.File.ReadAllBytes("C:\test.txt")
Dim strz As System.IO.Stream = request.GetRequestStream()
strz.Write(fileftp, 0, fileftp.Length)
strz.Close()
strz.Dispose()
End If
End While
Catch ex As Exception
MsgBox(ex.ToString)
End Try
End Sub
Alternatively, you could check your event viewer in windows to see if a .net application exception is being logged. Event Viewer > Windows Logs > Application

vb.net proper way to thread this application

My application is a web scraper (for the most part) that stores information in a database. I have 2 classes so far:
clsSpyder - This essentially rolls-up the scraper processes
clsDB - This does any database processes
My test program looks over all the URLs, scrapes, pushes into the database. It is pretty simple sequentially, but I would like to have say N number of threads running those processes (scrape and store). My sequential code is this:
Private Sub Button4_Click(sender As Object, e As EventArgs) Handles Button4.Click
'Grab List
Dim tDS As New DataSet
Dim tDB As New clsTermsDB
Dim tSpyder As New clsAGDSpyder
Dim sResult As New TermsRuns
'Grab a list of all URLS
tDS = tDB.GetTermsList(1)
Try
For Each Row As DataRow In tDS.Tables(0).Rows
rtbList.AppendText(Row("url_toBeCollected") & vbCrLf)
sResult = tSpyder.SpiderPage(Row("url_toBeCollected"))
'If nothing is found, do not store
If sResult.html <> "" And sResult.text <> "" Then
tDB.InsertScrape(Now(), sResult.html, sResult.text, Row("url_uid"), 1)
End If
Next
Exit Sub
Catch ex As Exception
MessageBox.Show(ex.Message)
End Try
End Sub
With that in mind and noting that I am passing variables to the SpiderPage and InsertScrape methods.. How could I implement threading? It's gotta be simple, but I feel like I have been googling and trying things for days without success :(
*** ADDED: SpiderPage method:
Public Function SpiderPage(PageURL As String) As TermsRuns
Dim webget As New HtmlWeb
Dim node As HtmlNode
Dim doc As New HtmlDocument
Dim docNOHTML As HtmlDocument
Dim uri As New Uri(PageURL)
Dim wc As HttpWebRequest = DirectCast(WebRequest.Create(uri.AbsoluteUri), HttpWebRequest)
Dim wcStream As Stream
wc.AllowAutoRedirect = True
wc.MaximumAutomaticRedirections = 3
'Set Headers
wc.UserAgent = "Mozilla/5.0 (Macintosh; I; Intel Mac OS X 11_7_9; de-LI; rv:1.9b4) Gecko/2012010317 Firefox/10.0a4"
wc.Headers.Add("REMOTE_ADDR", "66.83.101.5")
wc.Headers.Add("HTTP_REFERER", "66.83.101.5")
'Set HTMLAgility Kit Useragent Spoofing (not needed, I don't think)
webget.UserAgent = "Mozilla/5.0 (Macintosh; I; Intel Mac OS X 11_7_9; de-LI; rv:1.9b4) Gecko/2012010317 Firefox/10.0a4"
'Certification STuff
wc.UseDefaultCredentials = True
wc.Proxy.Credentials = System.Net.CredentialCache.DefaultCredentials
ServicePointManager.ServerCertificateValidationCallback = AddressOf AcceptAllCertifications
'Create Cookie Jar
Dim CookieJar As New CookieContainer
wc.CookieContainer = CookieJar
'Keep Alive Settings
wc.KeepAlive = True
wc.Timeout = &H7530
'Read the web page
Dim wr As HttpWebResponse = Nothing
Try
wcStream = wc.GetResponse.GetResponseStream
doc.Load(wcStream)
'Remove HTML from the document
docNOHTML = RemoveUnWantedTags(doc)
'Grab only the content inside the <body> tag
node = docNOHTML.DocumentNode.SelectSingleNode("//body")
'Output
SpiderPage = New TermsRuns
SpiderPage.html = node.InnerHtml
SpiderPage.text = node.InnerText
Return SpiderPage
Catch ex As Exception
'Something goes here when scraping returns an error
SpiderPage = New TermsRuns
SpiderPage.html = ""
SpiderPage.text = ""
End Try
End Function
*** Added InsertScrape:
Public Function InsertScrape(scrape_ts As DateTime, scrape_html As String, scrape_text As String, url_id As Integer, tas_id As Integer) As Boolean
Dim myCommand As MySqlClient.MySqlCommand
Dim dt As New DataTable
'Create ds/dt for fill
Dim ds As New DataSet
Dim dtbl As New DataTable
Try
'Set Connection String
myConn.ConnectionString = myConnectionString
'Push Command to Client Object
myCommand = New MySqlClient.MySqlCommand
myCommand.Connection = myConn
myCommand.CommandText = "spInsertScrape"
myCommand.CommandType = CommandType.StoredProcedure
myCommand.Parameters.AddWithValue("#scrape_ts", scrape_ts)
myCommand.Parameters("#scrape_ts").Direction = ParameterDirection.Input
myCommand.Parameters.AddWithValue("#scrape_html", scrape_html)
myCommand.Parameters("#scrape_html").Direction = ParameterDirection.Input
myCommand.Parameters.AddWithValue("#scrape_text", scrape_text)
myCommand.Parameters("#scrape_text").Direction = ParameterDirection.Input
myCommand.Parameters.AddWithValue("#url_id", url_id)
myCommand.Parameters("#url_id").Direction = ParameterDirection.Input
myCommand.Parameters.AddWithValue("#tas_id", tas_id)
myCommand.Parameters("#tas_id").Direction = ParameterDirection.Input
'Open Connection
myConn.Open()
myCommand.ExecuteNonQuery()
'Close Connection
myConn.Close()
InsertScrape = True
Catch ex As Exception
'Put Message Here
InsertScrape = False
MessageBox.Show(ex.Message)
End Try
End Function
thanks in advance.