Download a webpage to a text file - vb.net

I have the following code which works.
Imports System.IO
Imports System.Net
Module Module1
Sub Main()
Dim webClient1 As New WebClient()
webClient1.Encoding = System.Text.Encoding.ASCII
webClient1.DownloadFile("http://www.bmreports.com/servlet/com.logica.neta.bwp_MarketIndexServlet?displayCsv=true", "C:\temp\stream.txt")
End Sub
End Module
This creates the text file but it does download all the html as well. How can I omit this and just get the text that is displayed on the page?

You can remove all the html tags from the document using Regex:
Dim source as string = File.ReadAllText("C:\temp\stream.txt")
'Clean html tags
source = StripTagsRegex(source)
'Strip function
Private Function StripTagsRegex(source As String) As String
Return Regex.Replace(source, "<.*?>", String.Empty)
End Function
Here you have an example of thir regex, it extracts only text:
http://regexr.com?36ori

Related

How to read simple pseudo XML file?

I want to be able to read info from a simple, pseudo XML file I created to get the content. Here is what my XML file would look like :
<title>Form Title</title>
<Message1>A message or something</Message1>
<FormWidth>500</FormWidth>
<FormHeight>500</FormHeight>
The XML class I find online and inside Visual Studio are too advance. This is just a simple config file I'd like to use. Any tips?
Add a function like this to your code behind (I just converted this from c# so you may need to change some things):
Private Shared Function ReadValueFromXML(ValueToRead As String) As String
Try
Dim doc As New XPathDocument(System.Web.HttpContext.Current.Server.MapPath("filenameOfYourXML.xml"))
Dim nav As XPathNavigator = doc.CreateNavigator()
Dim expr As XPathExpression
expr = nav.Compile(Convert.ToString("/") & ValueToRead)
Dim iterator As XPathNodeIterator = nav.Select(expr)
While iterator.MoveNext()
Return iterator.Current.Value
End While
Return String.Empty
Catch
Return String.Empty
End Try
End Function
Don't forget to add these import statements in your code behind as well:
Imports System.Xml
Imports System.Xml.XPath
And when you use it, let's say you want to get the value of FormWidth:
Dim FormWidth As String = ReadValueFromXML("FormWidth")

How to use Stream to get String

I have a method in a third-party tool that has the following criteria:
ExportToXML(fileName As String) 'Saves the content to file in a form of XML document
or
ExportToXML(stream As System.IO.Stream) 'Saves the content to stream in a form of XML document
How do I use the call with the stream as the parameter to get the XML as a string?
I have researched and tried several things and just still can't get it..
You can use a MemoryStream as the stream to export the XML to, and then read the MemoryStream back with a StreamReader:
Option Infer On
Imports System.IO
Module Module1
Sub Main()
Dim xmlData As String = ""
Using ms As New MemoryStream
ExportToXML(ms)
ms.Position = 0
Using sr As New StreamReader(ms)
xmlData = sr.ReadToEnd()
End Using
End Using
Console.WriteLine(xmlData)
Console.ReadLine()
End Sub
' a dummy method for testing
Private Sub ExportToXML(ms As MemoryStream)
Dim bytes = Text.Encoding.UTF8.GetBytes("Hello World!")
ms.Write(bytes, 0, bytes.length)
End Sub
End Module
Added: Alternatively, as suggested by Coderer:
Using ms As New MemoryStream
ExportToXML(ms)
xmlData = Text.Encoding.UTF8.GetString(ms.ToArray())
End Using
A small effort at testing did not show any discernible efficiency difference.

PDFSharp Export JPG - ASP.NET

I am using an ajaxfileupload control to upload a pdf file to the server. On the server side, I'd like to convert the pdf to jpg. Using the PDFsharp Sample: Export Images as a guide, I've got the following:
Imports System
Imports System.Drawing
Imports System.Drawing.Imaging
Imports PdfSharp.Pdf
Imports System.IO
Imports PdfSharp.Pdf.IO
Imports PdfSharp.Pdf.Advanced
Namespace Tools
Public Module ConvertImage
Public Sub pdf2JPG(pdfFile As String, jpgFile As String)
pdfFile = System.Web.HttpContext.Current.Request.PhysicalApplicationPath & "upload\" & pdfFile
Dim document As PdfSharp.Pdf.PdfDocument = PdfReader.Open(pdfFile)
Dim imageCount As Integer = 0
' Iterate pages
For Each page As PdfPage In document.Pages
' Get resources dictionary
Dim resources As PdfDictionary = page.Elements.GetDictionary("/Resources")
If resources IsNot Nothing Then
' Get external objects dictionary
Dim xObjects As PdfDictionary = resources.Elements.GetDictionary("/XObject")
If xObjects IsNot Nothing Then
Dim items As ICollection(Of PdfItem) = xObjects.Elements.Values
' Iterate references to external objects
For Each item As PdfItem In items
Dim reference As PdfReference = TryCast(item, PdfReference)
If reference IsNot Nothing Then
Dim xObject As PdfDictionary = TryCast(reference.Value, PdfDictionary)
' Is external object an image?
If xObject IsNot Nothing AndAlso xObject.Elements.GetString("/Subtype") = "/Image" Then
ExportImage(xObject, imageCount)
End If
End If
Next
End If
End If
Next
End Sub
Private Sub ExportImage(image As PdfDictionary, ByRef count As Integer)
Dim filter As String = image.Elements.GetName("/Filter")
Select Case filter
Case "/DCTDecode"
ExportJpegImage(image, count)
Exit Select
Case "/FlateDecode"
ExportAsPngImage(image, count)
Exit Select
End Select
End Sub
Private Sub ExportJpegImage(image As PdfDictionary, ByRef count As Integer)
' Fortunately JPEG has native support in PDF and exporting an image is just writing the stream to a file.
Dim stream As Byte() = image.Stream.Value
Dim fs As New FileStream([String].Format("Image{0}.jpeg", System.Math.Max(System.Threading.Interlocked.Increment(count), count - 1)), FileMode.Create, FileAccess.Write)
Dim bw As New BinaryWriter(fs)
bw.Write(stream)
bw.Close()
End Sub
Private Sub ExportAsPngImage(image As PdfDictionary, ByRef count As Integer)
Dim width As Integer = image.Elements.GetInteger(PdfImage.Keys.Width)
Dim height As Integer = image.Elements.GetInteger(PdfImage.Keys.Height)
Dim bitsPerComponent As Integer = image.Elements.GetInteger(PdfImage.Keys.BitsPerComponent)
' TODO: You can put the code here that converts vom PDF internal image format to a Windows bitmap
' and use GDI+ to save it in PNG format.
' It is the work of a day or two for the most important formats. Take a look at the file
' PdfSharp.Pdf.Advanced/PdfImage.cs to see how we create the PDF image formats.
' We don't need that feature at the moment and therefore will not implement it.
' If you write the code for exporting images I would be pleased to publish it in a future release
' of PDFsharp.
End Sub
End Module
End Namespace
As I debug, it blows up on Dim filter As String = image.Elements.GetName("/Filter") in ExportImage. The message is:
Unhandled exception at line 336, column 21 in ~:46138/ScriptResource.axd?d=LGq0ri4wlMGBKd-1vxLjtxNH_pd26HaruaEG_1eWx-epwPmhNKVpO8IpfHoIHzVj2Arxn5804quRprX3HtHb0OmkZFRocFIG-7a-SJYT_EwYUd--x9AHktpraSBgoZk4VJ1RMtFNwl1mULDLid5o5U9iBcuDi4EQpbpswgBn_oI1&t=ffffffffda74082d
0x800a139e - JavaScript runtime error: error raising upload complete event and start new upload
Any thoughts on what the issue might be? It seems an issue with the ajaxfileupload control, but I don't understand why it would be barking here. It's neither here nor there, but I know I'm not using jpgFile yet.
PDFsharp cannot create JPEG Images from PDF pages:
http://pdfsharp.net/wiki/PDFsharpFAQ.ashx#Can_PDFsharp_show_PDF_files_Print_PDF_files_Create_images_from_PDF_files_3
The sample you refer to can extract JPEG Images that are included in PDF files. That's all. The sample does not cover all possible cases.
Long story short: the code you are showing seems unrelated to the error message. And it seems unrelated to your intended goal.

Missing Carriage Return in Downloaded Email Attachment ImapX

Here is my code:
#Region "Imports"
Imports System.Text.RegularExpressions
Imports System.Text
Imports Microsoft.VisualBasic.CallType
Imports ImapX
Imports System.Runtime.CompilerServices
Imports System.Security.Authentication
Imports System.IO
Imports ml = System.Net.Mail
Imports System.Net
Imports ImapX.Enums
Imports ImapX.Constants
Imports System.Security.Authentication.SslProtocols
#End Region
Module Module1
Sub Main()
Dim _messages As List(Of ImapX.Message)
Using MyImapClient = New ImapX.ImapClient
With MyImapClient
.Host = ImapServer
.Port = Port
.SslProtocol = Ssl3 Or Tls
.ValidateServerCertificate = True
.Credentials = New ImapX.Authentication.PlainCredentials(UserName, Password)
Dim IsConnected As Boolean = .Connect
.Login()
.Behavior.AutoDownloadBodyOnAccess = False
.Behavior.AutoPopulateFolderMessages = False
.Behavior.MessageFetchMode = MessageFetchMode.Full
.Behavior.ExamineFolders = False
.Behavior.RequestedHeaders = {MessageHeader.From, MessageHeader.[Date], MessageHeader.Subject, MessageHeader.ContentType, MessageHeader.Importance}
'Dim IsInboxSelected As Boolean = .SelectFolder(.Folders.Inbox.Name)
'Dim IsInboxSelected As Boolean = .Folders(.Folders.Inbox.Name).[Select]()
End With
Dim MyFolder As Folder = MyImapClient.Folders.Inbox
_messages = MyFolder.Search().OrderBy(Function(n) n.[Date]).ToList()
_messages.ForEach(Sub(n) n.Download(MessageFetchMode.Full))
_messages.ForEach(Sub(n) n.Download(MessageFetchMode.Full))
End Using
Dim MyAttachment As ImapX.Attachment = _messages.First.Attachments.First
MyAttachment.Download()
Dim FolderPath As String = "C:\Users\AAA\Downloads\"
Dim LocalFileName As String = "1212.txt"
MyAttachment.Save(FolderPath, LocalFileName)
End Sub
End Module
The code works without issue--it connects to the imap server, downloads the first attachment of the first email, which happens to be a .txt file, so I'm saving it as such.
The problem is that the contents of the file is prepended with " * 2 FETCH (" and is followed by " UID 45", and all carriage returns are removed from the file.
Can you please assist? Thanks,
My guess is the CRLF's ( Carriage Return Line Feeds ) are not the format whatever you're viewing the download in is looking for. I would look at your attachment using something like NotePad++ and make sure you're showing Carriage Returns. If all you see are cr's and you're viewing the file in something looking for crlf's then they will get ignored.
Another thing to look at is what default encoding is your .download call using and what encoding is the original attachment in.

Loading a file into memory stream buffer and creating new file with same content and with different filename

I don't know whether it is simple or not because i am new to programming.
my requirement is : In my vb.net winform application, the filenames of the files present in "D:\Project" willbe displayed in DataGridView1 control. Now I want to load these files one after another into memory stream buffer and add the headers("ID","Name","Class") to the content in the file. Then I want to save these files in "C:\" with "_de" as suufix to the filename i.e.,sample_de.csv.
Can anyone please help me? If you need more clarity i can post it in more clear way
Many Thanks for your help in advance.
Try adapting this example to your situation:
Imports System.Text
Imports System.IO
Module Module1
Sub Main()
' Read input
Dim inputBuffer As Byte() = File.ReadAllBytes(".\input.txt")
' Manipulate the input
Dim outputBuffer As Byte() = DoSomethingWithMyBuffer(inputBuffer)
' Add headers
' There are several ecodings to choose from, make sure you are using
' the appropriate encoder for your file.
Dim outputTextFromBuffer As String = Encoding.UTF8.GetString(outputBuffer)
Dim finalOutputBuilder As StringBuilder = New StringBuilder()
finalOutputBuilder.AppendLine("""ID"",""Name"",""Class""")
finalOutputBuilder.Append(outputTextFromBuffer)
' Write output
File.WriteAllText(".\output.txt", finalOutputBuilder.ToString(), Encoding.UTF8)
End Sub
Private Function DoSomethingWithMyBuffer(inputBuffer As Byte()) As Byte()
'' Do nothing because this is just an example
Return inputBuffer
End Function
End Module