Extract text from a PDF email attachment without saving the attachment to a pdf file first - vb.net

I'm using PDF Extractor (from here) to get the text from PDF attachments in emails.
It seems to me that the only way I can extract the text is to save the PDF to a file, and then using the code.
Private Function ReadPdfToStringList(tempfilename As String) As List(Of String)
Dim extractedText As String
Using pdfFile As FileStream = File.OpenRead(tempfilename)
Using extractor As Extractor = New Extractor()
extractedText = extractor.ExtractToString(pdfFile)
End Using
End Using
DeleteTempFile()
Return New List(Of String)(extractedText.Split(Chr(13)))
End Function
to extract a list of Strings from the PDF file.
However, I cant seem to extract text from the attachment directly. The 'extractor' doesnt seem to be able to handle any source other than a file on disk.
Is there any possible way of either tricking the 'extractor' into opening a file from memory maybe by creating an in memory file stream?
I've tried using a MemoryStream like this:
Private Function ReadPdfMemStrmToStringList(memstream As MemoryStream) As List(Of String)
Dim extractedText As String
Using extractor As Extractor = New Extractor()
extractedText = extractor.ExtractToString(memstream)
End Using
Return New List(Of String)(extractedText.Split(Chr(13)))
End Function
but because the extractor is assuming the source is a disk file, it returns an error saying that it cant find a temporary file.
To be honest I've spent quite a bit of time trying to understand memory streams and they don't seem to fit the bill.
UPDATE
Here also is the code that I'm using to save the attachment to the MemoryStream.
Private Sub SaveAttachmentToMemStrm(msg As MimeMessage)
Dim memstrm As New MemoryStream
For Each attachment As MimePart In msg.Attachments
If attachment.FileName.Contains("booking") Then
attachment.WriteTo(memstrm)
End If
Next
'this line only adds the memory stream to a List (of MemoryStream)
attachments.Add(memstrm)
End Sub
Many apologies if I've missed something obvious.

Related

VB.NET: Modifying non-text file as text without ruining it

I need my application to find and modify a text string in a .swp file (generated by VBA for SOLIDWORKS). If I open said file as text in Notepad++, most of the text looks like this (this is an excerpt):
Meaning there is readable text, and symbols that appear as NUL, BEL, EXT and so on, depending on selected encoding. If I make my changes via Notepad++ (finding and changing "1.38" to "1.39"), there are no issues, the file can be opened via SOLIDWORKS and is still recognized as valid. After all, I don't need to modify these non-readable bits. However, if I do the same modification in my VB.NET application,
Dim filePath As String = "D:\OneDrive\Desktop\launcher macro.swp"
Dim fileContents As String = My.Computer.FileSystem.ReadAllText(filePath, Encoding.UTF8).Replace("1.38", "1.39")
My.Computer.FileSystem.WriteAllText(filePath, fileContents, Encoding.UTF8)
then the file gets corrupted, and is no longer recognized by SOLIDWORKS. I suspect this is because ReadAllText and WriteAllText cannot handle whatever data is in these non-readable bits.
I tried many different encodings, but it seems to make no difference. I am not sure how Notepad++ does it, but I can't seem to get the same result in my VB.NET application.
Can someone advise?
Thanks to #jmcilhinney, this is a solution that worked for me - reading file as bytes, converting to string, and then saving, using ANSI formatting:
Dim file_name As String = "D:\OneDrive\Desktop\launcher macro.swp"
Dim fs As New FileStream(file_name, FileMode.Open)
Dim binary_reader As New BinaryReader(fs)
fs.Position = 0
Dim bytes() As Byte = binary_reader.ReadBytes(binary_reader.BaseStream.Length)
Dim fileContents As String = System.Text.Encoding.Default.GetString(bytes)
fileContents = fileContents.Replace("1.38", "1.39")
binary_reader.Close()
fs.Dispose()
System.IO.File.WriteAllText(file_name, fileContents, Encoding.Default)

VB.NET 2010 - Extracting an application resource to the Desktop

I am trying to extract an application resource from My.Resources.FILE
I have discovered how to do this with DLL & EXE files, but I still need help with the code for extracting PNG & ICO files.
Other file types also. (If possible)
Here is my current code that works with DLL & EXE files.
Dim File01 As System.IO.FileStream = New System.IO.FileStream("C:\Users\" + Environment.UserName + "\Desktop\" + "SAMPLE.EXE", IO.FileMode.Create)
File01.Write(My.Resources.SAMPLE, 0, My.Resources.SAMPLE.Length)
File01.Close()
First things first, the code you have is bad. When using My.Resources, every time you use a property, you extract a new copy of the data. That means that your second line is getting the data to write twice, with the second time being only to get its length. At the very least, you should be getting the data only once and assigning it to a variable, then using that variable twice. You should also be using a Using statement to create and destroy the FileStream. Even better though, just call File.WriteAllBytes, which means that you don't have to create your own FileStream or know the length of the data to write. You should also not be constructing the file path that way.
Dim filePath = Path.Combine(My.Computer.FileSystem.SpecialDirectories.Desktop, "SAMPLE.EXE")
File.WriteAllBytes(filePath, My.Resources.SAMPLE)
As for your question, the important thing to understand here is that it really has nothing to do with resources. The question is really how to save data of any particular type and that is something that you can look up for yourself. When you get the value of a property from My.Resources, the type of the data you get will depend on the type of the file you embedded in first place. In the case of a binary file, e.g. DLL or EXE, you will get back a Byte array and so you save that data to a file in the same way as you would any other Byte array. In the case of an image file, e.g. PNG, you will get back an Image object, so you save that like you would any other Image object, e.g.
Dim filePath = Path.Combine(My.Computer.FileSystem.SpecialDirectories.Desktop, "PICTURE.PNG")
Using picture = My.Resources.PICTURE
picture.Save(filePath, picture.RawFormat)
End Using
For an ICO file you will get back an Icon object. I'll leave it to you to research how to save an Icon object to a file.
EDIT:
It's important to identify what the actual problem is that you're trying to solve. You can obviously get an object from My.Resources so that is not the problem. You need to determine what type that object is and determine how to save an object of that type. How to do that will be the same no matter where that object comes from, so the resources part is irrelevant. Think about what it is that you have to do and write a method to do it, then call that method.
In your original case, you could start like this:
Dim data = My.Resources.SAMPLE
Once you have written that - even as you write it - Intellisense will tell you that the data is a Byte array. Your actual problem is now how to save a Byte array to a file, so write a method that does that:
Private Sub SaveToFile(data As Byte(), filePath As String)
'...
End Sub
You can now which you want to do first: write code to call that method as appropriate for your current scenario or write the implementation of the method. There are various specific ways to save binary data, i.e. a Byte array, to a file but, as I said, the simplest is File.WriteAllBytes:
Private Sub SaveToFile(data As Byte(), filePath As String)
File.WriteAllBytes(filePath, data)
End Sub
As for calling the method, you need to data, which you already have, and the file path:
Dim data = My.Resources.SAMPLE
Dim folderPath = My.Computer.FileSystem.SpecialDirectories.Desktop
Dim fileName = "SAMPLE.EXE"
Dim filePath = Path.Combine(folderPath, fileName)
SaveToFile(data, filePath)
Simple enough. You need to follow the same steps for any other resource. If you embedded a PNG file then you would find that the data is an Image object or, more specifically, a Bitmap. Your task is then to learn how to save such an object to a file. It shouldn't take you long to find out that the Image class has its own Save method, so you would use that in your method:
Private Sub SaveToFile(data As Image, filePath As String)
data.Save(filePath, data.RawFormat)
End Sub
The code to call the method is basically as before, with the exception that an image object needs to be disposed when you're done with it:
Dim data = My.Resources.PICTURE
Dim folderPath = My.Computer.FileSystem.SpecialDirectories.Desktop
Dim fileName = "SAMPLE.EXE"
Dim filePath = Path.Combine(folderPath, fileName)
SaveToFile(data, filePath)
data.Dispose()
The proper way to create and dispose an object in a narrow scope like this is with a Using block:
Dim folderPath = My.Computer.FileSystem.SpecialDirectories.Desktop
Dim fileName = "SAMPLE.EXE"
Dim filePath = Path.Combine(folderPath, fileName)
Using data = My.Resources.PICTURE
SaveToFile(data, filePath)
End Using
Now it is up to you to carry out the same steps for an ICO file. If you are a hands on learner then get your hands on.

how to read a CSV file as resource using TextFieldParser

I have a CSV file in my project resources which I want to read using FileIO.TextFieldParser
I tried Dim parser = new TextFieldParser(My.Resources.ArticlesCSV), but since TextFieldParser expects either a path (as string) or a stream, this is not working.
I guess one possibility is to convert the resource to a stream, but I cannot find how to do that...
What is the best way to get this working?
You can create a new instance of IO.StringReader which is of type TextReader that TextFieldParser will accept. Just pass your CSV file (Thanks to AndrewMorton)
Using strReader As New IO.StringReader(My.Resources.ArticlesCSV)
Using textparser As New TextFieldParser(strReader)
textparser.Delimiters = {","}
While Not textparser.EndOfData
Dim curRow = textparser.ReadFields()
' Do stuff
End While
End Using
End Using

Writing to FileStream works, MemoryStream copied to FileStream doesn't

I have some code that used a FileStream, StreamWriter and XmlDocument to produce Excel-compatible output files. Very useful!
However I now have a need to make copies of the file, and I'd like to do that in-memory. So I took my original FileStream code and changed the FileStream to a MemoryStream, and then wrapped that in this function:
'----------------------------------------------------------------------------------
Friend Sub Save(Optional ByVal SaveCalculatedResults As Boolean = True)
Dim MStream As MemoryStream
Dim FStream As FileStream
Dim Bytes As Byte()
'make the stream containing the XML
MStream = ToXLSL(SaveCalculatedResults)
If MStream.Length = 0 Then Return
'then read that data into a byte buffer
ReDim Bytes(CInt(MStream.Length))
MStream.Read(Bytes, 0, CInt(MStream.Length))
'and then write it to "us"
FStream = New FileStream("C:\OUTFILE.XLSX", FileMode.Create)
FStream.Write(Bytes, 0, CInt(MStream.Length))
FStream.Flush()
End Sub
This creates a file in the correct location, it has the exact same length as it did before, but opening it in Excel causes an error about the file format being invalid.
Can anyone see any obvious problems in that code? Perhaps I am writing the bytes backwards? Is this possibly a text encoding problem? 32/64 problem?
p.s. I tried using CopyTo, but that doesn't seem to work in VB?
It requires guessing what ToXLSL() does but the behavior gives a strong hint: the MemoryStream's Position is located at the end of the stream. So the Read() call doesn't actually read anything. Verify by checking its return value.
Just get rid of Bytes() entirely, it is very wasteful to duplicate the data like this. You don't need it, the MemoryStream already gives you access to the data:
Using FStream = New FileStream("C:\OUTFILE.XLSX", FileMode.Create)
FStream.Write(MStream.GetBuffer(), 0, CInt(MStream.Length))
End Using
Do note that the Using statement is not optional. And that you cannot write to C:\

Stream Reader and Writer Conflict

I am making a class that is to help with saving some strings to a local text file (I want to append them to that file and not overwrite so that it is a log file). When I write with the streamwriter to find the end of the previous text, I get an error "the file is not available as it is being used by another process". I looked into this problem on MSDN and I got very little help. I tried to eliminate some variables so I removed the streamreader to check was that the problem and it was. When I tried to write to the file then it worked and I got no error so this made me come to the conclusion that the problem arose in the streamreader. But I could not figure out why?
Here is the code:
Public Sub SaveFile(ByVal Task As String, ByVal Difficulty As Integer, ByVal Time_Taken As String)
Dim SW As String = "C:/Program Files/Business Elements/Dashboard System Files/UserWorkEthic.txt"
Dim i As Integer
Dim aryText(3) As String
aryText(0) = Task
aryText(1) = Difficulty
aryText(2) = Time_Taken
Dim objWriter As System.IO.StreamWriter = New System.IO.StreamWriter(SW, True)
Dim reader As System.IO.StreamReader = New System.IO.StreamReader(SW, True)
reader.ReadToEnd()
reader.EndOfStream.ToString()
For i = 0 To 3
objWriter.WriteLine(aryText(reader.EndOfStream + i))
Next
reader.Close()
objWriter.Close()
End Sub
As Joel has commented on the previous answer it is possible to change the type of locking.
Otherwise building on what Neil has suggested, if to try to write to a file with a new reader it is difficult not to lose the information already within the file.
I would suggest you rename the original file to a temporary name, "UserWorkEthicTEMP.txt" for example. Create a new text file with the original name. Now; read a line, write a line, between the two files, before adding your new data onto the end. Finally Delete the temporary file and you will have the new file with the new details. If you have an error the temporary file will serve as a backup of the original. Some sample code below:
Change file names
Dim Line as string
line=Reader.readline
Do until Line=nothing
objwriter.writeline(line)
line=reader.readline
loop
add new values on the end and remove old file
You are trying to read and write to the same file and this is causing a lock contention. Either store the contents of the file into a variable and then write it back out including your new data to the file.
Psuedo
Reader.Open file
String content = Reader.ReadToEnd()
Reader.Close
Writer.Open file
Loop
Writer.Write newContent
Writer.Close