How to recombine document pages stored as separate Base64 strings in FileNet using VB.Net? - vb.net

I have a document that is stored on FileNet. Each page of the document is stored as a separate base 64 encoded string. I need to get all of these pages into a single document again.
What I have attempted to do is to decode the Base64 string into an array. for each page of the document, I decode the Base64 string into a byte array using concatenation. I then use the File.WriteAllBytes method to create a single file. This file is a valid TIFF file and I am able to open it however only the last page appears. I have checked to make sure that the application I am using to open the document is capable of showing more than one page. I am using the Windows Photos application which will show all pages of a TIFF document.
How can I merge the pages of this document so that each page will appear correctly?
For example, the code below reads the Base64 string for each file and then combines them into a single output file. When I open bytefileout3.tiff however, I can only see the last page that was added into the document.
Dim inputPath As String = "C:\temp\file1.txt"
Dim fileStr As String = File.ReadAllText(inputPath)
Dim bytes As Byte() = Convert.FromBase64String(fileStr)
Dim inputPath2 As String = "C:\temp\file2.txt"
Dim fileStr2 As String = File.ReadAllText(inputPath2)
Dim bytes2 As Byte() = Convert.FromBase64String(fileStr2)
Dim bytes3 As Byte() = New Byte() {}
File.WriteAllBytes("c:\temp\bytefileout.tiff", bytes)
File.WriteAllBytes("c:\temp\bytefileout2.tiff", bytes2)
bytes3 = bytes2.Concat(bytes).ToArray()
File.WriteAllBytes("c:\temp\bytefileout3.tiff", bytes3)

Related

VB.NET: Modifying non-text file as text without ruining it

I need my application to find and modify a text string in a .swp file (generated by VBA for SOLIDWORKS). If I open said file as text in Notepad++, most of the text looks like this (this is an excerpt):
Meaning there is readable text, and symbols that appear as NUL, BEL, EXT and so on, depending on selected encoding. If I make my changes via Notepad++ (finding and changing "1.38" to "1.39"), there are no issues, the file can be opened via SOLIDWORKS and is still recognized as valid. After all, I don't need to modify these non-readable bits. However, if I do the same modification in my VB.NET application,
Dim filePath As String = "D:\OneDrive\Desktop\launcher macro.swp"
Dim fileContents As String = My.Computer.FileSystem.ReadAllText(filePath, Encoding.UTF8).Replace("1.38", "1.39")
My.Computer.FileSystem.WriteAllText(filePath, fileContents, Encoding.UTF8)
then the file gets corrupted, and is no longer recognized by SOLIDWORKS. I suspect this is because ReadAllText and WriteAllText cannot handle whatever data is in these non-readable bits.
I tried many different encodings, but it seems to make no difference. I am not sure how Notepad++ does it, but I can't seem to get the same result in my VB.NET application.
Can someone advise?
Thanks to #jmcilhinney, this is a solution that worked for me - reading file as bytes, converting to string, and then saving, using ANSI formatting:
Dim file_name As String = "D:\OneDrive\Desktop\launcher macro.swp"
Dim fs As New FileStream(file_name, FileMode.Open)
Dim binary_reader As New BinaryReader(fs)
fs.Position = 0
Dim bytes() As Byte = binary_reader.ReadBytes(binary_reader.BaseStream.Length)
Dim fileContents As String = System.Text.Encoding.Default.GetString(bytes)
fileContents = fileContents.Replace("1.38", "1.39")
binary_reader.Close()
fs.Dispose()
System.IO.File.WriteAllText(file_name, fileContents, Encoding.Default)

Extract text from a PDF email attachment without saving the attachment to a pdf file first

I'm using PDF Extractor (from here) to get the text from PDF attachments in emails.
It seems to me that the only way I can extract the text is to save the PDF to a file, and then using the code.
Private Function ReadPdfToStringList(tempfilename As String) As List(Of String)
Dim extractedText As String
Using pdfFile As FileStream = File.OpenRead(tempfilename)
Using extractor As Extractor = New Extractor()
extractedText = extractor.ExtractToString(pdfFile)
End Using
End Using
DeleteTempFile()
Return New List(Of String)(extractedText.Split(Chr(13)))
End Function
to extract a list of Strings from the PDF file.
However, I cant seem to extract text from the attachment directly. The 'extractor' doesnt seem to be able to handle any source other than a file on disk.
Is there any possible way of either tricking the 'extractor' into opening a file from memory maybe by creating an in memory file stream?
I've tried using a MemoryStream like this:
Private Function ReadPdfMemStrmToStringList(memstream As MemoryStream) As List(Of String)
Dim extractedText As String
Using extractor As Extractor = New Extractor()
extractedText = extractor.ExtractToString(memstream)
End Using
Return New List(Of String)(extractedText.Split(Chr(13)))
End Function
but because the extractor is assuming the source is a disk file, it returns an error saying that it cant find a temporary file.
To be honest I've spent quite a bit of time trying to understand memory streams and they don't seem to fit the bill.
UPDATE
Here also is the code that I'm using to save the attachment to the MemoryStream.
Private Sub SaveAttachmentToMemStrm(msg As MimeMessage)
Dim memstrm As New MemoryStream
For Each attachment As MimePart In msg.Attachments
If attachment.FileName.Contains("booking") Then
attachment.WriteTo(memstrm)
End If
Next
'this line only adds the memory stream to a List (of MemoryStream)
attachments.Add(memstrm)
End Sub
Many apologies if I've missed something obvious.

Saving a base 64 encoded image in MongoDB GridFS

I have a web service that takes the content of a canvas tag and saves it into a MongoDB GridFS store.
The code below works, however it requires saving the image to disk before sending it to MongoDB.
Using postBody As Stream = Request.InputStream
' Get the body of the HTTP POST (the data:image/png)
postBody.Seek(0, SeekOrigin.Begin)
Dim imageData As String = New StreamReader(postBody).ReadToEnd
Dim base64Data = Regex.Match(imageData, "data:image/(?<type>.+?),(?<data>.+)").Groups("data").Value
Dim data As Byte() = Convert.FromBase64String(base64Data)
Using stream = New MemoryStream(data, 0, data.Length)
Dim img As System.Drawing.Image = System.Drawing.Image.FromStream(stream)
Dim directory = Server.MapPath("~/App_Data/temp/")
Dim file = String.Concat(directory, id, ".png")
img.Save(file, System.Drawing.Imaging.ImageFormat.Png)
Using fs = New FileStream(file, FileMode.Open)
db.GridFS.Upload(fs, id & ".png")
End Using
End Using
End Using
Is there a better way, perhaps without the need to persist it to disk before uploading to MongoDB?
As suggested in the comments, just use the Stream as an argument to Upload instead of writing out to file.
And also note that you do not have to convert to base64 in order to send the file via GridFS (or a plain mongo field for that matter). The input can be binary, unless of course you always want your data base64 encoded for your convenience.

how to load a file from folder to memory stream buffer

I am working on vb.net win form. My task is display the file names from a folder onto gridview control. when user clicks process button in my UI, all the file names present in gridview, the corresponding file has to be loaded onto memory stream buffer one after another and append the titles to the content of the file and save it in hard drive with _ed as a suffix to the file name.
I am very basic programmer. I have done the following attempt and succeeded in displaying filenames onto gridview. But no idea of later part. Any suggestions please?
'Displaying files from a folder onto a gridview
Dim inqueuePath As String = "C:\Users\Desktop\INQUEUE"
Dim fileInfo() As String
Dim rowint As Integer = 0
Dim name As String
Dim directoryInfo As New System.IO.DirectoryInfo(inqueuePath)
fileInfo = System.IO.Directory.GetFiles(inqueuePath)
With Gridview1
.Columns.Add("Column 0", "FileName")
.AutoResizeColumns()
End With
For Each name In fileInfo
Gridview1.Rows.Add()
Dim filename As String = System.IO.Path.GetFileName(name)
Gridview1.Item(0, rowint).Value = filename
rowint = rowint + 1
Next
Thank you very much for spending your valuable time to read this post.
to read a file into a memorystream is quite easy, just have a look at the following example and you should be able to convert it to suite your needs:
Dim bData As Byte()
Dim br As BinaryReader = New BinaryReader(System.IO.File.OpenRead(Path))
bData = br.ReadBytes(br.BaseStream.Length)
Dim ms As MemoryStream = New MemoryStream(bData, 0, bData.Length)
ms.Write(bData, 0, bData.Length)
then just use the MemoryStream ms as you please. Just to clearify Path holds the full path and filename you want to read into your memorystream.

How to replace CRLF with a space?

How can I parse out undesireable characters from a collection of data?
I am working with existing VB.NET code for a Windows Application that uses StreamWriter and Serializer to output an XML document of transaction data. Code below.
Private TransactionFile As ProjectSchema.TransactionFile
Dim Serializer As New Xml.Serialization.XmlSerializer(GetType (ProjectSchema.TransactionFile))
Dim Writer As TextWriter
Dim FilePath As String
Writer = New StreamWriter(FilePath)
Serializer.Serialize(Writer, TransactionFile)
Writer.Close()
The XML document is being uploaded to another application that does not accept "crlf".
The "TransactionFile" is a collection of data in a Class named ProjectSchema.TransactionFile. It contains various data types.
There are 5 functions to create nodes that contribute to the creation of a Master Transaction file named TransactionFile
I need to find CRLF characters in the collection of data and replace the CRLF characters with a space.
I am able to replace illegal characters at the field level with:
.Name = Regex.Replace((Mid(CustomerName.Name, 1, 30)), "[^A-Za-z0-9\-/]", " ")
But I need to scrub the entire collection of data.
If I try:
TransactionFile = Regex.Replace(TransactionFile, "[^A-Za-z0-9\-/]", " ")
Because TransactionFile cannot be converted to String I get a "Conversion from type 'Transaction' to type 'String' is not valid" message.
Bottom Line = How can I replace CRLF with a space when it shows up in TransactionFile data?
Don't do it this way. Create the serializer with XmlWriter.Create(). Which has an overload that accepts an XmlWriterSettings object. Which has lots of options to format the generated XML. Like NewLineChars, it lets you set the characters to use for a line end.
As Hans says, mess around with the XmlWriterSettings.
The next best choice is to write the file, then read the file into an xml object and process it element by element. This would let you remove crlf from within individual elements, but leave the ones between elements alone, for example.
Another possibility - rather than write directly to the file, you can make an intermediate string, and do a replace in that:
Dim ms As New MemoryStream
Serializer.Serialize(ms, TransactionFile)
ms.Flush()
ms.Position = 0
Dim sr As New StreamReader(ms)
Dim xmlString As String = sr.ReadToEnd
sr.Close() ' also closes underlying memorystream
Then you could do your regex replace on the xmlString before writing it to a file. This should get all the crlf pairs, both within elements and between.