PDFSharp Export JPG - ASP.NET - vb.net

I am using an ajaxfileupload control to upload a pdf file to the server. On the server side, I'd like to convert the pdf to jpg. Using the PDFsharp Sample: Export Images as a guide, I've got the following:
Imports System
Imports System.Drawing
Imports System.Drawing.Imaging
Imports PdfSharp.Pdf
Imports System.IO
Imports PdfSharp.Pdf.IO
Imports PdfSharp.Pdf.Advanced
Namespace Tools
Public Module ConvertImage
Public Sub pdf2JPG(pdfFile As String, jpgFile As String)
pdfFile = System.Web.HttpContext.Current.Request.PhysicalApplicationPath & "upload\" & pdfFile
Dim document As PdfSharp.Pdf.PdfDocument = PdfReader.Open(pdfFile)
Dim imageCount As Integer = 0
' Iterate pages
For Each page As PdfPage In document.Pages
' Get resources dictionary
Dim resources As PdfDictionary = page.Elements.GetDictionary("/Resources")
If resources IsNot Nothing Then
' Get external objects dictionary
Dim xObjects As PdfDictionary = resources.Elements.GetDictionary("/XObject")
If xObjects IsNot Nothing Then
Dim items As ICollection(Of PdfItem) = xObjects.Elements.Values
' Iterate references to external objects
For Each item As PdfItem In items
Dim reference As PdfReference = TryCast(item, PdfReference)
If reference IsNot Nothing Then
Dim xObject As PdfDictionary = TryCast(reference.Value, PdfDictionary)
' Is external object an image?
If xObject IsNot Nothing AndAlso xObject.Elements.GetString("/Subtype") = "/Image" Then
ExportImage(xObject, imageCount)
End If
End If
Next
End If
End If
Next
End Sub
Private Sub ExportImage(image As PdfDictionary, ByRef count As Integer)
Dim filter As String = image.Elements.GetName("/Filter")
Select Case filter
Case "/DCTDecode"
ExportJpegImage(image, count)
Exit Select
Case "/FlateDecode"
ExportAsPngImage(image, count)
Exit Select
End Select
End Sub
Private Sub ExportJpegImage(image As PdfDictionary, ByRef count As Integer)
' Fortunately JPEG has native support in PDF and exporting an image is just writing the stream to a file.
Dim stream As Byte() = image.Stream.Value
Dim fs As New FileStream([String].Format("Image{0}.jpeg", System.Math.Max(System.Threading.Interlocked.Increment(count), count - 1)), FileMode.Create, FileAccess.Write)
Dim bw As New BinaryWriter(fs)
bw.Write(stream)
bw.Close()
End Sub
Private Sub ExportAsPngImage(image As PdfDictionary, ByRef count As Integer)
Dim width As Integer = image.Elements.GetInteger(PdfImage.Keys.Width)
Dim height As Integer = image.Elements.GetInteger(PdfImage.Keys.Height)
Dim bitsPerComponent As Integer = image.Elements.GetInteger(PdfImage.Keys.BitsPerComponent)
' TODO: You can put the code here that converts vom PDF internal image format to a Windows bitmap
' and use GDI+ to save it in PNG format.
' It is the work of a day or two for the most important formats. Take a look at the file
' PdfSharp.Pdf.Advanced/PdfImage.cs to see how we create the PDF image formats.
' We don't need that feature at the moment and therefore will not implement it.
' If you write the code for exporting images I would be pleased to publish it in a future release
' of PDFsharp.
End Sub
End Module
End Namespace
As I debug, it blows up on Dim filter As String = image.Elements.GetName("/Filter") in ExportImage. The message is:
Unhandled exception at line 336, column 21 in ~:46138/ScriptResource.axd?d=LGq0ri4wlMGBKd-1vxLjtxNH_pd26HaruaEG_1eWx-epwPmhNKVpO8IpfHoIHzVj2Arxn5804quRprX3HtHb0OmkZFRocFIG-7a-SJYT_EwYUd--x9AHktpraSBgoZk4VJ1RMtFNwl1mULDLid5o5U9iBcuDi4EQpbpswgBn_oI1&t=ffffffffda74082d
0x800a139e - JavaScript runtime error: error raising upload complete event and start new upload
Any thoughts on what the issue might be? It seems an issue with the ajaxfileupload control, but I don't understand why it would be barking here. It's neither here nor there, but I know I'm not using jpgFile yet.

PDFsharp cannot create JPEG Images from PDF pages:
http://pdfsharp.net/wiki/PDFsharpFAQ.ashx#Can_PDFsharp_show_PDF_files_Print_PDF_files_Create_images_from_PDF_files_3
The sample you refer to can extract JPEG Images that are included in PDF files. That's all. The sample does not cover all possible cases.
Long story short: the code you are showing seems unrelated to the error message. And it seems unrelated to your intended goal.

Related

Winnovative Hides Text of TextElements on PDFs

There is a group somewhere in our organization that scans documents and converts them to PDFs. They then associate those PDFs with an "event" record and store them in a database. On demand, my application -- which uses Winnovative HTML to PDF v9.0.0.0 -- has to retrieve the PDFs associated with an event, place a header on the first page of each, and store them on the file system. This header is a TextElement.
On some PDFs, the header displays beautifully. On others, the header does not appear. However, when viewing the PDF, the header can be "highlighted" with the cursor and its text successfully copied, so the header is indeed present and properly positioned. (See the green arrow in the inserted image.)
I have identified two PDFs that were scanned by the same person thirty minutes apart and associated with the same event in the database. On one, the header is displayed; on the other, it is not. To investigate, I have set the BackColor of the TextElement to Crimson. The Text appears and doesn't appear as before, but the TextElement always appears bright red.
The properties of the two Document and PDFPage objects are identical, including the TransparencyEnabled property. This phenomenon is present in PDFs of all sorts of documents scanned by various people over time. And it's not just this header TextElement, but TextElements everywhere on the PDF (e.g. Page X of Y, watermarks). On a given PDF, if the Text of one is visible, the Text of all is visible, and vice versa.
I can find no pattern or explanation. What could be causing some PDFs to "hide" the Text (and only the Text) of all TextElements that I put on them while others don't?
Private Sub AddTitleToFirstPage(ByRef pdf As Document)
Dim headerSystemFont As New Font("Arial", 10)
Dim headerFont As PdfFont = pdf.Fonts.Add(headerSystemFont)
Dim headerTextElement As New TextElement(65, 20, "My Page Title", headerFont)
headerTextElement.TextAlign = HorizontalTextAlign.Center
headerTextElement.ForeColor = Color.DarkBlue
headerTextElement.BackColor = Color.Crimson
pdf.Pages(0).AddElement(headerTextElement)
End Sub
Friend Function UpdatePdfDoc(pdfBytes As Byte()) As Byte()
Dim bytes As Byte()
Using docStream As New MemoryStream(pdfBytes, 0, pdfBytes.Length)
Dim returnDoc As Document = New Document(docStream)
returnDoc.LicenseKey = WinnovativeLicenceKey
AddTitleToFirstPage(returnDoc)
bytes = returnDoc.Save()
docStream.Close()
End Using
Return bytes
End Function
Friend Function GetEventObjectPdfSource(scannedDocIds As List(Of String)) As Object
Dim scannedDocObjectPdfSourceList As New List(Of Byte())()
For Each scannedDocId As String In scannedDocIds
Dim scannedDocObjectPdfSource As Byte() = GetScannedDocBlobById(scannedDocId)
scannedDocObjectPdfSource = UpdatePdfDoc(scannedDocObjectPdfSource)
scannedDocObjectPdfSourceList.Add(scannedDocObjectPdfSource)
Next
Return scannedDocObjectPdfSourceList
End Function
Friend Function GetEventObjectPdf(eventId As String) As String
Dim pdfFileName As String = GetPDFFileName(eventId)
Dim scannedDocIds As List(Of String) = GetScannedDocumentsForEvent(eventId)
Dim objectPdfSourceList As List(Of Byte()) = CType(GetEventObjectPdfSource1(scannedDocIds), List(Of Byte()))
For Each objectPdfSource As Byte() In objectPdfSourceList
Using docStream As New MemoryStream(objectPdfSource, 0, objectPdfSource.Length)
Dim masterDoc As New Document(docStream)
masterDoc.LicenseKey = WinnovativeLicenceKey
Do While masterDoc.Bookmarks.Count > 0
masterDoc.Bookmarks.Remove(0)
Loop
Try
masterDoc.AutoCloseAppendedDocs = True
masterDoc.Save(pdfFileName)
Catch ex As Threading.ThreadAbortException
Threading.Thread.ResetAbort()
Finally
masterDoc.DetachStream()
masterDoc.Close()
End Try
docStream.Close()
End Using
Next
Return pdfFileName
End Function
Please forgive the clunky code. I didn't write it. I just inherited it.

Stream editing OpenXml powerpoint presentation slides

I am trying to edit the XML stream of Powerpoint slides using OpenXml and Streamreader/Streamwriter.
For a word document, it's easy:
Imports System.IO
Imports DocumentFormat.OpenXml
Imports DocumentFormat.OpenXml.Packaging
Imports DocumentFormat.OpenXml.Presentation
Imports DocumentFormat.OpenXml.Wordprocessing
'
'
'
' Open a word document
CurrentOpenDocument = WordprocessingDocument.Open(TheWordFileName, True)
' for a word document, this works
Using (CurrentOpenDocument)
' read the xml stream
Dim sr As StreamReader = New StreamReader(CurrentOpenDocument.MainDocumentPart.GetStream)
docText = sr.ReadToEnd
' do the substitutions here
docText = DoSubstitutions(docText)
' write the modified xml stream
Dim sw As StreamWriter = New StreamWriter(CurrentOpenDocument.MainDocumentPart.GetStream(FileMode.Create))
Using (sw)
sw.Write(docText)
End Using
End Using
But for Powerpoint (presentations), I find that the inserted modified XML stream for the slideparts do not get saved:
Imports System.IO
Imports DocumentFormat.OpenXml
Imports DocumentFormat.OpenXml.Packaging
Imports DocumentFormat.OpenXml.Presentation
Imports DocumentFormat.OpenXml.Wordprocessing
'
'
' Open a powerpoint presentation
CurrentOpenPresentation = PresentationDocument.Open(ThePowerpointFileName, True)
' for a powerpoint presentation, this doesn't work
Using (CurrentOpenPresentation)
' Get the presentation part of the presentation document.
Dim pPart As PresentationPart = CurrentOpenPresentation.PresentationPart
' Verify that the presentation part and presentation exist.
If pPart IsNot Nothing AndAlso pPart.Presentation IsNot Nothing Then
' Get the Presentation object from the presentation part.
Dim pres As Presentation = pPart.Presentation
' Verify that the slide ID list exists.
If pres.SlideIdList IsNot Nothing Then
' Get the collection of slide IDs from the slide ID list.
Dim slideIds = pres.SlideIdList.ChildElements
' loop through each slide
For Each sID In slideIds
Dim slidePartRelationshipId As String = (TryCast(sID, SlideId)).RelationshipId
Dim TheslidePart As SlidePart = CType(pPart.GetPartById(slidePartRelationshipId), SlidePart)
' If the slide exists...
If TheslidePart.Slide IsNot Nothing Then
Dim sr As StreamReader = New StreamReader(TheslidePart.GetStream)
Using (sr)
docText = sr.ReadToEnd
End Using
docText = DoSubstitutions(docText)
Dim sw As StreamWriter = New StreamWriter(TheslidePart.GetStream(FileMode.Create))
Using (sw)
sw.Write(docText)
End Using
End If
Next
End If
End Using
I've also tried iterating through the in-memory slideparts to check the XML stream, and they ARE changed.
It's just that this never gets saved back to the file in the dispose (end using) and no error exceptions are raised.
Has anyone else experienced this?
After about a week of messing around with this, I came upon the answer. It is to reference the slideparts from the collection instead of referencing via the relationship Id's, although I don't know why this works and the initial approach doesn't:
' This DOES work
Using (CurrentOpenPresentation)
' Get the presentation part of the presentation document.
Dim pPart As PresentationPart = CurrentOpenPresentation.PresentationPart
' Verify that the presentation part and presentation exist.
If pPart IsNot Nothing AndAlso pPart.Presentation IsNot Nothing Then
' reference each slide in turn and do the Substitution
For Each s In pPart.SlideParts
Dim sr As StreamReader = New StreamReader(s.GetStream)
Using (sr)
docText = sr.ReadToEnd
End Using
docText = DoSubstitutions(docText)
Dim sw As StreamWriter = New StreamWriter(s.GetStream(FileMode.Create))
Using (sw)
sw.Write(docText)
End Using
Next
End If
End Using

Detect if any file in use by other process of a directory in VB

I am trying to get my vb.net application to look inside a folder and then let me know whether any file older is in use by any application. If in use it will show a message box. I am coding in VB.NET 2008, Express Edition. ...Would anybody know how I do that? Thanks
You can extend the suggested solution with enumerating files in a directory.
Imports System.IO
Imports System.Runtime.InteropServices
Module Module1
Sub Main()
' Here you specify the given directory
Dim rootFolder As DirectoryInfo = New DirectoryInfo("C:\SomeDir")
' Then you enumerate all the files within this directory and its subdirectory
' See System.IO.SearchOption enum for more info
For Each file As FileInfo In rootFolder.EnumerateFiles("*.*", SearchOption.AllDirectories)
' Here you can call the method from the solution linked in Sachin's comment
IsFileOpen(file)
Next
End Sub
' Jeremy Thompson's code from here
Private Sub IsFileOpen(ByVal file As FileInfo)
Dim stream As FileStream = Nothing
Try
stream = file.Open(FileMode.Open, FileAccess.ReadWrite, FileShare.None)
stream.Close()
Catch ex As Exception
If TypeOf ex Is IOException AndAlso IsFileLocked(ex) Then
' do something here, either close the file if you have a handle, show a msgbox, retry or as a last resort terminate the process - which could cause corruption and lose data
End If
End Try
End Sub
Private Function IsFileLocked(exception As Exception) As Boolean
Dim errorCode As Integer = Marshal.GetHRForException(exception) And ((1 << 16) - 1)
Return errorCode = 32 OrElse errorCode = 33
End Function
End Module

Missing deselected display items at the beginning of the list box (iTextSharp)

I’m getting unwanted results in Adobe Reader DC when generating or regenerating a multi-selection list box with iTextSharp in an Acroform PDF document.
Problem: The PDF form is missing deselected display items at the beginning of the list box when viewing the modified PDF in Adobe Reader DC. For example: “One“,“Two“,“Three“,“Four“,“Five“ are list items; and “Two“ and “Four“ are selected; then the previous items such as “One” are missing the top of the list box. And the first item displayed in the list box starts with the first selection, in this case “Two”. (See Adobe Reader DC Screenshot)
FYI: Using Adobe Reader DC, when I select different field selections in the list box, and then click outside the list box, the list box field reverts back to normal appearance with all the items shown. I can’t reproduce this behavior when opening the modified PDF in Adobe Acrobat Professional 8 and all the field items are visible and correctly selected. This missing list items behavior can also be reproduced in GhostScript when converting PDF to BMP or PNG.
Please answer my question: Can you please provide me with a resolution to this issue if this is an iTextSharp problem or if my syntax is incorrect. Would you also please let me know if this behavior can reproduced using your Adobe Reader DC?
Thank you for your support!
Modified Acroform PDF Document with issue:
http://www.nk-inc.com/listbox-error.pdf
Adobe Reader DC Screenshot:
(source: nk-inc.com)
ADDITIONAL INFORMATION:
iTextSharp.dll Version: 5.5.6.0
Adobe Reader DC Version: 2015.008.20082
Adobe Acrobat Pro Version: 8.x
Form Type: Acroform PDF
VB.NET CODE (v3.5 – Windows Application):
Imports iTextSharp.text.pdf
Imports iTextSharp.text
Imports System.IO
Public Class listboxTest
Private Sub RunTest()
Dim cList As New listboxTest()
Dim fn As String = Application.StartupPath.ToString.TrimEnd("\") & "\listbox-error.pdf"
Dim b() As Byte = cList.addListBox(System.IO.File.ReadAllBytes(fn), New iTextSharp.text.Rectangle(231.67, 108.0, 395.67, 197.0), "ListBox1", "ListBox1", 1)
File.WriteAllBytes(fn, b)
Process.Start(fn)
End Sub
Public Function addListBox(ByVal pdfBytes() As Byte, ByVal newRect As Rectangle, ByVal newFldName As String, ByVal oldfldname As String, ByVal pg As Integer) As Byte()
Dim pdfReaderDoc As New PdfReader(pdfBytes)
Dim m As New System.IO.MemoryStream
Dim b() As Byte = Nothing
Try
With New PdfStamper(pdfReaderDoc, m)
Dim txtField As iTextSharp.text.pdf.TextField
txtField = New iTextSharp.text.pdf.TextField(.Writer, newRect, newFldName)
txtField.TextColor = BaseColor.BLACK
txtField.BackgroundColor = BaseColor.WHITE
txtField.BorderColor = BaseColor.BLACK
txtField.FieldName = newFldName 'ListBox1
txtField.Alignment = 0 'LEFT
txtField.BorderStyle = 0 'SOLID
txtField.BorderWidth = 1.0F 'THIN
txtField.Visibility = TextField.VISIBLE
txtField.Rotation = 0 'None
txtField.Box = newRect '231.67, 108.0, 395.67, 197.0
Dim opt As New PdfArray
Dim ListBox_ItemDisplay As New List(Of String)
ListBox_ItemDisplay.Add("One")
ListBox_ItemDisplay.Add("Two")
ListBox_ItemDisplay.Add("Three")
ListBox_ItemDisplay.Add("Four")
ListBox_ItemDisplay.Add("Five")
Dim ListBox_ItemValue As New List(Of String)
ListBox_ItemValue.Add("1X")
ListBox_ItemValue.Add("2X")
ListBox_ItemValue.Add("3X")
ListBox_ItemValue.Add("4X")
ListBox_ItemValue.Add("5X")
txtField.Options += iTextSharp.text.pdf.TextField.MULTISELECT
Dim selIndex As New List(Of Integer)
Dim selValues As New List(Of String)
selIndex.Add(CInt(1)) ' SELECT #1 (index)
selIndex.Add(CInt(3)) ' SELECT #3 (index)
txtField.Choices = ListBox_ItemDisplay.ToArray
txtField.ChoiceExports = ListBox_ItemValue.ToArray
txtField.ChoiceSelections = selIndex
Dim listField As PdfFormField = txtField.GetListField
If Not String.IsNullOrEmpty(oldfldname & "") Then
.AcroFields.RemoveField(oldfldname, pg)
End If
.AddAnnotation(listField, pg)
.Writer.CloseStream = False
.Close()
If m.CanSeek Then
m.Position = 0
End If
b = m.ToArray
m.Close()
m.Dispose()
pdfReaderDoc.Close()
End With
Return b.ToArray
Catch ex As Exception
Err.Clear()
Finally
b = Nothing
End Try
Return Nothing
End Function
End Class
The reason why the visible list starts with the second entry is that iTextSharp starts drawing the list at the first selected entry.
This is an optimization for lists which have more (probably many more) entries than can be displayed in the fixed text box area, so that the displayed entries contain at least one interesting, i.e. selected, one.
Unfortunately this optimization does not consider whether this means leaving some lines empty at the bottom, and in case of lists which fit completely into the text box, there even aren't scroll bars or anything.
But iTextSharp also offers a way to disable this optimization: You can explicitly set the first visible item manually:
txtField.ChoiceSelections = selIndex
txtField.VisibleTopChoice = 0 ' Top visible choice is start of list!
Dim listField As PdfFormField = txtField.GetListField
Adding this middle line makes the generated appearance start at the first list value.

Loading a file into memory stream buffer and creating new file with same content and with different filename

I don't know whether it is simple or not because i am new to programming.
my requirement is : In my vb.net winform application, the filenames of the files present in "D:\Project" willbe displayed in DataGridView1 control. Now I want to load these files one after another into memory stream buffer and add the headers("ID","Name","Class") to the content in the file. Then I want to save these files in "C:\" with "_de" as suufix to the filename i.e.,sample_de.csv.
Can anyone please help me? If you need more clarity i can post it in more clear way
Many Thanks for your help in advance.
Try adapting this example to your situation:
Imports System.Text
Imports System.IO
Module Module1
Sub Main()
' Read input
Dim inputBuffer As Byte() = File.ReadAllBytes(".\input.txt")
' Manipulate the input
Dim outputBuffer As Byte() = DoSomethingWithMyBuffer(inputBuffer)
' Add headers
' There are several ecodings to choose from, make sure you are using
' the appropriate encoder for your file.
Dim outputTextFromBuffer As String = Encoding.UTF8.GetString(outputBuffer)
Dim finalOutputBuilder As StringBuilder = New StringBuilder()
finalOutputBuilder.AppendLine("""ID"",""Name"",""Class""")
finalOutputBuilder.Append(outputTextFromBuffer)
' Write output
File.WriteAllText(".\output.txt", finalOutputBuilder.ToString(), Encoding.UTF8)
End Sub
Private Function DoSomethingWithMyBuffer(inputBuffer As Byte()) As Byte()
'' Do nothing because this is just an example
Return inputBuffer
End Function
End Module