How can I test if a PDF document is PDF/A compliant using iTextSharp? - pdf

I have a existing PDF file and with iTextSharp I want to test if it is PDF/A compliant.
I don't want convert or create a file, just read and check if it is a PDF/A.
I have not tried anything because I did not find any methods or properties of the class PdfReader of iTextSharp, saying that the PDF is PDF/A. For now it would be enough to know how to verify that the document claims to be PDF/A compatible
Thanks
Antonio

After a long search i tried this way and seems to work:
Dim reader As iTextSharp.text.pdf.PdfReader = New iTextSharp.text.pdf.PdfReader(sFilePdf)
Dim yMetadata As Byte() = reader.Metadata()
Dim bPDFA As Boolean = False
If Not yMetadata Is Nothing Then
Dim sXmlMetadata = System.Text.ASCIIEncoding.Default.GetString(yMetadata)
Dim xmlDoc As Xml.XmlDocument = New Xml.XmlDocument()
xmlDoc.LoadXml(sXmlMetadata)
Dim nodes As Xml.XmlNodeList = xmlDoc.GetElementsByTagName("pdfaid:conformance")
If nodes.Item(0).FirstChild.Value.ToUpper = "A" Then
bPDFA = True
End If
End If
Return bPDFA
I also found some reference to the class XmpReader, but not sufficient to do what I wanted

Related

Stamp rotated text using itext7 in vb.net

I'm trying to convert some itextsharp code to use itext7 which stamps text on each page of a pdf at rotate 90 degrees. Unfortunately all the examples I can find are in c# and while I can use an online translator I'm having difficulties with this one.
The below code stamps my text on at the specified coords on each page of a given pdf:
Shared Sub itext7_stamp_text_on_pdf(mypdfname As String, myfoldername As String)
Dim src As String = myfoldername & "\" & mypdfname
Dim dest As String = myfoldername & "\Stamped " & mypdfname
Dim pdfDoc As PdfDocument = New PdfDocument(New PdfReader(src), New PdfWriter(dest))
Dim document As Document = New Document(pdfDoc)
Dim canvas As PdfCanvas
Dim n As Integer = pdfDoc.GetNumberOfPages()
For i = 1 To n
Dim page As PdfPage = pdfDoc.GetPage(i)
canvas = New PdfCanvas(page)
With canvas
.SetFontAndSize(PdfFontFactory.CreateFont(StandardFonts.HELVETICA), 12)
.BeginText()
.MoveText(100, 100)
.ShowText("SAMPLE TEXT 100,100")
.EndText()
End With
Next
pdfDoc.Close()
End Sub
... but I can't see a way of rotating it to 90 degrees.
There's an example here if you use a paragraph:
https://kb.itextpdf.com/home/it7kb/examples/itext-7-building-blocks-chapter-2-rootelement-examples#iText7BuildingBlocksChapter2:RootElementexamples-c02e14_showtextaligned
... but I can't seem to translate this to vb.net. I can specify where the errors I get are, but I thought I'd be better asking this general question first in case there's a way to do this without using a paragraph.
Can anyone help please?
Thanks!
Well, after some more digging this code seems to work OK on the rotation part:
Dim pdf As New PdfDocument(New PdfReader(inpdf), New PdfWriter(outpdf))
Dim document As New Document(pdf)
document.ShowTextAligned("This is some test text", 400, 750, TextAlignment.CENTER, VerticalAlignment.MIDDLE, 0.5F * CSng(Math.PI))
document.Close()
End Sub
.... but it gets hidden behind existing content, so I need a way to make sure it's set to over content.

How do I know if the PdfTextExtractor produced reliable results?

I am using the below code to extract text from PDFs for book keeping purposes.
How would I know if the PDF was "well readable" and produced accurate results or if produced "garbage" output which would require using an OCR solution?
Currently I have to inspect each results manually and see if it resulted in
"Iin voicE #Ajk 932 2"
or
"Invoice #8793201".
Using nReader As iTextSharp.text.pdf.PdfReader = New iTextSharp.text.pdf.PdfReader(fileName)
For page As Integer = 1 To nReader.NumberOfPages
Dim strategy As New iTextSharp.text.pdf.parser.SimpleTextExtractionStrategy
Dim currentText As String = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(nReader, page, strategy)
currentText = Encoding.UTF8.GetString(ASCIIEncoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(currentText)))
sb.Append(currentText)
Next
nReader.Close()
End Using

Attach an XML file to a PDF using Access 2016 VBA

I need to insert an XML file inside a PDF (as attachment) using Access VBA and I would like to find a solution without buying Acrobat.
Then I need to sing it digitally PAdES BES and save the file in PDF/A-3.
Is there some ideas or better some code?
Thanks.
You can use our DynamicPDF Core Suite for COM/ActiveX v11 to achieve this. It supports embedding files in a PDF and digitally signing it with PAdES. Here is a VBA sample:
Dim MyDocument As DynamicPDF.Document
Dim MyPage As DynamicPDF.Page
Set MyDocument = New DynamicPDF.Document
MyDocument.LoadPdf ("C:\Input.pdf")
Set MyPage = MyDocument.GetPage(1)
Dim MyXmp As DynamicPDF.XmpMetaData
Set MyXmp = MyDocument.SetXmpMetaData
MyXmp.AddPdfASchema DPDF_PdfAStandard_PDF_A_3a
Dim MyEmbeddedFile
Set MyEmbeddedFile = MyDocument.AddEmbeddedFile("C:\XMLFile.xml", "")
MyEmbeddedFile.Relation = DPDF_EmbeddedFileRelation_Source
MyEmbeddedFile.MimeType = "application/xml"
Dim MySignature As DynamicPDF.Signature
Set MySignature = MyPage.AddSignature("SigField", 10, 10, 250, 100)
MySignature.Visible = True
Dim MyCertificate As DynamicPDF.Certificate
Set MyCertificate = MyDocument.DigitalSignatures.GetCertificate("C:\JohnDoe.pfx", "password")
MyCertificate.SignatureType = DPDF_SignatureType_PAdESBasic
MyDocument.Sign "SigField", MyCertificate
MyDocument.DrawToFile ("C:\Output.pdf")
You can find more information about DynamicPDF Core Suite for COM/ActiveX at the link below:
https://www.dynamicpdf.com/PDF-Suite-COM.aspx
Please note that you will need to use version 11 (currently available as a BETA version) to achieve this:
https://www.dynamicpdf.com/beta.aspx

PDF fill in not merging correctly

We are using an asp.net website with iTextSharp.dll version 5.5.13
We can merge multiple PDF files into one using a stream and it works perfectly. However, when we use a PDF that was created in a "fill-in" function the new PDF file does not correctly merge into the other documents. It merges without the filled in values. However, if I open the filled in PDF that it creates the filled in data displays and prints fine.
I have tried merging the new "filled in" PDF at a later time but it still displays the template file as though the filled in data was missing.
Below code fills in the data
Dim strFileName As String = Path.GetFileNameWithoutExtension(strSourceFile)
Dim strOutPath As String = HttpContext.Current.Server.MapPath("~/Apps/Lifetime/OfficeDocs/Export/")
newFile = strOutPath & strFileName & " " & strRONumber & ".pdf"
If Not File.Exists(newFile) Then
Dim pdfReader As PdfReader = New PdfReader(strSourceFile)
Using pdfStamper As PdfStamper = New PdfStamper(pdfReader, New FileStream(newFile, FileMode.Create))
Dim pdfFormFields As AcroFields = pdfStamper.AcroFields
pdfFormFields.SetField("CUSTOMER NAME", strCustomer)
pdfFormFields.SetField("YR MK MODEL", strVehicle)
pdfFormFields.SetField("RO#", strRONumber)
pdfStamper.FormFlattening = False
pdfStamper.Dispose()
End Using
End If
Then code below here merges multiple PDF files/paths sent to it
Public Shared Sub MergePDFs(ByVal files As List(Of String), ByVal filename As String)
'Gets a list of full path files and merges into one memory stream
'and outputs it to a browser response.
Dim MemStream As New System.IO.MemoryStream
Dim doc As New iTextSharp.text.Document
Dim reader As iTextSharp.text.pdf.PdfReader
Dim numberOfPages As Integer
Dim currentPageNumber As Integer
Dim writer As iTextSharp.text.pdf.PdfWriter = iTextSharp.text.pdf.PdfWriter.GetInstance(doc, MemStream)
doc.Open()
Dim cb As iTextSharp.text.pdf.PdfContentByte = writer.DirectContent
Dim page As iTextSharp.text.pdf.PdfImportedPage
Dim strError As String = ""
For Each strfile As String In files
reader = New iTextSharp.text.pdf.PdfReader(strfile)
numberOfPages = reader.NumberOfPages
currentPageNumber = 0
Do While (currentPageNumber < numberOfPages)
currentPageNumber += 1
doc.SetPageSize(reader.GetPageSizeWithRotation(currentPageNumber))
doc.NewPage()
page = writer.GetImportedPage(reader, currentPageNumber)
cb.AddTemplate(page, 0, 0)
Loop
Next
doc.Close()
doc.Dispose()
If MemStream Is Nothing Then
HttpContext.Current.Response.Write("No Data is available for output")
Else
HttpContext.Current.Response.Clear()
HttpContext.Current.Response.ContentType = "application/pdf"
HttpContext.Current.Response.AppendHeader("Content-Disposition", "inline; filename=" + filename)
HttpContext.Current.Response.BinaryWrite(MemStream.ToArray)
HttpContext.Current.Response.OutputStream.Flush()
HttpContext.Current.Response.OutputStream.Close()
HttpContext.Current.Response.OutputStream.Dispose()
MemStream.Close()
MemStream.Dispose()
End If
End Sub
I expect the "filled in" PDF in the list of files to retain the filled in data but it does not. Even if I try to merge the filled in file later it still comes up missing the filled in data. If I print the filled in file it looks perfect.
PdfWriter.GetImportedPage only returns you a copy of the page contents. This does not include any annotations, in particular not the widget annotations of form fields on the page at hand.
To also copy the annotations of the source pages, use the iText PdfCopy class instead. This class is designed to copy pages including all annotations. Furthermore, it includes methods to copy all pages of a source document in one step.
You have to tell the PdfCopy object to merge fields, otherwise the document-wide form structure won't be built.
As an aside, your code creates many PdfReader objects but does not close them. That may increase your memory requirements substantially.
Thus:
Public Shared Sub MergePDFsImproved(ByVal files As List(Of String), ByVal filename As String)
Using mem As New MemoryStream()
Dim readers As New List(Of PdfReader)
Using doc As New Document
Dim copy As New PdfCopy(doc, mem)
copy.SetMergeFields()
doc.Open()
For Each strfile As String In files
Dim reader As New PdfReader(strfile)
copy.AddDocument(reader)
readers.Add(reader)
Next
End Using
For Each reader As PdfReader In readers
reader.Close()
Next
HttpContext.Current.Response.Clear()
HttpContext.Current.Response.ContentType = "application/pdf"
HttpContext.Current.Response.AppendHeader("Content-Disposition", "inline; filename=" + filename)
HttpContext.Current.Response.BinaryWrite(mem.ToArray)
HttpContext.Current.Response.OutputStream.Flush()
HttpContext.Current.Response.OutputStream.Close()
HttpContext.Current.Response.OutputStream.Dispose()
End Using
End Sub
Actually I'm not sure whether it is a good idea to Close and Dispose the response output stream here, that shouldn't be the responsibility of a PDF merging method.
This is a related answer for the Java version of iText; you may want to read it for additional information. Unfortunately many links in that answer meanwhile are dead.

Render Html using RDL Local Report

I need to render html from rdl reports using the LocalReport class, I dont want to use ReportViewer for the same. Is there any way i can enable generating HTML.
As far as I know LocalReport cannot be exported to HTML (only Excel,Word and PDF are available). But if you are still interested in export you can use following
Dim Report = New LocalReport
prepare report the same way as for viewing (Datasource for RDL reports with ReportViewer)
Dim warnings As Warning() = Nothing
Dim streamids As String() = Nothing
Dim mimeType As String = Nothing
Dim encoding As String = Nothing
Dim extension As String = Nothing
Dim bytes As Byte() = Nothing
bytes = Report.Render(RenderFormat, Nothing, mimeType, encoding, extension, streamids, warnings)
Using fs As New IO.FileStream(RepPath, IO.FileMode.Create)
fs.Write(bytes, 0, bytes.Length)
fs.Close()
ReDim bytes(0)
end Using
You can get list of available extensions with Report.ListRenderingExtensions
ServerReport solution is similar, but more possible export formats is available.