itextsharp split shared xObject streams - pdf

I am trying to split shared XObject streams (originally flatten form fields with the same content) in the PDF.
What is the correct way to do this using itextsharp? I am trying the code below but the stream is still shared in the resulting document.
Sample pdf with shared XObject streams flatten.pdf
PdfReader pdf = new PdfReader(path);
PdfStamper stamper = new PdfStamper(pdf, new FileStream("processed.pdf", FileMode.OpenOrCreate, FileAccess.ReadWrite));
EliminateSharedStreams(stamper, 1);
stamper.Close();
virtual public void EliminateSharedXObjectStreams(PdfStamper pdfStamper, int pageNum)
{
PdfReader pdfReader = pdfStamper.Reader;
PdfDictionary page = pdfReader.GetPageN(pageNum);
PdfDictionary resources = page.GetAsDict(PdfName.RESOURCES);
PdfDictionary xObjResources = resources.GetAsDict(PdfName.XOBJECT);
List<PRIndirectReference> newRefs = new List<PRIndirectReference>();
List<PdfName> newNames = new List<PdfName>();
List<PRStream> newStreams = new List<PRStream>();
IntHashtable visited = new IntHashtable();
foreach (PdfName key in xObjResources.Keys)
{
PdfStream xObj = xObjResources.GetAsStream(key);
if (xObj is PRStream && xObj.GetAsName(PdfName.SUBTYPE) != null &&
xObj.GetAsName(PdfName.SUBTYPE).CompareTo(PdfName.FORM) == 0)
{
PRIndirectReference refi = (PRIndirectReference)xObjResources.Get(key);
PRStream xFormStream = (PRStream)xObj;
if (visited.ContainsKey(refi.Number))
{
// need to duplicate
newRefs.Add(refi);
PRStream newStream = new PRStream(xFormStream, null);
newStreams.Add(newStream);
newNames.Add(key);
}
else
visited[xFormStream.ObjNum] = 1;
}
}
if (newStreams.Count == 0)
return;
PdfContentByte canvas = pdfStamper.GetOverContent(pageNum);
PdfWriter writer = pdfStamper.Writer;
for (int k = 0; k < newStreams.Count; ++k)
{
canvas.SaveState();
//add copied stream
PdfIndirectReference newRef = writer.AddToBody(newStreams[k]).IndirectReference;
//change the ref
xObjResources.Put(newNames[k], newRef);
canvas.RestoreState();
}
}

First remarks without a sample document
There are numerous reasons why your code may not work as desired. As you did not supply your sample PDF, I cannot tell which are more relevant and which are not.
You only search xobjects shared on the same page; if a xobject is once used on page one and once on page two, your code cannot identify this.
If you want to be able to find such shares, you'll at least have to use the same IntHashtable instance visited across all calls of EliminateSharedXObjectStreams for the same PdfStamper pdfStamper, e.g. by creating it once outside this method and making it a parameter of your method.
You only check for shared xobjects in the immediate page resources. But form xobjects have their own resources which can contain even more form xobject declarations.
If you want to find such shares, you'll have to recurse into the resources of your page's xobjects, those xobjects' xobjects, etc. pp.
(Strictly speaking you also have to recurse into the form xobjects of patterns and Type 3 Font glyph definitions, but these are unlikely positions to flatten form fields into.)
You only check for shared xobjects with different names. But xobjects can also be shared by referencing the same name multiple times from the same content stream.
If you want to find such shares, you have to analyse the content streams in question to find duplicate usages of the form xobject with the same name.
(By the way, doing so you may also check whether declared xobjects are used at all: if a form xobject is declared in some resources, this does not mean it is used in the context of these resources, it may be an unused resource.)
You don't mark xObjResources (if it itself is indirect) or page (otherwise) as used. If your PdfStamper pdfStamper is working in append mode, your changes may be ignored.
Solution with a sample document
After you provided the information that
It's single page document containing shared streams (xobjects with different names) in the immediate page resources. pdfStamper is not in append mode.
it turned out that the problems mentioned above are not relevant in your case. As you meanwhile also have provided an example document, I could reproduce the issue.
Indeed, your code does not split the shared XObjects. The reason is that the PdfStamper is made for manipulating the PDF in the PdfReader in the state it was in when the stamper was constructed, using stamper methods only. Your code, on the other hand, manipulates objects directly retrieved from the PdfReader after the construction of the stamper. Thus, while your new streams are added to the PDF (actually up front), the changes in the pre-existing XObject resource dictionaries don't make it to the result.
If you want to manipulate objects you retrieve from the reader, you instead should do this before creating the stamper.
This actually should suite you as your code structurally is copied from a Pdfreader method anyways, EliminateSharedStreams, which you adapted to your use case.
The only problem is that that method uses a hidden member variable of the PdfReader class. But you can access that variable bei means of reflection.
Thus, the manipulated method (working on a pure PdfReader) could look like this:
virtual public void EliminateSharedXObjectStreams(PdfReader pdfReader, int pageNum)
{
PdfDictionary page = pdfReader.GetPageN(pageNum);
PdfDictionary resources = page.GetAsDict(PdfName.RESOURCES);
PdfDictionary xObjResources = resources.GetAsDict(PdfName.XOBJECT);
List<PRIndirectReference> newRefs = new List<PRIndirectReference>();
List<PRStream> newStreams = new List<PRStream>();
IntHashtable visited = new IntHashtable();
foreach (PdfName key in xObjResources.Keys)
{
PdfStream xObj = xObjResources.GetAsStream(key);
if (xObj is PRStream && xObj.GetAsName(PdfName.SUBTYPE) != null &&
xObj.GetAsName(PdfName.SUBTYPE).CompareTo(PdfName.FORM) == 0)
{
PRIndirectReference refi = (PRIndirectReference)xObjResources.Get(key);
PRStream xFormStream = (PRStream)xObj;
if (visited.ContainsKey(refi.Number))
{
// need to duplicate
newRefs.Add(refi);
PRStream newStream = new PRStream(xFormStream, null);
newStreams.Add(newStream);
}
else
visited[xFormStream.ObjNum] = 1;
}
}
if (newStreams.Count == 0)
return;
FieldInfo xrefObjField = typeof(PdfReader).GetField("xrefObj", BindingFlags.Instance | BindingFlags.NonPublic);
List<PdfObject> xrefObj = (List<PdfObject>)xrefObjField.GetValue(pdfReader);
for (int k = 0; k < newStreams.Count; ++k)
{
xrefObj.Add(newStreams[k]);
PRIndirectReference refi = newRefs[k];
refi.SetNumber(xrefObj.Count - 1, 0);
}
}
and you can use it like this:
using (PdfReader pdfReader = new PdfReader(sourcePath))
using (Stream pdfStream = new FileStream(targetPath, FileMode.Create, FileAccess.Write))
{
EliminateSharedXObjectStreams(pdfReader, 1);
PdfStamper pdfStamper = new PdfStamper(pdfReader, pdfStream);
pdfStamper.Close();
}
in particular calling EliminateSharedXObjectStreams before constructing the PdfStamper.
If you are after a generic solution, you of course will have to extend the method to remove the restrictions observed in the first part of the answer...
Solution without reflection
The OP found out:
Manipulating PdfReader works as expected. Only thing is that instead of using xrefObj private field, the stream can be add using AddPdfObject:
for (int k = 0; k < newStreams.Count; ++k)
{
PRIndirectReference newRef = pdfReader.AddPdfObject(newStreams[k]);
PRIndirectReference refi = newRefs[k];
refi.SetNumber(newRef.Number, 0);
}
Indeed, this improves the solution substantially.

Related

iTextSharp Read Text From Single Layer of PDF

Currently I am using a custom LocationTextExtractionStrategy to extract text from a PDF that returns a TextRenderInfo[]. I would like to be able to determine if a TextRenderInfo object (or PDFString, child of TextRenderInfo) appears in a specific layer. I am not sure if this is possible. To get the layers in a PDF, I am using:
Dictionary<string,PdfLayer> layers;
using (var pdfReader = new PdfReader(src))
{
var newSrc = Path.Combine(["new file location"]);
using (var stream = new FileStream(newSrc, FileMode.Create))
{
PdfStamper stamper = new PdfStamper(pdfReader, stream);
layers = stamper.GetPdfLayers();
stamper.Close();
}
pdfReader.Close();
src = newSrc;
}
To extract the text, I am using:
var textExtractor = new TextExtractionStrategy();
PdfTextExtractor.GetTextFromPage(pdfReader, pdfPageNum,textExtractor);
List<TextRenderInfo> results = textExtractor.Results;
Is there any way that I can check if the individual TextRenderInfo results exist within the layers obtained in the first code snippet. Any help would be much appreciated.
It is possible to get the contents from a single layer, but you'll have to jump through a few hoops to work it out. Specifically, you will have to recreate some of the logic that is provided by the PdfTextExtractor and PdfReaderContentParser.
public static String GetText(PdfReader reader, int pageNumber, int streamNumber) {
var strategy = new LocationTextExtractionStrategy();
var processor = new PdfContentStreamProcessor(strategy);
var resourcesDic = pageDic.GetAsDict(PdfName.RESOURCES);
// assuming you still need to extract the page bytes
byte[] contents = GetContentBytesForPageStream(reader, pageNumber, streamNumber);
processor.ProcessContent(contents, resourcesDic);
return strategy.GetResultantText();
}
public static byte[] GetContentBytesForPageStream(PdfReader reader, int pageNumber, int streamNumber) {
PdfDictionary pageDictionary = reader.GetPageN(pageNum);
PdfObject contentObject = pageDictionary.Get(PdfName.CONTENTS);
if (contentObject == null)
return new byte[0];
byte[] contentBytes = GetContentBytesFromContentObject(contentObject, streamNumber);
return contentBytes;
}
public static byte[] GetContentBytesFromContentObject(PdfObject contentObject, int streamNumber) {
// copy-paste logic from
// ContentByteUtils.GetContentBytesFromContentObject(contentObject);
// but in case PdfObject.ARRAY: only select the streamNumber you require
}
If you're specifically looking to just use PdfTextExtractor or PdfReaderContentParser, and ask the returned TextRenderInfo for the layer it's on, then I'm not sure it will be easily possible. There are a number of problems with that:
TextRenderInfo doesn't store that information, so you'd have to subclass it (which is possible)
you'd have to rewrite the logic that creates the TextRenderInfo objects. This is possible by registering custom IContentOperator objects for all text operators (Tj, TJ, ' and ") with the PdfTextExtractor or PdfReaderContentParser
the hardest part is that you have already lost layer information in ContentByteUtils.GetContentBytesFromContentObject - so you'd need to retain that somehow, which creates its own set of problems.

ITextSharp crop PDF to remove white margins [duplicate]

I have a pdf which comprises of some data, followed by some whitespace. I don't know how large the data is, but I'd like to trim off the whitespace following the data
PdfReader reader = new PdfReader(PDFLOCATION);
Rectangle rect = new Rectangle(700, 2000);
Document document = new Document(rect);
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(SAVELCATION));
document.open();
int n = reader.getNumberOfPages();
PdfImportedPage page;
for (int i = 1; i <= n; i++) {
document.newPage();
page = writer.getImportedPage(reader, i);
Image instance = Image.getInstance(page);
document.add(instance);
}
document.close();
Is there a way to clip/trim the whitespace for each page in the new document?
This PDF contains vector graphics.
I'm usung iTextPDF, but can switch to any Java library (mavenized, Apache license preferred)
As no actual solution has been posted, here some pointers from the accompanying itext-questions mailing list thread:
As you want to merely trim pages, this is not a case of PdfWriter + getImportedPage usage but instead of PdfStamper usage. Your main code using a PdfStamper might look like this:
PdfReader reader = new PdfReader(resourceStream);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("target/test-outputs/test-trimmed-stamper.pdf"));
// Go through all pages
int n = reader.getNumberOfPages();
for (int i = 1; i <= n; i++)
{
Rectangle pageSize = reader.getPageSize(i);
Rectangle rect = getOutputPageSize(pageSize, reader, i);
PdfDictionary page = reader.getPageN(i);
page.put(PdfName.CROPBOX, new PdfArray(new float[]{rect.getLeft(), rect.getBottom(), rect.getRight(), rect.getTop()}));
stamper.markUsed(page);
}
stamper.close();
As you see I also added another argument to your getOutputPageSize method to-be. It is the page number. The amount of white space to trim might differ on different pages after all.
If the source document did not contain vector graphics, you could simply use the iText parser package classes. There even already is a TextMarginFinder based on them. In this case the getOutputPageSize method (with the additional page parameter) could look like this:
private Rectangle getOutputPageSize(Rectangle pageSize, PdfReader reader, int page) throws IOException
{
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
TextMarginFinder finder = parser.processContent(page, new TextMarginFinder());
Rectangle result = new Rectangle(finder.getLlx(), finder.getLly(), finder.getUrx(), finder.getUry());
System.out.printf("Text/bitmap boundary: %f,%f to %f, %f\n", finder.getLlx(), finder.getLly(), finder.getUrx(), finder.getUry());
return result;
}
Using this method with your file test.pdf results in:
As you see the code trims according to text (and bitmap image) content on the page.
To find the bounding box respecting vector graphics, too, you essentially have to do the same but you have to extend the parser framework used here to inform its listeners (the TextMarginFinder essentially is a listener to drawing events sent from the parser framework) about vector graphics operations, too. This is non-trivial, especially if you don't know PDF syntax by heart yet.
If your PDFs to trim are not too generic but can be forced to include some text or bitmap graphics in relevant positions, though, you could use the sample code above (probably with minor changes) anyways.
E.g. if your PDFs always start with text on top and end with text at the bottom, you could change getOutputPageSize to create the result rectangle like this:
Rectangle result = new Rectangle(pageSize.getLeft(), finder.getLly(), pageSize.getRight(), finder.getUry());
This only trims top and bottom empty space:
Depending on your input data pool and requirements this might suffice.
Or you can use some other heuristics depending on your knowledge on the input data. If you know something about the positioning of text (e.g. the heading to always be centered and some other text to always start at the left), you can easily extend the TextMarginFinder to take advantage of this knowledge.
Recent (April 2015, iText 5.5.6-SNAPSHOT) improvements
The current development version, 5.5.6-SNAPSHOT, extends the parser package to also include vector graphics parsing. This allows for an extension of iText's original TextMarginFinder class implementing the new ExtRenderListener methods like this:
#Override
public void modifyPath(PathConstructionRenderInfo renderInfo)
{
List<Vector> points = new ArrayList<Vector>();
if (renderInfo.getOperation() == PathConstructionRenderInfo.RECT)
{
float x = renderInfo.getSegmentData().get(0);
float y = renderInfo.getSegmentData().get(1);
float w = renderInfo.getSegmentData().get(2);
float h = renderInfo.getSegmentData().get(3);
points.add(new Vector(x, y, 1));
points.add(new Vector(x+w, y, 1));
points.add(new Vector(x, y+h, 1));
points.add(new Vector(x+w, y+h, 1));
}
else if (renderInfo.getSegmentData() != null)
{
for (int i = 0; i < renderInfo.getSegmentData().size()-1; i+=2)
{
points.add(new Vector(renderInfo.getSegmentData().get(i), renderInfo.getSegmentData().get(i+1), 1));
}
}
for (Vector point: points)
{
point = point.cross(renderInfo.getCtm());
Rectangle2D.Float pointRectangle = new Rectangle2D.Float(point.get(Vector.I1), point.get(Vector.I2), 0, 0);
if (currentPathRectangle == null)
currentPathRectangle = pointRectangle;
else
currentPathRectangle.add(pointRectangle);
}
}
#Override
public Path renderPath(PathPaintingRenderInfo renderInfo)
{
if (renderInfo.getOperation() != PathPaintingRenderInfo.NO_OP)
{
if (textRectangle == null)
textRectangle = currentPathRectangle;
else
textRectangle.add(currentPathRectangle);
}
currentPathRectangle = null;
return null;
}
#Override
public void clipPath(int rule)
{
}
(Full source: MarginFinder.java)
Using this class to trim the white space results in
which is pretty much what one would hope for.
Beware: The implementation above is far from optimal. It is not even correct as it includes all curve control points which is too much. Furthermore it ignores stuff like line width or wedge types. It actually merely is a proof-of-concept.
All test code is in TestTrimPdfPage.java.

Reading PDF Bookmarks in VB.NET using iTextSharp

I am making a tool that scans PDF files and searches for text in PDF bookmarks and body text. I am using Visual Studio 2008 with VB.NET with iTextSharp.
How do I load bookmarks' list from an existing PDF file?
It depends on what you understand when you say "bookmarks".
You want the outlines (the entries that are visible in the bookmarks panel):
The CreateOnlineTree examples shows you how to use the SimpleBookmark class to create an XML file containing the complete outline tree (in PDF jargon, bookmarks are called outlines).
Java:
PdfReader reader = new PdfReader(src);
List<HashMap<String, Object>> list = SimpleBookmark.getBookmark(reader);
SimpleBookmark.exportToXML(list,
new FileOutputStream(dest), "ISO8859-1", true);
reader.close();
C#:
PdfReader reader = new PdfReader(pdfIn);
var list = SimpleBookmark.GetBookmark(reader);
using (MemoryStream ms = new MemoryStream()) {
SimpleBookmark.ExportToXML(list, ms, "ISO8859-1", true);
ms.Position = 0;
using (StreamReader sr = new StreamReader(ms)) {
return sr.ReadToEnd();
}
}
The list object can also be used to examine the different bookmark elements one by one programmatically (this is all explained in the official documentation).
You want the named destinations (specific places in the document you can link to by name):
Now suppose that you meant to say named destinations, then you need the SimpleNamedDestination class as shown in the LinkActions example:
Java:
PdfReader reader = new PdfReader(src);
HashMap<String,String> map = SimpleNamedDestination.getNamedDestination(reader, false);
SimpleNamedDestination.exportToXML(map, new FileOutputStream(dest),
"ISO8859-1", true);
reader.close();
C#:
PdfReader reader = new PdfReader(src);
Dictionary<string,string> map = SimpleNamedDestination
.GetNamedDestination(reader, false);
using (MemoryStream ms = new MemoryStream()) {
SimpleNamedDestination.ExportToXML(map, ms, "ISO8859-1", true);
ms.Position = 0;
using (StreamReader sr = new StreamReader(ms)) {
return sr.ReadToEnd();
}
}
The map object can also be used to examine the different named destinations one by one programmatically. Note the Boolean parameter that is used when retrieving the named destinations. Named destinations can be stored using a PDF name object as name, or using a PDF string object. The Boolean parameter indicates whether you want the former (true = stored as PDF name objects) or the latter (false = stored as PDF string objects) type of named destinations.
Named destinations are predefined targets in a PDF file that can be found through their name. Although the official name is named destinations, some people refer to them as bookmarks too (but when we say bookmarks in the context of PDF, we usually want to refer to outlines).
If someone is still searching the vb.net solution, trying to simplify, I have a large amount of pdf created with reportbuilder and with documentmap I automatically add a bookmarks "Title". So with iTextSharp I read the pdf and extract just the first bookmark value:
Dim oReader As New iTextSharp.text.pdf.PdfReader(PdfFileName)
Dim list As Object
list = SimpleBookmark.GetBookmark(oReader)
Dim string_book As String
string_book = list(0).item("Title")
It is a little help very simple for someone searching a start point to understand how it works.

Issues with iTextsharp and pdf manipulation

I am getting a pdf-document (no password) which is generated from a third party software with javascript and a few editable fields in it. If I load this pdf-document with the pdfReader class the NumberOfPagesProperty is always 1 although the pdf-document has 17 pages. Oddly enough the document has 17 pages if I save the stream afterwards. When I now try to open the document the Acrobat Reader shows an extended feature warning and the fields are not fillable anymore (I haven't flattened the document). Do anyone know about such a problem?
Background Info:
My job is to remove the javascript code, fill out some fields and save the document afterwards.
I am using the iTextsharp version 5.5.3.0.
Unfortunately I can't upload a sample file because there are some confidental data in it.
private byte[] GetDocumentData(string documentName)
{
var document = String.Format("{0}{1}\\{2}.pdf", _component.OutputDirectory, _component.OutputFileName.Replace(".xml", ".pdf"), documentName);
if (File.Exists(document))
{
PdfReader.unethicalreading = true;
using (var originalData = new MemoryStream(File.ReadAllBytes(document)))
{
using (var updatedData = new MemoryStream())
{
var pdfTool = new PdfInserter(originalData, updatedData) {FormFlattening = false};
pdfTool.RemoveJavascript();
pdfTool.Save();
return updatedData.ToArray();
}
}
}
return null;
}
//Old version that wasn't working
public PdfInserter(Stream pdfInputStream, Stream pdfOutputStream)
{
_pdfInputStream = pdfInputStream;
_pdfOutputStream = pdfOutputStream;
_pdfReader = new PdfReader(_pdfInputStream);
_pdfStamper = new PdfStamper(_pdfReader, _pdfOutputStream);
}
//Solution
public PdfInserter(Stream pdfInputStream, Stream pdfOutputStream, char pdfVersion = '\0', bool append = true)
{
_pdfInputStream = pdfInputStream;
_pdfOutputStream = pdfOutputStream;
_pdfReader = new PdfReader(_pdfInputStream);
_pdfStamper = new PdfStamper(_pdfReader, _pdfOutputStream, pdfVersion, append);
}
public void RemoveJavascript()
{
for (int i = 0; i <= _pdfReader.XrefSize; i++)
{
PdfDictionary dictionary = _pdfReader.GetPdfObject(i) as PdfDictionary;
if (dictionary != null)
{
dictionary.Remove(PdfName.AA);
dictionary.Remove(PdfName.JS);
dictionary.Remove(PdfName.JAVASCRIPT);
}
}
}
The extended feature warning is a hint that the original PDF had been signed using a usage rights signature to "Reader-enable" it, i.e. to tell the Adobe Reader to activate some additional features when opening it, and the OP's operation on it has invalidated the signature.
Indeed, he operated using
_pdfStamper = new PdfStamper(_pdfReader, _pdfOutputStream);
which creates a PdfStamper which completely re-generates the document. To not invalidate the signature, though, one has to use append mode as in the OP's fixed code (for char pdfVersion = '\0', bool append = true):
_pdfStamper = new PdfStamper(_pdfReader, _pdfOutputStream, pdfVersion, append);
If I load this pdf-document with the pdfReader class the NumberOfPagesProperty is always 1 although the pdf-document has 17 pages. Oddly enough the document has 17 pages
Quite likely it is a PDF with a XFA form, i.e. the PDF is only a carrier of some XFA data from which Adobe Reader builds those 17 pages. The actual PDF in that case usually only contains one page saying something like "if you see this, your viewer does not support XFA."
For a final verdict, though, one has to inspect the PDF.

Existing PDF to PDF/A "conversion"

I am trying to make an existing pdf into pdf/a-1b. I understand that itext cannot convert a pdf to pdf/a in the sense making it pdf/a compliant. But it definitely can flag the document as pdf/a. However, I looked at various examples and I cannot seem to figure out how to do it. The major problem is that
writer.PDFXConformance = PdfWriter.PDFA1B;
does not work anymore. First PDFA1B is not recognized, second, pdfwriter seems to have been rewritten and there is not much information about that.
It seems the only (in itext java version) way is:
PdfAWriter writer = PdfAWriter.getInstance(document, new FileOutputStream(filename), PdfAConformanceLevel.PDF_A_1B);
But that requires a document type, ie. it can be used when creating a pdf from scratch.
Can someone give an example of pdf to pdf/a conversion with the current version of itextsharp?
Thank you.
I can't imagine a valid reason for doing this but apparently you have one.
The conformance settings in iText are intended to be used with a PdfWriter and that object is (generally) only intended to be used with new documents. Since iText was never intended to convert documents to conformance that's just the way it was built.
To do what you want to do you could either just open the original document and update the appropriate tags in the document's dictionary or you could create a new document with the appropriate entries set and then import your old document. The below code shows the latter route, it first creates a regular non-conforming PDF and then creates a second document that says it is conforming even though it may or may not. See the code comments for more details. This targets iTextSharp 5.4.2.0.
//Folder that we're working from
var workingFolder = Environment.GetFolderPath(Environment.SpecialFolder.Desktop);
//Create a regular non-conformant PDF, nothing special below
var RegularPdf = Path.Combine(workingFolder, "File1.pdf");
using (var fs = new FileStream(RegularPdf, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
using (var writer = PdfWriter.GetInstance(doc, fs)) {
doc.Open();
doc.Add(new Paragraph("Hello world!"));
doc.Close();
}
}
}
//Create our conformant document from the above file
var ConformantPdf = Path.Combine(workingFolder, "File2.pdf");
using (var fs = new FileStream(ConformantPdf, FileMode.Create, FileAccess.Write, FileShare.None)) {
using (var doc = new Document()) {
//Use PdfSmartCopy to get every page
using (var copy = new PdfSmartCopy(doc, fs)) {
//Set our conformance levels
copy.SetPdfVersion(PdfWriter.PDF_VERSION_1_3);
copy.PDFXConformance = PdfWriter.PDFX1A2001;
//Open our new document for writing
doc.Open();
//Bring in every page from the old PDF
using (var r = new PdfReader(RegularPdf)) {
for (var i = 1; i <= r.NumberOfPages; i++) {
copy.AddPage(copy.GetImportedPage(r, i));
}
}
//Close up
doc.Close();
}
}
}
Just to be 100% clear, this WILL NOT MAKE A CONFORMANT PDF, just a document that says it conforms.