Attachments not showing up in pdf document - created using pdfbox - pdf

I m trying to attach an swf file to a pdf document. Below is my code (excerpted from the pdfbox-examples). while i can see that the file is attached based on the size of the file - with & without the attachment, I can't see / locate it in the pdf document. I do see textual content correctly displayed. Can someone tell me what I m doing wrong & help me fix the issue?
doc = new PDDocument();
PDPage page = new PDPage();
doc.addPage( page );
PDFont font = PDType1Font.HELVETICA_BOLD;
String inputFileName = "sample.swf";
InputStream fileInputStream = new FileInputStream(new File(inputFileName));
PDEmbeddedFile ef = new PDEmbeddedFile(doc, fileInputStream );
PDPageContentStream contentStream = new PDPageContentStream(doc, page,true,true);
//embedded files are stored in a named tree
PDEmbeddedFilesNameTreeNode efTree = new PDEmbeddedFilesNameTreeNode();
//first create the file specification, which holds the embedded file
PDComplexFileSpecification fs = new PDComplexFileSpecification();
fs.setEmbeddedFile(ef);
//now lets some of the optional parameters
ef.setSubtype( "swf" );
ef.setCreationDate( new GregorianCalendar() );
//now add the entry to the embedded file tree and set in the document.
Map<String, COSObjectable> efMap = new HashMap<String, COSObjectable>();
efMap.put("My first attachment", fs );
efTree.setNames( efMap );
//attachments are stored as part of the "names" dictionary in the document catalog
PDDocumentNameDictionary names = new PDDocumentNameDictionary( doc.getDocumentCatalog() );
names.setEmbeddedFiles( efTree );
doc.getDocumentCatalog().setNames( names );

After struggling with the same thing, I've discovered this is a known issue. Attachments haven't worked for a while I guess.
Here's a link to the issue on the apache forum.
There is a hack suggested here that you can use.
I tried it and it worked!
the other work around i found is after you call setNames on your PDEmbeddedFilesNameTreeNode remove the limits: ((COSDictionary
)efTree.getCOSObject()).removeItem(COSName.LIMITS); ugly hack, but it
works, without having to recompile pdfbox

Attachment works fine with new version of PDFBox 2.0,
public static boolean addAtachement(final String fileName, final String... attachements) {
if (Objects.isNull(fileName)) {
throw new NullPointerException("fileName shouldn't be null");
}
if (Objects.isNull(attachements)) {
throw new NullPointerException("attachements shouldn't be null");
}
Map<String, PDComplexFileSpecification> efMap = new HashMap<String, PDComplexFileSpecification>();
/*
* Load PDF Document.
*/
try (PDDocument doc = PDDocument.load(new File(fileName))) {
/*
* Attachments are stored as part of the "names" dictionary in the
* document catalog
*/
PDDocumentNameDictionary names = new PDDocumentNameDictionary(doc.getDocumentCatalog());
/*
* First we need to get all the existed attachments, after that we
* can add new attachments
*/
PDEmbeddedFilesNameTreeNode efTree = names.getEmbeddedFiles();
if (Objects.isNull(efTree)) {
efTree = new PDEmbeddedFilesNameTreeNode();
}
Map<String, PDComplexFileSpecification> existedNames = efTree.getNames();
if (existedNames == null || existedNames.isEmpty()) {
existedNames = new HashMap<String, PDComplexFileSpecification>();
}
for (String attachement : attachements) {
/*
* Create the file specification, which holds the embedded file
*/
PDComplexFileSpecification fs = new PDComplexFileSpecification();
fs.setFile(attachement);
try (InputStream is = new FileInputStream(attachement)) {
/*
* This represents an embedded file in a file specification
*/
PDEmbeddedFile ef = new PDEmbeddedFile(doc, is);
/* Set some relevant properties of embedded file */
ef.setCreationDate(new GregorianCalendar());
fs.setEmbeddedFile(ef);
/*
* now add the entry to the embedded file tree and set in
* the document.
*/
efMap.put(attachement, fs);
}
}
efTree.setNames(efMap);
names.setEmbeddedFiles(efTree);
doc.getDocumentCatalog().setNames(names);
doc.save(fileName);
return true;
} catch (IOException e) {
System.out.println(e.getMessage());
return false;
}
}

To 'locate' or see an attached file in the PDF, you can't flip through its pages to find any trace of it there (like, an annotation).
In Acrobat Reader 9.x for example, you have to click on the "View Attachments" icon (looking like a paper-clip) on the left sidebar.

Related

allow arabic text in pdf table using itext7 (xamarin android)

I have to put my list data in a table in a pdf file. My data has some Arabic words. When my pdf is generated, the Arabic words don't appear. I searched and found that I need itext7.pdfcalligraph so I installed it in my app. I found this code too https://itextpdf.com/en/blog/technical-notes/displaying-text-different-languages-single-pdf-document and tried to do something similar to allow Arabic words in my table but I couldn't figure it out.
This is a trial code before I apply it to my real list:
var path2 = global::Android.OS.Environment.ExternalStorageDirectory.AbsolutePath;
filePath = System.IO.Path.Combine(path2.ToString(), "myfile2.pdf");
stream = new FileStream(filePath, FileMode.Create);
PdfWriter writer = new PdfWriter(stream);
PdfDocument pdf2 = new iText.Kernel.Pdf.PdfDocument(writer);
Document document = new Document(pdf2, PageSize.A4);
FontSet set = new FontSet();
set.AddFont("ARIAL.TTF");
document.SetFontProvider(new FontProvider(set));
document.SetProperty(Property.FONT, "Arial");
string[] sources = new string[] { "يوم","شهر 2020" };
iText.Layout.Element.Table table = new iText.Layout.Element.Table(2, false);
foreach (string source in sources)
{
Paragraph paragraph = new Paragraph();
Bidi bidi = new Bidi(source, Bidi.DirectionDefaultLeftToRight);
if (bidi.BaseLevel != 0)
{
paragraph.SetTextAlignment(iText.Layout.Properties.TextAlignment.RIGHT);
}
paragraph.Add(source);
table.AddCell(new Cell(1, 1).SetTextAlignment(iText.Layout.Properties.TextAlignment.CENTER).Add(paragraph));
}
document.Add(table);
document.Close();
I updated my code and added the arial.ttf to my assets folder . i'm getting the following exception:
System.InvalidOperationException: 'FontProvider and FontSet are empty. Cannot resolve font family name (see ElementPropertyContainer#setFontFamily) without initialized FontProvider (see RootElement#setFontProvider).'
and I still can't figure it out. any ideas?
thanks in advance
- C #
I have a similar situation for Turkish characters, and I've followed these
steps :
Create a folder under projects root folder which is : /wwwroot/Fonts
Add OpenSans-Regular.ttf under the Fonts folder
Path for font is => ../wwwroot/Fonts/OpenSans-Regular.ttf
Create font like below :
public static PdfFont CreateOpenSansRegularFont()
{
var path = "{Your absolute path for FONT}";
return PdfFontFactory.CreateFont(path, PdfEncodings.IDENTITY_H, true);
}
and use it like :
paragraph.Add(source)
.SetFont(FontFactory.CreateOpenSansRegularFont()); //set font in here
table.AddCell(new Cell(1, 1)
.SetTextAlignment(iText.Layout.Properties.TextAlignment.CENTER)
.Add(paragraph));
This is how I used font factory for Turkish characters ex: "ü,i,ç,ş,ö"
For Xamarin-Android, you could try
string documentsPath = Environment.GetFolderPath(Environment.SpecialFolder.MyDocuments);
var path = Path.Combine(documentsPath, "Fonts/Arial.ttf");
Look i fixed it in java,this may help you:
String font = "your Arabic font";
//the magic is in the next 4 lines:
PdfFontFactory.register(font);
FontProgram fontProgram = FontProgramFactory.createFont(font, true);
PdfFont f = PdfFontFactory.createFont(fontProgram, PdfEncodings.IDENTITY_H);
LanguageProcessor languageProcessor = new ArabicLigaturizer();
//and look here how i used setBaseDirection and don't use TextAlignment ,it will work without it
com.itextpdf.kernel.pdf.PdfDocument tempPdfDoc = new com.itextpdf.kernel.pdf.PdfDocument(new PdfReader(pdfFile.getPath()), TempWriter);
com.itextpdf.layout.Document TempDoc = new com.itextpdf.layout.Document(tempPdfDoc);
com.itextpdf.layout.element.Paragraph paragraph0 = new com.itextpdf.layout.element.Paragraph(languageProcessor.process("الاستماره الالكترونية--الاستماره الالكترونية--الاستماره الالكترونية--الاستماره الالكترونية"))
.setFont(f).setBaseDirection(BaseDirection.RIGHT_TO_LEFT)
.setFontSize(15);

Multi line form filling with iTextSharp

I'm tring to fill a PDF file that has some different form fields, one of those field is a multi line field. I load the values from a txt file (I have to repeat this operation in batch so I've a single txt file for each pdf I've to fill)
My problem is with the \r\n present in the txt file... if I pass on each formfield it writes something as "My text\r\n on new line").
I've tried to use a stringbuilder and .AppendLine method and if I save it's content it correctly shows
My text
on new line
my input file is defined as
<map>
<campoModulo>lo_some_data</campoModulo>
<value>My text\r\non new line</value>
</map>
and here's the code I use to fill the pdf
internal void Process(string p)
{
string output = p.Replace(".pdf", "_new.pdf");
PdfReader reader = new PdfReader(p);
XmlDocument document = new XmlDocument();
document.Load("XML\\mapping.xml");
var items = document.SelectNodes("//map");
PdfStamper pdfStamper = new PdfStamper(reader, new FileStream(
output, FileMode.Create));
var acro = pdfStamper.AcroFields;
if (items.Count > 0)
{
foreach (XmlNode item in items)
{
var campo = item.SelectSingleNode("campoModulo").InnerText;
var valore = item.SelectSingleNode("value").InnerText;
acro.SetField(campo, valore);
}
}
pdfStamper.FormFlattening = true;
pdfStamper.Close();
}
What am I doing wrong?
You have to replace the \r\n for the actual control codes. Try:
text = text.Replace(#"\r\n", "\r\n");

Split a "tagged" PDF document into multiple documents, keeping the tagging

In a project I have to split a PDF document into two documents, one containing all blank pages, and one containing all pages with content.
For this job, I use a PdfReader to read the source file, and two pdfCopy objects (one for the blank pages document, one for the pages with content document) to write the files to.
I use GetImportedPage to read a PdfImportedPage, which is then added to one of the PdfCopy writers.
Now, the problem is the following: the source file is using the "tagged PDF format". To preserve this (which is absolutely required), I use the SetTagged() method on both PdfCopy writers, and use the extra third parameter in GetImportedPage(...) to keep the tagged format. However, when calling the AddPage(...) on the PdfCopy writer, I get an invalid cast exception:
"Unable to cast object of type 'iTextSharp.text.pdf.PdfDictionary' to type 'iTextSharp.text.pdf.PRIndirectReference'."
Anyone has any ideas on how to solve this ? Any hints ?
Also: the project currently refers version 5.1.0.0 of the itext libraries. In 5.4.4.0 the third parameter to GetImportedPage does not seem to be there anymore.
Below, you can find a code extract:
iTextSharp.text.Document targetPdf = new iTextSharp.text.Document();
iTextSharp.text.Document blankPdf = new iTextSharp.text.Document();
iTextSharp.text.pdf.PdfReader sourcePdfReader = new iTextSharp.text.pdf.PdfReader(inputFile);
iTextSharp.text.pdf.PdfCopy targetPdfWriter = new iTextSharp.text.pdf.PdfSmartCopy(targetPdf, new FileStream(outputFile, FileMode.Create));
iTextSharp.text.pdf.PdfCopy blankPdfWriter = new iTextSharp.text.pdf.PdfSmartCopy(blankPdf, new FileStream(blanksFile, FileMode.Append));
targetPdfWriter.SetTagged();
blankPdfWriter.SetTagged();
try
{
iTextSharp.text.pdf.PdfImportedPage page = null;
int n = sourcePdfReader.NumberOfPages;
targetPdf.Open();
blankPdf.Open();
blankPdf.Add(new iTextSharp.text.Phrase("This document contains the blank pages removed from " + inputFile));
blankPdf.NewPage();
for (int i = 1; i <= n; i++)
{
byte[] pageBytes = sourcePdfReader.GetPageContent(i);
string pageText = "";
iTextSharp.text.pdf.PRTokeniser token = new iTextSharp.text.pdf.PRTokeniser(new iTextSharp.text.pdf.RandomAccessFileOrArray(pageBytes));
while (token.NextToken())
{
if (token.TokenType == iTextSharp.text.pdf.PRTokeniser.TokType.STRING)
{
pageText += token.StringValue;
}
}
if (pageText.Length >= 15)
{
page = targetPdfWriter.GetImportedPage(sourcePdfReader, i, true);
targetPdfWriter.AddPage(page);
}
else
{
page = blankPdfWriter.GetImportedPage(sourcePdfReader, i, true);
blankPdfWriter.AddPage(page);
blankPageCount++;
}
}
}
catch (Exception ex)
{
Console.WriteLine("Exception at LOC1: " + ex.Message);
}
The error occurs in the call to targetPdfWriter.AddPage(page); near the end of the code sample.
Thank you very much for your help.
Koen.

Split PDF into separate files based on text

I have a large single pdf document which consists of multiple records. Each record usually takes one page however some use 2 pages. A record starts with a defined text, always the same.
My goal is to split this pdf into separate pdfs and the split should happen always before the "header text" is found.
Note: I am looking for a tool or library using java or python. Must be free and available on Win 7.
Any ideas? AFAIK imagemagick won't work for this. May itext do this? I never used and it's
pretty complex so would need some hints.
EDIT:
Marked Answer led me to solution. For completeness here my exact implementation:
public void splitByRegex(String filePath, String regex,
String destinationDirectory, boolean removeBlankPages) throws IOException,
DocumentException {
logger.entry(filePath, regex, destinationDirectory);
destinationDirectory = destinationDirectory == null ? "" : destinationDirectory;
PdfReader reader = null;
Document document = null;
PdfCopy copy = null;
Pattern pattern = Pattern.compile(regex);
try {
reader = new PdfReader(filePath);
final String RESULT = destinationDirectory + "/record%d.pdf";
// loop over all the pages in the original PDF
int n = reader.getNumberOfPages();
for (int i = 1; i < n; i++) {
final String text = PdfTextExtractor.getTextFromPage(reader, i);
if (pattern.matcher(text).find()) {
if (document != null && document.isOpen()) {
logger.debug("Match found. Closing previous Document..");
document.close();
}
String fileName = String.format(RESULT, i);
logger.debug("Match found. Creating new Document " + fileName + "...");
document = new Document();
copy = new PdfCopy(document,
new FileOutputStream(fileName));
document.open();
logger.debug("Adding page to Document...");
copy.addPage(copy.getImportedPage(reader, i));
} else if (document != null && document.isOpen()) {
logger.debug("Found Open Document. Adding additonal page to Document...");
if (removeBlankPages && !isBlankPage(reader, i)){
copy.addPage(copy.getImportedPage(reader, i));
}
}
}
logger.exit();
} finally {
if (document != null && document.isOpen()) {
document.close();
}
if (reader != null) {
reader.close();
}
}
}
private boolean isBlankPage(PdfReader reader, int pageNumber)
throws IOException {
// see http://itext-general.2136553.n4.nabble.com/Detecting-blank-pages-td2144877.html
PdfDictionary pageDict = reader.getPageN(pageNumber);
// We need to examine the resource dictionary for /Font or
// /XObject keys. If either are present, they're almost
// certainly actually used on the page -> not blank.
PdfDictionary resDict = (PdfDictionary) pageDict.get(PdfName.RESOURCES);
if (resDict != null) {
return resDict.get(PdfName.FONT) == null
&& resDict.get(PdfName.XOBJECT) == null;
} else {
return true;
}
}
You can create a tool for your requirements using iText.
Whenever you are looking for code samples concerning (current versions of) the iText library, you should consult iText in Action — 2nd Edition the code samples from which are online and searchable by keyword from here.
In your case the relevant samples are Burst.java and ExtractPageContentSorted2.java.
Burst.java shows how to split one PDF in multiple smaller PDFs. The central code:
PdfReader reader = new PdfReader("allrecords.pdf");
final String RESULT = "record%d.pdf";
// We'll create as many new PDFs as there are pages
Document document;
PdfCopy copy;
// loop over all the pages in the original PDF
int n = reader.getNumberOfPages();
for (int i = 0; i < n; ) {
// step 1
document = new Document();
// step 2
copy = new PdfCopy(document,
new FileOutputStream(String.format(RESULT, ++i)));
// step 3
document.open();
// step 4
copy.addPage(copy.getImportedPage(reader, i));
// step 5
document.close();
}
reader.close();
This sample splits a PDF in single-page PDFs. In your case you need to split by different criteria. But that only means that in the loop you sometimes have to add more than one imported page (and thus decouple loop index and page numbers to import).
To recognize on which pages a new dataset starts, be inspired by ExtractPageContentSorted2.java. This sample shows how to parse the text content of a page to a string. The central code:
PdfReader reader = new PdfReader("allrecords.pdf");
for (int i = 1; i <= reader.getNumberOfPages(); i++) {
System.out.println("\nPage " + i);
System.out.println(PdfTextExtractor.getTextFromPage(reader, i));
}
reader.close();
Simply search for the record start text: If the text from page contains it, a new record starts there.
Apache PDFBox has a PDFSplit utility that you can run from the command-line.
If you like Python, there's a nice library: PyPDF2. The library is pure python2, BSD-like license.
Sample code:
from PyPDF2 import PdfFileWriter, PdfFileReader
input1 = PdfFileReader(open("C:\\Users\\Jarek\\Documents\\x.pdf", "rb"))
# analyze pdf data
print input1.getDocumentInfo()
print input1.getNumPages()
text = input1.getPage(0).extractText()
print text.encode("windows-1250", errors='backslashreplacee')
# create output document
output = PdfFileWriter()
output.addPage(input1.getPage(0))
fout = open("c:\\temp\\1\\y.pdf", "wb")
output.write(fout)
fout.close()
For non coders PDF Content Split is probably the easiest way without reinventing the wheel and has an easy to use interface: http://www.traction-software.co.uk/pdfcontentsplitsa/index.html
hope that helps.

edit any file which is wrapped in the jar file

I want to implement Following stuff with my java code in eclipse.
i need to edit the .dict file which is in directory of jar file.
my directory structure is like
C:\Users\bhavik.kama\Desktop\Sphinx\sphinx4-1.0beta6-bin\sphinx4-1.0beta6\modified_jar_dict\*WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar*\dict\**cmudict04.dict**
Text with bold character is my text file name which i want to edit
and text with italic foramt is my .jar file
now how can i edit this cmudict04.dict file which is reside in WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar\dict\ directory on runtime with java application.
and i want the jar file with the updated file i have edited.
please can u provide me any help?
thnank you in advance.
I would recommend to use java.util.zip.Using these classes you can read and write the files inside the archive .But modifying the contents is not guaranteed because it may be cached.
Sample tutorial
http://www.javaworld.com/community/node/8362
You can't edit files that are contained in a Jar file and have it saved in the Jar file ... Without, extracting the file first, updating it and creating a new Jar by copying the contents of the old one over to the new one, deleting the old one and renaming the new one in its place...
My suggestion is find a better solution
I had succeded to edit jar file and wrap it back as it is...with the following code
public void run() throws IOException
{
Manifest manifest = new Manifest();
manifest.getMainAttributes().put(Attributes.Name.MANIFEST_VERSION, "1.0");
// JarOutputStream target = new JarOutputStream(new FileOutputStream("E:\\hiren1\\WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar"), manifest);
// add(new File("E:\\hiren1\\WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/"), target);
JarOutputStream target = new JarOutputStream(new FileOutputStream("C:\\Users\\bhavik.kama\\Desktop\\Sphinx\\sphinx4-1.0beta6-bin\\sphinx4-1.0beta6\\modified_jar_dict\\WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz.jar"), manifest);
add(new File("C:\\Users\\bhavik.kama\\Desktop\\Sphinx\\sphinx4-1.0beta6-bin\\sphinx4-1.0beta6\\modified_jar_dict\\WSJ_8gau_13dCep_16k_40mel_130Hz_6800Hz/"), target);
target.close();
}
private void add(File source, JarOutputStream target) throws IOException
{
BufferedInputStream in = null;
try
{
if (source.isDirectory())
{
//String name = source.getPath().replace("\\", "/");
if(isFirst)
{
firstDir = source.getParent() + "\\";
isFirst = false;
}
String name = source.getPath();
name = name.replace(firstDir,"");
if (!name.isEmpty())
{
if (!name.endsWith("/"))
name += "/";
JarEntry entry = new JarEntry(name);
entry.setTime(source.lastModified());
target.putNextEntry(entry);
target.closeEntry();
}
for (File nestedFile: source.listFiles())
add(nestedFile, target);
return;
}
String name = source.getPath();
name = name.replace(firstDir,"").replace("\\", "/");
//JarEntry entry = new JarEntry(source.getPath().replace("\\", "/"));
JarEntry entry = new JarEntry(name);
//JarEntry entry = new JarEntry(source.getName());
entry.setTime(source.lastModified());
target.putNextEntry(entry);
in = new BufferedInputStream(new FileInputStream(source));
byte[] buffer = new byte[1024];
while (true)
{
int count = in.read(buffer);
if (count == -1)
break;
target.write(buffer, 0, count);
}
target.closeEntry();
}
finally
{
if (in != null)
in.close();
}
}