How to create PDF/UA in iText7 with text hyperlink - pdf

I am trying to create a PDF/UA compliant file that contains a text hyperlink with iText 7. Both the Acrobat Preflight test for PDF/UA and the PDF Accessibility Checker (PAC 3) complain that the PDF file say that the PDF is not compliant.
PAC 3 says ""Link" annotation is not nested inside a "Link" structure element" and the Acrobat Preflight test says the Link annotation does not have an alternate description in the Contents key.
The following is my attempt to create PDF/UA compliant output that contains a text hyperlink.
Any advice would be appreciated.
public void testHyperLink() throws IOException {
// Create PDF/UA with text hyperlink
String filename = "./results/HyperLink.pdf";
WriterProperties properties = new WriterProperties();
properties.addUAXmpMetadata().setPdfVersion(PdfVersion.PDF_1_7);
PdfWriter writer = new PdfWriter(filename, properties);
pdfDoc = new PdfDocument(writer);
//Make document tagged
pdfDoc.setTagged();
pdfDoc.getCatalog().setLang(new PdfString("en-US"));
pdfDoc.getCatalog().setViewerPreferences(new PdfViewerPreferences().setDisplayDocTitle(true));
PdfDocumentInfo info = pdfDoc.getDocumentInfo();
info.setTitle("Hello Hyperlinks!");
document = new Document(pdfDoc);
// Must embed font for PDF/UA
byte[] inputBytes = Files.readAllBytes(Paths.get("./resources/fonts/opensans-regular.ttf"));
boolean embedded = true;
boolean cached = false;
PdfFont font = PdfFontFactory.createFont(inputBytes, PdfEncodings.CP1252, embedded, cached);
Text text = new Text("This is a Text link");
text.setFont(font);
text.setFontSize(16F);
// Add alternate text for hyperlink
text.getAccessibilityProperties().setAlternateDescription("Click here to go to the iText website");
PdfAction act = PdfAction.createURI("https://itextpdf.com/");
text.setAction(act);
Paragraph para = new Paragraph();
para.add(text);
document.add(para);
document.close();
System.out.println("Created "+ filename);
}

A Link object might be what you want:
Link lnk = new Link("This is a Text link",
PdfAction.CreateURI("https://itextpdf.com/"));
lnk.SetFont(font);
lnk.GetLinkAnnotation().SetBorder(new PdfAnnotationBorder(0, 0, 0));//Remove the default border
lnk.GetAccessibilityProperties().SetAlternateDescription("Click here to go to the iText website");
Paragraph para = new Paragraph();
para.Add(lnk);
document.Add(para);

Related

How to pass font name as string in pdf file with Java iText

I am generating pdf report with few inputs like font name, font size. I tried to create a font using below code.
Font font = new Font(FontFamily.TIMES_ROMAN,50.0f,Font.UNDERLINE,BaseColor.RED);
Here, how pass font name that is TIMES_ROMAN as a string?
Here's a quick way on how you can achieve the desired behavior with iText 7:
final PdfDocument pdfDocument = new PdfDocument(new PdfWriter(DEST));
PdfFont font = PdfFontFactory.createFont(FontProgramFactory.createFont(StandardFonts.TIMES_ROMAN));
Style myStyle = new Style()
.setFontSize(50)
.setUnderline()
.setFontColor(RED)
.setFont(font);
try (final Document document = new Document(pdfDocument)) {
document.add(new Paragraph("Hello World!").addStyle(myStyle));
document.add(new Paragraph("Hello World!").setFont(font)
.setFontSize(50)
.setUnderline()
.setFontColor(RED));
}
You can also define the font on a Document level (I'm showing Style and directly on the Paragraph).

How can you copy text from source pdf that includes the formatting information?

I am using iText7 to experiment copying a load of seperate pdfs into 1 single pdf document. It's easy to copy the text like this:
var sourcePage = sourcePdf.GetPage(i + 1);
var strategy = new SimpleTextExtractionStrategy();
var text = PdfTextExtractor.GetTextFromPage(sourcePage, strategy);
var currentText = Encoding.
UTF8.GetString(Encoding.Convert(Encoding.Default, Encoding.UTF8, Encoding.Default.GetBytes(text)));
PdfFont regular = PdfFontFactory.CreateFont(FontConstants.HELVETICA);
PdfFont bold = PdfFontFactory.CreateFont(FontConstants.HELVETICA_BOLD);
Text first = new Text(currentText).SetFont(regular);
Text second = new Text("TEST TEST").SetFont(bold);
Paragraph paragraph = new Paragraph().Add(first).Add(second);
outDocument.Add(paragraph);
Here I am testing Helvetica font but its needs to be the same as the source.
The variables "text" and "currentText" are just plain text. How do I get the metadata? The destination document needs to have the same formatting.

pdfbox - unable to capture modified values from pdf

I have a requirement to open PDF on JXBrowser and let the user modify values on PDF and upon saving, I should able to read the modified values and save to database.
My issue was, I am unable to fetch modified values from pdf, its always sending back original values from pdf (acroForm.getField(field name);). Could you help me if there is any other way to solve this problem.
I am using pdfbox 2.0.1
Appreciate your help.
Thanks,
Prasad
Update1:
Adding sample code that I have used in my application
PDDocument PDFDoc = PDDocument.load(complaintform.pdf);
LoggerProvider.setLevel(Level.OFF);
Base64Encoder b64 = new Base64Encoder();
ByteArrayOutputStream baos = new ByteArrayOutputStream();
PDFDoc.save(baos);
String pdfHTML = "<HTML><BODY style=\"width:100%; height:100%\" > <embed style=\"width:100%; height:100%\" src=\"data:application/pdf;base64,"+b64.encode(baos.toByteArray())+"\"type=\"application/pdf\"></BODY></HTML>";
Browser browser = new Browser();
BrowserView browserView = new BrowserView(browser);
this.add(browserView, BorderLayout.CENTER);
browser.loadHTML(pdfHTML);
save()
{
PDDocumentCatalog docCatalog = PDFDoc.getDocumentCatalog();
PDAcroForm acroForm = docCatalog.getAcroForm();
PDField field = acroForm.getField("last");
String modifiedValue = field.getValueAsString();
}

How to add a rich Textbox (HTML) to a table cell?

I have a rich text box named:”DocumentContent” which I’m going to add its content to pdf using the below code:
iTextSharp.text.Font font = FontFactory.GetFont(#"C:\Windows\Fonts\arial.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED, 12f, Font.NORMAL, BaseColor.BLACK);
DocumentContent = System.Web.HttpUtility.HtmlDecode(DocumentContent);
Chunk chunkContent = new Chunk(DocumentContent);
chunkContent.Font = font;
Phrase PhraseContent = new Phrase(chunkContent);
PhraseContent.Font = font;
PdfPTable table = new PdfPTable(2);
table.WidthPercentage = 100;
PdfPCell cell;
cell = new PdfPCell(new Phrase(PhraseContent));
cell.Border = Rectangle.NO_BORDER;
table.AddCell(cell);
The problem is when I open PDF file the content appears as HTML not a text as below:
<p>Overview  line1 </p><p>Overview  line2
</p><p>Overview  line3 </p><p>Overview 
line4</p><p>Overview  line4</p><p>Overview 
line5 </p>
But it should look like below
Overview line1
Overview line2
Overview line3
Overview line4
Overview line4
Overview line5
What I'm going to do is to keep all the styling which user apply to the rich text and just change font family to Arial.
I can change Font Family but I need to Decode this content from HTML to Text.
Could you please advise?
Thanks
Please take a look at the HtmlContentForCell example.
In this example, we have the HTML you mention:
public static final String HTML = "<p>Overview line1</p>"
+ "<p>Overview line2</p><p>Overview line3</p>"
+ "<p>Overview line4</p><p>Overview line4</p>"
+ "<p>Overview line5 </p>";
We also create a font for the <p> tag:
public static final String CSS = "p { font-family: Cardo; }";
In your case, you may want to replace Cardo with Arial.
Note that we registered the regular version of the Cardo font:
FontFactory.register("resources/fonts/Cardo-Regular.ttf");
If you need bold, italic and bold-italic, you also need to register those fonts of the same Cardo family. (In case of arial, you'd register arial.ttf, arialbd.ttf, ariali.ttf and arialbi.ttf).
Now we can parse this HTML and CSS into a list of Element objects with the parseToElementList() method. We can use these objects inside a cell:
PdfPTable table = new PdfPTable(2);
table.addCell("Some rich text:");
PdfPCell cell = new PdfPCell();
for (Element e : XMLWorkerHelper.parseToElementList(HTML, CSS)) {
cell.addElement(e);
}
table.addCell(cell);
document.add(table);
See html_in_cell.pdf for the resulting PDF.
I do not have the time/skills to provide this example in iTextSharp, but it should be very easy to port this to C#.
Finally I write this code in c# which is working perfectly, Thanks to Bruno who helped me to understand XMLWorker.
Here is an example using XMLWorker in C#.
I used a sample HTML as below:
public static string HTML = "<p>Overview line1âââŵẅẃŷûâàêÿýỳîïíìôöóòêëéèẁẃẅŵùúúüûàáäâ</p>"
+ "<p>Overview line2</p><p>Overview line3</p>"
+ "<p>Overview line4</p><p>Overview line4</p>"
+ "<p>Overview line5 </p>";
I have created Test.css file and saved it in SharePoint Style Library. (for this test I saved it in D drive to keep it simple)
Here is the content of my test css file:
p { font-family: arial; }
Then using the below c# code I saved the PDF file in D drive. ( In SharePoint I used Memorystream. I keep this example very simple to understand )
string fileName = #"D:\Test.pdf";
var css = #"D:\Test.css";
using (var ActionStream = new MemoryStream(UTF8Encoding.UTF8.GetBytes(HTML)))
{
using (FileStream cssFile = new FileStream(css, FileMode.Open))
{
var document = new Document(PageSize.A4, 30, 30, 10, 10);
var worker = XMLWorkerHelper.GetInstance();
var writer = PdfWriter.GetInstance(document, new FileStream(fileName, FileMode.Create));
document.Open();
worker.ParseXHtml(writer, document, ActionStream, cssFile);
writer.CloseStream = false;
document.Close();
}
}
It creates Test.pdf file adding my HTML with Font Family:Arial. So all of the Welsh Characters can be saved in PDF file.
Note: I have added iTextSharp.dll v:5.5.3 and XMLworker.dll v: 5.5.3 to my project.
using iTextSharp.text;
using iTextSharp.text.html;
using iTextSharp.text.pdf;
using iTextSharp.tool.xml;
using iTextSharp.tool.xml.css;
using iTextSharp.tool.xml.html;
using iTextSharp.tool.xml.parser;
using iTextSharp.tool.xml.pipeline;
Hope this can be useful.
Kate

itextsharp stamper vs smartcopy, show pdf in browser without toolbar or navpanes

My application is grabbing pdf bytes from our db and sending the pdf to an iframe, using the itextsharp library. When the pdf is displayed in the iframe, the toolbar and navigation pane show, but we'd like to hide those. When I load a pdf document by simply typing in the pdf's url with #toolbar=0&navpanes=0, I see the result I'm looking for.
The application logic is using PdfStamper to add some buttons and other data to the pdf. When I write the pdf to the Response.Outputstream, the pdf shows up with the added buttons, and all is good except that I can't get rid of the toolbar and navpanes. I've tried adding "toolbar=0&navpanes=0" to the url in the response header, but to no avail.
I've written a test application which shows that using PdfSmartCopy instead of the stamper works perfectly - the pdf is shown in the browser which hides the toolbar and navpane by default.
The problem is that I still need to add some buttons to the pdf via the stamper. I've written a test app which adds the buttons via the stamper, then the smart copy grabs each page from the stamper and writes all this out to the Response.Output. The pdf shows in the browser with no toolbar or navpanes, but the buttons are not there.
Here is the code which uses both the stamper and the smart copy - your help is greatly appreciated:
private void SendStamperToCopy()
{
try
{
String filePath = #"C:\debug\PerfIndicWithDefaults.pdf";
byte[] pdfBytes = ReadFile(filePath);
Document document = new Document();
PdfSmartCopy copy = new PdfSmartCopy(document, Response.OutputStream);
document.Open();
MemoryStream memStream = new MemoryStream();
PdfReader reader = new PdfReader(pdfBytes);
PdfStamper pdfStamper = new PdfStamper(reader, memStream);
// add a button with the stamper
iTextSharp.text.Rectangle rectCancel = new iTextSharp.text.Rectangle(50, 50, 20, 20);
PushbuttonField btnCancel = new PushbuttonField(pdfStamper.Writer, rectCancel, "Cancel");
btnCancel.Text = "Cancel";
iTextSharp.text.pdf.PdfAnnotation fieldCancel = btnCancel.Field;
pdfStamper.AddAnnotation(fieldCancel, 1);
int numOfPgs = reader.NumberOfPages;
for (int n = 1; n <= numOfPgs; n++)
{
copy.AddPage(pdfStamper.GetImportedPage(reader, n));
}
String headerStr = "inline; filename=PerfIndicWithDefaults.pdf";
Response.AppendHeader("content-disposition", headerStr);
Response.ContentType = "application/pdf";
Response.OutputStream.Flush();
document.Close();
Response.OutputStream.Close();
}
catch (Exception ex)
{
Console.Write(ex);
Response.OutputStream.Flush();
Response.OutputStream.Close();
}
}
If I understand your question correctly, you want to use PdfStamper to add a button and you want to change the viewer preferences. This can be done like this:
PdfReader reader = new PdfReader(source);
System.IO.MemoryStream m = new System.IO.MemoryStream();
PdfStamper stamper = new PdfStamper(reader, m);
PdfStamper.ViewerPreferences = PdfWriter.HideToolbar | PdfWriter.PageModeUseNone;
stamper.Close();
reader.Close();
The HideToolbar will hide the toolbar, whereas PageModeUseNone means that you don't show any panels (such as the bookmarks panel, etc...).
It is not clear why you would need PdfSmartCopy in this context. Maybe I'm missing something. Also: there are some strange errors in your code: you never close the stamper instance, yet you import a page from the stamper into the copy instance. I've never seen any one try that. It's certainly not what I had in mind when I wrote iText. Your code is very confusing to me.