Wrong rendering of thousands separator with custom font in itext - pdf

I have a rendering issue since I upgraded to iText 7.2 (from 7.1.7) with a custom font.
The thousand separator is not rendered correctly.
With iText 7.1.7
With iText 7.2
tableArticles.AddCell(new Cell(1, 2)
.Add(new Paragraph($"{bonCommande.Total:C}")
.SetFontSize(14)
.SetBold()
.SetTextAlignment(TextAlignment.RIGHT)
.SetFixedLeading(LeadingSize))
.SetBorder(new SolidBorder(1))
.SetVerticalAlignment(VerticalAlignment.MIDDLE)
.SetPadding(5));
I tried to update the font by drawing something for unicode char arabic thousands separator (U+066C) but without success.
I'm from Belgium and use "fr-BE".
Thanks for any help.
EDIT :
Here is some code example with the font and pdfs examples.
WeTransfer Link To PDFs and Font
public class PdfCurrencyModel : PageModel
{
private readonly IPdfCurrencyService _pdfCurrencyService;
public PdfCurrencyModel(IPdfCurrencyService pdfCurrencyService)
{
_pdfCurrencyService = pdfCurrencyService;
}
public IActionResult OnGetAsync()
{
//var stream = _pdfCurrencyService.GetPDFMemoryStream();
//return File(stream, MediaTypeNames.Application.Pdf, "Values.pdf");
var bytesArray = _pdfCurrencyService.GetPDFBytesArray();
return new FileContentResult(bytesArray, MediaTypeNames.Application.Pdf);
}
}
public interface IPdfCurrencyService
{
public MemoryStream GetPDFMemoryStream();
public byte[] GetPDFBytesArray();
}
public class PdfCurrencyService : IPdfCurrencyService
{
public MemoryStream GetPDFMemoryStream()
{
// Fonts
var fontTexte = PdfFontFactory.CreateFont(FontConstants.BrownBold,
PdfFontFactory.EmbeddingStrategy.FORCE_EMBEDDED);
// Initialisation
var ms = new MemoryStream();
// Initialize PDF writer
var writer = new iText.Kernel.Pdf.PdfWriter(ms);
writer.SetCloseStream(false);
// Initialize PDF document
var pdfDoc = new iText.Kernel.Pdf.PdfDocument(writer);
// Initialize document
var document = new Document(pdfDoc, iText.Kernel.Geom.PageSize.A4)
.SetFont(fontTexte).SetFontSize(10);
// Values
for (var i = 1000; i <= 2000; i += 100)
document.Add(new Paragraph(i.ToString("C")));
// Close document
document.Close();
//ms.Flush();
ms.Position = 0;
return ms;
}
public byte[] GetPDFBytesArray()
{
return GetPDFMemoryStream().ToArray();
}
}
EDIT 2 : I created a simple web app with the problem
Link to project

When using the "fr-BE" culture this line:
1000.ToString("C")
evaluates to:
1.000,00 €
It seems that iText does some processing with the "." (dot) because neither files are correct.
Seems that 7.2.1 changed the way the fonts are handled. 7.1.7 created a simple WinAnsi font while 7.2.1 creates a Type0 font.
7.1.7 does not write at all the dot in the PDF file:
BT
/F1 10 Tf
36 787.93 Td
(1000,00 €)Tj
ET
7.2.1 writes <0000> for the dot GID (instead of <0073>) in the PDF file (spaces in string added by me to emphasize each GID):
BT
/F1 10 Tf
36 787.93 Td
<0093 0000 00a2 00a2 00a2 0080 00a2 00a2 0074 00e1>Tj
ET

I finally managed to make it work.
It appears that iText use U+202F (NARROW NO-BREAK SPACE) as thousands separator in 7.2.x versions but it was not the case before.
My font didn't have something specified so it rendered a square.
I copy/paste from U+0020 which is "SPACE" and I don't have any problem anymore.
In fr-BE and fr-FR the thousands separator is a blank space and not an dot.
var separator = new CultureInfo("fr-BE", false)
.NumberFormat
.CurrencyGroupSeparator;
Debug.WriteLine($"Currency Group Separator for fr-BE is {(int)separator[0]}");
Debug.WriteLine($"As HEX {((int)separator[0]).ToString("X")}");
// Currency Group Separator for fr-BE is 8239
//As HEX 202F
Thanks anyone for your time.

Related

itext html to pdf without embedding fonts

I'm following this guide in Chapter 6 of iText 7: Converting HTML to PDF with pdfHTML on adding extra fonts:
public static final String FONT = "src/main/resources/fonts/cardo/Cardo-Regular.ttf";
public void createPdf(String src, String font, String dest) throws IOException {
ConverterProperties properties = new ConverterProperties();
FontProvider fontProvider = new DefaultFontProvider(false, false, false);
FontProgram fontProgram = FontProgramFactory.createFont(font);
fontProvider.addFont(fontProgram, "Winansi");
properties.setFontProvider(fontProvider);
HtmlConverter.convertToPdf(new File(src), new File(dest), properties);
}
While it's working as expected and embedding subsets of the fonts being used, I'm wondering if there is a way for the resulting PDF document to not embed the fonts at all. This is possible when creating BaseFont instances and setting the embedded property to false and using them to build various PDF building blocks. What I'm looking for is this same behavior when using the HtmlConverter.convertToPdf().
What you should normally do is override FontProvider:
FontProvider fontProvider = new DefaultFontProvider(false, false, false) {
#Override
public boolean getDefaultEmbeddingFlag() {
return false;
}
};
However, the problem is that at the moment this font provider would be overwritten by pdfHTML further into the pipeline in ProcessorContext#reset.
While this issue is not fixed in iText you can build a custom version of pdfHTML for your needs. The repo is located at https://github.com/itext/i7j-pdfhtml and you are interested in this line. Just replace it with the overload as above and build the jar.
UPD The fix is available starting from pdfHTML 2.1.3. From that version on you can use custom font providers freely.

Trying to replace graphics resources in a PDF - PDFBox 2.0.8

I'm trying to manipulate image resources in some PDF files; the workflow is: extract image resources -> process each -> replace old ones with the new.
Simple task really, I have working code for extracting and replacing, but when I replace, the new file size is nearly twice the original.
To replace the images, I use PDResources.put(COSName, PDXObject). Any ideas what would cause the size increase in the resulting document? It happens even if I completely omit the middle step in the workflow to process each image resource.
public static void PDFBoxReplaceImages() throws Exception {
PDDocument document = PDDocument.load(new File("C:\\Users\\Markus\\workspace\\pdf-test\\book.pdf"));
PDPageTree list = document.getPages();
for (PDPage page : list) {
PDResources pdResources = page.getResources();
for (COSName c : pdResources.getXObjectNames()) {
PDXObject o = pdResources.getXObject(c);
if (o instanceof PDImageXObject) {
counter++;
String path = "C:\\Users\\Markus\\workspace\\pdf-test\\images\\"+counter+".png";
PDImageXObject newImg =
PDImageXObject.createFromFile(path, document);
pdResources.put(c, newImg);
}
}
}
document.save("C:\\Users\\Markus\\workspace\\pdf-test\\book.pdf");
}

Issues with iTextsharp and pdf manipulation

I am getting a pdf-document (no password) which is generated from a third party software with javascript and a few editable fields in it. If I load this pdf-document with the pdfReader class the NumberOfPagesProperty is always 1 although the pdf-document has 17 pages. Oddly enough the document has 17 pages if I save the stream afterwards. When I now try to open the document the Acrobat Reader shows an extended feature warning and the fields are not fillable anymore (I haven't flattened the document). Do anyone know about such a problem?
Background Info:
My job is to remove the javascript code, fill out some fields and save the document afterwards.
I am using the iTextsharp version 5.5.3.0.
Unfortunately I can't upload a sample file because there are some confidental data in it.
private byte[] GetDocumentData(string documentName)
{
var document = String.Format("{0}{1}\\{2}.pdf", _component.OutputDirectory, _component.OutputFileName.Replace(".xml", ".pdf"), documentName);
if (File.Exists(document))
{
PdfReader.unethicalreading = true;
using (var originalData = new MemoryStream(File.ReadAllBytes(document)))
{
using (var updatedData = new MemoryStream())
{
var pdfTool = new PdfInserter(originalData, updatedData) {FormFlattening = false};
pdfTool.RemoveJavascript();
pdfTool.Save();
return updatedData.ToArray();
}
}
}
return null;
}
//Old version that wasn't working
public PdfInserter(Stream pdfInputStream, Stream pdfOutputStream)
{
_pdfInputStream = pdfInputStream;
_pdfOutputStream = pdfOutputStream;
_pdfReader = new PdfReader(_pdfInputStream);
_pdfStamper = new PdfStamper(_pdfReader, _pdfOutputStream);
}
//Solution
public PdfInserter(Stream pdfInputStream, Stream pdfOutputStream, char pdfVersion = '\0', bool append = true)
{
_pdfInputStream = pdfInputStream;
_pdfOutputStream = pdfOutputStream;
_pdfReader = new PdfReader(_pdfInputStream);
_pdfStamper = new PdfStamper(_pdfReader, _pdfOutputStream, pdfVersion, append);
}
public void RemoveJavascript()
{
for (int i = 0; i <= _pdfReader.XrefSize; i++)
{
PdfDictionary dictionary = _pdfReader.GetPdfObject(i) as PdfDictionary;
if (dictionary != null)
{
dictionary.Remove(PdfName.AA);
dictionary.Remove(PdfName.JS);
dictionary.Remove(PdfName.JAVASCRIPT);
}
}
}
The extended feature warning is a hint that the original PDF had been signed using a usage rights signature to "Reader-enable" it, i.e. to tell the Adobe Reader to activate some additional features when opening it, and the OP's operation on it has invalidated the signature.
Indeed, he operated using
_pdfStamper = new PdfStamper(_pdfReader, _pdfOutputStream);
which creates a PdfStamper which completely re-generates the document. To not invalidate the signature, though, one has to use append mode as in the OP's fixed code (for char pdfVersion = '\0', bool append = true):
_pdfStamper = new PdfStamper(_pdfReader, _pdfOutputStream, pdfVersion, append);
If I load this pdf-document with the pdfReader class the NumberOfPagesProperty is always 1 although the pdf-document has 17 pages. Oddly enough the document has 17 pages
Quite likely it is a PDF with a XFA form, i.e. the PDF is only a carrier of some XFA data from which Adobe Reader builds those 17 pages. The actual PDF in that case usually only contains one page saying something like "if you see this, your viewer does not support XFA."
For a final verdict, though, one has to inspect the PDF.

How a font is detected to be bold/italic/plain that is used in PDF

While Extracting Content from PDF using the MuPDF library, i am getting the Font name only not its font-face.
Do i guess (eg.bold in font-name though not the right way) or there is any other way to detect that specific font is Bold/Italic/Plain.
I have used itextsharp to extract font-family ,font color etc
public void Extract_inputpdf() {
text_input_File = string.Empty;
StringBuilder sb_inputpdf = new StringBuilder();
PdfReader reader_inputPdf = new PdfReader(path); //read PDF
for (int i = 0; i <= reader_inputPdf.NumberOfPages; i++) {
TextWithFont_inputPdf inputpdf = new TextWithFont_inputPdf();
text_input_File = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader_inputPdf, i, inputpdf);
sb_inputpdf.Append(text_input_File);
input_pdf = sb_inputpdf.ToString();
}
reader_inputPdf.Close();
clear();
}
public class TextWithFont_inputPdf: iTextSharp.text.pdf.parser.ITextExtractionStrategy {
public void RenderText(iTextSharp.text.pdf.parser.TextRenderInfo renderInfo) {
string curFont = renderInfo.GetFont().PostscriptFontName;
string divide = curFont;
string[] fontnames = null;
//split the words from postscript if u want separate. it will be in this
}
}
public string GetResultantText() {
return result.ToString();
}
The PDF spec contains entries which allow you to specify the style of a font. However unfortunately in the real world you will often find that these are absent.
If the font is referenced rather than embeded this generally means you are stuck with the PostScript name for the font. It requires some heuristics but normally the name provides sufficient clues as to the style. It sounds this is pretty much where you are.
If the font is embedded you can parse it and try and find style information from the embedded font program. If it is subsetted then in theory this information might be removed but in general I don't think it will be. However parsing TrueType/OpenType fonts is boring and you may not feel that it is worth it.
I work on the ABCpdf .NET software component so my replies may feature concepts based around ABCpdf. It's just what I know. :-)"

HTML To PDF Turkish Character Problem

I want to convert a ASP.NET web page to pdf using ITextSharp. I did write some code but I can not make it show the Turkish Characters. Can anyone help me?
Here is the code:
using System;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
using System.Web.UI;
using System.Web;
using iTextSharp.text.html.simpleparser;
using System.Text;
using System.Text.RegularExpressions;
namespace Presentation
{
public partial class TemporaryStudentFormPrinter : System.Web.UI.Page
{
protected override void Render(HtmlTextWriter writer)
{
MemoryStream mem = new MemoryStream();
StreamWriter twr = new StreamWriter(mem);
HtmlTextWriter myWriter = new HtmlTextWriter(twr);
base.Render(myWriter);
myWriter.Flush();
myWriter.Dispose();
StreamReader strmRdr = new StreamReader(mem);
strmRdr.BaseStream.Position = 0;
string pageContent = strmRdr.ReadToEnd();
strmRdr.Dispose();
mem.Dispose();
writer.Write(pageContent);
CreatePDFDocument(pageContent);
}
public void CreatePDFDocument(string strHtml)
{
string strFileName = HttpContext.Current.Server.MapPath("test.pdf");
Document document = new Document(PageSize.A4, 80, 50, 30, 65);
PdfWriter.GetInstance(document, new FileStream(strFileName, FileMode.Create));
StringReader se = new StringReader(strHtml);
HTMLWorker obj = new HTMLWorker(document);
document.Open();
obj.Parse(se);
document.Close();
ShowPdf(strFileName);
}
public void ShowPdf(string strFileName)
{
Response.ClearContent();
Response.ClearHeaders();
Response.AddHeader("Content-Disposition", "inline;filename=" + strFileName);
Response.ContentType = "application/pdf";
Response.WriteFile(strFileName);
Response.Flush();
Response.Clear();
}
protected void Page_Load(object sender, EventArgs e)
{
}
}
}
iTextSharp.text.pdf.BaseFont STF_Helvetica_Turkish = iTextSharp.text.pdf.BaseFont.CreateFont("Helvetica", "CP1254", iTextSharp.text.pdf.BaseFont.NOT_EMBEDDED);
iTextSharp.text.Font fontNormal = new iTextSharp.text.Font(STF_Helvetica_Turkish, 12, iTextSharp.text.Font.NORMAL);
You should pass the font as an argument in itextsharp manipulation commands like that :
pdftable.AddCell(new Phrase(nn.InnerText.Trim(), fontNormal));
You might want to consider working with reporting tools with pdf exporting capability instead of direclty working with pdf which can be a real headache..
You'll need to make sure you are writing the text in a Font that supports the Turkish character set (or at least the caracters you are trying to write out).
I don't know what HtmlTextWriter does in terms of font use - it will presumably use one of the standard built-in fonts that are unlikely to support the characters you want to print if the fall outside of the Latin1 or Latin1-extended Unicode ranges.
I use BaseFont.createFont(...) to have an external Font included in my PDF in iText (Java) - one which supports all the characters that I am writing. You might be able to create your Font object and then pass it to HtmlTextWriter?
PdfFont font = PdfFontFactory.CreateFont("Helvetica", "CP1254",PdfFontFactory.EmbeddingStrategy.FORCE_NOT_EMBEDDED);
doc.Add(new Paragraph("P").SetFont(font));