Add text to existing PDF using Python and PDFTron - pdf

I’m trying to add text dynamically to an existing PDF with a font available in the system using pdfnetpython3 9.1.0 from the PDFTron SDK. I’m using the following Python code:
texts = ["test1", "test2"]
ewriter = ElementWriter()
ebuilder = ElementBuilder()
# not shown here: create PDFDoc object from PDF in disk
page = pdf_doc.GetPage(3)
ewriter.Begin(page)
element = ebuilder.CreateTextBegin(Font.Create(pdf_doc.GetSDFDoc(), "Inter", ""), 11)
ewriter.WriteElement(element)
for i, text in enumerate(texts):
element = ebuilder.CreateTextRun(text)
element.SetTextMatrix(1, 0, 0, 1, 0, 20*i)
ewriter.WriteElement(element)
i += 1
element = ebuilder.CreateTextEnd()
ewriter.WriteElement(element)
ewriter.End() # save changes to the current page
However, the text shows up in Japanese characters in the saved PDF. Am I loading the font incorrectly by passing its name and an empty string as char_set ? I’ve followed the sample that loads the Helvetica font in https://www.pdftron.com/documentation/samples/py/UnicodeWriteTest .
I have tried to call PDFNet.AddFontSubst() with the Inter font file path as parameter before rendering the strings, but I got the same result.
I’m only able to load the font correctly if I provide a Font.StandardType1Font (e.g. Font.e_times_roman) instead of a font name to Font.Create() (this links to the .NET sdk, but .NET and Python SDK uses the same classes and methods names).

Related

How to adjust font size in syncfusion html to pdf

I am able to convert HTML to PDF using syncfusion.
The issue I have is it doesn't obey the font sizes
var htmlText = "<html><head><style>body{font-size:50px;}</style></head><body>Hello</body></html>";
var convertedHtmlDocument = ConvertFromHtml(htmlText);
var ms = new MemoryStream();
var fpath = AppDomain.CurrentDomain.BaseDirectory + "myfile.pdf";
SaveToFile(convertedHtmlDocument, fpath);
ms.Dispose();
It doesn't matter if I make the font size in the CSS 50 or 5, the font comes out the same size.
I also tried (same issue):
var htmlText = "<html><head><style>.myclass {font-size:50px;}</style></head><body><div class="myapp">Hello</div></body></html>";
Exporting the above 2 to an .html document behaves as the syntax suggests.
If I change my CSS to use table then it works, but I want to have a single font size for the document, not just the table
var htmlText = "<html><head><style>table{font-size:50px;}</style></head><body>Hello</body></html>"; //this works
What am I doing wrong?
We have checked the reported issue with different font size but it is working properly on our end. We have attached the sample and output for your reference.
Sample: https://www.syncfusion.com/downloads/support/directtrac/general/ze/WPF1621828031
Output: https://www.syncfusion.com/downloads/support/directtrac/general/ze/Output259867595
If still you have facing the issue, we request you share the modified code sample, input html text and product version ,screenshot of the issue to check this on our end. It will be helpful for us to analyse and assist you further on this.

Is it possible to save a base64 string as an image in a image file using only PhantomJS?

I'm trying to capture a particular element on a web page using PhantomJS. Using getBoundingClientRect(), I'm able to clip Off the unnecessary elements(for which the entire page gets rendered and then clipped). Now I'm to trying to focus and capture a particular canvas component and store it in an image file. Once base64 string is obtained, how do I save base64 string as an image in an image file without the aid of any utility like casperjs? The below code doesn't work for me.
img = chart1.canvas.toDataURL();
ext = img.split(';')[0].match(/jpeg|png|gif/)[0];
data = img.replace(/^data:image\/\w+;base64,/, "");
fs.write('myChart.png', data, 'w');

How to embed font into pdf/a using iText7

I'm trying to see how to embed fonts into my pdf/a.
I found a lot of answer but using iTextSharp.
In my cas I use iText7 and all I tried gave me the error:
"All the fonts must be embedded..."
I have a ttf file for my font but I didn't find a way to embed it into my pdf to use it...
Could someone help me?
Thanks in advance
kor6k
As documented in the tutorial and as indicated by the error you mention ("All the fonts must be embedded"), you need to embed the fonts.
You are probably not defining a font, in which case the standard Type 1 font Helvetica will be used. These standard Type 1 fonts are never embedded, hence you need to pick another font.
The example from the tutorial uses the free font FreeSans:
public const String FONT = "resources/font/FreeSans.ttf";
The font object is defined like this:
PdfFont font = PdfFontFactory.CreateFont(FONT, PdfEncodings.WINANSI, true);
This font is used in a Paragraph like this:
Paragraph p = new Paragraph();
p.SetFont(font);
p.Add(new Text("Font is embedded"));
document.Add(p);
This is the C# version. If you need the Java version, take a look at the Java version of the tutorial:
public static final String FONT = "src/main/resources/font/FreeSans.ttf";
PdfFont font = PdfFontFactory.createFont(FONT, PdfEncodings.WINANSI, true);
Paragraph p = new Paragraph();
p.setFont(font);
p.add(new Text("Font is embedded"));
document.add(p);
If you already use this approach, and you still get the error, you probably have some content somewhere for which you didn't define a font that is embedded.

Docx4j v3 Docx to HTML with Images

I'm working to convert a docx to html using Docx4j version 3.
The document contains white space consisting of tabs, spaces and newlines. The resulting HTML either has unrecognized characters or does not preserve whitespace at all.
The java code I'm using is:
WordprocessingMLPackage wordMLPackage = Docx4J.load(is);
HTMLSettings htmlSettings = Docx4J.createHTMLSettings();
htmlSettings.setImageDirPath( System.getProperty("user.dir") + uploadedImagesDirectory );
htmlSettings.setWmlPackage(wordMLPackage);
Docx4J.toHTML(htmlSettings, out, Docx4J.FLAG_EXPORT_PREFER_XSL);
String result = ((ByteArrayOutputStream)out).toString();
How can I preserve the whitespace in the document. Also, is there a method to apply css to a particular node? Specifically, I have 3 images which should be evenly spaced horizontally on the page.
I've looked over the documentation and searched online with no success.
Thank you.
I resolved the issue and it was not related to Docx4j.
Docx4j parsed the document perfectly! The problem was related to sending the output in an email.
I set the Spring helper javamail mime encoding to resolve this issue:
MimeMessageHelper message = new MimeMessageHelper(mimeMessage, true, "utf-8");

How to find x,y location of a text in pdf

Is there any tool to find the X-Y location on a text content in a pdf file ?
Docotic.Pdf Library can do it. See C# sample below:
using (PdfDocument doc = new PdfDocument("your_pdf.pdf"))
{
foreach (PdfTextData textData in doc.Pages[0].Canvas.GetTextData())
Console.WriteLine(textData.Position + " " + textData.Text);
}
Try running "Preflight..." in Acrobat and choosing PDF Analysis -> List page objects, grouped by type of object.
If you locate the text objects within the results list, you will notice there is a position value (in points) within the Text Properties -> * Font section.
TET, the Text Extraction Toolkit from the pdflib family of products can do that. TET has a commandline interface, and it's the most powerful of all text extraction tools I'm aware of. (It can even handle ligatures...)
Geometry
TET provides precise metrics for the text, such as the position on the page, glyph widths, and text direction. Specific areas on the page can be excluded or included in the text extraction, e.g. to ignore headers and footers or margins.