Rendering a single font character using ABCPDF - pdf

Does ABCPDF 11 and higher have the ability to get a vector Path for a separate Glyph?
We have a need to process all Glyphs with our Neural Network to compose the correct encoding of the font. Now we are using PDFNET. The functionality we use is described here -
But we want to switch to ABCPDF.
We are looking for the necessary functionality in the documentation, for example here. But we didn't find anything.
Example
for charData in FontEncoding
Path path = ABC.GetVectorPath(charData);
or
Bitmap image = ABC.GetImage(charData);
// process the char using NN

Related

Ghostscript adds whitespace no matter what bounding box I use

I'm trying to convert a page of a PDF to an image. I'm successful with most PDF's I've tried with but this one in particular always ends up with a lot of whitespace on one side or strange scaling.
I've tried every combination of every fixed media, fixed resolution, fit page, use crop/bleed/trim/art box, etc. parameter to fix the issue but nothing does it. The best I get is the right content size but offset and chopped off.
Here's what it should look like, according to every PDF reader I've tried:
Here's a link to the PDF (8 MB) for testing.
https://drive.google.com/file/d/1ErS3KxADb1YAdzM7FG7T5dO8QnW4l1AQ/view?usp=sharing
Edit 1:
Here's what it looks like using just -dUseCropBox without a cropbox override:
I'm using Ghostscript.NET with very simple code. I create a rasterizer, call Ope(PDF file, ghostscript dll in bytes), then GetPage(DPI, page number). To use other flags I add a custom switch to the rasterizer before calling open
using(var rasterizer = new GhostscriptRasterizer()) {
//rasterizer.CustomSwitches.Add("-dFIXEDMEDIA");
//rasterizer.CustomSwitches.Add("-dFIXEDRESOLUTION");
//rasterizer.CustomSwitches.Add("-dPSFitPage");
//rasterizer.CustomSwitches.Add("-dFitPage");
//rasterizer.CustomSwitches.Add("-dPDFFitPage");
//rasterizer.CustomSwitches.Add("-dUseCropBox");
//rasterizer.CustomSwitches.Add("-dPrinted");
//rasterizer.CustomSwitches.Add("-dUseBleedBox");
//rasterizer.CustomSwitches.Add("-dUseTrimBox");
//rasterizer.CustomSwitches.Add("-dUseArtBox");
//rasterizer.CustomSwitches.Add("-sPAPERSIZE=letter");
//rasterizer.CustomSwitches.Add("-dORIENT1=true");
//etc
rasterizer.Open(pdfFilePath, ghostscriptDLL);
img = rasterizer.GetPage(dpi, pageNumber);
img.Save(pageFilePath, imageFormat);
}
I'll try again with the latest version of just ghostscript (no .NET) and see if that makes a difference.
Edit 2:
Using just gswin64c version 9.55.0 and -dUseCropBox works as KenS said. Since I don't need Ghostscript.NET to do that, that's a good resolution.
Using just gswin64c version 9.55.0 and -dUseCropBox works as KenS said. Since I don't need Ghostscript.NET to do that, that's a good resolution.

Two different fonts in one inline object while creating PDF

Is it technically possible to use two different font in the same
DrawHTMLTextBox while using Debenu Quick PDF Library 10?
Is it possible with any other libraries which can be used in a PHP
project (Not preferred)?
Currently it is not possible to use two different fonts in the string that you pass to the DrawHTMLTextBox function in Debenu Quick PDF Library. If you want to use a different font for different parts of the string you'll need to use DrawHTMLText instead and change the font using SetHTMLNormalFont prior to each section of the string being drawn.
Using this method you'll need to keep track of the width and height of the text you're drawing yourself but you can do that using the GetHTMLTextHeight, GetHTMLTextWidth and GetHTMLTextLineCount.

Set windows size of QuickLook Plugin

I'm building a QuickLook plugin. I want to change the width of the windows that pops up when user hits the spacebar.
I've read there are two keys in the info.plist file of the project where height and width are customisable. Even if I change those values I can't get the size of the preview windows to my desired one.
I don't know what else to try. Any idea?
Thanks!
Thought I'd dig a little on this. I have not tried any of the following suggestions, so nobody get their hopes up. I'll assume you're using the generator callback:
OSStatus (*GeneratePreviewForURL)(
void *thisInterface,
QLPreviewRequestRef preview,
CFURLRef url,
CFStringRef contentTypeUTI,
CFDictionaryRef options
);
Before anything else, you might manually check the options dictionary argument and verify that the kQLPreviewPropertyWidthKey and kQLPreviewPropertyHeightKey keys are indeed mapped to the desired CFNumber values.
Referring to each of these properties, the Apple QuickLook programming guide says:
Note that this property is a hint; Quick Look might set the width
automatically for some types of previews. The value must be
encapsulated in a CFNumber object.
(Edit: If your preview representation is flexible, you might try finding a preview type for which QuickLook honors your size hints, as per the statement above. Just a thought.)
Running nm on the QuickLook framework binary revealed some undocumented kQLPreviewProperty-- constants as well as the aforementioned width and height keys. One that caught my attention was kQLPreviewPropertyAutoSizeKey. Recalling Apple's statement about ignoring the hints to set the size automatically, this might be significant? Following the convention in QuickLook.framework/Headers/QLBase.h, you might try declaring
extern const CFStringRef kQLPreviewPropertyAutoSizeKey;
Then you could try associating a CFNumber 0 with that property key in the options dictionary. There are other undocumented keys of note, such as kQLPreviewPropertyAttributesKey.
Back to the Info.plist you mentioned, Apple says about those keys QLPreviewWidth and QLPreviewHeight:
This number gives Quick Look a hint for the width (in points) of
previews. It uses these values if the generator takes too long to
produce the preview. (emphasis added)
This is where someone makes the terrible suggestion of calling sleep() in your generator. But I'm perplexed as to why Apple would make following the size hints dependent on the generator latency. (?)
Edit: Also note the above statement says the Info.plist hints must be expressed in points (not pixels), a unit dependent on the user's screen resolution.
Recently I was developing a Quick Look Plugin myself which uses HTML+CSS and faced the same problem.
The solution for my was to test the plugin not within Xcode and qlmanage as the executable but instead to try the real .qlgenerator from my user library.
When invoking the generator from my user library, the Quick Look window was resized exactly the way I specified in the *-Info.plist.
I've run into the same problem, and may offer some clues: In my case I'm generating an image quick look preview for my custom file format. I initiate the preview context to draw my preview into using
CGContextRef QLPreviewRequestCreateContext(QLPreviewRequestRef preview, CGSize size, Boolean isBitmap, CFDictionaryRef properties);
The curious thing is that if I set isBitmap to true, quick look adjusts the preview panel size to the size specified for the context (up to a certain size at least). But if you set isBitmap to false, it seems to disregard the context size and instead always shows a full size preview panel with the vector graphics image scaled to cover the entire panel.
So, if you use a bitmap graphical preview context, it seems the preview panel will be set to the size of the context you specify. However, I haven't found any way to set the size of the panel when using a vector graphic preview context (which is what I want).

How to detect if a pdf is text searchable or non text searchable?

i have a set of pdfs, from which i want to process( VB.NET) only those which are non text searchable, can you please tell me how to go about this?
Generally speaking, the way to do this is open up each page and rip the content stream and see if any text operators are executed that place text on the page.
Let me explain what that means - PDF content is a small RPN language that contains operations that mark the page in some way. For example, you might see something like this:
BT 72 400 Td /F0 12 Tf (Throatwarbler Mangrove) Tj ET
Which means:
Begin a text area
Set the position of the text baseline to (72, 400) in PDF units
Set the font to a resource named F0 from the current page's font resource dictionary
Draw the text "Throatwarbler Mangrove"
End a text area
So you can try short cuts
does my page's resource dictionary contain any fonts?
This will fail in some cases because some PDF generation tools put fonts into the resource
dictionary and don't use them (false positive). It will also fail if the page content contains a Form XObject which contains text (false negative).
does my page's content stream have BT/ET opertors?
This will get you closer, but will fail if there is not content in them (false positive) or if they're not present, but there's a Form XObject which contains text (false negative).
So really, the thing to do is to execute the entire page's content stream, including recursing on all XObject to look for text operators.
Now, there's another approach that you can take using my Atalasoft's software (disclaimer, I work for Atalasoft and have written most of the PDF handling code, I also worked on Acrobat versions 1-4). Instead of asking, does this page contain any text, you can ask "does this page contain only a single image?"
bool allPagesImages = true;
using (Document doc = new Document(inputStream))
{
foreach (Page p in doc.Pages)
{
if (!p.SingleImageOnly)
{
allPagesImages = false;
break;
}
}
}
Which will leave allPagesImages with a pretty decent indication that each page is all images, which if you're looking to OCR is the non-searchable documents, is probably what you really want.
The down side is that this will be a very high price for a single predicate, but it also gets you a PDF rasterizer and the ability to extract the images directly out of the file.
Now, I have no doubt that a solid engineer could work their way through the PDF spec and write some code to extend iTextPdfSharp to do this task I think that if I sat down with it, I might be able to write that predicate in a few days, but I already know most of the PDF spec. So it might take you more like two weeks to a month. So your choice.
I think this option could be your consideration, though I haven't tested the code yet but I think it can be done by read the properties for each PDF files that you want to proceed.
You might check this link :
http://www.codeguru.com/columns/vb/manipulating-pdf-files-with-itextsharp-and-vb.net-2012.htm
You have to read the producer properties right after you proceeded it. That's just only example. But my advice please include your code here so we can give a try to help you. Bless you

Special characters in iText

I need help in using these symbols ⎕, ∨, ๐, Ʌ, and so on. But when I create a PDF with iText these symbols do not appear.
What can I do so that these symbols appear?
You have to use a font and encoding that contains those characters. Your best bet is to use IDENTITY_H for your encoding, as this grants you access to every character within a given font... but you still have to use the right font.
There are several font-manipulation examples within "iText in Action's" chapter on fonts:
http://www.itextpdf.com/book/chapter.php?id=11
The examples are down the right side. Buying the book would probably help too.
I had the same problem too and I figured out using IDENTITY_H for encoding is working fine.
For example:
java.awt.Font f =...;
Font font = FontFactory.getFont(f.getName(),BaseFont.IDENTITY_H)
I don't understand why with BaseFont.WINANSI it doesn't work. Winansi is the standard Windows Cp1252 character set, that one used by my JVM. So, if the char is correctly displayed in Java, why it is not the case for PDF?
You can escape them according to the unicode escape sequence defined in the java language specification. See http://java.sun.com/docs/books/jls/first_edition/html/3.doc.html
If you are using IntelliJ IDEA for your code you can download the StringManipulation plugin, that does the escapes for you. In the settings of IDEA you can also set the "Transparent native-to-ascii conversion" checkbox under File encodings, and this should help do the trick.
square in pdf file by iText:
BaseFont bf = BaseFont.createFont("c:/windows/fonts/arialbd.ttf", BaseFont.IDENTITY_H, BaseFont.EMBEDDED);
question.add(new Phrase("\u25A1", new Font(bf, 26)));
You can see a pdf file exemple here