Use cocoa core text to filter out disabled font - objective-c

I'm trying to use core text to get fonts on a mac machine, and I don't want those fonts which are disabled in the font book, the problem is I can neither get the font enabled state of each font nor filter out the disabled fonts when getting a font collection.
I use CTFontCollectionCreateFromAvailableFonts to get font collection, the problem is whether I pass options to CTFontCollectionCreateFromAvailableFonts or get kCTFontEnabledAttribute attribute from font descriptor, it just doesn't work. Is there anything wrong with my code, or there are other approaches to get a font's enabled state.
Here's my code piece:
static CFArrayRef fontDescArray;
static CFIndex fontCount = 0;
static int currentIndex = 0;
CFStringRef keys[] = {kCTFontCollectionIncludeDisabledFontsOption,kCTFontCollectionRemoveDuplicatesOption};
CFTypeRef values[] = {kCFBooleanFalse,kCFBooleanTrue};
CFIndex numValues = 2;
CFDictionaryRef options = CFDictionaryCreate((CFAllocatorRef)NULL,
(const void**)keys,
(const void**)values,
numValues,
&kCFTypeDictionaryKeyCallBacks,
&kCFTypeDictionaryValueCallBacks);
CTFontCollectionRef availableFonts = CTFontCollectionCreateFromAvailableFonts(options);
fontDescArray = CTFontCollectionCreateMatchingFontDescriptors(availableFonts);
fontCount = CFArrayGetCount(fontDescArray);
CTFontDescriptorRef fontDescRef = (CTFontDescriptorRef)CFArrayGetValueAtIndex(fontDescArray, currentIndex);
CFBooleanRef enabled =(CFBooleanRef)CTFontDescriptorCopyAttribute(fontDescRef,kCTFontEnabledAttribute);
Boolean result = CFBooleanGetValue(enabled);
CFRelease stuffs...

Related

ITextSharp crop PDF to remove white margins [duplicate]

I have a pdf which comprises of some data, followed by some whitespace. I don't know how large the data is, but I'd like to trim off the whitespace following the data
PdfReader reader = new PdfReader(PDFLOCATION);
Rectangle rect = new Rectangle(700, 2000);
Document document = new Document(rect);
PdfWriter writer = PdfWriter.getInstance(document, new FileOutputStream(SAVELCATION));
document.open();
int n = reader.getNumberOfPages();
PdfImportedPage page;
for (int i = 1; i <= n; i++) {
document.newPage();
page = writer.getImportedPage(reader, i);
Image instance = Image.getInstance(page);
document.add(instance);
}
document.close();
Is there a way to clip/trim the whitespace for each page in the new document?
This PDF contains vector graphics.
I'm usung iTextPDF, but can switch to any Java library (mavenized, Apache license preferred)
As no actual solution has been posted, here some pointers from the accompanying itext-questions mailing list thread:
As you want to merely trim pages, this is not a case of PdfWriter + getImportedPage usage but instead of PdfStamper usage. Your main code using a PdfStamper might look like this:
PdfReader reader = new PdfReader(resourceStream);
PdfStamper stamper = new PdfStamper(reader, new FileOutputStream("target/test-outputs/test-trimmed-stamper.pdf"));
// Go through all pages
int n = reader.getNumberOfPages();
for (int i = 1; i <= n; i++)
{
Rectangle pageSize = reader.getPageSize(i);
Rectangle rect = getOutputPageSize(pageSize, reader, i);
PdfDictionary page = reader.getPageN(i);
page.put(PdfName.CROPBOX, new PdfArray(new float[]{rect.getLeft(), rect.getBottom(), rect.getRight(), rect.getTop()}));
stamper.markUsed(page);
}
stamper.close();
As you see I also added another argument to your getOutputPageSize method to-be. It is the page number. The amount of white space to trim might differ on different pages after all.
If the source document did not contain vector graphics, you could simply use the iText parser package classes. There even already is a TextMarginFinder based on them. In this case the getOutputPageSize method (with the additional page parameter) could look like this:
private Rectangle getOutputPageSize(Rectangle pageSize, PdfReader reader, int page) throws IOException
{
PdfReaderContentParser parser = new PdfReaderContentParser(reader);
TextMarginFinder finder = parser.processContent(page, new TextMarginFinder());
Rectangle result = new Rectangle(finder.getLlx(), finder.getLly(), finder.getUrx(), finder.getUry());
System.out.printf("Text/bitmap boundary: %f,%f to %f, %f\n", finder.getLlx(), finder.getLly(), finder.getUrx(), finder.getUry());
return result;
}
Using this method with your file test.pdf results in:
As you see the code trims according to text (and bitmap image) content on the page.
To find the bounding box respecting vector graphics, too, you essentially have to do the same but you have to extend the parser framework used here to inform its listeners (the TextMarginFinder essentially is a listener to drawing events sent from the parser framework) about vector graphics operations, too. This is non-trivial, especially if you don't know PDF syntax by heart yet.
If your PDFs to trim are not too generic but can be forced to include some text or bitmap graphics in relevant positions, though, you could use the sample code above (probably with minor changes) anyways.
E.g. if your PDFs always start with text on top and end with text at the bottom, you could change getOutputPageSize to create the result rectangle like this:
Rectangle result = new Rectangle(pageSize.getLeft(), finder.getLly(), pageSize.getRight(), finder.getUry());
This only trims top and bottom empty space:
Depending on your input data pool and requirements this might suffice.
Or you can use some other heuristics depending on your knowledge on the input data. If you know something about the positioning of text (e.g. the heading to always be centered and some other text to always start at the left), you can easily extend the TextMarginFinder to take advantage of this knowledge.
Recent (April 2015, iText 5.5.6-SNAPSHOT) improvements
The current development version, 5.5.6-SNAPSHOT, extends the parser package to also include vector graphics parsing. This allows for an extension of iText's original TextMarginFinder class implementing the new ExtRenderListener methods like this:
#Override
public void modifyPath(PathConstructionRenderInfo renderInfo)
{
List<Vector> points = new ArrayList<Vector>();
if (renderInfo.getOperation() == PathConstructionRenderInfo.RECT)
{
float x = renderInfo.getSegmentData().get(0);
float y = renderInfo.getSegmentData().get(1);
float w = renderInfo.getSegmentData().get(2);
float h = renderInfo.getSegmentData().get(3);
points.add(new Vector(x, y, 1));
points.add(new Vector(x+w, y, 1));
points.add(new Vector(x, y+h, 1));
points.add(new Vector(x+w, y+h, 1));
}
else if (renderInfo.getSegmentData() != null)
{
for (int i = 0; i < renderInfo.getSegmentData().size()-1; i+=2)
{
points.add(new Vector(renderInfo.getSegmentData().get(i), renderInfo.getSegmentData().get(i+1), 1));
}
}
for (Vector point: points)
{
point = point.cross(renderInfo.getCtm());
Rectangle2D.Float pointRectangle = new Rectangle2D.Float(point.get(Vector.I1), point.get(Vector.I2), 0, 0);
if (currentPathRectangle == null)
currentPathRectangle = pointRectangle;
else
currentPathRectangle.add(pointRectangle);
}
}
#Override
public Path renderPath(PathPaintingRenderInfo renderInfo)
{
if (renderInfo.getOperation() != PathPaintingRenderInfo.NO_OP)
{
if (textRectangle == null)
textRectangle = currentPathRectangle;
else
textRectangle.add(currentPathRectangle);
}
currentPathRectangle = null;
return null;
}
#Override
public void clipPath(int rule)
{
}
(Full source: MarginFinder.java)
Using this class to trim the white space results in
which is pretty much what one would hope for.
Beware: The implementation above is far from optimal. It is not even correct as it includes all curve control points which is too much. Furthermore it ignores stuff like line width or wedge types. It actually merely is a proof-of-concept.
All test code is in TestTrimPdfPage.java.

Link with ampersands in a pdf generated by xmlworker

When I create a link with ampersands.
In the generated pdf instead of &,I have &
Because of this, the link is broken
I work on ASP.NET project with itextsharp and xmlworker.
I tested also in the demo http://demo.itextsupport.com/xmlworker/ and I see the same problem.
SOLUTION that works for me :
// we create the reader
var reader = new PdfReader(new FileStream(path, FileMode.Open));
// we retrieve the total number of pages
var n = reader.NumberOfPages;
for (var page = 1; page <= n; page++)
{
//Get the current page
var pageDictionary = reader.GetPageN(page);
//Get all of the annotations for the current page
var annots = pageDictionary.GetAsArray(PdfName.ANNOTS);
//Loop through each annotation
if ((annots != null) && (annots.Length != 0))
foreach (var a in annots.ArrayList)
{
//Convert the itext-specific object as a generic PDF object
var annotationDictionary = (PdfDictionary)PdfReader.GetPdfObject(a);
//Make sure this annotation has a link
if (!annotationDictionary.Get(PdfName.SUBTYPE).Equals(PdfName.LINK))
continue;
//Make sure this annotation has an ACTION
if (annotationDictionary.Get(PdfName.A) == null)
continue;
//Get the ACTION for the current annotation
var annotationAction = (PdfDictionary)annotationDictionary.Get(PdfName.A);
//Test if it is a URI action (There are tons of other types of actions, some of which might mimic URI, such as JavaScript, but those need to be handled seperately)
if (!annotationAction.Get(PdfName.S).Equals(PdfName.URI)) continue;
var destination = annotationAction.GetAsString(PdfName.URI).ToString();
destination = destination.Replace("&", "&");
annotationAction.Put(PdfName.URI, new PdfString(destination));
}
}
You should use URL encoding for those special ASCII characters. For example, '&' should be replaced by '%26'. Here's where you can find a full list of these codes http://www.w3schools.com/tags/ref_urlencode.asp

Download font .ttf file from web and store on iPhone

Is it possible to download .ttf file from web and store it on iPhone. Then use that for for labels and all other stuff ? Because my client want to control fonts from database and don't want to just drop fonts to xcode project right away.
So in future if he wants to change font, he will add new font to database, app will recognize new font on web (thats already done with images, not a problem), download it and use as font.
Thanks.
Actually it is possible to dynamically add fonts to the iOS runtime like this:
NSData *fontData = /* your font-file data */;
CFErrorRef error;
CGDataProviderRef provider = CGDataProviderCreateWithCFData((CFDataRef)inData);
CGFontRef font = CGFontCreateWithDataProvider(provider);
if (! CTFontManagerRegisterGraphicsFont(font, &error)) {
CFStringRef errorDescription = CFErrorCopyDescription(error)
NSLog(#"Failed to load font: %#", errorDescription);
CFRelease(errorDescription);
}
CFRelease(font);
CFRelease(provider);
Source: This Blog Article of Marco Arment.
It is possible. I created an example swift project in github. You have to just add the few line below.
var uiFont : UIFont?
let fontData = data
let dataProvider = CGDataProviderCreateWithCFData(fontData)
let cgFont = CGFontCreateWithDataProvider(dataProvider)
var error: Unmanaged<CFError>?
if !CTFontManagerRegisterGraphicsFont(cgFont, &error)
{
print("Error loading Font!")
} else {
let fontName = CGFontCopyPostScriptName(cgFont)
uiFont = UIFont(name: String(fontName) , size: 30)
}
Github project link
The fonts have to be set in the plist of your app, and that file cannot be changed during runtime, so you need to compile your project with the fonts already added to it.
You'll have to think in other way of implementing it.
You could use FontLabel (https://github.com/vtns/FontLabel) or smth. similar to load ttfs from the file system. I don't think that you can use downloaded fonts with a UILabel. Because you need the plist entries for each font.
Swift 4 solution by extension:
extension UIFont {
/**
A convenient function to create a custom font with downloaded data.
- Parameter data: The local data from the font file.
- Parameter size: Desired size of the custom font.
- Returns: A custom font from the data. `nil` if failure.
*/
class func font(withData data: Data, size: CGFloat) -> UIFont? {
// Convert Data to NSData for convenient conversion.
let nsData = NSData(data: data)
// Convert to CFData and prepare data provider.
guard let cfData = CFDataCreate(kCFAllocatorDefault, nsData.bytes.assumingMemoryBound(to: UInt8.self), nsData.length),
let dataProvider = CGDataProvider(data: cfData),
let cgFont = CGFont(dataProvider) else {
print("Failed to convert data to CGFont.")
return nil
}
// Register the font and create UIFont.
var error: Unmanaged<CFError>?
CTFontManagerRegisterGraphicsFont(cgFont, &error)
if let fontName = cgFont.postScriptName,
let customFont = UIFont(name: String(fontName), size: size) {
return customFont
} else {
print("Error loading Font with error: \(String(describing: error))")
return nil
}
}
}
Usage:
let customFont = UIFont.font(withData: data, size: 15.0)

How a font is detected to be bold/italic/plain that is used in PDF

While Extracting Content from PDF using the MuPDF library, i am getting the Font name only not its font-face.
Do i guess (eg.bold in font-name though not the right way) or there is any other way to detect that specific font is Bold/Italic/Plain.
I have used itextsharp to extract font-family ,font color etc
public void Extract_inputpdf() {
text_input_File = string.Empty;
StringBuilder sb_inputpdf = new StringBuilder();
PdfReader reader_inputPdf = new PdfReader(path); //read PDF
for (int i = 0; i <= reader_inputPdf.NumberOfPages; i++) {
TextWithFont_inputPdf inputpdf = new TextWithFont_inputPdf();
text_input_File = iTextSharp.text.pdf.parser.PdfTextExtractor.GetTextFromPage(reader_inputPdf, i, inputpdf);
sb_inputpdf.Append(text_input_File);
input_pdf = sb_inputpdf.ToString();
}
reader_inputPdf.Close();
clear();
}
public class TextWithFont_inputPdf: iTextSharp.text.pdf.parser.ITextExtractionStrategy {
public void RenderText(iTextSharp.text.pdf.parser.TextRenderInfo renderInfo) {
string curFont = renderInfo.GetFont().PostscriptFontName;
string divide = curFont;
string[] fontnames = null;
//split the words from postscript if u want separate. it will be in this
}
}
public string GetResultantText() {
return result.ToString();
}
The PDF spec contains entries which allow you to specify the style of a font. However unfortunately in the real world you will often find that these are absent.
If the font is referenced rather than embeded this generally means you are stuck with the PostScript name for the font. It requires some heuristics but normally the name provides sufficient clues as to the style. It sounds this is pretty much where you are.
If the font is embedded you can parse it and try and find style information from the embedded font program. If it is subsetted then in theory this information might be removed but in general I don't think it will be. However parsing TrueType/OpenType fonts is boring and you may not feel that it is worth it.
I work on the ABCpdf .NET software component so my replies may feature concepts based around ABCpdf. It's just what I know. :-)"

iTextSharp: Convert PdfObject to PdfStream

I am attempting to pull some font streams out of a pdf file (legality is not an issue, as my company has paid for the rights to display these documents in their original manner - and this requires a conversion which requires the extraction of the fonts).
Now, I had been using MUTool - but it also extracts the images in the pdf as well with no method for bypassing them and some of these contain 10s of thousands of images. So, I took to the web for answers and have come to the following solution:
I get all of the fonts into a font dictionary and then I attempt to convert them into PdfStreams (for flatedecode and then writing to files) using the following code:
PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject((PdfObject)cItem.pObj);
PdfName type = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
try
{
int xrefIdx = ((PRIndirectReference)((PdfObject)cItem.pObj)).Number;
PdfObject pdfObj = (PdfObject)reader.GetPdfObject(xrefIdx);
PdfStream str = (PdfStream)(pdfObj);
byte[] bytes = PdfReader.GetStreamBytesRaw((PRStream)str);
}
catch { }
But, when I get to PdfStream str = (PdfStream)(pdfObj); I get the error below:
Unable to cast object of type 'iTextSharp.text.pdf.PdfDictionary'
to type 'iTextSharp.text.pdf.PdfStream'.
Now, I know that PdfDictionary derives from (extends) PdfObject so I am uncertain as to what I am doing incorrectly here. Someone please help - I either need advice on patching this code, or if entirely incorrect, either code to extract the stream properly or direction to a place with said code.
Thank you.
EDIT
My revised code is here:
public static void GetStreams(PdfReader pdf)
{
int page_count = pdf.NumberOfPages;
for (int i = 1; i <= page_count; i++)
{
PdfDictionary pg = pdf.GetPageN(i);
PdfDictionary fObj = (PdfDictionary)PdfReader.GetPdfObject(res.Get(PdfName.FONT));
if (fObj != null)
{
foreach (PdfName name in fObj.Keys)
{
PdfObject obj = fObj.Get(name);
if (obj.IsIndirect())
{
PdfDictionary tg = (PdfDictionary)PdfReader.GetPdfObject(obj);
PdfName type = (PdfName)PdfReader.GetPdfObject(tg.Get(PdfName.SUBTYPE));
int xrefIdx = ((PRIndirectReference)obj).Number;
PdfObject pdfObj = pdf.GetPdfObject(xrefIdx);
if (pdfObj == null && pdfObj.IsStream())
{
PdfStream str = (PdfStream)(pdfObj);
byte[] bytes = PdfReader.GetStreamBytesRaw((PRStream)str);
}
}
}
}
}
}
However, I am still receiving the same error - so I am assuming that this is an incorrect method of retrieving font streams. The same document has had fonts extracted using muTool successfully - so I know the problem is me and not the pdf.
There are at least two things wrong in your code:
You cast an object to a stream without performing this check: if (pdfObj == null && pdfObj.isStream()) { // cast to stream } As you get the error message that you're trying to cast a dictionary to a stream, I'm 99% sure that the second part of the check will return false whereas pdfObj.isDictionary() probably returns true.
You try extracting a stream from PdfReader and you're trying to cast that object to a PdfStream instead of to a PRStream. PdfStream is the object we use to create PDFs, PRStream is the object used when we inspect PDFs using PdfReader.
You should fix this problem first.
Now for your general question. If you read ISO-32000-1, you'll discover that a font is defined using a font dictionary. If the font is embedded (fully or partly), the font dictionary will refer to a stream. This stream can contain the full font information, but most of the times, you'll only get a subset of the glyphs (because that's best practice when creating a PDF).
Take a look at the example ListFontFiles from my book "iText in Action" to get a first impression of how fonts are organized inside a PDF. You'll need to combine this example with ISO-32000-1 to find more info about the difference between FONTFILE, FONTFILE2 and FONTFILE3.
I've also written an example that replaces an unembedded font with a font file: EmbedFontPostFacto. This example serves as an introduction to explain how difficult font replacement is.
Please go to http://tinyurl.com/iiacsCH16 if you need the C# version of the book samples.