AI (Adobe Illustrator) Files Detect Rasterized - adobe-illustrator

Is there a way to detect whether there are any rasterized components to an Adobe Illustrator file? Under normal circumstances such a file can be vector based (in which case it will scale well when the size is increased) but if there's a pasted image in the file, this of course won't scale. Any ideas? Any programming language implementation is welcome although in the end I would be emitting C#...

Reference Illustrator with COM:
bool HasRaster = false;
Illustrator.Application app = new Illustrator.Application();
Illustrator.Document doc = app.Open("/FileName.AI", null, null);
HasRaster = (doc.RasterItems.Count > 0) ? true : false;
app.Quit();

Related

PDF lossy compression

I'm looking for a library or command-line program that can compress PDFs.
Compression speed and file size are very important.
The PDFs are full of very large print-quality images.
Adobe Acrobat does high-quality, fast compression but does not allow "reduced size pdfs" to be saved through a programmatic interface.
Ghostscript does high-quality compression be takes way too long (minutes).
If a commercial library is an option, you could give Amyuni PDF Creator a try. There is .net version (C#/VB.Net etc) and an ActiveX version (for C++/Delphi/VB/PHP etc).
You can iterate through all the objects of each page, pick those who are images, and reduce their size. You have several possibilities there:
Setting a lower compression rate.
Down-sampling (extracting the image, re-sizing it to a lower
resolution, and putting it back in your file)
Combining the previous two.
Here is how the code would look like for the first option, in C#, using Amyuni PDF Creator .Net:
//open a pdf document
document.Open("c:\\temp\\myfile.pdf","");
IacPage page1 = document.GetPage (1);
Amyuni.PDFCreator.IacAttribute attribute = page1.AttributeByName ("Objects");
// listobj is an array list of graphic objects
System.Collections.ArrayList listobj = (System.Collections.ArrayList) attribute.Value;
foreach ( object pdfObj in listobj )
{
if ((IacObjectType)pdfObj.AttributeByName("ObjectType").Value == IacObjectType.acObjectTypePicture)
{
if ((IacImageCompressionConstants)pdfObj.AttributeByName("Compression").Value == IacImageCompressionConstants.acCompressionJPegMedium)
pdfObj.AttributeByName("Compression").Value = IacImageCompressionConstants.acCompressionJPegLow;
if ((IacImageCompressionConstants)pdfObj.AttributeByName("Compression").Value == IacImageCompressionConstants.acCompressionJPegHigh)
pdfObj.AttributeByName("Compression").Value = IacImageCompressionConstants.acCompressionJPegMedium;
// (...)
}
}
usual disclaimer applies
You might want to try Docotic.Pdf library for your task.
Here is a code that scales all images that have width or height greater or equal to 256. Scaled images are then encoded using JPEG compression with quality set to 65.
public static void RecompressToJpeg(string path, string outputPath)
{
using (PdfDocument doc = new PdfDocument(path))
{
foreach (PdfImage image in doc.Images)
{
// image that is used as mask or image with attached mask are
// not good candidates for recompression
if (!image.IsMask && image.Mask == null && (image.Width >= 256 || image.Height >= 256))
image.Scale(0.5, PdfImageCompression.Jpeg, 65);
}
doc.Save(outputPath);
}
}
You could also just recompress images without changing their sizes using one of the RecompressWithJpeg methods (or one of other RecompressXXX methods).
And images can be resized to specified width and height using one of the ResizeTo methods. Please note that you will need to take aspect ratio into account in the latter case.
Disclaimer: I work for the vendor of the library.

WinJS / WinRT: detect corrupt image file

I'm building a Win8/WinJS app that loads pictures from the local pictures library. Everything is generally working fine for loading valid images and displaying them in a list view.
Now I need to detect corrupt images and disable parts of the app for those images.
For example, open a text file and enter some text in it. Save the file as .jpg, which is obviously not going to be a valid jpg image. My app still loads the file because of the .jpg name, but now I need to disable certain parts of the app because the image is corrupt.
Is there a way I can check to see if a given image that I've loaded is a valid image file? To check if it's corrupt or not?
I'm using standard WinRT / WinJS objects like StorageFile, Windows.Storage.Search related objects, etc, to load up my image list based on searches for file types.
I don't need to filter out corrupt images from the search results. I just need to be able to tell if an image is corrupt after someone selects it in a ListView.
One possible solution would be to check the image's width and height properties to determine whether it is valid or not.
Yeah, the contentType property will return whatever the file extension is. The best way I can find it to look at the image properties:
file.properties.getImagePropertiesAsync()
.done(function(imageProps) {
if(imageProps.width === 0 && imageProps.height === 0) {
// I'm probably? likely? invalid.
});
where SelectImagePlaceholder is an Image Control.. =)
StorageFile file;
using (IRandomAccessStream fileStream = await file.OpenAsync(Windows.Storage.FileAccessMode.Read))
{
try
{
// Set the image source to the selected bitmap
BitmapImage bitmapImage = new BitmapImage();
await bitmapImage.SetSourceAsync(fileStream);
SelectImagePlaceholder.Source = bitmapImage;
//SelectImagePlaceholder.HorizontalAlignment = HorizontalAlignment.Center;
//SelectImagePlaceholder.Stretch = Stretch.None;
this.SelectImagePlaceholder.DataContext = file;
_curMedia = file;
}
catch (Exception ex)
{
//code Handle the corrupted or invalid image
}
}

Page by page conversion of PDF into TIFF with proper compression

Problem
There are PDF documents with different type of objects inside. There are simple texts. There can be scanned images that are B&W, and also other images, that are true color. The resolution can be quite high for both (~1789X2711).
I need to convert the PDF into a set of single page TIFF files. There are quite good tools for that. For example Irfanview, ImageMagick. The problem is that I have to define a single compression type for all the pages.
Using JPG for all pages would result in loosing details for B&W images and they would be huge compared to lossless fax compression.
Using lossless fax for all would wanish colors and details of true color images.
Idea
It would be nice to examine the PDF page by page. I could check the content of the page. What kind of images are there inside, and which compression is recommanded for the particular page. I think this can be done with IText, but I don't know exactly, how it should be done. A second thing is that I want to do this analysis without fully reading the PDF file. Is it possible?
Maybe the fastest solution would be to create a list of pages for each compression type with IText analysis, and then to call Irfanview to process the choosen pages with the proper compression.
Any ideas and recommendations are welcome.
UPDATE:
I have now an answer. It does not cover all requirements, and its not freeware. Any opensource ideas? Maybe Java based solutions?
This can be done with DotImage DotPdf from Atalasoft (cue the obligatory "I work there and work on these products"). Here is how I would do this task in C#:
PdfImageSource source = new PdfImageSource(pdfStream);
while (source.HasMoreImages()) {
AtalaImage image = source.AcquireNext();
string fileName = GetNextTiffName();
using (FileStream outStm = new FileStream(fileName, FileMode.Create)) {
TiffEncoder encoder = new TiffEncoder();
encoder.Compression = SelectCompression(image.PixelFormat);
image.Save(outStm, encoder, null);
}
source.Release(image);
}
private TiffCompression SelectCompression(PixelFormat pf)
{
switch (pf) {
// 1 bit? use CCITT G4
case PixelFormat.Pixel1bbIndexed: return TiffCompression.Group4FaxEncoding;
// 24 bit? use JPEG
case PixelFormat.Pixel24bppBgr: return TiffCompression.JpegCompression;
// all else, Lzw
default: return TiffCompression.Lzw;
}
}
You can make SelectCompression do pretty much whatever you want. If you select an invalid compression for that pixel format, the encoder will use an appropriate lossless one in its place (for example, if you select CCITT for 24bit color, the encoder will instead use Lzw).
Our PDF decoder knows when a PDF page is just gray and returns a gray image. It does NOT do anything to get you to 1 bit (this is so antialiased text looks good), however you could threshold the gray image and look at the overall differences between it and the gray image to determine if it could go to 1 bit).
Here's how you could do a set of pages:
public void ExtractNPages(Stream pdfStream, params int[] pageIndexes)
{
PdfImageSource source = new PdfImageSource(pdfStream);
for (int i in pageIndexes) {
AtalaImage image = source[i]; // implied Acquire
string fileName = GetNextTiffName();
using (FileStream outStm = new FileStream(fileName, FileMode.Create)) {
TiffEncoder = new TiffEncoder();
encoder.Compression = SelectCompression(image.PixelFormat);
image.Save(outStm, encoder, null);
}
source.Release(image);
}
}
so now you can just do ExtractNPages(stm, 0, 2, 4, 6);

Some pdf file watermark does not show using iText

Our company using iText to stamp some watermark text (not image) on some pdf forms. I noticed 95% forms shows watermark correctly, about 5% does not. I tested, copy 2 original pdf files, one was marked ok, other one does not ok, then tested in via a small program, same result: one got marked, the other does not. I then tried the latest version of iText jar file (version 5.0.6), same thing. I checked pdf file properties, security settings etc, seems nothing shows any hint. The result file does changed size and markd "changed by iText version...." after executed program.
Here is the sample watermark code (using itext jar version 2.1.7), note topText, mainText, bottonText parameters passed in, make 3 lines of watermarks show in the pdf as watermark.
Any help appreciated !!
public class WatermarkGenerator {
private static int TEXT_TILT_ANGLE = 25;
private static Color MEDIUM_GRAY = new Color(160, 160, 160);
private static int SUPPORT_FONT_SIZE = 42;
private static int PRIMARY_FONT_SIZE = 54;
public static void addWaterMark(InputStream pdfInputStream,
OutputStream outputStream, String topText,
String mainText, String bottomText) throws Exception {
PdfReader reader = new PdfReader(pdfInputStream);
int numPages = reader.getNumberOfPages();
// Create a stamper that will copy the document to the output
// stream.
PdfStamper stamp = new PdfStamper(reader, outputStream);
int page=1;
BaseFont baseFont =
BaseFont.createFont(BaseFont.HELVETICA_BOLDOBLIQUE,
BaseFont.WINANSI, BaseFont.EMBEDDED);
float width;
float height;
while (page <= numPages) {
PdfContentByte cb = stamp.getOverContent(page);
height = reader.getPageSizeWithRotation(page).getHeight() / 2;
width = reader.getPageSizeWithRotation(page).getWidth() / 2;
cb = stamp.getUnderContent(page);
cb.saveState();
cb.setColorFill(MEDIUM_GRAY);
// Top Text
cb.beginText();
cb.setFontAndSize(baseFont, SUPPORT_FONT_SIZE);
cb.showTextAligned(Element.ALIGN_CENTER, topText, width,
height+PRIMARY_FONT_SIZE+16, TEXT_TILT_ANGLE);
cb.endText();
// Primary Text
cb.beginText();
cb.setFontAndSize(baseFont, PRIMARY_FONT_SIZE);
cb.showTextAligned(Element.ALIGN_CENTER, mainText, width,
height, TEXT_TILT_ANGLE);
cb.endText();
// Bottom Text
cb.beginText();
cb.setFontAndSize(baseFont, SUPPORT_FONT_SIZE);
cb.showTextAligned(Element.ALIGN_CENTER, bottomText, width,
height-PRIMARY_FONT_SIZE-6, TEXT_TILT_ANGLE);
cb.endText();
cb.restoreState();
page++;
}
stamp.close();
}
}
We solved problem by change Adobe LifecycleSave file option. File->Save->properties->Save as, then look at Save as type, default is Acrobat 7.0.5 Dynamic PDF Form File, we changed to use 7.0.5 Static PDF Form File (actually any static one will work). File saved in static one do not have this watermark disappear problem. Thanks Mark for pointing to the right direction.
You're using the underContent rather than the overContent. Don't do that. It leaves you at the mercy of big, white-filled rectangles that some folks insist on drawing first thing. It's a hold over from less-than-good PostScript interpreters and hasn't been necessary for Many Years.
Okay, having viewed your PDF, I can see the problem is that this is an XFA-based form (from LiveCycle Designer). Acrobat can (and often does) rebuild the entire file based on the XFA (a type of xml) it contains. That's how your changes are lost. When Acrobat rebuilds the PDF from the XFA, all the existing PDF information is pitched, including your watermark.
The only way to get this to work would be to define the watermark as part of the XFA file contained in the PDF.
Detecting these forms isn't all that hard:
PdfReader reader = new PdfReader(...);
AcroFields acFields = reader.getAcroFields();
XfaForm xfaForm = acFields.getXfaForm();
if (xfaForm != null && xfaForm.isXfaPresent()) {
// Ohs nose.
throw new ItsATrapException("We can't repel XML of that magnitude!");
}
Modifying them on the other hand could be Quite Challenging, but here's the specs.
Once you've figured out what needs to be changed, it's a simple matter of XML manipulation... but that "figure it out" part could be interesting.
Good hunting.

How do I figure out the font family and the font size of the words in a pdf document?

How do I figure out the font family and the font size of the words in a pdf document? We are actually trying to generate a pdf document programmatically using iText, but we are not sure how to find out the font family and the font size of the original document which needs to be generated. document properties doesn't seem to contain this information
Fonts are stored in the catalog (I suppose in a sub-catalog of type font). If you open a pdf as a text file, you should be able to find catalog entries (they begin and end with "<<" and ">>" respectively.
On a simple pdf file, i found the following:
<</Type/Font/BaseFont/Helvetica-Bold/Subtype/Type1/Encoding/WinAnsiEncoding>>
thus searching for the prefix should help you (in some pdf files, there are spaces between
the commponents but '/Type /Font' should be ok).
Of course this is a manual process, while you would probably prefer an automatic one.
On another note, we sometime use identifont or what the font to find uncommon fonts that give us problem (logo font).
regards
Guillaume
Edit : the following code will find all font in the pages. To be short, you search the dictionnary of each page for the subdictionnary "ressource" and then the subdictionnary "font". Each entry in the later is a font dictionnary, describing a font.
PdfReader reader = new PdfReader(
new FileInputStream(new File("file.pdf")));
int nbmax = reader.getNumberOfPages();
System.out.println("nb pages " + nbmax);
for (int i = 1; i <= nbmax; i++) {
System.out.println("----------------------------------------");
System.out.println("Page " + i);
PdfDictionary dico = reader.getPageN(i);
PdfDictionary ressource = dico.getAsDict(PdfName.RESOURCES);
PdfDictionary font = ressource.getAsDict(PdfName.FONT);
// we got the page fonts
Set keys = font.getKeys();
Iterator it = keys.iterator();
while (it.hasNext()) {
PdfName name = (PdfName) it.next();
PdfDictionary fontdict = font.getAsDict(name);
PdfObject typeFont = fontdict.getDirectObject(PdfName.SUBTYPE);
PdfObject baseFont = fontdict.getDirectObject(PdfName.BASEFONT);
System.out.println(baseFont.toString());
}
}
The name (variable "name" in the following code) is what is used in the text to change font. In the PDF, you'll have to find it next to a text. The following number is the size. Here for example, it's size 12. (sorry, still no code for this part).
BT
/F13 12 Tf
288 720 Td
the text to find Tj
ET
Depending on the PDF, if it hasn't been outlined you may be able to open it in Adobe Illustrator, double click the text and select some of it to see it's font family, size, etc.
If the text is outlined then use one of those online tools that PATRY suggests to find out the font.
Good luck
If you have Adobe Acrobat you can see the fonts inside and examine the objects and text streams. I wrote a blog post on this at http://pdf.jpedal.org/java-pdf-blog/bid/10479/Viewing-PDF-objects