EMF GDI hDC to vector PDF - pdf

I have a bunch of base emf files that i play on a graphics surface then use gdi to merge text on the surface with drawstring to create a single page report forms.
The graphics object is then or sent to the printer, or saved as a png and then wrapped in a pdf. (iTextSharp)
I'm looking for a way to keep the pdf vector based but cant find any open source ways of getting direct access to the dc to be able to draw a metafile image.
My current pdf's are around 800k per page, where if I print the same image to a pdf printer (amyuni) its 23k. The only product that i've found is PdfTron which creates a 200k vector based file directly without printing, but is way too expensive because of all its features.
Do any of graphics experts have any suggestions for an easy way to put metadata directly into a pdf?
Thanks
Mike

Related

Embedding PDF graphics in PDF output file programmatically

I am looking for a rough overview of how one would go about embedding graphics (coming from a PDF file) into another PDF file when writing a C++ document processor.
Background: I work on the LilyPond music typesetter, and recently added Cairo output to the system. Now I would like to support adding externally provided graphics to the PDF files that we generate (eg. adding a logo onto page laid out). This is trivial with EPS for PS output.
I can see how you could hook up Poppler to read the PDF, and render the PDF contents onto a Cairo surface, but I wonder if there is a simpler shortcut (eg. embed the PDF file as a binary stream, and then point directly to that stream).
If you need to go via an external route, like reading the PDF and writing it into an existing PDF using Cairo, that would be simpler. To do it manually:
A PDF page consists of a stream of operators for drawing it, and a dictionary of external resources (fonts, images etc.). To stamp one PDF page onto another, you would need to:
a) Find all objects for external resources in the stamp which are needed, and add them to the destination PDF.
b) Convert the page to a "Form Xobject", which is a sort of reusable piece of content. Add this to the /XObjects entry in the destination page, making sure to pick a fresh name.
c) Add some operators to the page content in the destination page to invoke the new xobject
To see how this might work, you could play with -stamp-as-xobject and -postpend-content "/XObjName Do" from section 8.4 of the cpdf manual.
Making this work for arbitrary PDFs is really not for the faint of heart, I'm afraid.

How to measure different coordinates from a PDF file on Windows?

I am looking for a way to measure the coordinates of different rectangles on a PDF file?
Mainly I do have to perform some overprinting on an existing PDF and I need to know the x,y,w,h on where I am supposed to write the texts.
It seems that Preview.app on Mac has this ability but so far I wasn't able to find anything on Windows that does the same.
Please do not confuse this feature with the Measuring Tools from Adobe Reader which are used to measure distance in printed construction stuff, not the PDF page itself.
It seems that the default using of measure is point, so I need something that would allow to select a rectangle and that will tell me the coordinates.
Please do not suggest on exporting as a imagine and using something else to measure the pixels on the image.
Update: http://legacy.activepdf.com/support/knowledgebase/view.cfm?tk=rl&kb=11866 -- PDF Units, that's what I am looking for, something to measure the PDF coordinates in PDF units.
Disclaimer: I work for Atalasoft.
I know you said not to suggest this, but honestly, it's the easiest approach:
If you mean "sweep out a rectangle in the UI and report the coordinates", that's pretty straight forward, but it's going to be a build-your-own type of thing. What you will need are:
A PDF rasterizer (GhostScript, Acrobat, FoxIt, Atalasoft) to get you an image at a specific resolution.
A tool to display that image in a window and let you sweep out a rectangle (this is straight forward winforms type code for .NET, but we have a control that does this out of the box - combining 1 & 2 into one step).
A tool that can look at the structure of a PDF page and report back the crop box (if any) and the media box for each page (iText, DotPdf).
A tool/understanding of matrix transformations to build the matrix that goes from display space into PDF space (and/or vice versa, probably in iText, definitely in DotPdf)
The code flow becomes something like:
For each page:
Open document, pull out crop and media box, rasterize page, build transformation matrix.
Display image, build/hook into event for selection changing.
Push the image viewer rectangle coordinates through the transformation matrix.
Profit.
From a coding point of view (assuming 0 prior knowledge of this, but a decent understanding of linear algebra), from 3 days to a 2 weeks. If I were to write it, it would probably take on the order of a few hours, but I wrote most of our PDF tools and this is pretty easy.
If your goal is to intuit where rectangles are on the page and report back those coordinates, that's also doable, but it decidedly non-trivial in comparison. You need to write code that can rip through a PDF display list and interpret the contents correctly. That means being able to handle all the cumulative matrix transformations, the graphics state changes, the gstate object use, Form XObject placement, and so on. You need to answer the question "what is a rectangle?" because in PDF placement, it could be an re operator, a set of degenerate beziers, a set of lines, an image of a rectangle or (surprise!) a combination of all of the above. Honestly, intuiting anything about the content on a PDF page is a Herculean task.

Script to Cut Adobe Illustrator File into Tiles

I'm creating a Custom Google Map based on an image in an Adobe Illustrator file. I need to cut the file into 256px x 256px PNGs to feed into the Google Maps API.
You can write scripts to automate tasks in Illustrator using ExtendScript, a modified version of JavaScript. I found one example of a script for Photoshop that makes tiles for Google Maps (Hack #68 in this book) but I haven't figured out how to port this over to Illustrator.
The main problem is I can't figure out how to tell Illustrator to isolate 256px x 256px portions of an image. The Photoshop script does this by selecting portions of the image of that size and copying them into a new file, but as far as I know you can't do that in Illustrator.
Any ideas?
I've got no experience writing scripts for Adobe products, but since Illustrator handles vector data, the tiling algorithm is slightly different. There is a Python script for MS VisualEarth that tiles a set of GPS points (demo), maybe you can take some ideas from it.
Another choice may be to (programatically?) render .AI files to .PNG or something similar an then tile it into 256x256px tiles using that PS hack you referenced.

How do I embed a source PDF onto an existing page in a PDF?

I need to programmatically embed an existing PDF (a small graphic) onto a specfic page on an existing PDF. Using iTextSharp I've been able to add a new page containing this embedded PDF, but what need is to modify an existing page by adding this graphic. Is this possible using iTextSharp or any other PDF-generation libarary?
I tend to do this sort of thing using Context, which is a Tex-based layout tool that in integrated into the pdftex Tex/Metapost engine. There's a learning curve involved, and installing Context isn't entirely trivial, but it makes very general programmatic document processing involving PDFs easy once you get the hang of it.
For this problem, you'd define two overlays, with the first overlay being the main PDF that you set to a background, and then on the page you want to change, defining a foreground overlay with a \setlayer command, which contains a single \framed box, which superimposes the second PDF using a \externalfigure command.
The nice thing about Context for this kind of task is that it works with PDF as its internal representation all the way through, so there is no unexpected blow up in file size or deterioration in image quality, which you can get with other tools that convert between formats.

How do I extract vector graphics from a pdf document?

I want to make a tool that extracts vector graphics from a pdf file with the help of a human. e.g. A person opens the pdf document using the tool and then selects the objects that he wants to save as a vector drawing. Are there any tools out there already doing this or any libraries that can be used to write my own tool. Language of the library can be(in decreasing preference) c#, VB.net, python or c/c++.
Perhaps this is a tedious way, but if you print it using the XPS Document Writer, the vector graphics should be there in WPF XAML that you can use. The output document is just a zip archive with the different document elements