PDF - Mass cropping of non-whitespace application - pdf

I have about 400 pdfs with a lot of dead space between the text and the page border.
Usually I'm using govert's pdf cropper to crop all the whitespace, but this time the pdf background color is (darn!) yellow,
and no software which I know (and I've searched for quite a while) can crop non-whitespace
(well, except maybe pdfcrop.pl -a Pearl library which supposedly can remove black spaces).
Anybody knows of a software that can perform such task?
The ideal app, I guess, would have the option to receive specific color to remove,
like rgb(192,192,192).
Thanks in advance.

The reason this is so difficult is that PDF has no concept of paper color or background color. So what you're seeing is not a different background color, but an object (typically a rectangle) painted in that yellow background color.
Most cropping tools simply calculate the bounding box of all objects on the page and then crop away everything outside that bounding box. Of course that doesn't work for your file because the bounding box will include the background rectangle object.
There are potentially a number of directions you could take this:
1) If all pages need to be cropped by the same amount, you could attempt to do cropping that way (simply passing a rectangle to the cropping tool to do the actual cropping).
2) There are tools (callas pdfToolbox - watch it, I'm associated with this tool, Enfocus PitStop...) that allow you to remove objects from a document and this could be done by specifying your yellow color. This would allow you to modify the PDF file by removing the background object and then perform the cropping you want to perform.

Related

Remove white background with Ghostscript when creating a PDF from PS

Is it possible to tell ghostscript to remove white backgrounds when using the pdfwrite-device?
The reason for this is that the generated PDF should be overlaid in further processing over some letterhead also given as PDF.
If the source postscript has the background already set to white then the resulting PDF would also have an explicited white background (achieved by a rect at the beginning of each page set to the complete page size set and filled with non-stroking color 'white'). Thus the generated PDF cannot be overlayed over a second letterhead-PDF. The white background would override the letterhead completely and the letterhead won't appear in the final PDF.
The application generating the postscript output with white background (e.g. some business software) is fix and cannot be changed. So the changes have to be done when processing the postscript output of this software.
No, you cannot remove that with Ghostscript and the pdfwrite device.
If the problem is always produced by the same input the possibly you could write something in PostScript to solve the problem but without seeing an example I can't say for sure.
Note that PostScript doesn't have a 'non-stroking' colour, there's only one colour in PostScript.
Another solution it seems to me would be to change the Z order; put the letterhead on top of the content, rather than putting the content on top of the letter heading.

How to cut the png image as per the shape?

I have no experience on any image processing/editing tool. And I am doing a project, which requires me to use different shapes. I could create different shapes using visio. But however not able to get rid of white background behind. I need only shape not squared white background.Tried online out of my ways but not successfull.
Any help will be greatly appreciated.
Thanks,
Ganesh
Absolutely any image file has to be contained within a rectangular frame, this includes png and SVG.
Some image file formats can have what are called alpha channel backgrounds this allows you to see through transparent areas.
What you want to do is remove the white background to expose the alpha channel background in Photoshop (or similar tool) which can then be saved out as transparent.
For example in Photoshop:
If you open this image directly and have no other layers, double click the layer that says background and OK the confirmation box. This turns your flat image into a layered image
Select the magic wand tool and ensure you have a high tolerance set (3)
with the wand selected click the white area to bring up a marquee around your selection (the white background) and hit delete to remove it.
Your image should now have a chequered background which is the transparency showing through.
If you now go to file > save as and select png, your image should now be saved out with an alpha background.
Please note: There are further optimisations to make if this is for web, including file formats and file size but that is beyond the scope of this question but I encourage you to read up on the Gif format and it's restrictions, the difference between 8bit and 24bit pngs and how to use SVG.
You can do it pretty simply at the command-line using ImageMagick which is free and installed on most Linux distros and is available for OSX and Windows.
Basically, you want to make your whites transparent, so you would do
convert shape.png -transparent white result.png
If your whites are a little bit off-white, you could allow for some variation with a little fuzz as follows:
convert shape.png -fuzz 10% -transparent white result.png
I added the checkerboard background just so you can see it on StackOverflow's white background - it is not really there.
By the way, you may like to trim to the smallest bounding rectangle while you are there:
convert shape.png -fuzz 10% -transparent white -trim result.png
By the way, you can also draw your shapes with ImageMagick:
convert -size 150x150 xc: -fill none -stroke "rgb(74,135,203)" -draw 'stroke-width 90 ellipse 0,0 80,80 30,80' arc.png
See Anthony Thyssen's excellent examples here.

Is there a way to take an image file and make its background transparent via VB .NET?

We have a system where people are being taken a face shot via a DSLR camera. We need the people's images with transparent background. What we're currently doing is taking the image and editing and cropping it in Photoshop, removing the background image with the Magic Eraser tool.
What I am looking for is a way to parse the image and automatically erase the semi-white background we have, along with the resizing and cropping. Is there some kind of library or code sample that does this without requiring manual intervention?
This is a real complex problem. Like the answer below suggested you'll need to do a fuzzy match on each pixel and set it to be transparent but you also need to detected other nearby pixels to make sure they are not close in color. A white tag on the shirt, white eyelids, hair, pale skin reflecting the flash. All are candidates to be removed by any greedy fuzzy logic.
Think about the Magic Wand tool in Photoshop. How good is it at detecting the edges of the person in the picture? Yeah, and that's the top standard of image editing software with thousands of engineering hours behind it.
This is not a feasible request for a Q&A format, and this is one of those things that humans just do better than machine. BUT, that doesn't mean it's not possible, and who knows, you might be the one to do it. Just don't do it in VB.NET please :)
Some pseudo-code to get an idea of what you need to do:
Bitmap faceShot = Bitmap.FromFile(filepath)
foreach pixel in faceShot
//the following line is where the magic happens, you can do any fuzzy match on the color that suits you
//figure out your color range and do a fuzzy match percentage wise
if (pixel between RGB(255,255,255) and RGB(250,235,215)) //white and antique white
pixel.setAlpha=0
endif
end foreach
You could start with this as a starting point for processing a single image,
http://www.java2s.com/Code/VB/2D/ProcessanImageinvertPixel.htm
Basically, if you have a constant background color (like the TV green-screen), it's just a matter of selecting pixels close to the color you are erasing and setting their Alpha level to 0 (transparent). Treating the RGB values like XYZ coordinates, you can do a 3d distance from your background color, and make everything within a certain threshold transparent.
As an improvement, you could also make everything within another threshold semi-transparent so the edges right around hair and stuff like that look softer and less harsh.
Alternatively, you could probably do the same exact thing with good results in Photoshop, as it should support batch processing.
Edit, thinking about it some more, you may want to use a green screen type background as well instead of an off-white one like you stated, as you may make people's eyes transparent. I would definitely try to batch it in Photoshop/Gimp/etc.

PDF Low-level: Invert colors within coordinates

Is it possible to invert the colors within a box (4 sets of coordinates) on a page from within the page's content object code?
My pages consist of simple B&W JBIG2 images and I wish to make the white black and the black white within a small box to highlight something.
As mkl suggests, you may extract the images and change their bits - this might prove to be a little bit of work however. There might be another useful approach here, specifically useful because it would work regardless of what the underlying objects are.
It is possible in PDF to add a transparent object (for example a rectangle) over all underlying objects. In your case you would create a rectangle that you put on top of the images you already have in the page stream.
If you paint this rectangle in white, set it to transparent and choose "Difference" as the transparency blending mode, the net effect should be that the colors underneath your rectangle are inverted.
From the PDF specification: "Painting with white inverts the backdrop colour; painting with black produces no change."
This may be the quickest and most painless way to accomplish what you are looking for...

Is it possible to remove the background of a text block in pdf using ghostscript

I am trying to convert a pdf into tif using ghost script. Is it possible to remove the background (grey color) of a text block (back font color) in a pdf using ghost script? I would like to replace the grey background to white.
Appreciate your help!!
I don't think you'll get a generic solution to your problem because there are many different ways such a background may be coded in your PDF and there is no sure way to distinguish such a background from a rectangular form of some vector image.
PDF essentially offers a set of tools for positioning glyphs and vector graphics in some rectangle (page) to display and some additional tools to add some interactivity (e.g. forms). Thus, a colored background in a PDF generally is created by drawing a line along the edge of the area of the background, fill this form with the desired color, and position glyphs and graphics (text and images) atop it. There are other operators, too, which can be used, though, and many variants of their use, and generally the form created is not marked as background.
In the answer Dingo refers to in his comment a rectangle covering the whole page, actually even a bit more (in case of a fairly common choice of a media box), is drawn (m: move to a corner; 4*l: draw the 4 edge lines; h: close the path; f fill the form).
Thus, please make the PDF in question available for inspection, maybe there is some specific solution for your file.