Is it possible to remove the background of a text block in pdf using ghostscript - pdf

I am trying to convert a pdf into tif using ghost script. Is it possible to remove the background (grey color) of a text block (back font color) in a pdf using ghost script? I would like to replace the grey background to white.
Appreciate your help!!

I don't think you'll get a generic solution to your problem because there are many different ways such a background may be coded in your PDF and there is no sure way to distinguish such a background from a rectangular form of some vector image.
PDF essentially offers a set of tools for positioning glyphs and vector graphics in some rectangle (page) to display and some additional tools to add some interactivity (e.g. forms). Thus, a colored background in a PDF generally is created by drawing a line along the edge of the area of the background, fill this form with the desired color, and position glyphs and graphics (text and images) atop it. There are other operators, too, which can be used, though, and many variants of their use, and generally the form created is not marked as background.
In the answer Dingo refers to in his comment a rectangle covering the whole page, actually even a bit more (in case of a fairly common choice of a media box), is drawn (m: move to a corner; 4*l: draw the 4 edge lines; h: close the path; f fill the form).
Thus, please make the PDF in question available for inspection, maybe there is some specific solution for your file.

Related

Printing multiple pages on one page in landscape orientation

I'm trying to print four landscape-oriented pages of a document in a grid on one page in landscape-orientation using VBA with:
ActiveDocument.PageSetup.Orientation = wdOrientationLandscape
ActiveDocument.PrintOut PrintZoomRow:=2, PrintZoomColumn:=2
This however is printing the four small landscape-oriented pages in a grid on a portrait-oriented page, which leaves them too small and with too much free space between them vertically.
I looked at the documentation for PrintOut, but didn't find anything concerning orientation.
I tried reversing the order of the PrintZooms.
I also tried manually configuring the width and height of the printed paper with PrintZoomPaperWidth and -Height, which lead to the small pages being cut off and the printing one still in portrait mode.
This just doesn't seem to be possible in the current version of Office (2019), neither with code nor the UI.
As a workaround, one could take screenshots, change the orientation to portrait and paste them in rotated 90° or use rotated textboxes in Word.
Alternatively and probably much easier, create a PDF and use a PDF reader capable of printing this way, e.g. Adobe Reader.

Remove white background with Ghostscript when creating a PDF from PS

Is it possible to tell ghostscript to remove white backgrounds when using the pdfwrite-device?
The reason for this is that the generated PDF should be overlaid in further processing over some letterhead also given as PDF.
If the source postscript has the background already set to white then the resulting PDF would also have an explicited white background (achieved by a rect at the beginning of each page set to the complete page size set and filled with non-stroking color 'white'). Thus the generated PDF cannot be overlayed over a second letterhead-PDF. The white background would override the letterhead completely and the letterhead won't appear in the final PDF.
The application generating the postscript output with white background (e.g. some business software) is fix and cannot be changed. So the changes have to be done when processing the postscript output of this software.
No, you cannot remove that with Ghostscript and the pdfwrite device.
If the problem is always produced by the same input the possibly you could write something in PostScript to solve the problem but without seeing an example I can't say for sure.
Note that PostScript doesn't have a 'non-stroking' colour, there's only one colour in PostScript.
Another solution it seems to me would be to change the Z order; put the letterhead on top of the content, rather than putting the content on top of the letter heading.

What is PDF stroking, non-stroking and filling?

I've just started using Apache PDFBox and I'm completely baffled as to what is meant by stroking, non-stroking and filling when applied to text and lines.
Please can someone point me to a reference / guide which explains what these terms mean (for beginners) and what the difference is between them.
Its pretty simple. Consider a rectangle located at 0,0 and 50 units wide and high. That is described as a path with vertices at 0,0 0,50 50,50 and 50,0
Now, if you stroke the path (imagine drawing along the path using a pen) with black. What you get is a black square, the interior of the square is whatever was on the paper before you drew the border (probably nothing, so white).
If you fill the path, you get a filled in square, but no border drawn.
If you fill and stroke the path you get a filled in square with a border. Because the fill and stroke colours can be different you can have the square filled in one colour and the border drawn in another.
See the PDF Reference, section 4.4 "Path Construction and Painting"
Update (by -kp-)
I've copied the following table from the official PDF-1.7 specification:
This table shows the different text rendering modes. Here too, you can stroke or fill or do both to glyph shapes. You can even do neither stroke nor fill, but still define the shapes: that is, you get invisible text -- a very useful mode for placing OCR-ed text on top of a scanned image! It makes the text searchable, copy'n'paste-able and screen-reader aware.
I am currently writing a book The ABC of PDF with iText that introduces you to all these principles.
You are talking about the "Graphics State" and syntax that is used to define objects on a page. This syntax is stored in content streams.
Ignoring "Text State" (a subset of "Graphics State") for the moment, the idea is that you create paths and shapes (shapes are closed paths). These path and shapes can be drawn using stroke and fill operators. If you fill a path, you need to define whether you're using the non-zero winding rule or the even-odd rule (if you've studied geometry at college level, you've already encountered these rules).
Stroke and fill operators will use the colors of the current graphics state. Lines will be drawn using the stroking color. Shapes will be filled using the non-stroking color.
There's much more info in the free ebook you can download from Leanpub.

PDF - Mass cropping of non-whitespace application

I have about 400 pdfs with a lot of dead space between the text and the page border.
Usually I'm using govert's pdf cropper to crop all the whitespace, but this time the pdf background color is (darn!) yellow,
and no software which I know (and I've searched for quite a while) can crop non-whitespace
(well, except maybe pdfcrop.pl -a Pearl library which supposedly can remove black spaces).
Anybody knows of a software that can perform such task?
The ideal app, I guess, would have the option to receive specific color to remove,
like rgb(192,192,192).
Thanks in advance.
The reason this is so difficult is that PDF has no concept of paper color or background color. So what you're seeing is not a different background color, but an object (typically a rectangle) painted in that yellow background color.
Most cropping tools simply calculate the bounding box of all objects on the page and then crop away everything outside that bounding box. Of course that doesn't work for your file because the bounding box will include the background rectangle object.
There are potentially a number of directions you could take this:
1) If all pages need to be cropped by the same amount, you could attempt to do cropping that way (simply passing a rectangle to the cropping tool to do the actual cropping).
2) There are tools (callas pdfToolbox - watch it, I'm associated with this tool, Enfocus PitStop...) that allow you to remove objects from a document and this could be done by specifying your yellow color. This would allow you to modify the PDF file by removing the background object and then perform the cropping you want to perform.

PDF Low-level: Invert colors within coordinates

Is it possible to invert the colors within a box (4 sets of coordinates) on a page from within the page's content object code?
My pages consist of simple B&W JBIG2 images and I wish to make the white black and the black white within a small box to highlight something.
As mkl suggests, you may extract the images and change their bits - this might prove to be a little bit of work however. There might be another useful approach here, specifically useful because it would work regardless of what the underlying objects are.
It is possible in PDF to add a transparent object (for example a rectangle) over all underlying objects. In your case you would create a rectangle that you put on top of the images you already have in the page stream.
If you paint this rectangle in white, set it to transparent and choose "Difference" as the transparency blending mode, the net effect should be that the colors underneath your rectangle are inverted.
From the PDF specification: "Painting with white inverts the backdrop colour; painting with black produces no change."
This may be the quickest and most painless way to accomplish what you are looking for...