how to change all colours in a PDF to their respective complimentary colours; how to make a PDF negative - pdf

How can all colours in a PDF be changed to their compliments? So, I mean that a document consisting of black text on a white background would be changed to a document consisting of white text on a black background. Any red colours in a document would be changed to turquoise colours and so on. Is there some standard utility that could be used for this purpose or am I likely to have to contrive some awkward ImageMagick image conversions?
EDIT: Here's a very manual way of doing this using ImageMagick:
convert -density 300 -quality 100 "${fileName}" tmp.png
mogrify -flatten *.png
mogrify -negate *.png
convert *.png "${fileName}"_1.pdf
EDIT: I changed the wording for the purposes of clarity.

I can think of at least 3 ways to invert (negate, compliment) colors of a PDF page description -- I mean treating page content as a black box, therefore not counting direct diving into page content and messing around as per Dingo's answer. Unfortunately, ready-made free tools (Ghostscript, mainly) provide incomplete solution and require manual intervention.
Note, that all specific terms used below require at least some knowledge of basics of PDF and Postscript Language References and are presented here somewhat as a simplification, please refer to manuals or google for thorough description.
The most obvious method is to use inverting Transfer function. Transfer
function (TF) expects an argument in the range 0..1 which is (additive) color
component, and returns color value, too. Negating TF is, of course, {1 sub neg} and is easy to inject:
gs -q -sDEVICE=pdfwrite -o out.pdf -c '{1 sub neg} settransfer' -f in.pdf
That's great, and Adobe Reader displays our out.pdf as (see below) negated. But here 'greatness' ends. All other viewers ignore TF, (probably) considering it to be device-dependent and actually present in PDF as a compensation for output device pecularities (non-linear printer response etc.) and therefore something to be ignored when displaying PDF on-screen. Further, depending in Reader's version, negation of black-on-white text leads to either white-on-black or yellowish-on-black text. And that's not great.
Therefore, we need not only TF injection, but the way to properly apply TF to
PDF content before viewing. And, regardless of ps2pdf Ghostscript's manual saying:
Currently, the transfer function is always applied
(of three options: Apply, Preserve, Remove)
using current 9.10 version, I couldn't make Ghostscript to actually apply TF (i.e. modify page description operators) when outputting to high-level (pdfwrite, as opposing to image output) devices. Maybe I'm missing something here.
But, Adobe Distiller, with proper options set, does apply TF to input postscript file.
Somewhat related to TF is the use of inverting Device-Link color profiles,
which are simple identity DL profiles with inverting (input or output) curves.
That's an interesting use of interesting technology, but, again, Ghostscript currently doesn't support proper Color Management (and DL profiles) in PDF-2-PDF workflows. Moreover, Adobe Acrobat doesn't know what to do with DL profiles, their use within Acrobat requires expensive third-party plugins.
If PDF viewer (renderer) claims to support 1.4 and transparency (they all do, nowadays), that's another way to go. PDF Reference says, that if current Blending Mode is Difference and we paint with white, it effectively means inverting backdrop. So, we explicitly paint background with white (if there's no background then there's nothing to invert), then put our current content (treating it as black box), then set Blending Mode to Difference and paint on top with white. Is that clear? Again, I had no success setting Blending Mode using Ghostscript, with:
[ /BM /Difference /SetTransparency pdfmark
It works OK with Distiller but is ignored by Ghostscript. Maybe (again) I'm missing something.
OK, to round up (the answer's getting somewhat long), here's Perl solution for 3d method using proper API (programming site, isn't it. Any programming language and appropriate API will do):
use strict;
use warnings;
use PDF::API2;
use PDF::API2::Basic::PDF::Utils;
my $pdf = PDF::API2->open('adobe_supplement_iso32000.pdf');
for my $n (1..$pdf->pages()) {
my $p = $pdf->openpage($n);
$p->{Group} = PDFDict();
$p->{Group}->{CS} = PDFName('DeviceRGB');
$p->{Group}->{S} = PDFName('Transparency');
my $gfx = $p->gfx(1); # prepend
$gfx->fillcolor('white');
$gfx->rect($p->get_mediabox());
$gfx->fill();
$gfx = $p->gfx(); # append
$gfx->egstate($pdf->egstate->blendmode('Difference'));
$gfx->fillcolor('white');
$gfx->rect($p->get_mediabox());
$gfx->fill();
}
$pdf->saveas('out.pdf');
Here I take one of Adobe documents and invert it.
What's important: page should have transparency blending space set to RGB explicitly, because Adobe Reader defaults to CMYK, and inverting colors in CMYK you probably don't want. Pure CMYK black 0-0-0-100 inverts to 100-100-100-0, that's (nearly) black, too. RGB black gives something like 70-60-50-70 CMYK that inverts to brown 30-40-50-30, and you don't want that. That's why I add Group entry to pages dictionaries.

Your question seems to be very similar to this:
Change background color of pdf
but you also want to change the colour of the text.
so you can follow the workflow I suggested some time ago for the same task:
----------------
vector pdf background (meaning not raster image) in pdf files can be
easily changed in a couple of steps (see also my stackoverflow answer that
now I'll extend and improve
Change background color of pdf
PRELIMINAR CHECK:
open your pdf file with an editor able to show the internal pdf structure,
like
notepad++
- http://notepad-plus-plus.org/download/v6.1.8.html
and verify if you can see code snippets like
0.000 0.000 0.000 rg (it means *black*)
1.000 1.000 1.000 rg (it means *white*)
and so on...
(code snippet can change, for instance, in pdf produced by openoffice
internal pdf exporting feature, the same code snippepts are in this forms:
0 0 0 rg (it means *black*)
1 1 1 rg (it means *white*)
and so on...
if you are able to see these code snippets, then you can start to change
values, otherwise, you need to decompress text streams
you can perform this task with
pdftk
http://www.pdflabs.com/docs/install-pdftk/
pdftk file.pdf output uncompressed.pdf uncompress
and recompress after finished changes
pdftk uncompressed.pdf output recompressed.pdf compress
now, if you see these code snippets, you can change values
STEP 1 (for pdf editing) -
the first thing you need is to find the right equivalence between RGB
color values of text and background and the internal pdf represerntation
of same colors
Since it seems you are a windowsian inhabitant from the third planet in
the Microsoft constellation, you can use a free color picker like this
http://www.iconico.com/download.aspx?app=ColorPic&type=free
to identify the rgb values of text and background colors
once you have these values, you need to convert into special internal pdf
representation
to do this take i mind this proportion:
1:255=x:color you selected
for instance: let say you have this RGB triplet for background:
30,144,255
to know correspondent values in pdf in order to insert in code snippet to
change pdf background color, you do: (you can use http://
www.wolframalpha.com/ to compute with precision)
1:255=x:30 = 30/255 = 0.117 (approximated to first three decimals)
1:255=x:144 = 144/255 = 0.564 (approximated to first three decimals)
1:255=x:255 = 255/255 = 1
so, the whole triplet in pdf, corresponding to RGB 30,144,255, will be:
0.117 0.564 1.000
STEP 2 (for pdf editing)
we look for 0.117 0.564 1.000 in pdf file with notepad++ (wrap around
and match one word only need to be checked) and we found the internal
pdf representation of background and we can change from azure to, let say,
white
1.000 1.000 1.000
or
1 1 1
but, since you wrote about black background, to be more precise, I
created a sample pdf with white background and black text
http://ge.tt/1N7Vuz91/v/0
since we know that 0.000 0.000 0.000 rg means black, we look for this
and we can change from 0.000 0.000 0.000 rg, to 1.000 1.000 1.000 rg
(white) BUT...
at same time, if, your text is black, nd you want change its color to white, you need also to change first the text from black to other color, otherwise it will be invisible, white on white
so, we cannot simply change directly white background to black, at once,
since doing this, we have not a difference between color text and
background values
and then we act as follows:
we change white background from 1.000 1.000 1.000 into something like
0.5 0.5 0.5 (light grey)
http://ge.tt/1N7Vuz91/v/1 (resulting pdf - intermediate step)
then looking for
0.000 0.000 0.000 (black text) and change to **white**
1.000 1.000 1.000
resulting intermediate pdf file:
http://ge.tt/1N7Vuz91/v/2
finally, we change again the color of background from
0.5 0.5 0.5 (light grey)
to black
0.000 0.000 0.000
and we have now a vector pdf with white text and black background
http://ge.tt/1N7Vuz91/v/3
please, remember to
1 - compress again this pdf you mmodified if you uncompressed with pdftk
2 - repair
pdftk file.pdf output fixed.pdf
there is another way, starting from postscript, to perform the same task,
but being you a windowsian, I guess the postscript way is the harder way
for you, but if someone (a linuxian from Torvald constellation) is
interested I can explain how do the same thing in postscript
not in this post to avoid to be too verbose
give a feedback, please, and feel free to ask more

Related

Gimp image color mode

I use Gimp, I have a color pallet which contain 556 colors (embroidery related) , but I don't know how to use all those colors in my working image because the index color mode only support maximum 256 colors... what's the solution I have?
All the places in Gimp where the number of colors is limited seem to have an upper limit of 256 colors: indexed mode, color palettes, Posterize filter...
If you want to limit yourself to the 556 colors of your palette, create an image with 556 squares, each painted with one of your 556 colors and save it somewhere. Then when needed open it in Gimp together with your work image, and use the color picker to sample colors from it.
If you want to shoehorn an existing image into your 556 color palette, then you can use the ImageMagick toolbox for this:
Prepare an image with only your 556 colors (as a PNG file, you have to avoid JPEG because the compression will introduce extra colors). This will be your "color map". There is no need for a special format layout, the only important thing is that it contains only your 556 colors (to check in Gimp: Colors > Info > Colorcube analysis, with IM: identify -verbose ColorMap.png and check the Colors line)
Execute the command
convert Source.png -remap ColorMap.png Reduced.png
where:
Source.png is your original image, with likely thousands of colors. It can be any format (JPG, PNG, TIFF...)
ColorMap.png is the map you prepared above
Reduced.png is the color-reduced image. It has to be in a format where pixel colors are preserved exactly (so, PNG in your case, for simplicity(*))
In recent versions, convert is replaced by magick or magick convert
So for instance, starting with:
And applying this 512-colors colormap
You obtain this:
Note that the color-reduced image can contain much less than the 556 colors (190 colors in the image above, despite the 512 colors colormap) (you won't have bright reds in Mona Lisa).
The whole thing is documented here.
After trying the process a few times, I find that given a good palette it works quite well, so if your 556 colors make up good palette, you could make your workflow a lot simpler, by working in full RGB all the time, and then converting the image to 556 colors.
(*) TIFF and WebP formats also support exact colors/lossless compression, but they still have variants that will do a JPEG-like compression that will change the colors, so they must be used with care.

PDF to EPS or PS to EPS conversion maintaining page size

I need to convert a PDF or Postscript file to EPS, I tried using Ghostscript with the following command to convert Postscript to EPS:
gswin32.exe -o output.eps -sDEVICE=eps2write -dFitPage input.ps
Or PDF to EPS:
gswin32c.exe -q -dNOCACHE -dNOPAUSE -dBATCH -dSAFER -sDEVICE=eps2write -o output.eps -dFitPage input.pdf
They both complete successfully but they are not maintaining the page size. The input PDF or PS files are the same drawings and they both a page size of 300x300pts. You can download these files here and here. They look like this:
But after converting them to EPS the results are these, PS to EPS and PDF to EPS. They look like this, the first one is the result from PS to EPS and the second one is the result from PDF to EPS (they are opened using EPS Viewer that rasterizes the image that's the reason for the low quality):
As you can see, none of them have the original 300x300 pts size, I've tried many Ghostscript options but I can't manage to get an EPS with the right Bounding Box. I just need to convert a PDF OR PS to EPS, whatever is easier or gives better results.
What you are asking for is, more or less, the exact opposite of what is normally required.
In general people want the EPS Bounding Box to be as tight as possible to the actual marks made by the EPS, because the normal use for an EPS file is to 'embed' it in another document. If you want extra white space you would normally add it around the EPS when you embed it.
Indeed, the EPS specification says that the BoundingBox comment should not include the white space. On page 8 of the EPSF specification:
"For an EPS file, the bounding box is the smallest rectangle that encloses all the marks painted on the single page of the EPS file"
Messing with Ghostscript switches isn't going to do anything helpful for you here, the device explicitly records the marks that are made by the input, and sets the BoundiongBox from those.
Perhaps if you were to explain why you want to have an EPS file with incorrect BoundingBox comments it would be possible to make some suggestions, but Ghostscript is doing exactly what it should do here.
[addendum]
(see comment below, this is in reply)
I suspect you need to change your process in some way then. One solution is to have the PDF start by filling the entire page with white. Contrary to many people's expectations that counts as making a mark on the page so the entire page would then be considered as the BoundingBox.
As long as you are using the Ghostscript eps2write device you could also parse the document for %%BeginPageSetup, the eps2write device still writes the original document size out in this section, Eg:
%!PS-Adobe-3.0 EPSF-3.0
%%Invocation: path/gswin32c -dDisplayFormat=198788 -dDisplayResolution=96 -sDEVICE=eps2write -sOutputFile=? ?
%%BoundingBox: 101 132 191 256
%%HiResBoundingBox: 101.80 132.80 190.30 255.20
%%Creator: GPL Ghostscript GIT PRERELEASE 951 (eps2write)
....
....
%%EndProlog
%%Page: 1 1
%%BeginPageSetup
4 0 obj
<</Type/Page/MediaBox [0 0 300 300]
/Parent 3 0 R
/Resources<</ProcSet[/PDF]
>>
/Contents 5 0 R
>>
endobj
%%EndPageSetup
You can see here that the original media size was 300x300, even though the BoundingBox correctly reflects the marks made on the page. Note! This is characteristic of EPS files produced by the current version of eps2write, it won't work for EPS files from other sources and may not work with eps2write in the future.
Other than that you're stuck with finding the media size from the input and passing it separately to the program doing the insertion, presumably by putting the data in some other text file to accompany the EPS. Or, of course, manually or programmatically editing the urx,ury co-ordinates of the BoundingBox.
Ghostscript isn't going to do this for you I'm afraid.

I need detect the approximate location of QR code in scanned image (PDF converted to PNG)

I have many scanned document in PDF.
I use ImageMagick with Ghostscript to convert PDF to PNG in big density. I use convert -density 288 2.pdf 2.png. After that I read the pixels with PHP and find where is QR code and decode it. Because image is very big (~ 2500px), it's need very much RAM. I want, before I read pixels with PHP, to crop the image with ImageMagick and leave only that part with the QR code.
Can I detect the approximate location of QR code with ImageMagick, crop and leave only that part ?
Sample PDF
Converted PNG
Further Update
I see your discussion with Kurt about better extraction of the image from the PDF in the first place, and his recommendation was to use pdfimages. I just wanted to add that you won't find that if you do brew search pdfimages, but you actually need to use
brew install poppler
and then you get the pdfimages executable.
Updated Answer
If you change the tile size to 100x100 on the crop command and run this for the second PDF you supplied:
convert -density 288 pdf2.pdf -crop 100x100 tile%04d.png
and then use the same entropy analysis command
convert -format "%[entropy]:%X%Y:%f\n" tile*.png info: | sort -n
...
...
0.84432:+600+3100:tile0750.png
0.846019:+600+2800:tile0678.png
0.980938:+700+400:tile0103.png
0.984906:+700+500:tile0127.png
0.988808:+600+400:tile0102.png
0.998365:+600+500:tile0126.png
The last 4 listed tiles are
Likewise for the other PDF file you supplied, you get
0.863498:+1900+500:tile0139.png
0.954581:+2000+500:tile0140.png
0.974077:+1900+600:tile0163.png
0.97671:+2000+600:tile0164.png
which means these tiles
I would think that should help you pretty much approximately locate the QR code.
Original Answer
This is not all that scientific, but it may help you get started. The key, I think, is the entropy of the various areas of the image. The QR code has a lot of information encoded in a small area so it should have high entropy. So, I use ImageMagick to split the image into square 400x400 tiles like this:
convert image.png -crop 400x400 tile%03d.png
which gives me 54 tiles. Then I calculate the entropy of each of the tiles and sort them by increasing entropy, also outputting their offsets from the top left of the frame, and their name, like this:
convert -format "%[entropy]:%X%Y:%f\n" tile*.png info: | sort -n
0.00408949:+1200+2800:tile045.png
0.00473755:+1600+2800:tile046.png
0.00944815:+800+2800:tile044.png
0.0142171:+1200+3200:tile051.png
0.0143607:+1600+3200:tile052.png
0.0341039:+400+2800:tile043.png
0.0349564:+800+3200:tile050.png
0.0359226:+800+0:tile002.png
0.0549334:+800+400:tile008.png
0.0556793:+400+3200:tile049.png
0.0589632:+400+0:tile001.png
0.0649078:+1200+0:tile003.png
0.10811:+1200+400:tile009.png
0.116287:+2000+3200:tile053.png
0.120092:+800+800:tile014.png
0.12454:+0+2800:tile042.png
0.125963:+1600+0:tile004.png
0.128795:+800+1200:tile020.png
0.133506:+0+400:tile006.png
0.139894:+1600+400:tile010.png
0.143205:+2000+2800:tile047.png
0.144552:+400+2400:tile037.png
0.153143:+0+0:tile000.png
0.154167:+400+400:tile007.png
0.173786:+0+2400:tile036.png
0.17545:+400+1600:tile025.png
0.193964:+2000+400:tile011.png
0.209993:+0+3200:tile048.png
0.211954:+1200+800:tile015.png
0.215337:+400+2000:tile031.png
0.218159:+800+1600:tile026.png
0.230095:+2000+1200:tile023.png
0.237791:+2000+0:tile005.png
0.239336:+2000+1600:tile029.png
0.24275:+800+2400:tile038.png
0.244751:+0+2000:tile030.png
0.254958:+800+2000:tile032.png
0.271722:+2000+2000:tile035.png
0.275329:+0+1600:tile024.png
0.278992:+2000+800:tile017.png
0.282241:+400+1200:tile019.png
0.285228:+1200+1200:tile021.png
0.290524:+400+800:tile013.png
0.320734:+0+800:tile012.png
0.330168:+1600+2000:tile034.png
0.360795:+1200+2000:tile033.png
0.391519:+0+1200:tile018.png
0.421396:+1200+1600:tile027.png
0.421421:+2000+2400:tile041.png
0.421696:+1600+2400:tile040.png
0.486866:+1600+1600:tile028.png
0.489479:+1600+800:tile016.png
0.611449:+1600+1200:tile022.png
0.674079:+1200+2400:tile039.png
and, hey presto, the last one listed (i.e. the one with the highest entropy) tile039.png is this one.
I have drawn a rectangle around its location using this command
convert image.png -stroke red -fill none -strokewidth 3 -draw "rectangle 1200,2400 1600,2800" a.jpg
I concede there may be luck involved, but I only have one image to test my mad theories. You may need to tile twice, the second time with an x-offset and y-offset of half a tile width, so that you don't cut the QR code and split it across 2 tiles. You may need different size tiles for different size barcodes. You may need to consider the last 3-5 tiles located for your next algorithm. But I think it could form the basis of a method.

Raw pdf color conversion (with known conversion formula) from RGB to CMYK

This question is related to
Script (or some other means) to convert RGB to CMYK in PDF?
however way more specific. Consider that I am not an expert in print production ;)
Situation: For printing I am only allowed to use two colors, Cyan and Black. The printery requests the final PDF to be in DeviceCMYK with only the Channels C and K used.
pdflatex automatically does that (with the xcolor package) for all fonts and drawn objects, however I have more than 100 sketches/figures in PDF format which are embedded in the manuscript. Due to an admittedly badly designed workflow (late realization that Inkscape cannot export CMYK PDFs), all these figures were created in Inkscape, and thus are RGB PDFs.
However, the only used colors within Inkscape were RGB complements of CMY(K), e.g. 100% Cyan is (0,255,255) RGB and 50% K is (127,127,127) etc.
Problem: I need to convert all these PDF figures from RGB to DeviceCMYK (or alternatively the whole PDF of the final manuscript) with a specific conversion formula.
I did a lot of google research and tried the often suggested ways of using e.g. Ghostscript or various print production tools in Adobe Acrobat, however all of the conversion techniques I found so far wanted to use ICC color profiles or used some other conversion strategy which filled the channels MY and spared some C and K, for example.
I know the exact conversion formula for the raw color numbers from our Inkscape-RGBs to the channels C and K, however I do not know or find any program or tool that allows me to manually specify conversion formulas.
Question: Is there any workflow to convert my PDFs from RGB to C(MY)K manually with my own specific conversion formula for the raw numbers with the converted PDF being in DeviceCMYK using a tool, script or Adobe product?
Due to the large number of figures I would prefer a batched solution which doesn't require too much coding from my side, but if it should be the only solution, I'd also be open minded for a workflow like "load/convert/save" within a program for every single figure or writing a small program with an easy-to-handle C++ PDF API for example.
Limitations and additional info: A different file format (like TikZ figures) is not possible any more since it does not work perfectly and the necessary adaptions to the figures would create too much overhead. A maybe helpful information: Since the figures are created in Inkscape, there are no raster images within the PDFs. I also do not want all figures to be converted to raster images during the color conversion.
Edit:
I have created an example of a RGB PDF-figure created with inkscape.
I also did a manual object-by-object color conversion to a CMYK-PDF with Illustrator, to show how the result should look like. Illustrator stores the axial shading in a DeviceN colorspace with the colors cyan and black, which is close enough^^
Here is an idea, I think it will work if your PDF files are using exclusively the colorspaces DeviceGray, DeviceRGB and DeviceCMYK:
1- Convert all your PDF files to Postscript (with pdf2ps from ghostscript for example)
2- Write a Postscript program that redefines the operators setrgbcolor, setgray and setcolor with your own implementation in the Postscript language, your implementation will internally use setcmykcolor and it will compute the values using your custom formula.
Here is an example for redefining the setgray operator:
% The operator setcmykcolor expects 4 values in the stack
% When setgray is called, we can expect to have 1 value in the stack, we will
% use it for the black component of cmyk by adding 3 zeros and rolling the
% top 4 elements of the stack 3 times
/setgray { 0 0 0 4 3 roll setcmykcolor } bind def
3- Paste your Postcript program at the begining of each resulting ps file from step 1.
4- Convert all your files back to PDF (with ps2pdf for example)
See it in action by saving this piece of code as sample.ps:
/setgray { 0 0 0 4 3 roll setcmykcolor } bind def
0.5 setgray
0 0 moveto
600 600 lineto
stroke
showpage
Convert it to PDF with ghostscript using this command line (I used version 9.14):
gswin64c.exe -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=sample.pdf sample.ps
The resulting PDF will have the following page content:
q 0.1 0 0 0.1 0 0 cm
/R7 gs
10 w
% The K operator is the PDF equivalent of setcmykcolor in postscript
0 0 0 0.5 K
0 0 m
3000 3000 l
S
Q
As you can see, the ps-> pdf conversion will preserve the cmky colors specified in postscript with the setcmykcolor operator.
Maybe you can post your formula as a new question and someone could help you out translating it to postscript.
Since you have access to Illustrator, you might want to try importing the PDF into Illustrator and using Illustrator's scripting capabilities to iterate over the elements and replace fill/stroke RGB colors with their CMYK replacement colors.
The difficulty will be with the shading patterns (Gradients) used in the PDF; if they are imported as GradientColor, then in theory it's a matter of digging into the GradientColor to find the base RGB colors and substitute their CMYK replacement.
A very similar problem was solved using the ActivePDF.dll with C++ (or C#??).

Using trimbox with ImageMagick

I have a PDF-file with the following dimensions;
mediabox: 23.08 x 31.78 cm
cropbox: 23.08 x 31.78 cm
trimbox: 21 x 29.7 cm
I'm using ImageMagick to try and get the trimbox value using Imagemagick's trimbox function.
identify -format "%[fx:(w/72)*2.54]x%[fx:(h/72)*2.54]" -define pdf:use-trimbox=true foo.pdf
This line of code gives me 23.08x31.78 cm which is the size of the media/crop-box. If I check the values of these boxes with Adobe Acrobat Reader I get the values I just posted in the top of this very post. Acrobat Reader/Photoshop/In Design tells me that the trimbox is 21x29.7 cm but ImageMagick just doesn't read the same value.
My guess is that ImageMagick can't interpret the trimbox correctly and then returns the cropbox values instead.
Does anyone know how to get the trimbox value from a correctly formated PDF-file or did anyone have the same problem?
Imagemagick states that this function should work, but some of the forums threads beg to differ.
convert -resize 50% -define pdf:use-cropbox=true 1.pdf 1.jpg
You can always find it as clear text within the page dictionary:
alt text http://sourceforge.net/apps/wordpress/moonshiner/nfs/project/m/mo/moonshiner/uploads/2009/05/pdfboxdata.png
If you need it just in one case, a hex editor will probably suffice. If it's a recurring thing you want to accomplish programatically, you might want to use a PDF parsing framework (a free example would be PoDoFo, but there's lots of others, too).