Free tool for watching coordinates in PDF [closed] - pdf

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
We don’t allow questions seeking recommendations for books, tools, software libraries, and more. You can edit the question so it can be answered with facts and citations.
Closed 1 year ago.
Improve this question
Is there any tool in some PDF Viewer/Editor like Acrobat, Evince, etc. where I can navigate and watch coordinates(i.e. (x,y)) of any selected point in PDF-document?

Apache PDFBox PDFDebugger 2.0.* displays PDF coordinates in the status bar. Get it here:
https://pdfbox.apache.org/download.cgi
Download the pdfbox-app-2.0.*.jar available under command line tools on above link. Then run the below command with the required file.
java -jar pdfbox-app-2.0.*.jar PDFDebugger "InputFile"
You would be able to see the coordinates by hovering the mouse on pdf page. You can select a particular page from left hand side and corresponding page will be displayed on right. Note pdf displays coordinates from lower left of the page so if you want to extract some text using these coordinates you need to subtract the y axis from the total height and then use it. In case of below example you will have to use x:47 y:(792-522)=270
The 3.0.0 version has a few extra features unrelated to this question:
https://repository.apache.org/content/groups/snapshots/org/apache/pdfbox/pdfbox-debugger/3.0.0-SNAPSHOT/

I've found that Gimp is perfect for this! It even has different units of measure, so this is my choice

To explain #michael Z's answer, I found the following works:
These steps are for paint.net, but most image manipulation programs like The Gimp should be similar:
open the pdf up in reader.
screenshot it.
paste into paint.net
choose image > flip vertical.
choose view >show rulers.
Also tick pixels.
Resize image to use inches and be 8.5 x 11 (if it's American letter) and 72 DPI.
Now use the rectangle select tool. The image is upside-down, so the upper-left of the tool is the lower-left for the .pdf and the lower right is the upper-right for the .pdf.
FYI - In Paint.Net, the bottom toolbar always shows the xy coordinates of the cursor.
It makes your eyes a bit squiffy to read documents upside-down for while, but at least you can now get a pretty good estimate of the locations on the document!
I found this description helpful too

Also good old GhostView (gv) shows coordinates.

CanOpener is a very impressive tool for working with PDF files. It operates as a plugin for Acrobat Pro - http://www.windjack.com/product/pdfcanopener/
Another option would be to use the Foxit Phantom PDF Advanced Editor which allows you to select objects and see the properties of each object.
You could use a library such as Quick PDF Library to render the image to a BMP file and then write yourself a little TOOL to scroll and zoom around the BMP file reporting back coordinated. http://www.quickpdflibrary.com - (Note : I do consulting work for Quick PDF)
I am sure you could load the PDF into Adobe Illustrator and get the current coordinates in the status bar.
Andrew

If you are trying to do this without using Acrobat Pro because it not free, here is how you can do it.
1. download and install Acrobat Pro (yes seriously)
2. activate the trial version. if you already did this, it's okay
3. once the trial is done you will lose a bunch of tools from Acrobat, but you will definitely not lose the cursor coordinate tool
here is how to use it
1. go to the main menu strip, click view, and select show/hide
2. from here you can select cursor coordinate and voilà

I use InkScape v 0.91 to map out PDF rectangles for extracting text. It can load a PDF into the canvas. Don't forget to change the measurements to inches (Ctrl-Shift-D) with the document open for the document properties. For the Page property tab set the Default Units to inches, for the Grids property tab set Grid Units to inches.
This page, PDF coordinates, explains the PDF coordinate system, and its unit of measurement.

I was looking for a tool to get the co-ordinates to use extract option available in PDFBox library. Gimp way worked well for my purpose. I opened the PDF in Gimp and set the measuring unit to points.

In case you don't want to install any heavy software for such a trivial task, you can create annotations in a xfdf file, set their location and then see which area is annotated in PDF.
You can use this template:
<xfdf xmlns="http://ns.adobe.com/xfdf/" xml:space="preserve">
<annots>
<square style="solid" width="4" color="#000000" opacity="1" creationdate="D:20190624111403Z" flags="print" date="D:20190624111403Z" page="0" rect="0,0,135,390.6" subject="ROI" title="ROI" />
</annots>
<pdf-info version="2" xmlns="http://www.pdftron.com/pdfinfo" />
</xfdf>
Change the coordinates of the "rect" attribute, save the file as xfdf. Then, as you open the xfdf file in Adobe reader, the reader will ask for the location of the PDF file. Locate the PDF document, and you will see the rectangle drawn at the specified coordinates.

There is ImageMagick which is lighter than Gimp and gives you coordinates along your mouse pointer.

Adobe Reader has it.
Edit->Analysis->Geospatial Location Tool

Related

Automatically remove all PDF content outside a crop area

For a deck of lecture slides, I have extracted several vector illustrations from a PDF-file. I did this by highlighting the relevant area in Preview.app, copying, and opening a new file from the clipboard.
The figures look just fine, even though I noticed that the files are a little large. When I open them in Illustrator, I can see what's described in the screenshot – that all of the page content is still there, it's just hidden because it lies outside the crop area.
Now I could simply remove everything except the relevant figures in Illustrator, but I would much rather automate the process, since I have a large number of figures.
How can I automate this process such that everything outside the crop area is discarded and everything inside it is preserved as a vector image?
You can use redact utility to remove the content.
Just go to https://doxiview.cib.de/showcase/index.html?locale=default
Choose redact tool
upload your PDF
Choose on the right Select Area and redact fill color as white
Mark all content, which you want to remove
click on apply
download PDF
Afterwards you can crop the PDF and you won't have the content being still there.
There's no need to rasterize. Just crop the pages then use Acrobat DC to "Sanitize" the document. That will completely remove any non-visible parts of the file.
In Acrobat Pro, go to Preflight and select the setting below.
Then click edit to the right
You should be able to create Adobe droplets with this preflight setting for automation

How can I easily crop a PDF page?

How can I easily crop a PDF page in a given PDF file? I prefer using as little coding as possible, and guess border geometries as little as possible...
There are several options:
Crop by point-and-click using a GUI front-end:
pdf-quench
krop
briss
PDF scissors
Crop by using the command line:
pdfcrop command (provided by texlive-extra-utils), using the following arguments: pdfcrop --margins '-30 -30 -250 -150' --clip input.pdf output.pdf (-left -top -right -bottom format).
PDFCrop
convert -crop command (provided by imagemagick)
Ghostscript
Crop by writing your own script:
Python
LaTeX
For quick, GUI-aided PDF cropping tasks, try pdfarranger (available in Debian repos, formerly known as PDF-Shuffler).
For precise point-and-click cropping, one option is to use LibreOffice Draw.
The instructions below assume you want to crop part of a single-page PDF:
Start with a blank document
Select the Insert > Image... menu
Navigate to the PDF you wish to crop
The contents of the PDF will show up as an image
Right-click on the PDF content in your document and select the "Crop" menu item.
Use the handles to resize the viewable area of the PDF to the section you want to remain after cropping
Click outside of the PDF to disable the crop handles
Click again on the PDF content to position it however you want by:
Dragging it around the page
Using the arrow keys to move it
Use the Draw positioning tools to align or center the PDF content.
When you're happy with the result, save, export it to PDF, or print it.
For multi-page PDFs, You'll have to work page by page by first splitting the PDF into multiple pages using some other tool like PDF Arranger (or simply "Printing to PDF" each page of the PDF you want to crop in your PDF viewer), cropping them one by one with Draw, then recombining them into a single PDF (using PDF Arranger again).
You could try using the pdfCropMargins Python program (https://pypi.org/project/pdfCropMargins/) with the -pg option to select the particular page. The command-line program offers many options, and also has an optional GUI.
You can use Inkscape to losslessly crop PDFs. This uses Inkscape's built-in SVG-PDF conversion.
Open your file in Inkscape: File -> Open -> select your file -> Open
Resize PDF:
Using user-input values: File -> Document properties -> Page -> Custom size
Using auto resize to content: File -> Document properties -> Page -> Custom size -> Resize page to content... -> set desired margin -> Resize page to drawing or selection
Inkscape is a particularly good option as often PDF crop utilities (such as krop, mentioned in other answers) do not change the actual size of the object, instead adjusting how much of the object (e.g. an A4 page) is displayed.
E.g. from krop homepage:
Unfortunately, there is no simple way to eliminate
unnecessary/invisible parts of a PDF file. krop only adjusts which
parts of a PDF are displayed; the original content is still there in
the file and will, for instance, show up when editing the file in
inkscape
Editing directly in Inkscape does exactly what this says is impossible.
The list of tools provided by #sparkler was interesting, but did not help me very much.
Some of the tools provided, actually cropped my pages, but usually they involved some conversion to an image which made pdf files blurry and hard to read.
In the end I used podofocrop of PoDoFo tools which was able to retain all the graphics at full resolution and the text as real text.
It will crop all pages to the minimal size (i.e. without a border).
The command is: podofocrop input.pdf output.pdf
To install on MacOS use brew install podofo

Poor image rendering with Google Docs PDF viewer

I used Word 2007 to create a PDF file with an 1526px * 900px image filling a whole page. This is not the first time it's happened, but Google Docs PDF viewer absolutely mangles the colour rendering making it unusable.
I've taken screenshots at the same zoom level in Google Docs viewer and Foxit Reader.
Here's an image for comparison:
It's awful! I've tried messing about with some things, but can't find anything that can correct this issue.
In Chrome you can select "Print" and then "Save as PDF". The image quality in the saved PDF file will go up significantly, compared to the one from "Download as PDF". Google seems to be optimizing images to preserve bandwidth.
Let it be recorded here, 16 months after the present original posting by Turkeyphant and a similar posting [1] on the Docs+Drive product forum, that the problem appears to have been fixed within about the past week. Since that time, when a pdf (or Word) file is opened that resides on the Docs+Drive cloud, the file is rendered with what appears to be proper 24-bit color. The treatment whereby the color was reduced to 5 bits, which could encode 32 colors or 32 shades of gray or 16 of each, depending on the image, has been abandoned.
To the best of my knowledge the Docs+Drive staff have not announced this change, either on their Blog or on their product forum. I noticed the change a few days ago and noted it on the conversation [1].
[1] (2013-05-21) Problem in pdf-viewer with color images
https://productforums.google.com/d/msg/docs/_bdfiYgjF2s/5PDMdp9MhFQJ
It might have something to do with compression of the image in the PDF.
I mean, PDF supports JPEG2000-encoded images (JPXDecode Filter) and PDF Reference states that:
From a single JPEG2000 data stream, multiple versions of an image may
be decoded. These different versions form progressions along four
degrees of freedom: sampling resolution, color depth, band, and
location. For example, with a resolution progression, a thumbnail
version of the image may be decoded from the data, followed by a
sequence of other versions of the image, each with approximately four
times as many samples (twice the width times twice the height) as the
previous one. The last version is the full-resolution image.
Google Docs viewer might be displaying only first version of the image (with lower resolution or lower color depth) thus producing "awful" output.
Perhaps the attached pair of images will help towards clarifying what is happening with color in images that are rendered through the Google Docs pdf viewer. I inserted the Wikipedia image RGB_Color_Solid_Cube (1024*1024 pixels) into an otherwise empty Google Docs text document, converted it to pdf, and viewed the resulting pdf files two ways: once through the Google Docs+Drive pdf viewer and once through the regular pdf viewer of the Chrome or Firefox browser. Then I made screenshots. Here is the RGB Color Cube via the Docs PDF Viewer and here is the RGB Color Cube via a regular browser PDF Viewer.
The color resolution in the Docs PDF Viewer version is really awful; it looks like 64 colors at most. Maybe someone else is able to recognize this kind of rendering and identify the problem better.
This is related to compression and it's something that you can't change in the default view of Google Docs Viewer. The simple solution is to upload the PDF and just serve it from the site in an iFrame. Here is an example:
Problem Embedding Google Docs PDF Solution
Mike

pdf see current line ruler

I'm looking for accessibility tool , to make it easier to read pdf's.
In short, it should be possible to easily see which line is being read ( a bit like a ruler,when it comes down to text ), to avoid losing the line that is being read.
I was wondering if anyone knows any solution for this , for example a plugin for Adobe Acrobat Reader, etc...
Any suggestions are welcome.
I don't think there is a plug-in for Acrobat Reader. You may want to look at ZoomText or ClaroRead. Of course these only work if the PDF has text, but not images of text.
A low tech solution would be to open a Notepad doc and size it how you need. If you are on Win7 you could do this with sticky notes.
Another approach I've used is to convert the PDF to HTML and then run a server with it. This is fairly simple to accomplish using Live Server in VScode.
In the Chrome browser, we may then use accessibility extensions, such as ReadingBuddies, that have reading ruler functions.
Otherwise consider,
Use a PDF reader that has a built-in reading ruler feature, such as Adobe Acrobat Reader DC or Foxit Reader.
Use a PDF reader that allows you to add a reading ruler as an annotation, such as Xodo PDF Reader.
Use an online tool that allows you to view PDFs with a reading ruler, such as Smallpdf's PDF Reader.
Use a screen ruler tool, such as the one offered by How-To Geek, to measure the PDF on your screen.
The academic term is sometimes called RSVP (Rapid Serial Visual Presentation), there are patented hardware and software versions but in principle it is simply a translucent masking added to the viewport. see https://softwarerecs.stackexchange.com/questions/28582/is-there-an-equivalent-to-a-reading-guide-strip-for-windows-os-x-or-linux and http://www.see-n-read.com/products/esee-n-read-2/
10 years later and its 2023 so software such as browsers should include such features here is Edge in some sites where Immersive Reader is supported but not StackOverflow !! The above example is using an edge extension. https://microsoftedge.microsoft.com/addons/detail/screen-mask/dfanfcmhbdocjfpmnoebccndgmhlincl others are available for other browsers https://chrome.google.com/webstore/detail/reading-ruler/phiedfcbjfjagnjikfbobmldbpmdcpfk
To get the Reader Mode options on Chrome: or Edge look at the available flags
However if you save page as PDF and read aloud it is then used there !
Some PDF readers like Mac Skim include such accessibility option.
However, simplest is :-
Most PDF readers can be reduced to focus viewport on single lines and with auto scrolling that allows for more focused "line by line" reading without the audio, plus fast and easy adjustments/enlarging for PDF variable lines with illustrations.
Note as per above PDF where much of the text is actually one or two lines out of order it is not trivial for a PDF reader to understand which text base line is independently to be used next. in reality "Read Aloud" will read two variable height lines then jump to top of page then back to the second visible line. PDF lines are not the visible order nor a constant height/spacing, you might expect.

Reading text + graphic (like lines) info from an existing pdf

I want to read an existing pdf & extract the text and graphics information. Within graphics, currently i just need the drawn lines. There are many vendor component for reading PDF text, but are there ones that can give graphics info too ? Though free/open-source is preferred, I'm ok to commercial ones too.
The requirement is:
For every page in PDF:
Reading text blocks
Getting to know the canvas co-ordinate of the text block (rectangle containing the block). Note, for text with higher font size, the rect size will change.
Lines - need collection of (x1,y1,x2,y2) for every line in a page in pdf
Thanks,
- Seeker
This is my field, though the question is a bit old. Hopefully this still helps.
You leave some room for assumptions, so here are mine:
you seek a script, rather than stand-alone software
your object is archival
you are running command-line scripts:
Use this command line script, detailed at: http://stefaanlippens.net/extract-images-from-pdf-documents
you are running server-side code using imagemagick or graphicsmagick functions:
Something like "convert -background white -flatten test1.pdf test1.jpg" (imagemagick) will render the whole PDF page into a jpeg. If you want to then crop it to the image(s), then it depends upon the context of the project to determine the best script(s) to do that.
A rather complex question. If you wish to provide more details about the project, then I can provide some more guidance. Best of luck.