pdf see current line ruler - pdf

I'm looking for accessibility tool , to make it easier to read pdf's.
In short, it should be possible to easily see which line is being read ( a bit like a ruler,when it comes down to text ), to avoid losing the line that is being read.
I was wondering if anyone knows any solution for this , for example a plugin for Adobe Acrobat Reader, etc...
Any suggestions are welcome.

I don't think there is a plug-in for Acrobat Reader. You may want to look at ZoomText or ClaroRead. Of course these only work if the PDF has text, but not images of text.
A low tech solution would be to open a Notepad doc and size it how you need. If you are on Win7 you could do this with sticky notes.

Another approach I've used is to convert the PDF to HTML and then run a server with it. This is fairly simple to accomplish using Live Server in VScode.
In the Chrome browser, we may then use accessibility extensions, such as ReadingBuddies, that have reading ruler functions.
Otherwise consider,
Use a PDF reader that has a built-in reading ruler feature, such as Adobe Acrobat Reader DC or Foxit Reader.
Use a PDF reader that allows you to add a reading ruler as an annotation, such as Xodo PDF Reader.
Use an online tool that allows you to view PDFs with a reading ruler, such as Smallpdf's PDF Reader.
Use a screen ruler tool, such as the one offered by How-To Geek, to measure the PDF on your screen.

The academic term is sometimes called RSVP (Rapid Serial Visual Presentation), there are patented hardware and software versions but in principle it is simply a translucent masking added to the viewport. see https://softwarerecs.stackexchange.com/questions/28582/is-there-an-equivalent-to-a-reading-guide-strip-for-windows-os-x-or-linux and http://www.see-n-read.com/products/esee-n-read-2/
10 years later and its 2023 so software such as browsers should include such features here is Edge in some sites where Immersive Reader is supported but not StackOverflow !! The above example is using an edge extension. https://microsoftedge.microsoft.com/addons/detail/screen-mask/dfanfcmhbdocjfpmnoebccndgmhlincl others are available for other browsers https://chrome.google.com/webstore/detail/reading-ruler/phiedfcbjfjagnjikfbobmldbpmdcpfk
To get the Reader Mode options on Chrome: or Edge look at the available flags
However if you save page as PDF and read aloud it is then used there !
Some PDF readers like Mac Skim include such accessibility option.
However, simplest is :-
Most PDF readers can be reduced to focus viewport on single lines and with auto scrolling that allows for more focused "line by line" reading without the audio, plus fast and easy adjustments/enlarging for PDF variable lines with illustrations.
Note as per above PDF where much of the text is actually one or two lines out of order it is not trivial for a PDF reader to understand which text base line is independently to be used next. in reality "Read Aloud" will read two variable height lines then jump to top of page then back to the second visible line. PDF lines are not the visible order nor a constant height/spacing, you might expect.

Related

pdfbox embedding subset font for annotations - part 2

I am creating a separate question, stemming from this one. The used code is almost the same. The reason is that the original problem was about subsetting a font with pdfbox, which I kind of dealt with. I got faced though with another problem, which is : the annotations, and how the fonts used in them are interpreted by particularly Acrobat Reader DC.
I tried different combinations of fonts and embedding options and got rather desperate. The fact is that I had a feeling that in particular the way these things are handled by the programs that interpret the PDF files is non-standard. I think I read somewhere that the annotations and the way they are displayed is on purpose non-standardized by the PDF format, to give freedom to the interpreters to handle them in their own way, since the main purpose of the annotations is the interaction with the user. TL;DR I cannot understand why Acrobat Reader DC doesn't like the annotations I have created and saved with PDFBOX. I even opened a question on friendly and helpful Adobe's User Community forum. But as I expected, someone suggested me to better investigate this question with the PDFBOX team.
Everything is possible, but rather than writing a question on PDFBOX mailing list (I could never get used or understand the efficient use of the mailing lists btw), I want to open a question here because I hope that it could help others to understand the PDF format better.
I basically rephrase the above question from the Adobe's forums here: Here is an example (Google Drive link) with FreeText annotations (but it seems to make no difference if I use Stamp annotations instead), it causes problems when open by Adobe Acrobat Reader DC (file) version 21.001.20149.37945 (I think this corresponds to April 16th '21 update). Specifically the problem happens when the Comments pane is opened by the user, either manually or automatically.
Manually:
link
Automatically:
link
While experimenting, I also tried to unset the "Use local fonts" option in Preferences -> Page Display. I had the impression that maybe Acrobat Reader will be more eager to show the error message once it is not allowed to substitute the erroneously embedded fonts with the possible local fonts. I am not sure if this is true.
The error that I get is the infamous "Cannot extract the embedded font XXXXXX+SomeFontName" as seen in the below picture:
link
The same problems happen also if I use full font embed (subsetting option set to false when using PDType0Font.load). I also tried to embed OpenSans font instead of LiberationSans, also tried to manually convert LiberationSans to a TTF font with fewer glyphs using FontForge, even tried to use Windows ARIALN.TTF, thinking that maybe the font is the problem. All cause the same behavior in Acrobat Reader DC. I have also tried to run Acrobat Reader 2019 Pro Preflight on the document and in the profile that scans the document for the possible font inconsistencies, it reports no errors.
Of course, when I use e.g. PDType1Font.HELVETICA instead of custom TTF font, I do not get the above errors. But I cannot use it because it does not contain the glyphs for the Unicode characters that I use. Does anybody have a better idea?
Thank you very much!
EDIT: to make myself clear - the error does not appear ALWAYS. it appears on some machines constantly (e.g. I am using Windows 7 64-bit with latest Acrobat Reader DC installed to reproduce it fairly well), while on my Windows 10 64-bit with the same version of Acrobat Reader DC it sometimes appears, and sometimes not - I haven't figured out why or in what cases.. - which makes me think - but no - I checked that too - the font I am using opens up alright on the machine where the problem is fairly constant)
UPDATE: at my wits ends again, I created a blank page with Apache OpenOffice, exported it to PDF, opened it with Acrobat Reader DC (last version), added a FreeTextTypewriter annotation (View -> Tools -> Comment -> Open) with 4 greek letters in ArialNarrow font, saved it, reopened it with Acrobat Reader DC, and it gives me the same error (cannot extract the embedded font...).. So this could be the Reader problem? But they made this so difficult to diagnose.. Here is the file, but I do not expect it to show errors on other machines. It's one of those moments that you start to believe in magic and the power of prayer (and a good sleep)
UPDATE 30/04/2021
So, to sum things up, I haven't come with a solution yet, but I came up with three files created with PDFBOX, OpenPDF (iText5 fork) and Acrobat Reader DC itself (can append annotations and save - just adding a simple Text box with greek text through Comment pane) - and they all issue the above error message, when open by Acrobat Reader DC. I have posted details in the Acrboat Reader forum here (same link as in comment)
I have added the code that I used to create the OpenPDF example file here and the example 3 files are in the same repository here

How to troubleshoot badly rendered PDF file

I have a small PDF file, which is supposed to display just the string "Hello World!".
Unfortunately, it displays black boxes instead of the characters. I suppose there is some problem with the fonts, but I am not sure.
Is there a way to diagnose and troubleshoot this issue? All I see on the Internet is advices to do this and to do that, which helps to some and does not to others (nothing helped me). Looks like shooting in the dark to me.
Here is a concrete example. Why does this PDF display black squares instead of the string Hello World ?
EDIT
A bit of the context. I am trying to convert a trivial HTML to PDF using the wkhtmltopdf tool. It is an absolute frustration, because according to the Internet searches the tool is supposed to work and do it quite well. But the thing does not work for me and nothing I do changes this! Unfortunately, this tool seems the only free tool to convert HTML to PDF. This is a huge bummer.
If you want to find out whether a PDF is valid or what is wrong with it, there are a few general steps you can take:
1) Open it in Adobe Acrobat or Adobe Reader (on a desktop platform, not a tablet device). For a very long time the PDF format was owned by Acrobat and the way their software handles PDF is still close to the gold standard. However, there is a caveat with this; Acrobat is very, very smart in the way it handles PDF files and it will overlook or actively correct a number of mistakes other PDF engines might have a problem with...
2) Get yourself a preflight tool. These tools were invented for use in graphic arts, but have applications outside of it too. Popular examples are callas pdfToolbox (warning, I'm affiliated with this vendor!) or the "Preflight" plug-in you'll find in Adobe Acrobat Pro (which is actually also callas technology under the hood). Then preflight specifically against the PDF/A-1b or PDF/A-2b standard.
That last point deserves some more explanation. You should pick a PDF/A compliant preflight profile because the PDF/A (or PDF for Archival) standard is extremely picky. It's goal is to make sure that PDF files will still be readable in exactly the same way 50 years from now and to ensure that it tests a whole range of properties of the file itself and the different components in it. You might be able to ignore some of the errors you get (because some of them will be connected to the fact that the PDF/A identification isn't correct for example) but I wouldn't ignore any other errors unless you understand exactly what they mean and why they aren't relevant.
PS: Can you make your test file available some other way? The file you shared in your question is useless I think. When I do "Download" I get a PDF file that doesn't contain text and doesn't have fonts in it. Those rectangles you see are exactly that - rectangles. So this PDF renders fine - it's the PDF generation process (or the fact that you stored the file on Google docs - I really have no clue what that might do) that went berserk apparently.
In addition to David's hints (first using a known good viewer and then some preflight tool), there is a third level in the inspection process:
3) Inspect the PDF with your own eyes and with the PDF specification (made available by Adobe here) at hand in a text viewer (for a first impression) and (if the cause of the issue at hand is not immediately visible) then in a PDF browsing tool (for in-depth analysis).
This step is quite cumbersome at first but after some time you learn your way around in the PDFs.
A sample for such a PDF browser tool is RUPS but there are others around, too.
'Small PDF file supposed to display "Hello World!"'
Not correct. The file you linked to does not contain any code that could render pixels on screen or on paper that a human brain would read as "Hello World!". The file indeed does only contain vector drawing operations which result in 12 black boxes.
The command line tool pdffonts does not indicate any font being used in the file:
pdffonts so-file-#15858199.pdf
What could still cause the "rendering" of the words you are looking for: some vector or pixel drawing code contained in the PDF. To find out about this, you'll have to look into the low level source code of the PDF.
The original file is 1.570 Bytes. So this task looks not as being overly huge.
'Is there a way to diagnose and troubleshoot this issue?'
Using qpdf, a "command-line program that does structural, content-preserving transformations on PDF files", you can expand all contained streams (which are normally compressed):
qpdf --qdf --object-streams=disable so-file-#15858199.pdf qdf-#15858199.pdf
The resulting file, qdf-#15858199.pdf, is 3.875 Bytes. Now open it in a text editor. PDF object no. 6 (lines 66-219) contains the contents of the page. Lines 123-194 contain only the operators m (moveto), l (lineto) and h (closepath). These lines contain 12 different groups of drawing commands, where each one represents the path for one of the 12 black boxes you see rendered on screen or printed on paper:
102.400001 12.8000001 m
268.800004 12.8000001 l
268.800004 179.200002 l
102.400001 179.200002 l
102.400001 12.8000001 l
h
Line 196 contains
f
which is the fill operator to actually fill black color into so far constructed (closed) path. Nothing in the other lines (which I didn't analyze in detail) does any drawing that may resemble the shapes of any glyphs.
'Unfortunately, this tool seems the only free tool to convert HTML to PDF'
Not correct either.
1.
Assuming your "free" is meant as free as in liberty, then an alternative option is HTMLDOC.
HTMLDOC does not support specific fonts which may be assigned to your HTML input via CSS, but it does a good job in converting one or multiple HTML documents into a single PDF book containing chapters, page-numbering, page headers and footers and more. For all options available, see its full documentation.
2.
Assuming your "free" is meant as free as in beer, then an alternative option (for private usage only) could be PrinceXML.
PrinceXML does an extraordinarily good job when it comes to support almost all CSS features your HTML document may be using. See its documentation and also some of the sample PDF files produced by PrinceXML.

Screen reader in PDF

Does anyone use or know about Screen readers in PDF such as NVDA?
I wanna figure out some question about screen readers in PDF:
What screen reader can read PDFs? I mean, What should I use, If I wanted to create a PDF Reader(for example with C#)?
How can I use a special language(Such as Hebrew or Persian) in screen readers? Can I change default language to special language in a screen reader?
IF I can change default language, What shall I do?
YMS is kind of correct with his answer.
1.What does screen reader support by PDF?
The screen reader that has the most support is JAWS, this is because Freedom Scientific and Adobe have worked extremely close together. NVDA has pretty good support as well. ZoomText has hit or miss support in Adobe Reader, and next to none in Acrobat.
I mean, What should I use, If I wanted to create a PDF Reader(for example with C#)?
Honestly, don't even try. It took Freedom Scientific roughly 10 years to get JAWS and Acrobat/Reader working together decently.
2.How can I use a special language(Such as Hebrew or Persian) in screen readers? Can I change default language to special language in a screen reader?
See my answer on localising strings, but JAWS does not even support those languages. So I would tag the first page as an image and put an alt of "This PDF is in Hebrew, please return to the main page to get a ___ version." I don't know those languages, so I don't know if it is more practical to provide a straight English version or a romanization of the languages you mentioned.
AFAIK, screen readers and PDF files (in general) do not get along very well. You can however use PDF files specially created to be used with screen readers. See this post for more details: What are “PDF tags” and why should I care?
For a non-optimal solution, you will need a component that allows unicode text extraction from a PDF file. You will find many questions and answers about PDF text extraction here in SO.
you could find information on webaim website: http://webaim.org/techniques/acrobat/

Render PDF on a Blackberry?

We are using Blackberries to display PDF reports. Here are background details on the problem:
The PDF reports are created using JasperReports.
Report format can be changed.
Different report formats are available (as per the feature set of JasperReports).
The PDF reports are on a website, too, so retaining a single source is ideal.
The page setup is in Landscape.
Here are the issues we have encountered:
Users cannot see a full line of text on the Blackberry.
The size of the PDF and UI makes reading difficult, at best.
The menu option to convert the PDF to text loses too much formatting to be useful.
The text is blurry (and too small).
Here are solutions we have thought about:
Create a second report (not ideal) in text or HTML format.
Simplify the original report format (not really an option, given the amount of data).
What other options are there for making a report available on the Blackberry, given the constraints of JaserReports, such that the report:
Is legible?
Is formatted for readability?
Displays quickly?
Essentially, we'd like to make sure there are no simple solutions we have overlooked for displaying legible PDFs on Blackberries.
We convert TIFFs to PDF for one of our applications, and have had mixed results with BlackBerry PDF viewers. These were our results.
Working
The following PDF readers worked for our purposes:
RepliGo Reader v1.1.1.1 - $19.95
Works fine.
DataViz Documents To Go Premium Edition v1.003.001 - $49.99
Works and includes a word wrap option to get the current zoom level to fit the available screen width, by moving text onto subsequent lines. Might fit your needs.
Non-Working
The following PDF readers did not work for our purposes:
BeamReader v1.0.8 - $17.99
BeamSuite v3.0.2 - $49.99
These couldn't open our PDF files ("Unsupported document format"). In addition they did not register as a PDF content handler, required for our application.
MasterDoc - $19.95
eOffice - $29.95
These also did not register as a PDF content handler. We had a range of problems with these, including installation issues, and not being able to open any PDFs at all.
Try BeamReader http://www.slgmobile.com/beamreader.html
I hear it's the best at reading PDFs for BlackBerry
How about outputting the file to an RTF or an image file (JPG/GIF), and then viewing them in your web browser?
If that doesn't work well on the native browser, I would focus on viewing the file via some other web browser - for example, Opera Mini. I know for images it's easier to navigate "big" images in Opera Mini than the native browser.
If your blackberries are on a BES server, couldn't you display the reports as HTML on your corporate intranet? - Then you could email a link to the blackberry and simply browse the report.
You can convert pdf to image via xpdf and than show image. xpdf is a BEST renderer of pdf.

Is there a way to use custom fonts in a PDF file?

Well basically I'm finishing school in mid December so I'm just brushing up my resume and I'm wondering if there's a way to use custom fonts (in this case Calibri and Cambria) in a PDF file and make them render correctly on all computers.
Thanks in advance!
EDIT: I'm using MS Word 2007, but am open to suggestions
PDFs don't store text and fonts like other documents, they actually convert the font to vectors, that way no matter what font you use, the document displays exactly as expected. This is why searching for text inside the PDF is such a problem for 3rd party PDF Readers and why even Adobe themselves use to distribute 2 versions of Acrobat (one with text search, one without).
Another thing to keep in mind is, PDF isn't pixel exact, it's ratio exact. PDF readers generally do not use a 100% zoom level, instead most people read them at "fit to screen" or "fit to page". I point this out because I'm guessing the reason you are trying to use those new Vista/Office 2007 fonts is because of their LCD subpixel support (improves readability on LCD screens). This feature will not translate into the PDF, since the letter becomes a vector, subpixel information is lost, and even if it wasn't, becomes useless because the vector will be sized to something other than you intended at view time.
The PDF format is capable of embedding fonts, if the font has been marked embeddable by its creator. You'll have to check the software that's creating your PDF to see if it has the capability and how to enable it.
theoretically speaking, on technical side, embedding/not embedding ability, regarding the fonts, is settled with a special flag in font file (ttf or opentype or type1)
you can view this special embedding flag with any font editor program (I recommend
FontCreator (by High-logic)
http://www.high-logic.com/font-editor/fontcreator.html
with a free trial fully operative and without limitations
you can also change embedding/not embedding flag, but legally speaking, for the 99% of fonts commercially distributed, this breaks the license of font