We have a client selling ebooks and he wants to add the buyer's name on every page of the book (for example in the footer) so that it discourages him to share it too widely. Apparently this is called adding an "ex libris". Our client wants to sell ebooks in PDF/ePub/Mobi formats.
I've searched the Interweb about how to do this and so far I've found that doing this to PDFs is quite easy, that there is a library to do exactly this on ePubs. But I've found close to nothing related to mobi files.
So my questions are :
Is it possible to add text on every page of a given .mobi file, for example in the footer?
If it's not possible, how accurate would it be to convert a watermarked epub to the mobi format? What would be the best tools and practices for the job?
This discussion is not about how I could add a hidden watermark to the files through some form of steganography.
We have done something similar before to use a footer image at the bottom of each page in a Mobi file. This image could contain the name of the buyer.
<style>
body{
background-image:url('[watermarkpath].PNG');
background-repeat:no-repeat;
background-attachment:fixed;
background-position:center bottom;
background-size:contain;
}
</style>
I suggest making the image itself transparent so that it doesn't interfere with the text. Hope this helps!
Related
I'm using Apache FOP to generate a PDF through XML and XSL-FO. I have a cell in my generated PDF that I need to be able to scroll through if the content overflows it. XSL-FO has an overflow="scroll" feature, but based on my research on the topic it seems that Apache FOP does not support this option.
For example, here is a scrollable region in a PDF used by a large CAD company that I need to replicate:
Is there any way to enable this feature in Apache FOP? Is it possible to enable it in the source code (I haven't been able to find a way to do so)? Any other ways to tackle this issue?
No, it isn't possible.
From the FO perspective:
In the XSL-FO Recommendation the scroll value for the property overflow comes from the corresponding CSS2 definition, which includes this clarification:
When this value is specified and the target medium is "print", overflowing content should be printed.
As the PDF output is a print-oriented medium, I read this as a confirmation that FOP is correct in printing the overflowing content.
From the PDF perspective:
In the PDF Reference 6th edition, a search for the word "scroll" returns results referring either to the scrolling bars in the user interface or in interactive forms (text fields, list boxes, combo boxes).
There is not, or at least I could not find it, a "static text object, but with scrolling bars" feature (which is probably sensible for a print-oriented format), so FOP cannot create it in the PDF output file, not even modifying the source code.
A second look at your comment and the screenshot you included made me think it could be an example of the 3D Artwork feature of the PDF format, a feature I didn't know of before (and I still know nothing besides its name). According to the reference:
Specific views of 3D artwork can be specified, including a default view that is displayed initially and other views that can be selected. Views can have names that can be presented in a user interface.
So, I think your screenshot shows the different views associated to a 3D object; it is not a general-purpose feature that could be used to provide scrollable text.
Well, it could be possible ...
It is possible but as far as I know not with Apache FOP. Without seeing the PDF in question and guessing from the screen shot, it looks like a Flash widget inserted into the PDF. This in PDF terms is a RichMedia annotation (requires PDF version 1.7 with extensions) in which you can insert the Flash widget as well as other controlling files (like XML, other images to display, etc.) and relate them together.
AFAIK, only RenderX XEP (whom I work for) supports such RichMedia annotations inserted into PDF via XSL FO through the rx:rich-media-object extension documented here: http://www.renderx.com/reference.html#Rich Media
I believe, the only viewer that supports PDF with RichMedia annotations is Adobe Reader so it is required to view such a file. Here is a sample that includes a few interactive flash widgets, some interactive charts all within a few page PDF that was generated long ago. NOTE: I am sure some of the links in the document do not go anywhere, it was for a trade show many years ago. Remember, you would need to download this file and view in Adobe Reader and have flash player installed to see it function.
http://www.cloudformatter.com/Resources/Samples/RichMedia.pdf
You cannot use common PDF browser-based viewers like Chrome or Firefox as they do not support this type of annotation.
A screenshot of page one here shows an interactive, scrolling widget. Page 4 contains a widget similar to what you show in your example.
Page 4 scrolling widget very similar to your request:
The widget on the last page is created using a scroller SWF that takes parameters that are the images and setup/configuration files that are XML. The RenderX extension object takes these as parameters and embeds all of them in the document for the interactive flash widget so that it is totally self-contai9ned in the PDF. The XSL FO to do this is:
<rx:rich-media-object name="Sample HTML Widget" scaling="non-uniform" width="611.92pt"
height="74.99pt" content-width="scale-to-fit" src="url('rx-scroller\dockmenu.swf')"
transparency="true" activate-condition="page_visible">
<rx:flash-var name="setupXML" value="rx-dock-settings.xml"/>
<rx:flash-var name="contentXML" value="rx-dock-contents.xml"/>
<rx:rich-media-resource name="rx-dock-settings.xml"
src="url('rx-scroller\rx-dock-settings.xml')"/>
<rx:rich-media-resource name="rx-dock-contents.xml"
src="url('rx-scroller\rx-dock-contents.xml')"/>
<rx:rich-media-resource name="style.css" src="url('rx-scroller\css\style.css')"/>
<rx:rich-media-resource name="customer1.png" src="url('rx-scroller\images\customer1.png')"/>
<rx:rich-media-resource name="customer2.png" src="url('rx-scroller\images\customer2.png')"/>
<rx:rich-media-resource name="customer3.png" src="url('rx-scroller\images\customer3.png')"/>
<rx:rich-media-resource name="customer4.png" src="url('rx-scroller\images\customer4.png')"/>
<rx:rich-media-resource name="customer5.png" src="url('rx-scroller\images\customer5.png')"/>
<rx:rich-media-resource name="customer6.png" src="url('rx-scroller\images\customer6.png')"/>
</rx:rich-media-object>
And note that many things that are in the flash would work, like links and such. It is just a pure, interactive flash inserted into PDF as the container.
Indeed it looks like this is not possible to achieve through FOP.
Continuing to dig around for a few days, however, I did find a clever post-processing alternative that is also free, essentially embedding a PDF inside of another PDF using the LaTeX animate package.
A drawback to this method is that it is not possible to embed links inside of the scrollable region, which is a major issue for me. But the method does enable inserting a scrollable region inside of an existing PDF and got me very close to what I was trying to achieve.
Im struggling with a thought here. Let's say a user has his own CMS where he can fill the content for our app. One of his options is to create a view by uploading images and typing text. Well keep it very simple and imagine he only uploads a image (320 x 20) and some text. So an image on top and some textlines below.
What would be the best way to let my app know of this layout and download the contents? I was thinking of a downloadable XML file which defines the layout but don't really know how to implement this or if its even the best way.
Oh and the content and layout must be downloadable for offline use too.
Another option what I was thinking of is showing the layout in a webview but I can't figure out how to download the mobile website for offline use.
A push in the right direction would be appreciated!
We use a custom XML and it is working good. All texts inside 'label tags' are in XHTML
remember to:
be specific when defining the xml to save some effort
write a limiting XSD! So nothing 'surprising' creeps into the xml
remember not include everything in ONE xml file as that would get rather large rather quickly. Choose a scheme for portioning the XML
Is there any alternative way to view PDF files on the web instead of using Acrobat Reader? I need to control the viewer to programmatically trigger the printing of the document.
The source of the PDF should come from a webservice URL / AspX
The easiest I would think is to use the Google Doc Viewer:
<iframe src="http://docs.google.com/viewer?url=**PathToMyPdfFile.pdf**&embedded=true" width="600" height="780" style="border: none;"></iframe>
You need to host your PDF files somewhere online, may be in a file in your public website ( it needs to be a public site) and put the link to the PDF file in "PathToMyPdfFile.pdf" in the iFrame above. Then set the width and height you need.
Google even generates this code for you here:
https://docs.google.com/viewer
Then simply put this iframe anywhere in the body of your page where you want to display your PDF. This also supports many other file formats too.
There are quite a few options for document views online, some open source others proprietary. Personally, I've had good experiences with Flex Paper. This will allow you to include the document view on your website, and there are some developer resources which will allow you to integrate it with the functionality you're looking for.
For demos, see here: http://flexpaper.devaldi.com/demo/
You can use FoxIT PDF viewer. It's free and programmable.
I'm currently writing a QuickLook plugin, and I wondering how I can display an image and some information about that image at the same time, similar to http://www.code-line.com/software/sneakpeekphoto/ .
There is only one way to do so: Convert your content to an already supported one. This means either PDF or HTML. There are two options you have:
For static information you create a simple PDF preview by rendering a view into a PDF. (Use -dataWithPDFInsideRect: method of NSView)
For dynamic information create a HTML page with links and so on. QuickLook will then show it. (I think this is also the way your example does it.)
We have not found a way to create complex previews on ur own and had to stick to one of the methods, too. Keynote and Pages do the same -- they convert their presentations to multi-page PDF previews...
I watched the traffic when google displays PDF attachments in gmail in a new window. The content is served as PNG images for each PDF page. And its text can be selected. What does google use on server side to generate a PNG file for a particular page in a pdf file? How does the selection of text on a png file work? Any ideas?
By default attachments are viewed securely using https://docs.google.com/gview, however it turns out you are allowed to request files over plain HTTP. This makes it a little bit easier to figure out what is going on using Wireshark.
As you indicated it was already clear that the PDF is converted on the server side to a PNG (ImageMagick is indeed a reasonable solution for this purpose), the obvious reason for this is to preserve the exact layout while still being able to view the file without requiring a PDF viewer.
However, from looking at the traffic I found out that the entire PDF is also converted to a custom XML format when calling /gview?a=gt&docid=&chan=&thid= (this is done as soon as you request the document). As I couldn't use Wireshark to copy the XML I resorted to the Firefox extension Live HTTP Headers. Here's an excerpt:
<pdf2xml>
<meta name="Author" content="Bruce van der Kooij"/>
<meta name="Creator" content="Writer"/>
<meta name="Producer" content="OpenOffice.org 3.0"/>
<meta name="CreationDate" content="20090218171300+01'00'"/>
<page t="0" l="0" w="595" h="842">
<text l="188" t="99" w="213" h="27" p="188,213">Programmabureau</text>
<text l="85" t="127" w="425" h="27" p="85,117,209,61,277,21,305,124,436,75">Nederland Open in Verbinding (NOiV)</text>
</page>
</pdf2xml>
I'm not quite sure yet what all the attributes on the text element stand for (with the exception of w and h) but they're obviously the coordinates of the text and possibly length. As the JavaScript Google uses is minimized (or possibly obsfuscated, but this is not likely) figuring out precisely how the client-side selection function works is not quite that easy. But most likely it uses this XML file to figure out what text the user is looking at and then copies that to the user's clipboard.
Note that there is an open source (GPL licensed) tool called pdf2xml which has similar but not quite the same output. Here's the example from their homepage:
<?xml version="1.0" encoding="utf-8" ?>
<pdf2xml pages="3">
<title>My Title</title>
<page width="780" height="1152">
<font size="10" face="MHCJMH+FuturaT-Bold" color="#FF0000">
<text x="324" y="37" width="132" height="10">Friday, September 27, 2002</text>
<img x="324" y="232" width="277" height="340" src="text_pic0001.png"/>
<link x="324" y="232" width="277" height="340" dest_page="2" dest_x="141" dest_y="187"/>
</font>
<font size="12" face="AGaramond-Regular" italic="true" bold="true">
<text x="509" y="68" width="121" height="12">This is a test PDF file</text>
<link x="509" y="68" width="121" height="12" href="www.mobipocket.com"/>
</font>
</page>
</pdf2xml>
Hope this information is in any way useful, however like one of the other posters mentioned the only way to be sure what Google does is by asking them. It's a shame Google doesn't have an official IRC channel but they do have a forum for Google Docs support questions.
Good luck.
Google uses a non-open-sourced PDF converter app developed in-house. So you're better off looking into the links posted by other answers, since you can't get your hands on the Google version. Sorry!
if you have the text you can make it what you want offcourse,
more specific you should check out this link : pdf to png using php
so imageMagick will be needed imageMagic
edit : another interesting link.
edit : i found this at google, it looks interesting ... so you could use the google api
Google Document List Data Api and this is a blogpost about it Google API Now Lets You Get Documents in Many Formats
Offcourse to be sure what google uses you need an answer from them ? :)
good luck !
To see what a pdf is created with, right click on it and go to the Document Properties (in Adobe reader). The PDF producer will show up as the "PDF Producer". I think google uses both Prince and IText (not in combination for creating PDFs). Google has created some major modifications on the above toolkits to create that end product.
Well.. this might just be the pdf2xml tool Google is using. They only changed they full words width, height etc and they added the p attribute... which turns out to be the attribute containing the coordinates for the words inside the line. Just played with it and found out :) Going to use this pdf2xml from google :P Upload, let them convert... use xml to transform tooo... epub? :P
You may also want to investigate use Lucence to index those big pdf files and serve related pages to your users.
See http://www.jguru.com/faq/view.jsp?EID=1074237 for more ideas.