I have a PDF which users can download. However, I need to have the date it was downloaded in the text on the PDF.
Is there a way to insert a "date now" stamp on a PDF so it is dynamically added on opening?
Perhaps you can add javascript to a PDF?
You can sign and timestamp the PDF on the server before giving it to the user. Signing can be performed once a day (unless you need finer granularity if timestamp time). This is the only reliable way to put a timestamp to the PDF. Also, the signature can be removed, and if you want to add a watermark which can't be easily removed, this is a different story.
JS sounds like the way to go... mostly.
You could add an empty text field where you want the date to appear. When someone opens the PDF for the first time, you could then set the field's value.
ISSUES:
Reader cannot save changed PDFs unless they have been Reader Enabled via Acrobat.
The script will run when the PDF is opened not when it is downloaded.
If you really want to mark their download time, you need to do it on the server. Fields again, only this time with some byte offset monkeying.
Set the keywords (or author, whatever... one of the Doc Info fields you're not using) to a bunch of empty spaces... enough to accommodate your largest timestamp. In script, when the form is opened, you ALWAYS write that info field to your timestamp field. No worries about reader enabling.
Now, to actually write the timestamp, you need to know the exact byte offset of that info field in your PDF, and you need to know how much empty space you have to work with so you don't accidentally overflow (that would be Very Bad). When the time comes to serve up your PDF, you suck up the bytes, change the ones for your time stamp, and server those modified bytes.
That's the most efficient (and brittle) way to do it.
If you'd rather do things The Right Way, use your favorite PDF library to change the form value and serve up the resulting PDF. No script required, but significantly more work for your server.
Note: Multiple fields can share the same name. They automagically share the same value as well. So if you want the time stamp to show up on every page, you don't need "timestamp1", "timestamp2", and so on. You can give all the fields the same name, then set the value once. "Your favorite PDF library" might not handle this terribly well. iText does, I'm sure others do also.
Related
I deal with pdfs a lot but when I try to download them it usually doesn't contain the actual title of the pdf/paper so I'll have to rename it most of the times, which I find is annoying.
In many cases URL doesn't have the title of the pdf, so I guess this has to be extracted by processing the content of the pdf. And it needs to be done on the client side, i.e., for e.g., as a browser plugin?
Is there a way that I can get the title when I'm downloading pdfs over the web via scripting or someting?
That most likely won't work and here is why.
You would have to write some incredibly dynamic code to fetch some sort of title for the PDF. You would have to have code that would scan a website, somehow pick out a title, then fire off a request to code running on your computer to change the name.
It would be somewhat inconvenient because you would always have to have the script run on your computer (likely always having terminal open).
Your code would be highly prone to error. If your website script messed up, you could accidentally name the PDF incorrectly and then not be able to find it based on how inaccurate the name is.
For now, I would suggest dealing with the pain of editing the PDF name manually.
Actually I am in need of counting the visitors count for a particular document.
I can do it by adding a field, and increasing its value.
But the problem is following.,
I have 10 replication copies in different location. It is being replicated by scheduled manner. So replication conflict is happening because of document count is editing the same document in different location.
I would use an external solution for this. Just search for "visitor count" in your favorite search engine and choose a third party tool. You can then display the count on the page if that is important.
If you need to store the value in the database for some reason, perhaps you could store it as a new doc type that gets added each time (and cleaned up later) to avoid the replication issues.
Otherwise if storing it isn't required consider Google Analytics too.
Also I faced this problem. I can not say that it has a easy solution. Document locking is the only solution that i had found. But the visitor's count is not possible.
It is possible, but not by updating the document. Instead have an AJAX call to an agent or form with parameters on the URL identifying the document being read. This call writes a document into a tracking DB with one or two views and then determines from those views how many reads you have had. The number of reads is the return value of the AJAX form.
This can be written in LS, Java or #Formulas. I would try to do it 100% in #Formulas to make it as efficient as possible.
You can also add logic to exclude reads from the same user or same source IP address.
The tracking database then replicates using the same schedule as the other database.
Daily or Hourly agents can run to create summary documents and delete the detail documents so that you do not exceed the limits for #DBLookup.
If you do not need very nearly real time counts (and that is the best you can get with replicated system like this) you could use the web logs that domino generates by finding the reads in the logs and building the counts in a document per server.
/Newbs
Back in the 90s, we had a client that needed to know that each person had read a document without them clicking to sign or anything.
The initial solution was to add each name to a text field on a separate tracking document. This ran into problems when it got over 32k real fast. Then, one of my colleagues realized you could just have it create a document for each user to record that they'd read it.
Heck, you could have one database used to track all reads for all users of all documents, since one user can only open one document at a time -- each time they open a new document, either add that value to a field or create a field named after the document they've read on their own "reader tracker" document.
Or you could make that a mail-in database, so no worries about replication. Each time they open a document for which you want to track reads, it create a tiny document that has only their name and what document they read which gets mailed into the "read counter database". If you don't care who read it, you have an agent that runs on a schedule that updates the count and deletes the mailed-in documents.
There really are a lot of ways to skin this cat.
My customer actually stores his documents, which are single page automotive forfeits, in a single MS Word document... this method is of course generating a huge file which is slow to open, not to talk about searches.
After a user compiles a document, he may need to print it to manually sign it. Then the document is scanned back and stored in PDF format. The document may be printed again to be
signed a second time by a manager. The doubly signed document is scanned again and saved
overwriting the singly-signed one.
The user wants to be able to search the document using a couple of search keys (the doc number and a sort of a SSN). That is the reason they are using a single file, to be able to search in the file using Word's search feature.
I have to propose an IT solution. I was thinking about giving them a software tool that:
reads a pdf form/template; the template rarely changes
shows the template on the screen and allows the user to input his variable fields in the form
some of the fields must be defined as searchable
the user saves only the form fields, not the whole pdf.
the sw is able to rebuild a document by coupling the template with the fields. I have to find a way to tie the template with the saved fields, so that the template can change (versioning) without breaking the old documents
the tool allows to search in multiple documents, using the defined search fields
the tool allows to print the document to manually sign it; this is the hard part. When the document is signed cannot be changed anymore, but if the document is simply scanned and coupled with the form/fields pdf, then I'll loose the benefits of only storing the data decoupled from the template. Should I only scan the signature and attach it to the document as an image?
What do you suggest to use?
Adobe XML Forms?
Adobe Forms Data Format?
An already existing software?
Other?
For the existing documents, I want allow the customer to import his huge MS Word file into the new system.
Thanks.
Sounds like you want a PDF form template that submits data to a dB that can be searched.
OTOH, if you just save the PDFs, Acrobat Pro can generate an index file from a directory, that can be searched (from reader?). Yep, you can run searches on an index from reader, but can only build them with Acrobat.
I prefer AcroForms to LiveCycle forms myself. There's a lot more software out there that works with 'em. If you go with LiveCycle, you're almost completely locked into Adobe. And Adobe server software is EXPENSIVE.
SSRS newbie question here...
I have a table where one column is varbinary(max) data. I would like to make a report that makes this data available for download as a hyperlink so the user can just click on the item and get a file download dialog for the binary data. In this particular case, the binary data happens to be the content of old pdf files, but that shouldn't matter.
I tried searching around but I can't find any pointers on how to do this. It seems to me that it should be possible. There are ways to display images in a report using varbinary data, so it makes sense that one should be able to make arbitrary binary data downloadable on a report, right?
No, it is not possible as far as I can tell. Don't see anyway to do this.
The work-around that I used was to create a simple asp.net page to serve the binary content through some URL. I then hyperlinked to that page from the SSRS report giving the right variables in the URL. Works fine for me, YMMV if you have to worry about URL hacking or security issues.
I have a PDF generated by 3rd party system. Using PDF editor or els software I have modified it.
Is it possible to detect if PDF file was modified, without original file?
I will add some more details.
There is no encryption and no signature features.
Document is created by IT system. User receives document and modifies it.
Is it possible to track that change somehow?
I thought that all these applications leaves some data in PDF header or somewhere encoded inside file and it is possible to check it. However properties showed by windows explorer shows nothing... so I was interested if there is something smarter than viewing properties/header in explorer.
The problem with this is that just opening the PDF on a Mac in Preview and hitting Command-S to save the file will replace both the Creation and Modification date to match the current date/time. So even the creation date will be wrong. Even novice users can unknowingly do this, so if you're trying to track someone who may be purposefully modifying the document, it may lead to a false positive.
What you're asking is just too easy to spoof and fool unfortunately.
You could always check the md5sum of the pdf file. I'm not sure what environment you are using but that should help get you started.
It's going to be rough without the original file unless there were security features like encryption or digital signatures applied to it, which it doesn't sound like there was. Do you have access to any information at all about the original file? A file size, creation date, any of the metadata, etc.?
If the tool used to modify the PDF is working according to the PDF spec then in the Info dictionary it should update ModDate but leave CreationDate alone. You may also see some non-zero generation numbers on the objects although it is just as possible that all the objects have been regenerated and will therefore be generation 0. The trial version of CosEdit will allow you to look at these 2 items.
If however the tool has been used to intentionally modify the PDF without leaving a trace then they would be spoofing those bits of data so they won't help you.
Are the users modifying the PDF using Acrobat? If so then what Danio mentioned above should work. Strictly speaking, modifying the PDF should change its ModDate or xmp:ModifyDate without changing its CreationDate. However not all tools adhere to this; quite a few simply leave all metadata untouched, so this method of checking isn't 100% reliable unless you know what PDF editor your users employ.
If the editor your users use does change ModDate or xmp:ModifyDate, then you should be able to see it in two places. One is when you open the document in Acrobat and hit Ctrl-D to view Document Properties. The Creation field and Modified field should have different timestamps. There may also be APIs that can be used to programmatically retrieve this metadata. The other way you can visualize it is to simply open the PDF in Notepad and search for the properties. Most of the document won't be human readable but these timestamps should be. If they do get changed appropriately, you can always parse for them in your application. Good luck!
If you're using Ubuntu linux 18.04 and using Document Viewer then, you can
click on File options (3 vertical line ellipsis)
click on Properties...
look for Created / Modified fields in the Properties pop up
Beware: A sufficiently knowledgeable user can manipulate the PDF contents without changing the Created and Modified time stamps in the PDF metadata and the file system.
You can use some tools to get the pdf file property.
I use pdfinfo, you can get many property of the file, and check it.
pdfinfo 58dcc41d01293.pdf
Author: worker
Creator: Microsoft® Word 2016
Producer: Microsoft® Word 2016
CreationDate: Sat Aug 24 16:02:29 2019
ModDate: Sat Aug 24 16:02:29 2019
Tagged: yes
UserProperties: no
Suspects: no
Form: none
JavaScript: no
Pages: 55
Encrypted: no
Page size: 841.92 x 595.32 pts (A4)
Page rot: 0
File size: 3346838 bytes
Optimized: no
PDF version: 1.7