Redacting information in pdf programmatically - pdf

I would like some suggestions on how I can achieve this. While there is discussion on this topic, it is six years old and I am hoping there are SaaS solutions available today or easy way to do it.
I would like to run a program on tax-returns in pdf format that would remove or redact sensitive information from the pdf file such as Name, Address, SSN, and other PII, and generate a public copy of the tax return in pdf that is safe to share with others.
The source of the pdf can be a scanner or tax software. Is it there an easy way to accomplish this?
Thanks,
Dan

There is a SaaS based image storage and manipulation service called cloudinary (cloudinary.com) which has an add-on that may help to redact text, see: https://cloudinary.com/documentation/ocr_text_detection_and_extraction_addon
How are these files being presented? e.g. are the PDF files viewed on the web or via an application as images?
[i am not affiliated with cloudinary]

Related

Concatenation of certified PDFs

We have a need to concatenate a number of PDFs uploaded by a user into a single PDF file. We're currently using iTextSharp for this without problem for standard PDFs. But sometimes one of the files is certified (e.g. a bank statement issued by the bank) and this is causing a problem. It's treating the operation as an edit, which is not allowed because of the certificate.
My question is: is this going to be possible to achieve, or is there a fundamental reason that it can't be done? What tools could I use and how (iTextSharp, Aspose.Pdf, etc)?
For clarity, I don't want the certificate to be maintained in the concatinated PDF. I would like a standard PDF to be the result. Also, I'm not talking about PDFs protected with a password.
Most of the discussion I can find online is either talking about password protected files, or trying to maintain the certificate.
Many thanks,
Robin
Certifying a PDF document is to secure it against modification. You may remove the signature using Aspose.Pdf from certified PDF and later concatenate the resultant PDF with other PDF documents.
I'm Tilal, developer evangelist at Aspose.

Count no. of pages in a PDF file while uploading in Javascript

I want to calculate number of pages in a PDF file while uploading to my application by Javascript in Client Side. I want some help for this. I Google this topic but I couldn't get any appropriate solution.
If you are uploading PDF file to the server anyway then it is better to do this on the server-side, there are lot of options depending what is your target platform).
For example, you may use the open source Ghostscript to count total number of pages.

How to View a Google Spreadsheet Doc as a PDF

I would like to know if it's possible to view a Google Spreadsheet Doc as a PDF without first manually converting it as a PDF? I don't want to share a link directly to the spreadsheet, I want to share a link to a PDF version of it which ends up looking better (in Print View rather than Spreadsheet Document View)
I know I can Print > Save as PDF, then download to local machine, then upload and save somewhere on my server. But is there is a way to be able to view the spreadsheet as a PDF.
I have Google'd this and found nothing. The best I could come up with is the Google Document Viewer (https://docs.google.com/viewer) but that does not seem to give mt the option I am looking for. Further, I do not want to install any Chrome plugins, etc. because I want to be able to share a link to the PDF with people but not have to have them install a plugin to see the doc.
Unfortunately, what you are trying to do and the way you are trying to do it is not a capability within Google Docs. Sorry.
I think the best way is to use Google Drive API to write own script that will do this job. I mean:
You have a web server
Write a simple method in any web technology, such as PHP, Python, Java, C#, whatever you like and your server is able to serve. This method is connected to the google drive through it's API to your account, knows which spreadsheet to take care of and how to understand the columns. This spread should be parsed to HTML and with some popular tool (proper for your programming language or server's operating system) you create the PDF. The method should create HTTP response with header type: application/pdf.
You provide interested people with the link under which your method is available.
I guess this reference should help you to use Google API:
How to download the resources:
https://developers.google.com/drive/web/manage-downloads
How to convert (i.e. to PDF) and open the resources in your own application:
https://developers.google.com/drive/web/integrate-open#open_and_convert_google_docs_in_your_app
I hope this helps.

Lotus Notes - Automatically export email to PDF on arrival

Is it possible with Lotus Notes 8.5 to write a program (assuming an Agent) that will automatically export the email as a PDF document where the name of the document will be the subject line of the email?
I am being told by our lovely IT people that this will take months worth of effort to investigate, test and implement.
Surely there is a function that could be called to do this?
Can anyone please point me in the direction of a tutorial or help doc etc that I can read so I can have some more information to speak more authoritatively with our IT guys.
My intention is then to hand this information to the Domino Design team to ask them to build the function (without taking months to do so). :-)
Thank you in advance for any help you can provide.
There is a third party application called PD4ML which allows you to export to PDF format. They also supply samples on how to do this in the Notes client.
http://www.pd4ml.com/lotus.htm
You would need to create an agent that runs on new mail arriving.
There is also some sample code on SearchDomino.
http://searchdomino.techtarget.com/tip/Converting-Lotus-Notes-Domino-Web-pages-to-PDF-files-with-a-Java-agent
Alternative option is also available, user can save Lotus Notes email to PDF format through third party Lotus Notes to PDF tool with 100% secure manner.
http://www.lotusnotestooutlookexpress.com/lotus-notes-export-email-to-pdf
you can read email conversion process and other useful detail of software in PDF format
http://www.lotusnotestoexchange.com/guide-for-lotus-notes-to-pdf-conversion.pdf
There's an "industry standard" tool to export Lotus Notes documents to PDF.
This tool is Swing PDF Converter.
Please check it out here: http://www.swingsoftware.com/pdf-converter/overview
Swing PDF Converter supports Lotus Notes format emails (RTF) and HTML - MIME email conversion to PDF.
There is also advanced support for pdf document naming, single document conversion, batch conversion, document archiving option and even support for automatic upload to document repositories such as MS Sharepoint, Alfresco, Filenet, etc..

Embedding PDF documents into websites

I need to embed some PDF documents into a website. The last time I did this, I used a jQuery lightbox to popup an iFrame with the PDF document as the URL. The client's PDF viewer would then take care of the rest.
Apparently though, that was a bit buggy on some other peoples browsers. I guess it was due to the large PDF file sizes and the effort it took for their computers to fire up Adobe.
So I'm after ideas on how to go about this. How do you guys embed your PDF's into websites? Or do you just stick to adding a download link?
I often use scribd to solve this issue.
You have to upload your document (can be PDF, DOC or something else) to your scribd account and the service makes it possible to view this (pdf) document in a flash environment (perfectly embedabble with lightbox).
For this solution, a third party service (scribd) is required for your documents, but with their API it's possible to include all scribd functionality in your own website.
We have used Docuter
They let you embed and track
I've used Google Docs in Flash: http://trajctrl.tyblu.ca/?page_id=2
It's a bit buggy, but I find it works if you wiggle the image a bit - ie: zoom, click, etc. Download link is nearby just in case, too. Not exactly sure how it was done, as its a Wordpress plugin (Google Doc Embedder), but I imagine Google has an API somewhere.