How to read UTF metadata returned by PDFsharp? - pdf

I am trying to read some metadata from a PDF file and I am using PDFsharp for this. Unfortunately it seems to return the creator as UTF.
var reader = PdfReader.Open("data.pdf");
var creator = reader.Info.Creator; // þÿ\0M\0i\0c\0r\0o\0s\0o\0f\0t\0®\0 \0W\0o\0r\0d\0 \02\00\01\03
When using iTextSharp it works fine, which shows it's not a problem of the file.
var reader = new PdfReader("data.pdf");
var creator = reader.Info["Creator"]; // Microsoft® Word 2013
How can I read the creator correctly with PDFsharp? Since it seems to be an encoding issue I have the impression it should be a relatively easy fix, but I can't find it.

I can replicate this behaviour with PDFsharp 1.32 from 2013.
I cannot replicate this behaviour with PDFsharp 1.50 from 2018. At some time in the past five years this issue has been fixed.
Use the latest version of PDFsharp and you can access the meta data as expected. Now that is a very easy fix. No bad idea to always try the latest version first when you encounter problems.

Related

Upload file as octet-stream using RestSharp 1.7

I'm trying to upload a file as octet-stream with RestSharp 1.7 but can't get it to work.
I've tried the following approaches:
Using restRequest.AddFile("file", documentContent, MediaTypeNames.Application.Octet);
This doesn't generate an octet-stream even if I specify it.
Using restRequest.AddBody(documentContent, MediaTypeNames.Application.Octet);
This seems to generate an octet-stream but the content doesn't seem to be the correct byte content. This worked on v.1.6.
If anyone have a working version of uploading streaming file data for RestSharp v.1.7 it would be greatly appreciated.
This is a bug in RestSharp v.107 according to this
https://github.com/restsharp/RestSharp/issues/1761
and the correction here:
https://github.com/restsharp/RestSharp/commit/a7d2662ef95269fab1e6dac2ef2ad3003c5c1e85
The solution for now is to use the following:
restRequest.AddParameter(new BodyParameter("", byteArrayContent, MediaTypeNames.Application.Octet, DataFormat.Binary));

Handling of Thumbnails in Google Drive Android API (GDAA)

I've run into the following problem when porting an app from REST API to GDAA.
The app needs to download some of (thousands of) JPEG images based on user selection. The way this is solved in the app is by downloading a thumbnail version first, using this construct of the REST API:
private static InputStream getCont(String rsid, boolean bBig){
InputStream is = null;
if (rsid != null) try {
File gFl = bBig ?
mGOOSvc.files().get(rsid).setFields("downloadUrl" ).execute():
mGOOSvc.files().get(rsid).setFields("thumbnailLink").execute();
if (gFl != null){
GenericUrl url = new GenericUrl(bBig ? gFl.getDownloadUrl() : gFl.getThumbnailLink());
is = mGOOSvc.getRequestFactory().buildGetRequest(url).execute().getContent();
}
} catch (UserRecoverableAuthIOException uraEx) {
authorize(uraEx.getIntent());
} catch (GoogleAuthIOException gauEx) {}
catch (Exception e) { }
return is;
}
It allows to get either a 'thumbnail' or 'full-blown' version of an image based on the bBig flag. User can select a thumbnail from a list and the full-blown image download follows (all of this supported by disk-base LRU cache, of course).
The problem is, that GDAA does not have an option to ask for reduced size / thumbnail version of an object (AFAIK), so I have to resort to combining both APIs, which makes the code more convoluted then I like (bottom of the page). Needles to state that the 'Resource ID' needed by the REST may not be immediately available.
So, the question is: Is there a way to ask GDAA for a 'thumbnail' version of a document?
Downloading thumbnails isn't currently available in the Drive Android API, and unfortunately I can't give a timeframe to when it will be available. Until that time, the Drive Java Client Library is the best way to get thumbnails on Android.
We'd appreciate if you go ahead and file a feature request against our issue tracker: https://code.google.com/a/google.com/p/apps-api-issues/
That gives requests more visibility to our teams internally, and issues will be marked resolved when we release updates.
Update: I had an error in the discussion of the request fields.
As Ofir says, you can't get thumbnails with the Drive Android API and you can get thumbnails with the Drive Java Client Library. This page has is a really good primer for getting started:
https://developers.google.com/drive/v3/web/quickstart/android
Oddly, I can't get the fields portion of the request to work as it is on that quick start. As I've experienced, you have to request the fields a little differently.
Since you're doing a custom field request you have to be sure to add the other fields you want as well. Here is how I've gotten it to work:
Drive.Files.List request = mService.files()
.list()
.setFields("files/thumbnailLink, files/name, files/mimeType, files/id")
.setQ("Your file param and/or mime query");
FileList files = request.execute();
files.getFiles(); //Each File in the collection will have a valid thumbnailLink
A sample query might be:
"mimeType = 'image/jpeg' or mimeType = 'video/mp4'"
Hope this helps!

How to set PDF copyright metadata using ColdFusion (&iText)

I'm attempting to set the copyright metadata of a PDF using ColdFusion & iText. Adobe ColdFusion 9-11 allow getting & setting the Title, Subject, Keywords, Creator & Author fields, but doesn't appear to allow access to the "Additional Metadata" properties. (This is my first SO question.)
I found a potential iTextSharp/C# solution and have tried to convert it to CFML, but I haven't been able to determine how to successfully access System.IO.MemoryStream() w/ColdFusion. I get a "Could not find the ColdFusion component or interface MemoryStream. Ensure that the name is correct and that the component or interface exists." error and searching the internet doesn't offer any results.
https://stackoverflow.com/a/6942531/693068
NOTE: I'm using iText because the PDF files are created using ABBYY FineReader. I've encountered numerous issues in the past where ColdFusion refuses to identify non-Acrobat-generated PDFs as valid PDFs when using isPDFFile().
Any ideas? Does the functionality currently exist and just not documented anywhere? Thanks.
(Too long for comments ...)
As Ryan mentioned, it might be possible using DDX. The cfpdf documentation lists Metadata as a supported element. So you may want to look into that option first.
I found a potential iTextSharp/C# solution
That said, there is no need to use an external C# library. CF is already bundled with an older version of iText (written in java). So use the java classes instead. iTextSharp is a port of the original java library, so the class and method names will usually be the same.
source = "c:/path/to/input.pdf";
target = "c:/path/to/output.pdf";
reader = createObject("java", "com.lowagie.text.pdf.PdfReader").init( source );
output = createObject("java", "java.io.FileOutputStream").init( target );
stamper = createObject("java", "com.lowagie.text.pdf.PdfStamper").init( reader, output );
copyrightName = "YOUR NAME HERE";
copyrightUrl = "http://www.example.com/";
baos = createObject("java", "java.io.ByteArrayOutputStream").init();
xmp = createObject("java", "com.lowagie.text.xml.xmp.XmpWriter").init(baos);
xmp.addRdfDescription("xmlns:dc=""http://purl.org/dc/elements/1.1/""", "<dc:rights><rdf:Alt><rdf:li xml:lang=""x-default"">"& copyrightName &"</rdf:li></rdf:Alt></dc:rights>");
xmp.addRdfDescription("xmlns:xmpRights=""http://ns.adobe.com/xap/1.0/rights/"""
, "<xmpRights:Marked>True</xmpRights:Marked><xmpRights:WebStatement>"& copyrightUrl &"</xmpRights:WebStatement>");
xmp.close();
stamper.setXmpMetadata(baos.toByteArray());
stamper.close();
You may be able to do this with the DDX capabilities. See these links:
Using DDX to unlock the potential of PDF manipulation in ColdFusion
<cfpdf> documentation
System.IO.MemoryStream is a C# class. You won't be able to access it from CFML directly. You could try to use the .net integration features to access it, or the iTextSharp solution, directly.

Uploading files using html5 FormData in dojo(without using XmlHttpRequest)

I want to upload files using FormData Object(html5) in dojo and without using XmpHttpRequest.
I am using dojo.xhrPost to upload files.
Please post your ideas/thoughts and experience.
Thanks
Mathirajan S
Based on your comment I am assuming you do want to use XHR (which would make sense given that FormData is part of the XHR2 spec).
dojo/request/xhr (introduced in Dojo 1.8) supports passing a FormData object via the data property of the options object, so that may get you what you want.
request.post(url, {
data: formdataObjectHere
// and potentially other options...
}).then(...);
The legacy dojo/_base/xhr module does not explicitly support XHR2, but it does lean on dojo/request/xhr now, so it might end up working anyway, but no guarantees there.
More information on dojo/request/xhr can be found in the Reference Guide.

Tracking Code Into a PDF or PostScript File

Is there a way to track when a PDF is opened? Perhaps by embedding some script into the pdf itself?
I saw the question below, and I suppose the answer is "no" for javascript, but I am wondering if this is possible at all.
Google analytics tracking code insert in pdf file
The PDF standard includes support for JavaScript but as #Wes Hardaker pointed out, not every PDF reader supports it. However, sometimes some is better than none.
Here's Adobe's official Acrobat JavaScript Scripting Guide. What's probably most interesting to you is the doc object which has a method called getURL(). To use it you'd just call:
app.doc.getURL('http://www.google.com/');
Bind that event to the document's open event and you've got a tracker. I'm not too familiar with creating events from within Adobe Acrobat but from code its pretty easy. The code below is a full working VS2010 C# WinForms app that uses the open source library iTextSharp (5.1.1.0). It creates a PDF and adds the JavaScript to the open event.
Some notes: Adobe Acrobat and Reader will both warn the user whenever a document accesses an external resource. Most other PDF readers will probably do the same. This is very annoying so for this reason alone it shouldn't be done. Personally I don't care if someone tracks my document opens, I just don't want to get a prompt every time. Second, just to reiterate, this code works for Adobe Acrobat and Adobe Reader, probably as far back as at least V6, but may or may not work in other PDF readers. Third, there's no safe way to uniquely identify the user. Doing so would require you to create and store some equivalent of a "cookie" which would require you writing to the user's file system which would be considered unsafe. This means that you could only detect the number of opens, not unique opens. Fourth, this might not be legal everywhere. Some jurisdictions require that you notify users if you are tracking them and provide for a way for them to see what information you are collecting.
But with all of the above, I can't not give an answer just because I don't like it.
using System;
using System.Text;
using System.Windows.Forms;
using System.IO;
using iTextSharp.text;
using iTextSharp.text.pdf;
namespace WindowsFormsApplication1
{
public partial class Form1 : Form
{
public Form1()
{
InitializeComponent();
}
private void Form1_Load(object sender, EventArgs e)
{
//File that we will create
string OutputFile = Path.Combine(Environment.GetFolderPath(Environment.SpecialFolder.Desktop), "Events.pdf");
//Standard PDF creation setup
using (FileStream fs = new FileStream(OutputFile, FileMode.Create, FileAccess.Write, FileShare.None))
{
using (Document doc = new Document(PageSize.LETTER))
{
using (PdfWriter writer = PdfWriter.GetInstance(doc, fs))
{
//Open our document for writing
doc.Open();
//Create an action that points to the built-in app.doc object and calls the getURL method on it
PdfAction act = PdfAction.JavaScript("app.doc.getURL('http://www.google.com/');", writer);
//Set that action as the documents open action
writer.SetOpenAction(act);
//We need to add some content to this PDF to be valid
doc.Add(new Paragraph("Hello"));
//Close the document
doc.Close();
}
}
}
this.Close();
}
}
}
The problem with technologies like that is that they can never be absolute.
First, it's a security violation to trigger an external event and the software writers likely wouldn't support it (or, at least I hope not).
Second, its dependent on things like the network. What happens when someone downloads it and then reads it while offline on a plane, for example? You won't get the notification.
Third, there are multiple ways to read PDF files. Some people read them with readers you've likely not heard of (my favorite is a linux application that I like much better than the Adobe's AcroRead).
So even if you could do it (and I'd argue you shouldn't, but that's not answering your question), the real answer is "no" but even if the software supported it, it still wouldn't be reliable in the first place.
Given that PostScript is a fully capable programming language, there shouldn't be any reason that it should not be possible to track when it is viewed/run.
I should think the difficult part in that would be finding the libraries (or making the functions yourself) to do the networking portion of the logging.
One quick note, however, on functionality like this it is probably best if you make things still-accessible on failure; the reason being people tend to get upset when their media suddenly becomes unavailable which is exactly what would happen if you forced termination on failure. (Can you guarantee that your logging-domain will never change? That it will always be available? What happens in the case where the internet is not available in the user's situation?)