Editing the metadata of a PDF file using PDFBox - pdfbox

I am able to edit the metadata for a PDF file using PDFBox3.x. However, at the final step I am being forced to save the changes to a different file.
Is there a way to make changes to the original file itself?
Sample code:
PDDocument doc = Loader.loadPDF(inFileObj);
PDDocumentInformation pdd = doc.getDocumentInformation();
pdd.setAuthor("Mr. Stack Overflow");
// ...
doc.save(outFileObj); // This works, however doc.save(inFileObj) makes the PDF document a blank document

Related

Error converting .docx with HTML to PDF using Graph API

I am trying to convert MS Word (.docx) file to PDF format using Graph API. The file is stored in SharePoint Office 365. I am using below code which works.
var httpClient = await CreateAuthorizedHttpClient();
string path = $"{GraphEndpoint}sites/{SiteId}/drive/items/";
string requestUrl = $"{path}{fileId}/content?format={targetFormat}";
var response = await httpClient.GetAsync(requestUrl);
However, when we try to convert .docx file which contains HTML added using below code converting fails.
string altChunkId = "myId123";
//Create an alternative format import part on the MainDocumentPart
AlternativeFormatImportPart altformatImportPart = wordDoc.MainDocumentPart
.AddAlternativeFormatImportPart(AlternativeFormatImportPartType.Html, altChunkId);
using (MemoryStream htmlMemoryStream = new MemoryStream(Encoding.UTF8.GetBytes($"<html><head></head><body>{value}</body></html>")))
{
//Add the HTML data into the alternative format import part
altformatImportPart.FeedData(htmlMemoryStream);
//create a new altChunk and link it to the id of the AlternativeFormatImportPart
AltChunk altChunk = new AltChunk();
altChunk.Id = altChunkId;
//p.InsertAfterSelf(altChunk);
documentBody.Append(altChunk);
break;
}
I get 406 Not Acceptable error when we try to convert the file using Graph API. Also I see that the file is not editable in browser and open in compatibility mode. If I try to open the document in edit mode I get error:
Sorry this document can't be opened because it contains objects that
word doesn't support
I tried removing the HTML part of the document and pasted that in another document and tried converting that document to PDF which worked. When I saw the XML of the document I saw Word App converted that HTML to word compatible XML tags.
Question 1: How can I convert the HTML to word compatible tags? So I can convert the document to PDF.
Also if I try to Download as PDF, the file is converted to PDF without any issue.
This option is using below API call:
https://word-view.officeapps.live.com/wv/WordViewer/request.pdf?WOPIsrc={SiteURL}%2F%5Fvti%5Fbin%2Fwopi%2Eashx%2Ffiles%2F{ID}&access_token=&access_token_ttl=&z=256&type=downloadpdf
Question 2: Is there a way I can use this API to convert .docx file to PDF? I saw the access token's audience value is "wopi/{TenantName}#{TenantID}". If I get the correct access token I think I will be able to use the above API.

iText 7 Chinese characters and merge with existing pdf template

I have to rephrase my question, basically my request is very straight forward, i want to display Asian characters in the generated pdf file from iText7.
As of now i have download the NotoSansCJKsc-Regular.otf file and assign a variable to hold the path, below is my code:
public static string FONT = #"D:\Projects\Resources\NotoSansCJKsc-Regular.otf";
PdfWriter writer = new PdfWriter(#"C:\temp\test.pdf");
PdfDocument pdfDoc = new PdfDocument(writer);
Document doc = new Document(pdfDoc, PageSize.A4);
PdfFont fontChinese = PdfFontFactory.CreateFont(FONT, PdfEncodings.IDENTITY_H);
doc.SetFont(fontChinese);
but the issue i am facing now is whenever the code runs to this section:
PdfFont fontChinese = PdfFontFactory.CreateFont(FONT, PdfEncodings.IDENTITY_H);
i am always getting this error: The request could not be performed because of an I/O device error. and this error doesn't make sense to me and I am struggling to find out the solution, could someone in here had the similar issue plz, the code is in C#.
Many thanks.
I can confirm that above code is working as expected, the .otf file that I was originally downloaded was corrupted, hence I got above error.

Adobe breaks stamped PDF when saving as new file / what is difference in Adobe 'save as' vs. Foxit Reader 'save as' feature

I'm reaching out to larger community of developers in seek of help to understand the real cause and possibly finding a fix. I have asked questions from Aspose, and they have also tracked the issue (PDFNET-42880) in their system. I think they are not going to investigate this anytime soon as it is very specific case. And now I am posting this here to ask more details about:
What is difference in Adobe 'save as' vs. Foxit Reader 'save as' vs. Windows Reader 'save as' feature?
Issues with Adobe product that are not so obvious to figure out. I don't even know what to ask :D
Link to their (Aspose) old forum: https://www.aspose.com/community/forums/thread/845549/removing-stamps-fails-after-saving-stamped-file-from-adobe-acrobat.aspx
Case:
Created PDF with forms using OpenOffice (version 3.4.0), stamped with Aspose PDF, opened with Adobe Reader DC (or Adobe Acrobat XI), filled, saved as new file. Now this new file is fine, but when I try to remove stamps using Aspose (and replace with new stamp later), this is where things get interesting.
Files that I've tested with: https://1drv.ms/f/s!Auvpijam7a73iDzOqc6wZPuY9l81
Stamp_Location.png
OoPdfFormExample_WithStamp.pdf
OoPdfFormExample_WithStamp_StampRemoved.pdf
OoPdfFormExample_WithStamp_SavedFromFoxit.pdf
OoPdfFormExample_WithStamp_SavedFromFoxit_StampRemoved.pdf
OoPdfFormExample_WithStamp_SavedFromWindowsReader.pdf
OoPdfFormExample_WithStamp_SavedFromWindowsReader_StampRemoved.pdf
OoPdfFormExample_WithStamp_SavedFromAdobeReader.pdf
OoPdfFormExample_WithStamp_SavedFromAcrobat_StampRemoved.pdf
C# code that is used to remove the stamp(s):
/// <summary>
/// Removes stamps from PDF file.
/// </summary>
/// <param name="pdfFile"></param>
private static void RemoveStamps( string pdfFile )
{
// Create PDF content editor.
Aspose.Pdf.Facades.PdfContentEditor contentEditor = new Aspose.Pdf.Facades.PdfContentEditor();
// Open the temp file.
contentEditor.BindPdf( pdfFile );
// Process all pages.
foreach ( Page page in contentEditor.Document.Pages )
{
// Get the stamp infos.
Aspose.Pdf.Facades.StampInfo[] stampInfos = contentEditor.GetStamps( page.Number );
//Process all stamp infos
foreach ( Aspose.Pdf.Facades.StampInfo stampInfo in stampInfos )
{
// Use try catch so we can output possible error w/out break point.
try
{
contentEditor.DeleteStampById( stampInfo.StampId );
}
catch ( Exception e )
{
Console.WriteLine( e );
}
}
}
// Save changes to the temp file.
contentEditor.Save( StampRemovedPdfFile );
}
Using Adobe: The process of removing stamp works fine, but trying to open the file will end up having an issue with the file.
"An error exists on this Page. Acrobat may not display the page correctly. Please contact the person who created the PDF document to correct the problem."
EDIT: After testing more, and just opening file to Aspose, and saving it without modifications, that didn't break the file, only once the stamp was removed with Aspose method it was broken.
Using Foxit: Only difference in the process is that opening the file to Foxit Reader and save form there. The stamp is removed and file is fine, works with any PDF reader.
Using Windows (10) Reader: Only difference in the process is that opening the file to Windows Reader and save from there. The stamp is removed and file is fine, works with any PDF reader.
Ok - The thing you are referring to is not a stamp annotation. It's an XObject that gets drawn into the page content. Why Aspose refers to it as a Stamp is... well... a mystery. When you remove the "stamp" (not a stamp) Aspose seems to be removing the XObject but not the instructions to draw it from the page Contents stream... that's why you're getting the error in Acrobat. The other applications are more permissive with bad PDF and my guess is when they write out the file, they are removing references to non-existent objects. You can make Acrobat attempt to fix problems like this by selecting Save As Optimized PDF. However, you are far better off removing the drawing instruction in addition to the XObject.
Because of the way you've created the file and added the "stamp", your page content stream is an array of streams. Remove the last item in the array, which is the instruction to draw the XObject, and you file will work without errors in all the viewers. Note: It won't always be the case that the last item in the content array will be your stamp. It's just that your stamp is the last thing to get drawn so it goes at the end.
If your intention is to "replace" the "stamp", you'll want to do so by removing the XObject as you are doing now, then remove the instruction, then add the new "stamp".

itextsharp advance pdf generation by configuration xml

I am generating PDF using iTextSharp.i got so many sample code in google but how we can configure pdf setting which is required to generate PDF like
var doc = new Document(PageSize.A5);
var doc = new Document(new Rectangle(100f, 300f));
these all setting i wanted to set my config xml file and read xml and generate pdf on fly. is any professional sample example code available?plz help..

Relative file links in pdf files

I'm creating a single pdf file that I'd like to link to other files in the same directory as the pdf.
ie.
MyFolder
|
|-main.pdf
|-myotherpdf.pdf
|-myotherotherpdf.pdf
I'd like the main.pdf to have links that would cause the default program on the pdf to open the other pdfs.
As I am generating these file on a server and then providing them in a download to the client I cannot use absolute links as these would not exist on the client pc.
So firstly do pdf files actually support relative file links like this, I haven't found much that says they do either way.
Additionally to generate my pdf I'm using abcpdf and providing it html to convert to pdf.
To try and generate the correct out the correct urls in html I have tried the following
<a href='test.pdf'>test pdf link to local file</a>
<a href='#test.pdf'>test pdf link to local file</a>
<a href='/test.pdf'>test pdf link to local file</a>
<a href='file:///test.pdf'>test pdf link to local file</a>
<a href='file://test.pdf'>test pdf link to local file</a>
Most of them either direct to me a point where the pdf document was generated from (temporary file path) or they link hovering shows "file:///test.pdf" in acrobat but clicking it causes a warning dialog to popup asking to allow/deny, upon clicking allow it opens up in firefox with the url "file:///test.pdf" which wouldn't resolve to anything.
Any ideas on how to get this working or if this kind of linking is even possible in pdfs?
I can only answer your question: does PDF files actually support relative file links like this?
Yes, it does. I created a little test with a main.pdf that has two links to two other PDF documents in the same folder. I created the links manually with Acrobat and associated a launch action with the link annotation. See the internal structure here:
Here is the zip with the main plus two secondary PDFs. Note that you can copy them anywhere and the relative links remain valid.
https://www.dropbox.com/s/021tvynkuvr63lv/main.zip
I am not sure how you would accomplish this with abcpdf, especially since you are converting from HTML which probably limits the PDF features available.
So I got it working in the end thanks to #Frank Rem and some help from the abcpdf guys
Code is as follows
foreach (var page in Enumerable.Range(0, doc.PageCount))
{
doc.PageNumber = page;
var annotEnd = doc.GetInfoInt(doc.Page, "Annot Count");
for (var i = 0; i <= annotEnd; ++i)
{
var annotId = doc.GetInfoInt(doc.Page, "Annot " + (i + 1));
if (annotId > 0)
{
var linkText = doc.GetInfo(annotId, "/A*/URI*:Text");
if (!string.IsNullOrWhiteSpace(linkText))
{
var annotationUri = new Uri(linkText);
if (annotationUri.IsFile)
{
// Note abcpdf temp path can be changed in registry so if this changes
// will need to rewrite this to look at the registry
// http://www.websupergoo.com/helppdfnet/source/3-concepts/d-registrykeys.htm
var abcPdfTempPath = Path.GetTempPath() + #"AbcPdf\";
var relativePath = annotationUri.LocalPath.ToLower().Replace(abcPdfTempPath.ToLower(), string.Empty);
// Only consider files that are not directly in the temp path to be valid files
// This is because abcpdf will render the document as html to the temp path
// with a temporary file called something like {GUID}.html
// so it would be difficult to tell which files are the document
// and which are actual file links when trying to do the processing afterwards
// if this becomes and issue this could be swapped out and do a regex on {GUID}.html
// then the only restriction would be that referenced documents cannot be {GUID}.html
if (relativePath.Contains("\\"))
{
doc.SetInfo(annotId, "/A*/S:Name", "Launch");
doc.SetInfo(annotId, "/A*/URI:Del", "");
doc.SetInfo(annotId, "/A*/F:Text", relativePath);
doc.SetInfo(annotId, "/A*/NewWindow:Bool", "true");
}
}
}
}
}
}
This will allow each link to be opened in the viewer that is associated with it on the pc.