Aspose.Words save Word document as pdf loses formatting

Aspose.Words save Word document as pdf loses formatting - asp.net-core

I am using Aspose.Words for .NET to replace some merge fields in my document and then save the file as a PDF, however, my formatting is getting messed up (even for non-merge fields) by the conversion to PDF (refer to the images). The code is quite simple so I don't see what I'm missing.
The word document, pre-processing:
The generated pdf:
As you can see some of the fields are indented a bit more instead of being nicely aligned.
My code for generating the PDF and replacing the merge fields is:
public async Task<Stream> GenerateContractAsync(string requestRegistrationId)
{
var requestRegistration = await _requestRegistrationRepository
.FindRequestRegistration(requestRegistrationId)
.Include(rr => rr.Request.QualityType)
.Include(rr => rr.User)
.SingleOrDefaultAsync();
var file = await _fileService
.LoadFileAsync("Concept contract.docx");
var user = requestRegistration.User;
var document = new Aspose.Words.Document(file);
document.MailMerge.Execute(
new[]
{
"EmployeeName", "EmployeeDateOfBirth", "EmployeePlaceOfBirth", "EmployeeSSN", "EmployeeCity",
"EmployeeAddress", "ContractStartDate", "EmployeeFunction", "HourlyWage", "WageDeductionApplied"
},
new object[]
{
user.FullName, $"{user.Birthday:dd-MM-yyyy}", "Oss", user.Bsn, user.City,
$"{user.PostalCode}, {user.City}", $"{requestRegistration.Request.StartDate:dd-MM-yyyy}",
requestRegistration.Request.QualityType.Name, $"{requestRegistration.Request.HourlyRate:C}",
user.PayrollTaxDiscountEnabled ? "Ja" : "Nee"
}
);
var mergedDocumentStream = new MemoryStream();
document.Save(mergedDocumentStream, SaveFormat.Pdf);
#if DEBUG
mergedDocumentStream.Seek(0, SeekOrigin.Begin);
await _fileService.SaveFileToDiskAsync($"{user.Id}-{DateTimeOffset.Now:g}.pdf", "", mergedDocumentStream);
#endif
mergedDocumentStream.Seek(0, SeekOrigin.Begin);
return mergedDocumentStream;
}
Any help would be greatly appreciated.

The problem occurs because of missing fonts. Please refer to the following article for details.
How Aspose.Words Uses True Type Fonts
In your case, you need to install 'Verdana', 'Arial' and 'Cambria' fonts on the machine where you are executing this Aspose.Words' code. Simply copying these font files from Windows machine to other MAC machine should work.
I work with Aspose as Developer Evangelist.

Related

Only part of a PDF is converting

I have a PDF I am trying to extract the text from.
To do this, I have tried to get the contents into a Google Doc.
The PDF has 1180 pages (3MB) but only the first 77 pages are being converted to text.
I have tried Drive.Files.insert and Drive.Files.copy, but get the same result.
I also tried to convert the PDF using MS Word and referencing that file (2.5MB) - with the same result.
I cannot see anything in either the PDF or Word that would indicate an "end of file" that would stop the rest of the document converting. There are no error messages - just 6.5% of what I need. I can only assume it was originally smaller PDF's that were merged.
Is there something else I should be looking at? Has anyone encountered this before?
I can manipulate the PDFtext string to get the data I need, but can't convert more than the first 77 pages.
This is what I am using to get the text string I require.
function txtPDF() {
var ss = SpreadsheetApp.getActiveSpreadsheet();
var sht = ss.getSheetByName('Sheet1');
var mycell = sht.getRange('B1');
var myPdfID = mycell.getValue().toString();
var PDFblob = DriveApp.getFileById(myPdfID).getBlob();
var resource = {
title: PDFblob.getName(),
// mimeType: PDFblob.getContentType()
};
// var tmpfile = Drive.Files.insert(resource, PDFblob, {ocr: true, ocrLanguage: "en"});
var tmpfile = Drive.Files.copy(resource, myPdfID, {convert: true, ocr: true, ocrLanguage: "en"});
var doc = DocumentApp.openById(tmpfile.id);
// var doc = Drive.Files.copy({}, 'WordFileID', {'convert': true});
// var doc = DocumentApp.openById('WordFileID');
var PDFtext = doc.getBody().getText();
// Drive.Files.remove(doc.getId());
};

It looks like I fell foul of the Google Drive limits.
Documents can be up to 50MB - which was not an issue.
There is however a limit of 1.02 million characters. The 1180 pages exceeded this, so I guess I was lucky to get anything returned at all.
Maximum file sizes on Google Drive

According to the documentation, the pdf to docs conversion is limited to files of 2MB or less. To convert larger files properly look for other alternatives outside of Google.

How to return a string as a CSV with proper encoding for Excel

I have the following code:
public async Task<ActionResult<FileResult>> GetCSV()
{
var stringCsv = await _statistics.GetUserCSV();
using (var stream = new MemoryStream())
using (var sw = new StreamWriter(stream, Encoding.UTF8))
{
sw.Write(stringCsv);
sw.Flush();
return File(stream.ToArray(), "text/csv", "thefile.csv");
}
}
If I inspect stringCsv while debugging it looks good, but If I look at the resulting CSV in Excel I get this.
The missing letters are the Swedish letters ÅÄÖ.
What am I doing wrong?

The issue is not coming from your code. It is from the file itself. Try the following:
On a Windows computer, open the CSV file using Notepad.
Click "File > Save As".
In the dialog window that appears - select "ANSI" from the "Encoding" field. Then click "Save".
That's all! Open this new CSV file using Excel - your non-English characters should be displayed properly.
Let me know if it worked.

Adobe pdf printer doesn't creating the pdf file

I'm creating an add-in in Revit 2017. The addin will export drawing sheets to PDF files. So, whenever I try to export the sheet a dialog box appears to choose the location to save. I tried to turn off the Prompting programmatically by adding a key to the Windows registry (as described in Adobe documentation page 15-16).
Now, the prompting got turned off and now I'm facing an issue. The issue is the Adobe Printer got stuck while creating the pdf file. See the below image: The PDF creating progress bar seems frozen, I waited for more than 10 mins and it didn't create the pdf file.
Can anybody provide any fix?
Appreciate any suggestion.
Edit
here's the code that I've written for this purpose. I hope this may help to identify the problem.
public static bool ExportSheetToPDF(Document doc, string path)
{
using (Transaction tx = new Transaction(doc)
{
tx.Start("Exportint to PDF");
PrintManager pm = doc.PrintManager;
pm.SelectNewPrintDriver("Adobe PDF");
pm.Apply();
pm.PrintRange = PrintRange.Current;
pm.Apply();
pm.CombinedFile = true;
pm.Apply();
pm.PrintToFile = true;
pm.Apply();
pm.PrintToFileName = path + #"\PDF\" + "abc.pdf";
pm.Apply();
SuppressAdobeDialogAndSaveFilePath(path + #"\PDF\" + "abc.pdf");
pm.SubmitPrint();
pm.Apply();
tx.Commit();
}
return true;
}
// Add Registry Key to suppress the dialog box
public static void SuppressAdobeDialogAndSaveFilePath(string value)
{
var valueName = #"C:\Program Files\Autodesk\Revit 2017\Revit.exe";
var reg = currentUser.OpenSubKey(key, true);
var tempReg = reg.OpenSubKey(valueName);
if (tempReg == null)
{
reg = reg.CreateSubKey(valueName);
}
reg.SetValue(valueName, value);
reg.Close();
}

I have explained how you can achieve this by overriding the registry key for Revit.exe process that Adobe uses to generate the next print.
http://archi-lab.net/printing-pdfs-from-revit-why-is-it-so-hard/
Please remember that you still have to print via Revit PrintManager, but then you can set the registry keys before every print to control where the files get saved.

iTextSharp: What is Lost When Copying PDF Content From Another PDF?

I am currently evaluating iTextSharp for potential use in a project. The code that I have written to achieve my goal is making use of PDFCopy.GetImportedPage to copy all of the pages from an existing PDF. What I want to know is what all do I need to be aware of that will be lost from a PDF and/or page when duplicating PDF content like this? For example, one thing that I already noticed is that I need to manually add in any bookmarks and named destinations into my new PDF.
Here's some rough sample code:
using (PdfReader reader = new PdfReader(inputFilename))
{
using (MemoryStream ms = new MemoryStream())
{
using (Document document = new Document())
{
using (PdfCopy copy = new PdfCopy(document, ms))
{
document.Open();
int n;
n = reader.NumberOfPages;
for (int page = 0; page < n; )
{
copy.AddPage(copy.GetImportedPage(reader, ++page));
}
// add content and make further modifications here
}
}
// write the content to disk
}
}

Basically anything that's document-level instead of page-level will get lost and both Bookmarks and Destinations are document-level. Pull up the PDF spec and look at section 3.6.1 for other entries in the document catalog including Threads, Open and Additional Actions and Meta Data.
You might already have seen these but here are some samples (in Java) of how to merge Named Destinations and how to merge Bookmarks.

Create PDF file with images in WinRT

How I can create PDF files from a list of image in WinRT. I found something similar for windows phone 8 here("Converting list of images to pdf in windows phone 8") But I am looking for a solution for windows 8. If anyone of having knowledge about this please share your thoughts with me.

Try http://jspdf.com/
This should work on WinJS, but I haven't tested it. In a XAML app you can try to host a web browser control with a jsPDF-enabled page.

ComponentOne has now released the same PDF library that they had in Windows Phone for Windows Runtime. Tho it's not open source, of course.

Amyuni PDF Creator for WinRT (a commercial library) could be used for this task. You can create a new PDF file by creating a new instance of the class AmyuniPDFCreator.IacDocument, then add new pages with the method AddPage, and add pictures to each page by using the method IacPage.CreateObject.
The code in C# for adding a picture to a page will look like this:
public IacDocument CreatePDFFromImage()
{
IacDocument pdfDoc = new IacDocument();
// Set the license key
pdfDoc.SetLicenseKey("Amyuni Tech.", "07EFCD0...C4FB9CFD");
IacPage targetPage = pdfDoc.GetPage(1); // A new document will always be created with an empty page
// Adding a picture to the current page
using (Stream imgStrm = await Windows.ApplicationModel.Package.Current.InstalledLocation.OpenStreamForReadAsync("pdfcreatorwinrt.png"))
{
IacObject oPic = page.CreateObject(IacObjectType.acObjectTypePicture, "MyPngPicture");
BinaryReader binReader = new BinaryReader(imgStrm);
byte[] imgData = binReader.ReadBytes((int)imgStrm.Length);
// The "ImageFileData" attribute has not been added yet to the online documentation
oPic.AttributeByName("ImageFileData").Value = imgData;
oPic.Coordinates = new Rect(100, 2000, 1200, 2000);
}
return pdfDoc;
}
Disclaimer: I currently work as a developer of the library
For an "open source" alternative it might be better for you to rely on a web service that creates the PDF file using one of the many open source libraries available.

I think this may help you if you want to convert an image (.jpg) file to a PDF file.
Its working in my lab.
string source = "image.jpg";
string destinaton = "image.pdf";
PdfDocument doc = new PdfDocument();
doc.Pages.Add(new PdfPage());
XGraphics xgr = XGraphics.FromPdfPage(doc.Pages[0]);
XImage img = XImage.FromFile(source);
xgr.DrawImage(img, 0, 0);
doc.Save(destinaton);
doc.Close();
Thanks.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Aspose.Words save Word document as pdf loses formatting - asp.net-core

Related

Only part of a PDF is converting

How to return a string as a CSV with proper encoding for Excel

Adobe pdf printer doesn't creating the pdf file

iTextSharp: What is Lost When Copying PDF Content From Another PDF?

Create PDF file with images in WinRT

Categories

Resources