Have a puppeteer generated PDF pass accessibility reports

Have a puppeteer generated PDF pass accessibility reports - pdf

I'm building PDFs using Puppeteer, the resulting PDF looks nice but it is failing PDF Accessibility reports.
The main issues have been the title of the PDF, and the Language of the PDF.
I have tried setting both via EXIF values (Title, Language), the title does display in certain cases but still fails Acrobat Pro's accessibility check report.
I have used another accessibility check report ( http://checkers.eiii.eu/en/pdfcheck/ ) and there the title is set successfully but not the language.
I have used --export-tagged-pdf as a launch parameter which fixed many other issues.
Would anyone have an idea how I could pass the accessibility report please? Mostly the language parameter. I'm using Node.js to generate the PDFs, even if there is another library to edit the PDF after the fact that would be really helpful, I wasn't able to figure that out.

Facing the same problem I managed to get all the required meta data and XMP data except PDF-UA identifier. I used the JS lib "pdf-lib" (https://pdf-lib.js.org/docs/api/classes/pdfdocument) to set the meta data and exiftool-vendored to inject the XMP shema data.
const pdfLib = require('pdf-lib');
const exiftool = require("exiftool-vendored").exiftool
const fs = require('fs');
const distexiftool = require('dist-exiftool');
const pdfData = await fs.readFile('your-pdf-document.pdf');
const pdfDoc = await pdfLib.PDFDocument.load(pdfData);
const nowDate = new Date();
const meta_creator = "The author";
const meta_author = "The author";
const meta_producer = "The producer";
const meta_title = "Your PDF title";
const meta_subject = "Your PDF subject";
const meta_creadate = `${nowDate.getFullYear()}-${nowDate.getMonth()+1}-${nowDate.getDate()}`;
const meta_keywords = ["keyword1", "keyword2", "keyword3", "keyword4"];
// Implement PDF Title
pdfDoc.setSubject(meta_subject);
// Implement required "DisplayDocTitle" pdf var
pdfDoc.setTitle(meta_title, {
showInWindowTitleBar: true,
updateMetadata: true
});
// Implement PDF language
pdfDoc.setLanguage("en-EN");
// Save file in order exiftool can load it
const pdfBytes = await pdfDoc.save();
await fs.promises.writeFile("your-pdf-document.pdf", pdfBytes);
// We use "distexiftool" to get the TAGS from PDF/UA well formed XMP file "pdfUA-ID.xmp" and assign data to "your-pdf-document.pdf"
execFile(distexiftool, ["-j","-xmp<=pdfUA-ID.xmp", "your-pdf-document.pdf"], (error, stdout, stderr) => {
if (error) {
console.error(`exec error: ${error}`);
return;
}
afterTagsOperation()
});
async function afterTagsOperation(){
// Open the file and write XMP tags with exiftool
await exiftool.write("your-pdf-document.pdf", { 'xmp:Author': meta_author });
await exiftool.write("your-pdf-document.pdf", { 'xmp:Creator': meta_creator });
await exiftool.write("your-pdf-document.pdf", { 'xmp:CreateDate': meta_creadate });
await exiftool.write("your-pdf-document.pdf", { 'xmp:Producer': meta_producer });
await exiftool.write("your-pdf-document.pdf", { 'xmp:Title': meta_title });
await exiftool.write("your-pdf-document.pdf", { 'xmp:Subject': meta_subject });
await exiftool.write("your-pdf-document.pdf", { 'xmp:Keywords': meta_keywords });
await exiftool.write("your-pdf-document.pdf", { 'xmp:Trapped': 'false' });
await exiftool.write("your-pdf-document.pdf", { 'xmp:DocumentID': `uuid:${nowDate.getTime()}` });
await exiftool.write("your-pdf-document.pdf", { 'xmp:Title': meta_title });
await exiftool.write("your-pdf-document.pdf", { 'xmp:Subject': meta_subject });
await exiftool.write("your-pdf-document.pdf", { 'xmp:Keywords': meta_keywords });
await exiftool.write("your-pdf-document.pdf", { 'xmp:Trapped': 'false' });
await exiftool.write("your-pdf-document.pdf", { 'xmp:Identifier': nowDate.getTime() });
await exiftool.write("your-pdf-document.pdf", { 'xmp:PDFVersion': `v${nowDate.getTime()}` });
await exiftool.write("your-pdf-document.pdf", { 'xmp-xmpMM:DocumentID': `uuid:${nowDate.getTime()}` });
await exiftool.write("your-pdf-document.pdf", { 'xmp-dc:format': `application/pdf` });
await exiftool.write("your-pdf-document.pdf", { 'xmp-dc:title': meta_title });
// We save the file
const pdfBytes = await pdfDoc.save();
await fs.promises.writeFile("your-pdf-document.pdf", pdfBytes);
}

Related

Problem to generate pdf from a blob in an expo app using FileSystem

I get a blob and I treat it like this:
const file = response.data;
var blob = new Blob([file], {
type: 'application/pdf',
});
const fileReaderInstance = new FileReader();
fileReaderInstance.readAsDataURL(blob);
fileReaderInstance.onload = async () => {
const fileUri = `${FileSystem.documentDirectory}file.pdf`;
await FileSystem.writeAsStringAsync(
fileUri,
fileReaderInstance.result.split(',')[1],
{
encoding: FileSystem.EncodingType.Base64,
}
);
console.log(fileUri);
Sharing.shareAsync(fileUri);
};
however when I generate and share the file, I can't access it and if I get its URI and search on the web it returns:

i solved my problem in this way:
This is a func who get other data to request, do the request (generate PDF()) and treat the data and generate by received blob the buffer on (fileReaderInstance.result) who is shared in Sharing.shareAsync()
const generatePDF = async () => {
setAnimating(true);
const companyReponse = await CompanyService.getCompany();
const peopleResponse = await PeopleService.getPerson(sale.customerId);
const company = companyReponse.response.company;
const people = peopleResponse.response;
const quote = false;
const json = await SaleService.generatePDF({
sale,
company,
people,
quote,
});
if (json && json.success) {
try {
const fileReaderInstance = new FileReader();
fileReaderInstance.readAsDataURL(json.data);
fileReaderInstance.onload = async () => {
const base64data = fileReaderInstance.result.split(',');
const pdfBuffer = base64data[1];
const path = `${FileSystem.documentDirectory}/${sale._id}.pdf`;
await FileSystem.writeAsStringAsync(`${path}`, pdfBuffer, {
encoding: FileSystem.EncodingType.Base64,
});
await Sharing.shareAsync(path, { mimeType: 'application/pdf' });
};
} catch (error) {
Alert.alert('Erro ao gerar o PDF', error.message);
}
}
setAnimating(false);
}
This is the func in SaleServicegeneratePDF who do the request to api and pass the parameters that return a blob of pdf using axios:
generatePDF: async ({ sale, company, people, quote }) => {
const token = await AsyncStorage.getItem('token');
const body = { sale, company, people, quote };
try {
const response = await axios(`${BASE_API}/generate-sale-pdf`, {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: token,
},
responseType: 'blob',
data: body,
});
return {
success: true,
data: response.data,
};
} catch (err) {
return err.error;
}
},

I have solved this problem by passing the blob string to WriteAsStringAsync method of FileSystem library from expo.
const blobDat = data.data[0].data; //blob data coming from an API call
const fileUri = FileSystem.documentDirectory + `testt.pdf`; //Directory Link of the file to be saved
await FileSystem.writeAsStringAsync(fileUri, blobDat, {
encoding: FileSystem.EncodingType.UTF8,
}) //This step writes the blob string to the pdf fileURI
await IntentLauncher.startActivityAsync("android.intent.action.VIEW", {
data: fileUri,
flags: 1,
type: "application/pdf",
});
//prompts user with available application to open the above created pdf.

Next.js API: Watermark PDF using query parameters in URL

this is my first time testing out Next.js API, I am also quite new to the whole Next.js/React world so bear with me.
The goal for this API route is to trigger an automatic download of a PDF with a custom watermark generated from API URL query parameters like this: /api/PDFWatermark?id=123&firstname=John&lastname=Doe
In order to create the watermark I am using pdf-lib and I am using a modified version of this code to watermark my PDF. To generate a modified and downloadable version of the original PDF pdfDoc I have tried to create a blob using the pdfBytes after the watermarking. After the blob is created I thought I could add it to an anchor attached to the DOM.
When commenting out the blob and anchor code, two errors occur:
ReferenceError: Blob is not defined
ReferenceError: document is not defined (possibility because there is no DOM to attach the anchor link)
At this point I am only able to print the pdfBytes as json, I am not able to create and download the actual watermarked PDF file.
Is there a way to auto download the pdfBytes as a PDF file when the API is called?
UPDATE
Working code below after changing modifyPDF to return a buffer:
const pdfBytes = await pdfDoc.save();
return Buffer.from(pdfBytes.buffer, 'binary');
And:
export default async function handler(req, res) {
const filename = "test.pdf";
const {id, firstname, lastname} = req.query;
const pdfBuffer = await modifyPDF(firstname, lastname, id);
res.status(200);
res.setHeader('Content-Type', 'application/pdf'); // Displsay
res.setHeader('Content-Disposition', 'attachment; filename='+filename);
res.send(pdfBuffer);
}
WORKING:
import {PDFDocument, rgb, StandardFonts } from 'pdf-lib';
export async function modifyPDF(firstname, lastname, id) {
const order_id = id;
const fullname = firstname + " " + lastname;
const existingPdfBytes = await fetch("https://pdf-lib.js.org/assets/us_constitution.pdf").then((res) => res.arrayBuffer());
const pdfDoc = await PDFDocument.load(existingPdfBytes);
const helveticaFont = await pdfDoc.embedFont(StandardFonts.Helvetica);
const watermark = fullname + " (OrderID: " + id + ")";
// Set Document Metadata
pdfDoc.setSubject(watermark);
// Get pages
const pages = pdfDoc.getPages();
// Iterate every page, skip first
//pages.slice(1).forEach(page => {
pages.forEach(page => {
// Get the width and height of the page
const {
width,
height
} = page.getSize()
// Watermark the page
page.drawText(watermark, {
x: 70,
y: 8,
size: 10,
font: helveticaFont,
color: rgb(0.95, 0.1, 0.1),
})
})
const pdfBytes = await pdfDoc.save();
return Buffer.from(pdfBytes.buffer, 'binary');
}
export default async function handler(req, res) {
const filename = "test.pdf";
const {id, firstname, lastname} = req.query;
const pdfBuffer = await modifyPDF(firstname, lastname, id);
res.status(200);
res.setHeader('Content-Type', 'application/pdf'); // Displsay
res.setHeader('Content-Disposition', 'attachment; filename='+filename);
res.send(pdfBuffer);
}

Change your modifyPDF to return a buffer
[...]
const pdfBytes = await pdfDoc.save();
return Buffer.from(pdfBytes.buffer, 'binary');
[...]
Let the API return the PDF to the browser through the handler:
export default async function handler(req, res) {
const {id, firstname, lastname} = req.query;
const pdfBuffer = await modifyPDF(firstname, lastname, id);
res.status(200);
res.setHeader('Content-Type', 'application/pdf');
res.setHeader('Content-Disposition', 'attachment; filename='+filename);
// Edited as the linked example reports:
// res.type('pdf'); // and might not work
res.send(pdfBuffer);
}
Untested but you should get the gist.
Here's the full example from the library itself

Why when I upload file with apollo-server the file is uploaded but the file is 0kb?

I tried to solve the problem but I don't understand why the file is uploaded but his size is 0Kb.
I see this code in the tutorial but he works on that tutorial but, is not worked for me
const { ApolloServer, gql } = require('apollo-server');
const path = require('path');
const fs = require('fs');
const typeDefs = gql`
type File {
url: String!
}
type Query {
hello: String!
}
type Mutation {
fileUpload(file: Upload!): File!
}
`;
const resolvers = {
Query: {
hello: () => 'Hello world!',
},
Mutation: {
fileUpload: async (_, { file }) => {
const { createReadStream, filename, mimetype, encoding } = await file;
const stream = createReadStream();
const pathName = path.join(__dirname, `/public/images/${filename}`);
await stream.pipe(fs.createWriteStream(pathName));
return {
url: `http://localhost:4000/images/${filename}`,
};
},
},
};
const server = new ApolloServer({
typeDefs,
resolvers,
});
server.listen().then(({ url }) => {
console.log(`🚀 Server ready at ${url}`);
});
then when I upload the file, it is uploaded, but the file is 0kb
like this

What is happening is the resolver is returning before the file has uploaded, causing the server to respond before the client has finished uploading. You need to promisify and await the file upload stream events in the resolver.
Here is an example:
https://github.com/jaydenseric/apollo-upload-examples/blob/c456f86b58ead10ea45137628f0a98951f63e239/api/server.js#L40-L41
In your case:
const resolvers = {
Query: {
hello: () => "Hello world!",
},
Mutation: {
fileUpload: async (_, { file }) => {
const { createReadStream, filename } = await file;
const stream = createReadStream();
const path = path.join(__dirname, `/public/images/${filename}`);
// Store the file in the filesystem.
await new Promise((resolve, reject) => {
// Create a stream to which the upload will be written.
const writeStream = createWriteStream(path);
// When the upload is fully written, resolve the promise.
writeStream.on("finish", resolve);
// If there's an error writing the file, remove the partially written
// file and reject the promise.
writeStream.on("error", (error) => {
unlink(path, () => {
reject(error);
});
});
// In Node.js <= v13, errors are not automatically propagated between
// piped streams. If there is an error receiving the upload, destroy the
// write stream with the corresponding error.
stream.on("error", (error) => writeStream.destroy(error));
// Pipe the upload into the write stream.
stream.pipe(writeStream);
});
return {
url: `http://localhost:4000/images/${filename}`,
};
},
},
};
Note that it’s probably not a good idea to use the filename like that to store the uploaded files, as future uploads with the same filename will overwrite earlier ones. I'm not really sure what will happen if two files with the same name are uploaded at the same time by two clients.

Cypress - check if the file is downloaded

I have a little problem with trying to check if a file is downloaded.
Button click generates a PDF file and starts its download.
I need to check if it works.
Can Cypress do this?

I would suggest you to have a look to the HTTP response body.
You can get the response with cy.server().route('GET', 'url').as('download') (check cypress documentation if you don't know these methods).
and catch the response to verify the body is not empty:
cy.wait('#download')
.then((xhr) => {
assert.isNotNull(xhr.response.body, 'Body not empty')
})
Or if you have a popup announcing success when the download went successfully, you can as well verify the existence of the popup:
cy.get('...').find('.my-pop-up-success').should('be.visible')
Best,
EDIT
Please note cy.server().route() may be deprecated:
cy.server() and cy.route() are deprecated in Cypress 6.0.0. In a future release, support for cy.server() and cy.route() will be removed. Consider using cy.intercept() instead.
According to the migration guide, this is the equivalent: cy.intercept('GET', 'url').as('download')

cypress/plugins/index.js
const path = require('path');
const fs = require('fs');
const downloadDirectory = path.join(__dirname, '..', 'downloads');
const findPDF = (PDFfilename) => {
const PDFFileName = `${downloadDirectory}/${PDFfilename}`;
const contents = fs.existsSync(PDFFileName);
return contents;
};
const hasPDF = (PDFfilename, ms) => {
const delay = 10;
return new Promise((resolve, reject) => {
if (ms < 0) {
return reject(
new Error(`Could not find PDF ${downloadDirectory}/${PDFfilename}`)
);
}
const found = findPDF(PDFfilename);
if (found) {
return resolve(true);
}
setTimeout(() => {
hasPDF(PDFfilename, ms - delay).then(resolve, reject);
}, delay);
});
};
module.exports = (on, config) => {
require('#cypress/code-coverage/task')(on, config);
on('before:browser:launch', (browser, options) => {
if (browser.family === 'chromium') {
options.preferences.default['download'] = {
default_directory: downloadDirectory,
};
return options;
}
if (browser.family === 'firefox') {
options.preferences['browser.download.dir'] = downloadDirectory;
options.preferences['browser.download.folderList'] = 2;
options.preferences['browser.helperApps.neverAsk.saveToDisk'] =
'text/csv';
return options;
}
});
on('task', {
isExistPDF(PDFfilename, ms = 4000) {
console.log(
`looking for PDF file in ${downloadDirectory}`,
PDFfilename,
ms
);
return hasPDF(PDFfilename, ms);
},
});
return config;
};
integration/pdfExport.spec.js
before('Clear downloads folder', () => {
cy.exec('rm cypress/downloads/*', { log: true, failOnNonZeroExit: false });
});
it('Should download my PDF file and verify its present', () => {
cy.get('ExportPdfButton').click();
cy.task('isExistPDF', 'MyPDF.pdf').should('equal', true);
});

Timeout in Pdf-html when running on Google Cloud Function

We've created a Cloud Function that generates a PDF. The library that we're using is
https://www.npmjs.com/package/html-pdf
The problem is when we try to execute the
.create()
method it times out with the following errors
"Error: html-pdf: PDF generation timeout. Phantom.js script did not exit.
at Timeout.execTimeout (/srv/node_modules/html-pdf/lib/pdf.js:91:19)
at ontimeout (timers.js:498:11)
This works fine on localhost but happens when we deploy the function on GCP.
Some solutions we've already tried:
Solution #1
Yes we've updated the timeout settings to
const options = {
format: "A3",
orientation: "portrait",
timeout: "100000"
// zoomFactor: "0.5"
// orientation: "portrait"
};
and it still doesn't work.
here's the final snippet that triggers the PDF function
const options = {
format: "A3",
orientation: "portrait",
timeout: "100000"
// zoomFactor: "0.5"
// orientation: "portrait"
};
try {
// let pdfRes = await new Promise(async (resolve, reject) => {
console.log("Before pdf.create()")
let pdfResponse = await pdf.create(html, options).toFile(localPDFFile, async function (err, res) {
if (err) {
console.log(err)
}
console.log('response of pdf.create(): ', res);
let uploadBucket = await bucket.upload(localPDFFile, {
metadata: { contentType: "application/octet-stream" }
});
let docRef = await db
.collection("Organizations")
.doc(context.params.orgId)
.collection("regulations")
.doc(context.params.regulationId)
.collection("reports")
.doc(context.params.reportId);
await docRef.update({
pdf: {
status: "created",
reportName: pdfName
}
});
});
} catch (error) {
console.log('error: ', error);
}
``

I have seen many cases like this even in my current project we use step functions (when cloud functions needs more computational power we divide them into chunks i.e mini cloud functions).
But i think step functions will not work in your case either because you are using single module.
In your case you should use compute engine to perform this operation.

Using promise, We can fix this timeout error
var Handlebars = require('handlebars');
var pdf = require('html-pdf');
var options = {
height: "10.5in", // allowed units: mm, cm, in, px
width: "8in" // allowed units: mm, cm, in, px
"timeout": 600000
};
var document = {
html: html1,
path: resolvedPath + "/" + filename,
data: {}
};
var create = function(document, options) {
return new Promise((resolve, reject) => {
// Compiles a template
var html = Handlebars.compile(document.html)(document.data);
var pdfPromise = pdf.create(html, options);
// Create PDF from html template generated by handlebars
// Output will be PDF file
pdfPromise.toFile(document.path, (err, res) => {
if (!err)
resolve(res);
else
reject(err);
});
});
}

This seems to be a problem with the html, my problem was that I had an image source linked to a deleted image in a server and that was what caused the time out, I solved it by putting the image in the server's route and that was it, I hope this to be useful to someone

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Have a puppeteer generated PDF pass accessibility reports - pdf

Related

Problem to generate pdf from a blob in an expo app using FileSystem

Next.js API: Watermark PDF using query parameters in URL

Why when I upload file with apollo-server the file is uploaded but the file is 0kb?

Cypress - check if the file is downloaded

Timeout in Pdf-html when running on Google Cloud Function

Categories

Resources