Related
I would like to programmatically set page breaks in my Google Spreadsheet before exporting to PDF, using Apps Script
It should be possible as you can manually set the page breaks when you print the Spreadsheet (https://support.google.com/docs/answer/7663148?hl=en)
I found that it's possible in Google Docs (https://developers.google.com/apps-script/reference/document/page-break) but they don't mention it on the sheet.
Is there a way to do it, even if it's a "hack"?
Talking about "hacks", you may try to capture HTTP request sent from the Spreadsheet to Google when you are trying to save a sheet as PDF by going to the developer tools - Network.
From this link you can get formatting parameter pc, which in my case looks like this:
[null,null,null,null,null,null,null,null,null,0,
[["1990607563"]],
10000000,null,null,null,null,null,null,null,null,null,null,null,null,null,null,
43866.56179325232,
null,null,
[0,null,1,0,0,0,1,1,1,1,2,1,null,null,2,1],
["A4",0,6,1,[0.75,0.75,0.7,0.7]],
null,0,
[["1990607563",[[45,92],[139,139]],[[0,15]]]],0]
where:
[["1990607563",[[45,92],[139,139]],[[0,15]]]],0] // page breaks parameters
Note though that I used custom page breaks and landscape orientation, which are reflected in the response above.
Putting it all together, the following code does the trick:
function exportPDFtoGDrive (ssID, filename, source){
var source = "1990607563"
var dt = new Date();
var d = encodeDate(dt.getFullYear(),dt.getMonth(),dt.getDate(),dt.getHours(),dt.getMinutes(),dt.getSeconds());
var pc = [null,null,null,null,null,null,null,null,null,0,
[[source]],
10000000,null,null,null,null,null,null,null,null,null,null,null,null,null,null,
d,
null,null,
[0,null,1,0,0,0,1,1,1,1,2,1,null,null,2,1],
["A4",0,6,1,[0.75,0.75,0.7,0.7]],
null,0,
[[source,[[45,92],[139,139]],[[0,15]]]],0];
var folder = DriveApp.getFoldersByName("FolderNameGoesHere").next();
var options = {
'method': 'post',
'payload': "a=true&pc="+JSON.stringify(pc)+"&gf=[]",
'headers': {Authorization: "Bearer " + ScriptApp.getOAuthToken()},
'muteHttpExceptions': true
};
const esid = (Math.round(Math.random()*10000000));
const theBlob =
UrlFetchApp.fetch("https://docs.google.com/spreadsheets/d/"+ssID+"/pdf?id="+ssID+"&esid="+esid, options).getBlob();
folder.createFile(theBlob).setName(filename+".pdf");
}
function myExportPDFtoGDrive(){
var ss = SpreadsheetApp.openById('yourSpreadSheetID');
var sheet = ss.getSheetByName("NameGoesHere");
var filename = ss.getName()+" ["+sheet.getName()+"]";
exportPDFtoGDrive (ss.getId(),filename);
}
A more detailed explanation of the hack is available here
Export Google Sheets to PDF though in Russian only.
I use a work around. I adjust the page size by altering the row height to fit the paper size I want (A4).
When exporting to pdf google changes sizes to fit the width. I add up the size of the columns and then set the row heights accordingly. Numbers were chosen by trial and error.
var width = 0;
for(var z = 0; z < s4.getLastColumn(); z++){
width += s4.getColumnWidth(z+1);
}
var a4PageHeightPixels = 1050 * width / 800;
Because I wanted the rows all the same height I set the row height dividing my page height by the number of rows. Having ensured the last row was blank, I adjusted the last row to take up the rounding error.
rowHeight= Math.floor(a4PageHeightPixels/(numDataRows ));
lastRowHeight = a4PageHeightPixels - (numDataRows -1) * rowHeight;
s4.setRowHeights(pageFirstRow,numDataRows-1,rowHeight);
s4.setRowHeight(pageFirstRow+numDataRows-1,lastRowHeight);
(s4 is the sheet I am using)However, I would expect most people would simply want to insert a blank line at the bottom of each page and adjust its size to fit the pdf paper size.
For years, I have been using Google Cloud Print to print labels in our laboratories on campus (to standardize) using a Google Apps Script custom HtmlService form.
Now that GCP is becoming depreciated, I am in on a search for a solution. I have found a few options but am struggling to get the file to convert to a pdf as would be needed with these other vendors.
Currently, when you submit a text/html blob to the GCP servers in GAS, the backend converts the blob to application/pdf (as evidenced by looking at the job details in the GCP panel on Chrome under 'content type').
That said, because these other cloud print services require pdf printing, I have tried for some time now to have GAS change the file to pdf format before sending to GCP and I always get a strange result. Below, I'll show some of the strategies that I have used and include pictures of one of our simple labels generated with the different functions.
The following is the base code for the ticket and payload that has worked for years with GCP
//BUILD PRINT JOB FOR NARROW TAPES
var ticket = {
version: "1.0",
print: {
color: {
type: "STANDARD_COLOR",
vendor_id: "Color"
},
duplex: {
type: "NO_DUPLEX"
},
copies: {copies: parseFloat(quantity)},
media_size: {
width_microns: 27940,
height_microns:40960
},
page_orientation: {
type: "LANDSCAPE"
},
margins: {
top_microns:0,
bottom_microns:0,
left_microns:0,
right_microns:0
},
page_range: {
interval:
[{start:1,
end:1}]
},
}
};
var payload = {
"printerid" : QL710,
"title" : "Blank Template Label",
"content" : HtmlService.createHtmlOutput(html).getBlob(),
"contentType": 'text/html',
"ticket" : JSON.stringify(ticket)
};
This generates the expected following printout:
When trying to convert to pdf using the following code:
The following is the code used to transform to pdf:
var blob = HtmlService.createTemplate(html).evaluate().getContent();
var newBlob = Utilities.newBlob(html, "text/html", "text.html");
var pdf = newBlob.getAs("application/pdf").setName('tempfile');
var file = DriveApp.getFolderById("FOLDER ID").createFile(pdf);
var payload = {
"printerid" : QL710,
"title" : "Blank Template Label",
"content" : pdf,//HtmlService.createHtmlOutput(html).getBlob(),
"contentType": 'text/html',
"ticket" : JSON.stringify(ticket)
};
an unexpected result occurs:
This comes out the same way for direct coding in the 'content' field with and without .getBlob():
"content" : HtmlService.createHtmlOutput(html).getAs('application/pdf'),
note the createFile line in the code above used to test the pdf. This file is created as expected, of course with the wrong dimensions for label printing (not sure how to convert to pdf with the appropriate margins and page size?): see below
I have now tried to adopt Yuri's ideas; however, the conversion from html to document loses formatting.
var blob = HtmlService.createHtmlOutput(html).getBlob();
var docID = Drive.Files.insert({title: 'temp-label'}, blob, {convert: true}).id
var file = DocumentApp.openById(docID);
file.getBody().setMarginBottom(0).setMarginLeft(0).setMarginRight(0).setMarginTop(0).setPageHeight(79.2).setPageWidth(172.8);
This produces a document looks like this (picture also showing expected output in my hand).
Does anyone have insights into:
How to format the converted pdf to contain appropriate height, width
and margins.
How to convert to pdf in a way that would print correctly.
Here is a minimal code to get a better sense of context https://script.google.com/d/1yP3Jyr_r_FIlt6_aGj_zIf7HnVGEOPBKI0MpjEGHRFAWztGzcWKCJrD0/edit?usp=sharing
I've made the template (80 x 40 mm -- sorry, I don't know your size):
https://docs.google.com/document/d/1vA93FxGXcWLIEZBuQwec0n23cWGddyLoey-h0WR9weY/edit?usp=sharing
And there is the script:
function myFunction() {
// input data
var matName = '<b>testing this to <u>see</u></b> if it <i>actually</i> works <i>e.coli</i>'
var disposeWeek = 'end of semester'
var prepper = 'John Ruppert';
var className = 'Cell and <b>Molecular</b> Biology <u>Fall 2020</u> a few exercises a few exercises a few exercises a few exercises';
var hazards = 'Lots of hazards';
// make a temporary Doc from the template
var copyFile = DriveApp.getFileById('1vA93FxGXcWLIEZBuQwec0n23cWGddyLoey-h0WR9weY').makeCopy();
var doc = DocumentApp.openById(copyFile.getId());
var body = doc.getBody();
// replace placeholders with data
body.replaceText('{matName}', matName);
body.replaceText('{disposeWeek}', disposeWeek);
body.replaceText('{prepper}', prepper);
body.replaceText('{className}', className);
body.replaceText('{hazards}', hazards);
// make Italics, Bold and Underline
handle_tags(['<i>', '</i>'], body);
handle_tags(['<b>', '</b>'], body);
handle_tags(['<u>', '</u>'], body);
// save the temporary Doc
doc.saveAndClose();
// make a PDF
var docblob = doc.getBlob().setName('Label.pdf');
DriveApp.createFile(docblob);
// delete the temporary Doc
copyFile.setTrashed(true);
}
// this function applies formatting to text inside the tags
function handle_tags(tags, body) {
var start_tag = tags[0].toLowerCase();
var end_tag = tags[1].toLowerCase();
var found = body.findText(start_tag);
while (found) {
var elem = found.getElement();
var start = found.getEndOffsetInclusive();
var end = body.findText(end_tag, found).getStartOffset()-1;
switch (start_tag) {
case '<b>': elem.setBold(start, end, true); break;
case '<i>': elem.setItalic(start, end, true); break;
case '<u>': elem.setUnderline(start, end, true); break;
}
found = body.findText(start_tag, found);
}
body.replaceText(start_tag, ''); // remove tags
body.replaceText(end_tag, '');
}
The script just changes the {placeholders} with the data and saves the result as a PDF file (Label.pdf). The PDF looks like this:
There is one thing, I'm not sure if it's possible -- to change a size of the texts dynamically to fit them into the cells, like it's done in your 'autosize.html'. Roughly, you can take a length of the text in the cell and, in case it is bigger than some number, to make the font size a bit smaller. Probably you can use the jquery texfill function from the 'autosize.html' to get an optimal size and apply the size in the document.
I'm not sure if I got you right. Do you need make PDF and save it on Google Drive? You can do in Google Docs.
As example:
Make a new document with your table and text. Something like this
Add this script into your doc:
function myFunction() {
var copyFile = DriveApp.getFileById(ID).makeCopy();
var newFile = DriveApp.createFile(copyFile.getAs('application/pdf'));
newFile.setName('label');
copyFile.setTrashed(true);
}
Every time you run this script it makes the file 'label.pdf' on your Google Drive.
The size of this pdf will be the same as the page size of your Doc. You can make any size of page with add-on: Page Sizer https://webapps.stackexchange.com/questions/129617/how-to-change-the-size-of-paper-in-google-docs-to-custom-size
If you need to change the text in your label before generate pdf or/and you need change the name of generated file, you can do it via script as well.
Here is a variant of the script that changes a font size in one of the cells if the label doesn't fit into one page.
function main() {
// input texts
var text = {};
text.matName = '<b>testing this to <u>see</u></b> if it <i>actually</i> works <i>e.coli</i>';
text.disposeWeek = 'end of semester';
text.prepper = 'John Ruppert';
text.className = 'Cell and <b>Molecular</b> Biology <u>Fall 2020</u> a few exercises a few exercises a few exercises a few exercises';
text.hazards = 'Lots of hazards';
// initial max font size for the 'matName'
var size = 10;
var doc_blob = set_text(text, size);
// if we got more than 1 page, reduce the font size and repeat
while ((size > 4) && (getNumPages(doc_blob) > 1)) {
size = size-0.5;
doc_blob = set_text(text, size);
}
// save pdf
DriveApp.createFile(doc_blob);
}
// this function takes texts and a size and put the texts into fields
function set_text(text, size) {
// make a copy
var copyFile = DriveApp.getFileById('1vA93FxGXcWLIEZBuQwec0n23cWGddyLoey-h0WR9weY').makeCopy();
var doc = DocumentApp.openById(copyFile.getId());
var body = doc.getBody();
// replace placeholders with data
body.replaceText('{matName}', text.matName);
body.replaceText('{disposeWeek}', text.disposeWeek);
body.replaceText('{prepper}', text.prepper);
body.replaceText('{className}', text.className);
body.replaceText('{hazards}', text.hazards);
// set font size for 'matName'
body.findText(text.matName).getElement().asText().setFontSize(size);
// make Italics, Bold and Underline
handle_tags(['<i>', '</i>'], body);
handle_tags(['<b>', '</b>'], body);
handle_tags(['<u>', '</u>'], body);
// save the doc
doc.saveAndClose();
// delete the copy
copyFile.setTrashed(true);
// return blob
return docblob = doc.getBlob().setName('Label.pdf');
}
// this function formats the text beween html tags
function handle_tags(tags, body) {
var start_tag = tags[0].toLowerCase();
var end_tag = tags[1].toLowerCase();
var found = body.findText(start_tag);
while (found) {
var elem = found.getElement();
var start = found.getEndOffsetInclusive();
var end = body.findText(end_tag, found).getStartOffset()-1;
switch (start_tag) {
case '<b>': elem.setBold(start, end, true); break;
case '<i>': elem.setItalic(start, end, true); break;
case '<u>': elem.setUnderline(start, end, true); break;
}
found = body.findText(start_tag, found);
}
body.replaceText(start_tag, '');
body.replaceText(end_tag, '');
}
// this funcion takes saved doc and returns the number of its pages
function getNumPages(doc) {
var blob = doc.getAs('application/pdf');
var data = blob.getDataAsString();
var pages = parseInt(data.match(/ \/N (\d+) /)[1], 10);
Logger.log("pages = " + pages);
return pages;
}
It looks rather awful and hopeless. It turned out that Google Docs has no page number counter. You need to convert your document into a PDF and to count pages of the PDF file. Gross!
Next problem, even if you managed somehow to count the pages, you have no clue which of the cells was overflowed. This script takes just one cell, changes its font size, counts pages, changes the font size again, etc. But it doesn't granted a success, because there can be another cell with long text inside. You can reduce font size of all the texts, but it doesn't look like a great idea as well.
Is there a Java or Nodejs library that can move existing text in a PDF file?
I'd like to extract all the text nodes, then move some of them to a new location based on some conditions.
I tried PdfClown, galkahana/HummusJS, Hopding/pdf-lib, but seems they don't have exactly what I need.
can anyone help? thanks
After inspecting the variables, I figured out how to move text, here is the code
PrimitiveComposer composer = new PrimitiveComposer(page);
ContentScanner scanner = composer.getScanner();
tranverse(scanner);
composer.flush();
...
while (level.moveNext()){
ContentObject content = level.getCurrent();
if (content instanceof Text){
...
List<ContentObject> objects = text.getBaseDataObject().getObjects();
for(ContentObject co: objects){
if(co instanceof SetTextMatrix){
List<PdfDirectObject> operands = ((SetTextMatrix)co).getOperands();
PdfInteger y = (PdfInteger)operands.get(5);
operands.set(5, new PdfInteger(y.getIntValue()-100));
}
}
How to get the font from a COSName?
The solution I'm looking for looks somehow like this:
COSDictionary dict = new COSDictionary();
dict.add(fontname, something); // fontname COSName from below code
PDFontFactory.createFont(dict);
If you need more background, I added the whole story below:
I try to replace some string in a pdf. This succeeds (as long as all text is stored in one token). In order to keep the format I like to re-center the text. As far as I understood I can do this by getting the width of the old string and the new one, do some trivial calculation and setting the new position.
I found some inspiration on stackoverflow for replacing https://stackoverflow.com/a/36404377 (yes it has some issues, but works for my simple pdf's. And How to center a text using PDFBox. Unfortunatly this example uses a font constant.
So using the first link's code I get a handling for operator 'TJ' and one for 'Tj'.
PDFStreamParser parser = new PDFStreamParser(page);
parser.parse();
java.util.List<Object> tokens = parser.getTokens();
for (int j = 0; j < tokens.size(); j++)
{
Object next = tokens.get(j);
if (next instanceof Operator)
{
Operator op = (Operator) next;
// Tj and TJ are the two operators that display strings in a PDF
if (op.getName().equals("Tj"))
{
// Tj takes one operator and that is the string to display so lets
// update that operator
COSString previous = (COSString) tokens.get(j - 1);
String string = previous.getString();
String replaced = prh.getReplacement(string);
if (!string.equals(replaced))
{ // if changes are there, replace the content
previous.setValue(replaced.getBytes());
float xpos = getPosX(tokens, j);
//if (true) // center the text
if (6 * xpos > page.getMediaBox().getWidth()) // check if text starts right from 1/xth page width
{
float fontsize = getFontSize(tokens, j);
COSName fontname = getFontName(tokens, j);
// TODO
PDFont font = ?getFont?(fontname);
// TODO
float widthnew = getStringWidth(replaced, font, fontsize);
setPosX(tokens, j, page.getMediaBox().getWidth() / 2F - (widthnew / 2F));
}
replaceCount++;
}
}
Considering the code between the TODO tags, I will get the required values from the token list. (yes this code is awful, but for now it let's me concentrate on the main issue)
Having the string, the size and the font I should be able to call the getWidth(..) method from the sample code.
Unfortunatly I run into trouble to create a font from the COSName variable.
PDFont doesn't provide a method to create a font by name.
PDFontFactory looks fine, but requests a COSDictionary. This is the point I gave up and request help from you.
The names are associated with font objects in the page resources.
Assuming you use PDFBox 2.0.x and that page is a PDPage instance, you can resolve the name fontname using:
PDFont font = page.getResources().getFont(fontname);
But the warning from the comments to the questions you reference remain: This approach will work only for very simple PDFs and might even damage other ones.
try {
//Loading an existing document
File file = new File("UKRSICH_Mo6i-Spikyer_z1560-FAV.pdf");
PDDocument document = PDDocument.load(file);
PDPage page = document.getPage(0);
PDResources pageResources = page.getResources();
System.out.println(pageResources.getFontNames() );
for (COSName key : pageResources.getFontNames())
{
PDFont font = pageResources.getFont(key);
System.out.println("Font: " + font.getName());
}
document.close();
}
I am writting a VB.Net application. I need to be able to convert either a Word or PDF file to TIF format.
Free would be nice but I will also accept low cost.
I would like sample code if possible, VB is preferable but I also know c#
It's very simple with imagemagick (you do have to download ghostscript, too.). You just need to use VB to run it as a process.
Dim imgmgk As New Process()
With imgmgk.StartInfo
.FileName = v_locationOfImageMagickConvert.exe
.UseShellExecute = False
.CreateNoWindow = True
.RedirectStandardOutput = True
.RedirectStandardError = True
.RedirectStandardInput = False
.Arguments = " -units PixelsPerInch " & v_pdf_filename & " -depth 16 -flatten +matte –monochrome –density 288 -compress ZIP " & v_tiff_filename
End With
imgmgk.Start()
Dim output As String = imgmgk.StandardOutput.ReadToEnd()
Dim errorMsg As String = imgmgk.StandardError.ReadToEnd()
imgmgk.WaitForExit()
imgmgk.Close()
The arguments are varied - use imagemagick docs to see what they are. You can do something as simple as just pass the pdf file name and the tiff file name for a simple conversion.
You can do this with Atalasoft DotImage (commerial product - obligatory disclaimer: I work for Atalasoft and write a lot of the PDF code):
// this code extracts images from the PDF - there may be multiple images per page
public void PdfToTiff(Stream pdf, Stream tiff)
{
TiffEncoder encoder = new TiffEncoder(tiff);
PdfImageSource images = new PdfImageSource(pdf);
encoder.Save(tiff, images, null);
}
// this code renders each page
public void PdfToTiff(string pathToPdf, Stream tiff)
{
TiffEncoder encoder = new TiffEncoder(tiff);
FileSystemImageSource images = new FileSystemImageSource(pathToPdf, true);
encoder.Save(tiff, images, null);
}
The latter example is probably what you want. It works on a path because FileSystemImageSource takes advantage of code to operate on the file system with wildcards. It's overkill for the task, really. If you wanted to do it without you would have this:
public void PdfToTiff(Stream pdf, Stream tiff)
{
TiffEncoder encoder = new TiffEncoder();
encoder.Append = true;
PdfDecoder decoder = new PdfDecoder();
int pageCount = decoder.GetFrameCount(pdf);
for (int i=0; i < pageCount; i++) {
using (AtalaImage image = decoder.Read(pdf, i, null)) {
encoder.Save(tiff, image, null);
}
}
}
There are many products I found that can do this. Some have a cost while some are free.
I used Black Ice http://www.blackice.com/Printer%20Drivers/Tiff%20Printer%20Drivers.htm. This is afordable at about $40. I didn't buy this one though.
I ended up using a free one called MyMorph.