I received a large number of document files, where each document has its own split archive for each page (i.e. file1.001,file1.002,file2.001,file3.001). These are meant to be TIF files that can easily be combined and converted into PDF documents.
However, some of these files will not convert through imagemagick. Some can simply be converted using a different program, which works fine. There are some files where this doesn't work. I tried converting them to .jpg, then to tif, but they won't convert to .jpg. Things got weird when I converted them to .png, as some of these files would have multiple output files associated with them.
This is hard to explain, but I'll try and give an example; file1.001 and file1.002 both have the same image present on them when converted to tif and opened. However, when either of the tif documents is converted to a .png, two .png files are created. One has the original page, but the other one has a second page of the document that I could not view previously.
What could be causing this weird behavior, and how can I convert these to pdf more reliably?
I also used BlueBeam Staple to convert the files, if that helps at all.
Edit:
I've verified I'm on the latest imagemagick release, and I've been using it through PHP to process files. I'm running Windows 10.
Also, here's some example files to play around with. The first TIF actually shows the second page, instead of the page I normally see when I open the file.
Edit 2: Sorry, I thought uploading the image would preserve the file type. Here's a link to some test samples
When I convert your tiff to png, I get two files using IM 7.1.0-10 Q16-HDRI or IM 6.9.12-25 Q16 both on Mac OSX Sierra.
magick -quiet 294944.tif x.png
Produces:
and
Is this not what you get or expect?
P.S.
What are the other two files: 327924.001 327924.002
If those are some kind of split tiff, then it does not look like libtiff, which Imagemagick uses to read TIFFs can handle them. I get errors when attempting to use identify on them.
You definitely have some issue with whatever attempted to write those tiffs.
instrument 294944 page 1 of 2 = G4 199 dpi sheet 2 of 2 294944.tif (25.17 x 17.53 inches)
instrument 294944 page 2 of 2 = G4 199 dpi sheet 1 of 2 294944.tif (24.12 x 17.63 inches)
instrument 327501 page 1 of 1 = UN 72 dpi sheet 1 of 1 327924.001 (124.78 x 93.86 inches)
instrument 327924 page 1 of 2 = G4 400 dpi sheet 1 of 2 327924.002 (23.80 x 17.53 inches)
instrument 327924 page 2 of 2 = G4 400 dpi sheet 2 of 2 327924.002 (23.84 x 17.41 inches)
Two are identified as CCITT Group 4 Fax Encoding which is common for TIFFs of this type.
Tiff is a multi image format so a multipage FAX can be viewed as one file or 4 different printing CMYK colour plates could be sent as one image file for either overlay as one check print or printed one at a time for quality inking.
The file name Tif (or tiff) is usually applied to files with one or more pages (even 400+ for a long novel)
The extension part001.tif part002.tif is usually applied to groups of multiple pages OR for single sequential pages part1.001.tif part1.002.tif
Unfortunately for you you have a mix following a convention that seems to indicate number of pages 002 = 2 pages, but in inconsistent order, so need to check which were used for each file, as there is uncertainty.
Also the internal number does NOT always reflect the filename? perhaps transfer of interest ?
IN ADDITION you have a mix of compression methods and resolution thus cannot be sure of correct scale to be applied.
The best way to resolve this issue is decide how you wish them to be regrouped/sequenced and use the correct scale for each page or group of pages then recombine as desired into PDF.
It would help for a large number to tabulate the pages by number scale size compression etc and then process in identical groups before reorder and merge.
I am using a shell script to modify many pdfs and would like to create a script that adds the page number (1 of X format) to the bottom of PDFs in a directory along with the text of the filename.
I tried using pdfjam with this format:
pdfjam --pagenumbering true
but it fails saying undefine pagenumbering
Any other recommendations how to do this? I am OK installing other tools but would like this to all be within a shell script.
Thank you
tl;dr: pdfjam --pagecommand '' input.pdf
By default, pdfjam adds the following LaTeX command to every page: \thispagestyle{empty}. By changing the command to an empty command, the default plain page style is used, which consists of a page number at the bottom. Of course you may want to play with other styles or layout options to position the page number differently.
Has anyone put together a plugin or tool for exporting a Tiddlywiki to pdf?
No, there isn't.
As a workaround, I write or find a decent printable stylesheet, then print to PDF.
Why not select the target tiddler to "Open in new window", and print it to PDF with any installed PDF printer?
To accomplish this I used a tool to convert HTML to PDF. These steps are a bit long but well worth it. Once you've got it working it is easily repeated.
In each tiddler that I want in my PDF, I mark with a specific tag; I used TableOfContents.
In each tiddler that is marked with this tag, I added an order field--to be used to define the order of tiddlers to appear in the PDF.
Ensure your HTML headers are properly defined for the document. I think tiddler titles use <h2>, so properly defining subheadings using <h3><h4> etc will ensure, if you want, a nice auto-generated Table of Contents in your PDF.
If you want each tiddler to start on a new page (in the PDF), we need to add this HTML to the end of each tiddler:
<div style = "display:block; clear:both; page-break-after:always;"></div>
With a completed TiddlyWiki document export the tiddlers to a single HTML file--this will be used to generate a PDF document. To export, go to the AdvancedSearch, select the Filter tab. In the search textbox enter your filter criteria--for me that was:
[tag[TableOfContents]sort[order]]
You'll see, immediately, on-screen a list of the tiddlers the system found based on that criteria. Then click on the Export icon and select Static HTML.
Optionally, but I think it's a great idea, manually create a cover page (in your favorite editor)--this will be a single HTML file to act as the cover page in the PDF document; call it cover.html. More on this later.
Download and install wkhtmltopdf (command-line tool to generate PDF from an HTML file).
https://wkhtmltopdf.org/downloads.html
Learn and get familiar with the wkhtmltopdf command line syntax. There are numerous features here so the command you end up with maybe lengthy. Use wkhtmltopdf /? to view general help, then wkhtmltopdf --extended-help to view details (well worth the read).
Generate a PDF document. At the command prompt navigate to the folder where your TiddlyWiki document is located. Here is a list of my favorite command-line switches. My app is installed in C:\Program Files..., so my command line starts with that...
"c:\Program Files\wkhtmltopdf\bin\wkhtmltopdf.exe"
Add this switch for a header on the left:
--header-left "My document title"
For a header on the right:
--header-right "v1.0.0.1"
Font size of header:
--header-font-size 8
Display a line below the header:
--header-line
Spacing between header and content in mm (default 0):
--header-spacing 5
A left-footer ([section] is replaced with the name of the current section:
--footer-left "[section]"
A centered footer:
--footer-center "Page [page] of [topage]"
Footer font size:
--footer-font-size 8
Footer spacing:
--footer-spacing 5
If you want titles to hyperlink (in the PDF) to go back to the TOC:
--enable-toc-back-links
Make sure no background images get printed:
--no-background
I added special styles in the TiddlyWiki document for print media--to hide tags and clean up the spacing. Then I used this switch to ensure print media is used:
--print-media-type
Being in North America I want letter-size pages; I think the default is A4:
-s Letter
IMPORTANT--give the tool access to local files, otherwise your images will be missing in the PDF:
--enable-local-file-access
Use this if you want to have a cover page (see step 6 above):
cover "cover.htm"
And use this if you want a TOC automatically generated. Without a cover page, the TOC will be your first page, so create a cover page:
toc
After the toc identify your exported tiddler HTML file as input to the tool:
tiddlers.html
And, the final argument on the command line is the output PDF file name:
MyDocument.pdf
Export the tid to html.
Then in the terminal, issue:
html2pdf $myTid.html $myTid.pdf
$myTid is only a var and can be any name
:)
The cfpdf tag has lots of options but I can't seem to find one for splitting apart a PDF package into separate files which can be saved to the file system.
Is this possible?
There's not a direct command, but you can achieve what you want to do in very few lines of code by using action="merge", with the "pages" attribute. So if you wanted to take a 20-page PDF and create 20 separate files, you could use getInfo to get the number of pages in the input document, then loop from 1 to that number, and in that loop, do a merge from your input document to a new output document for each iteration, with pages="#currentPage#" (or whatever your loop counter is)
I often get a PDF from our designer (built in Adobe InDesign) which is supposed to be sent out to thousands of people.
I've got the list with all the people, and it's easy doing a mail merge in OpenOffice.org. However, OpenOffice.org doesn't support the advanced PDF. I just want to output some text onto each page and print it out.
Here's how I do it now: print out 6.000 copies of the PDF, then put all of them into the printer again and just print out name, address and other information on top of it. But that's expensive.
Sadly, I can't make the PDF to an image and use that in OpenOffice.org because it grinds the computer to a halt. It also takes extremely long time to send this job to the printer.
So, is there an easy way to do this mail merge (preferably in Python) without paying for third party closed solutions?
Now I've made an account. I fixed it by using the ingenious pdftk.
In my quest I totally overlook the feature "background" and "overlay". My solution was this:
pdftk names.pdf background boat_background.pdf output out.pdf
Creating the names.pdf you can easily do with Python reportlab or similar PDF-creation scripts. It's best using code to do that, creating 6k pages took several hours in LibreOffice/OpenOffice, while it took just a few seconds using Python.
You could probably look at a PDF library like iText. If you have some programming knowledge and a bit of time you could write some code that adds the contact information to the PDFs
There are two much simpler and cheaper solutions.
First, you can do your mail merge directly in InDesign using DataMerge. This is a utility added to InDesign way back in CS. You export or save your names in CSV format. Import the data into an InDesign template and then drop in your name, address and such fields in the layout. Press Go. It will create a new document with all the finished letters or you can go right to the printer.
OR, you can export your data to an XML file and create a dynamic layout using XML placeholders in InDesign.
The book A Designer's Guide to Adobe InDesign and XML will teach you how to do this, or you can check out the Lynda.com videos for Dynamic workflows with InDesign and XML.
Very easy to do.
If you want to create separate PDFs files for the mail merge, you can run out one long PDF with all the names in one file then do an Extract to Separate PDF files in Acrobat Pro itself.
If you cannot get the template in another format than PDF a simple ad-hoc solution would be to
convert the PDF into an image
put the image in the backgroud of your (OpenOffice.org) document
position mail merge fields on top of the image
do the mail merge and print
Probably the best way would be to generate another PDF with the missing text, and overlay one PDF over the other. A quick Google found this link showing how to do it in Acrobat, and I'm sure there are other methods as well.
http://forums.macrumors.com/showthread.php?t=508226
For a no-mess, no-fuss solution, use iText to simply add the text to the pdf. For example, you can do the following to add text to a pdf document once loaded:
PdfContentByte cb= ...;
cb.BeginText();
cb.SetFontAndSize(font, fontSize);
float x = ...;
float y = ...;
cb.SetTextMatrix(x, y);
cb.ShowText(fieldValue);
cb.EndText();
From there on, save it as a different file, and print it.
However, I've found that form fields are the way to go with pdf document generation from templates.
If you have a template with form fields (added with Adobe Acrobat), you have one of two choices :
Create a FDF file, which is essentially a list of values for the fields on the form. A FDF is a simple text document which references the original document so that when you open up the PDF, the document loads with the field values supplied by the FDF.
Alternatively, load the template with with a library like iText / iTextSharp, fill the form fields manually, and save it as a seperate pdf.
A sample FDF file looks like this (stolen from Planet PDF) :
%FDF-1.2
%âãÏÓ
1 0 obj
<<<
/F(Example PDF Form.pdf)
/Fields[
<<
/T(myTextField)
/V(myTextField default value)
>>
]
>>
>> endobj trailer
<>
%%EOF
Because of the simple format and the small size of the FDF, this is the preferred approach, and the approach should work well in any language.
As for filling the fields programmatically, you can use iText in the following way :
PdfAcroForm acroForm = writer.AcroForm;
acroForm.Put(new PdfName(fieldInfo.Name), new PdfString(fieldInfo.Value));
What about using a variable data program such as - XMPie for Adobe Indesign. It's a plug-in that should reference to your list of people (think it might have to be a list in Excel though).
One easy way would be to create a fillable pdf form from the original document in Acrobat and do a mail merge with the form and a csv.
PDF mail merges are relatively easy to do in python and pdftk. Fdfgen (pip install fdfgen) is a python library that will create an fdf from a python array, so you can save the excel grid to a csv, make sure that the csv headers match the name of the pdf form field you want to fill with that column, and do something like
import csv
import subprocess
from fdfgen import forge_fdf
PDF_FORM = 'path/to/form.pdf'
CSV_DATA = 'path/to/data.csv'
infile = open(CSV_DATA, 'rb')
reader = csv.DictReader(infile)
rows = [row for row in reader]
infile.close()
for row in rows:
# Create fdf
filename = row['filename'] # Construct filename
fdf_data = [(k,v) for k, v in row.items()]
fdf = forge_fdf(fdf_data_strings=fdf_data)
fdf_file = open(filename+'.fdf', 'wb')
fdf_file.write(fdf)
fdf_file.close()
# Use PDFTK to create filled, flattened, pdf file
cmds = ['pdftk', PDF_FORM, 'fill_form', filename+'.fdf',
'output', filename+'.pdf', 'flatten', 'dont_ask']
process = subprocess.Popen(cmds, stdout=subprocess.PIPE)
stdout, stderr = process.communicate()
returncode = process.poll()
os.remove(filename+'.fdf')
I've encountered this problem enough to write my own free solution, PdfZero. PdfZero has a mail merge feature to merge spreadsheets with PDF forms. You will still need to create a PDF form, but you can upload the form and csv to pdfzero, select which form fields you want filled with which columns, create a naming convention for each filled pdf using the csv data if needed, and batch generate the filled PDfs.
DISCLAIMER: I wrote PdfZero
Someone asked for specifics. I didn't want to sully my top answer with it, because you can do it how you like (and just knowing pdftk is up to it should give people the idea).
But here's some scripts I used ages ago:
csv_to_pdf.py
#!/usr/bin/python
# This makes one PDF page per name in the CSV file
# csv_to_pdf.py <CSV_FILE>
import csv
import sys
from reportlab.pdfgen.canvas import Canvas
from reportlab.lib.units import cm, mm
in_db = csv.reader(open(sys.argv[1], "rb"));
outname = sys.argv[1].replace("csv", "pdf")
pdf = Canvas(outname)
in_db.next()
i = 0
for rad in in_db:
pdf.setFontSize(11)
adr = rad[1]
tekst = pdf.beginText(2*cm, 26*cm)
for a in adr.split('\n'):
if not a.strip():
continue
if a[-1] == ',':
a = a[:-1]
tekst.textLine(a)
pdf.drawText(tekst)
pdf.showPage()
i += 1
if i % 1000 == 0:
print i
pdf.save()
When you've ran this, you have a file with thousands of pages, only with a name on it. This is when you can background the fancy PDF under all of them:
pdftk <YOUR_NEW_PDF_FILE.pdf> background <DESIGNED_FILE.pdf> <MERGED.pdf>
You can use InDesign's data merge function, or you can do what you've been doing with printing a portion of the job, and then printing the mail merge atop that with Word or Open Office.
But also look into finding a company that can do variable data offset printing or dynamic publishing. Might be a little more expensive up front but can save a bundle when it comes to time, testing, even packaging and mailing.
Disclaimer: I'm the author of this tool.
I ran into this issue enough times that I built a free online tool for it: https://pdfbatchfill.com/
It assumes a PDF form as a template and uses that along with CSV form data to generate a single PDF or individual PDFs in a zip file.