Why doesn't pandoc convert a plaintext file to PDF properly? - pdf

Commands tried:
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=pdflatex 1.txt -o 1.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=lualatex 1.txt -o 2.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=xelatex 1.txt -o 3.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=latexmk 1.txt -o 4.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=tectonic 1.txt -o 5.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=wkhtmltopdf 1.txt -o 6.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=weasyprint 1.txt -o 7.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=prince 1.txt -o 8.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=context 1.txt -o 9.pdf
pandoc -V 'fontfamily:Courier' --variable mainfont="Courier" --pdf-engine=pdfroff 1.txt -o 10.pdf
Contents of 1.txt:
--------------------------------------------------------------------------------
Left Right
--------------------------------------------------------------------------------
Lorem ipsum whatever. Lorem ipsum whatever. Lorem ipsum whatever. Lorem ipsum 1
whatever. Lorem ipsum whatever. Lorem ipsum whatever. Lorem ipsum whatever. 2
Lorem ipsum whatever. Lorem ipsum whatever. Lorem ipsum whatever. Lorem ipsum 3
whatever. Lorem ipsum whatever. Lorem ipsum whatever. Lorem ipsum whatever. 4
Lorem ipsum whatever. Lorem ipsum whatever. Lorem ipsum whatever. 5
--------------------------------------------------------------------------------
Results:
Out of all those allegedly supported "engines", only the first and third produce any PDF at all (the others just dump a bunch of nonsensical errors). And those two that do produce PDFs, produce horribly butchered ones:
"pdflatex" (the first command) entirely ignores the specified font, so it's completely useless.
"xelatex" (the third command) seems to be mostly using the right font, but seemingly deletes all the spaces between "Left" and "Right", morphs the "-"s into straight lines (that's not how that font looks...) and messes up the lines completely so that the numbers on the last columns are not aligned to the right, and has crammed the entire contents into the middle of the page instead of, as expected, near the top-left corner:
screenshot of the xelatex-produced PDF
I have spent enormous amounts of times hunting for options and trying a million variations of the above commands, but it seems like this tool is fundamentally broken. I have no idea how others (apparently) use these tools, but they just don't work. It's impossible to convert a text file to PDF...

Pandoc is not broken; it is doing just what its documentation says it will do. Pandoc treats your input file as Markdown with pandoc extensions (since you didn't specify a format). What you have here is a one-column simple table (since there is no break in the line of ----s to indicate a column break).
If what you want is a rendering of this context as verbatim text in a PDF, you could use e.g. enscript 1.txt --output=- | ps2pdf - > 1.pdf. If you want to do it using pandoc, then the easiest way is to put the content inside backtick fences so that it is treated as a markdown verbatim block. One way to do this would be to modify your file, but you could also do it by creating a file ticks.txt containing just
```
and then run
pandoc ticks.txt 1.txt ticks.txt -o 1.pdf

Related

Is there a way discard previous pdfmark metadata?

I was trying to automate adding title, bookmarks and such to some PDFs I need. The way I came up with was to create a simple pdfmark script like this:
% pdfmark.ps
[ /Title (My document)
/Author(Me)
/DOCINFO pdfmark
[ /Title (First chapter)
/Page 1
/OUT pdfmark
Then generate a new PDF with ghostscript using:
gs -q -dNOPAUSE -dBATCH -sDEVICE=pdfwrite -sOutputFile=out.pdf in.pdf pdfmark.ps
If in.pdf doesn't have any pdfmark data it works fine, however if it does things don't work out nicely: for example title/author aren't modified and bookmarks are appended instead of replaced.
Since I don't want to mess around modifying the PDF's corresponding postscript, I was trying to find if there is some command to add to pdfmark.ps that can delete (or overwrite) previous metadata.
I'll leave PostScript to others and show how to remove a PDF outline using the qpdf package (for qpdf and fix-qdf) and GNU sed.
From the qpdf manual:
In QDF mode, qpdf creates PDF files in what we call QDF form.
A PDF file in QDF form, sometimes called a QDF file, is a completely
valid PDF file that has %QDF-1.0 as its third line (after the pdf
header and binary characters) and has certain other characteristics.
The purpose of QDF form is to make it possible to edit PDF files,
with some restrictions, in an ordinary text editor.
(For a non-GNU/Linux system adapt the commands below.)
qpdf --qdf --compress-streams=n --decode-level=generalized \
--object-streams=disable -- in.pdf - |
sed --binary \
-e '/^[ ][ ]*\/Outlines [0-9][0-9]* [0-9] R/ s/[1-9]/0/g' |
fix-qdf > tmp.qdf
qpdf --coalesce-contents --compression-level=9 \
--object-streams=generate -- tmp.qdf out.pdf
where:
1st qpdf command converts the PDF file to QDF form for editing
sed orphans outlines in the QDF file by rooting them at non-existing obj 0
fix-qdf repairs the QDF after editing
2nd qpdf converts and compresses QDF to PDF
qpdf input cannot be pipelined, it needs to seek
The sed command changes digits to zeros in the line containing
the indented text /Outlines.
Note that GNU sed is used for the non-standard --binary option
to avoid mishaps on an OS distinguishing between text and binary files.
Similarly, to strip annotations replace /Outlines with /Annots in
the -e above, or insert it in a second -e option to do both.
Another patch utility than sed will do; often just one byte has
to be changed.
To quickly strip all non-page data (docinfo, outlines a.o. but not
annotations) qpdf's --empty option may be useful:
qpdf --coalesce-contents --compression-level=9 \
--object-streams=generate \
--empty --pages in.pdf 1-z -- out.pdf

Awk or sed to prepend missing zeros in mac addresses

I've got a file consisting of IPs and MAC address pairs and I need to pad the MAC addresses with zeros in each octet, but I don't want to change the IP. So this...
10.5.96.41 0:0:e:4c:b7:42
10.5.96.42 c4:f7:0:13:ef:32
10.5.96.43 0:e8:4c:60:2b:42
10.5.96.44 0:6a:bf:b:35:f1
Should get changed to this...
10.5.96.41 00:00:0e:4c:b7:42
10.5.96.42 c4:f7:00:13:ef:32
10.5.96.43 00:e8:4c:60:2b:42
10.5.96.44 00:6a:bf:0b:35:f1
I tried sed 's/\b\(\w\)\b/0\1/g' but that produces:
10.05.96.41 00:00:0e:4c:b7:42
10.05.96.42 c4:f7:00:13:ef:32
10.05.96.43 00:e8:4c:60:2b:42
10.05.96.44 00:6a:bf:0b:35:f1
which is not desired because I only want to effect the MAC address portion.
Since you've tagged macos, I'm not sure if this will work for you. I tested it on GNU awk
$ awk '{gsub(/\<[0-9a-f]\>/, "0&", $2)} 1' ip.txt
10.5.96.41 00:00:0e:4c:b7:42
10.5.96.42 c4:f7:00:13:ef:32
10.5.96.43 00:e8:4c:60:2b:42
10.5.96.44 00:6a:bf:0b:35:f1
awk is good for field processing, here you can simply perform substitution only for second field
But, I see \b and \w with your sed command, so you are using GNU sed? If so,
sed -E ':a s/( .*)(\b\w\b)/\10\2/; ta' ip.txt
With perl
$ perl -lane '$F[1] =~ s/\b\w\b/0$&/g; print join " ", #F' ip.txt
10.5.96.41 00:00:0e:4c:b7:42
10.5.96.42 c4:f7:00:13:ef:32
10.5.96.43 00:e8:4c:60:2b:42
10.5.96.44 00:6a:bf:0b:35:f1
If you want to get adventurous, specify that you want to avoid replacing first field:
perl -pe 's/^\H+(*SKIP)(*F)|\b\w\b/0$&/g' ip.txt
With any sed that uses -E to support EREs, e.g. GNU sed or OSX/BSD (MacOS) sed:
$ sed -E 's/[ :]/&0/g; s/0([^:]{2}(:|$))/\1/g' file
10.5.96.41 00:00:0e:4c:b7:42
10.5.96.42 c4:f7:00:13:ef:32
10.5.96.43 00:e8:4c:60:2b:42
10.5.96.44 00:6a:bf:0b:35:f1
and with any sed:
$ sed 's/[ :]/&0/g; s/0\([^:][^:]:\)/\1/g; s/0\([^:][^:]$\)/\1/' file
10.5.96.41 00:00:0e:4c:b7:42
10.5.96.42 c4:f7:00:13:ef:32
10.5.96.43 00:e8:4c:60:2b:42
10.5.96.44 00:6a:bf:0b:35:f1
This might work for you (GNU sed):
sed 's/\b.\(:\|$\)/0&/g' file
Prepend a 0 before any single character followed by a : or the end of line.
Other seds may use:
sed 's/\<.\(:\|$\)/0&/g' file
With GNU sed:
sed -E ':a;s/([ :])(.)(:|$)/\10\2\3/g;ta' file
with any sed:
sed ':a;s/\([ :]\)\(.\):/\10\2:/g;ta' file
Explanation (of the GNU version)
:a # a label called 'a', used as a jump target
; # command separator
s # substitute command ...
/([ :])(.)(:|$)/ # search for any single char which is enclosed by
# either two colons, a whitespace and a colon or
# a colon and the end of the line ($)
# Content between () will be matched in a group
# which is used in the replacement pattern
\10\2\3 # replacement pattern: group1 \1, a zero, group2 and
# group3 (see above)
/g # replace as often as possible
; # command separator
ta # jump back to a if the previous s command replaced
# something (see below)
The loop using the label a and the ta command is needed because sed won't match a pattern again if input was already part of a replacement. This would happen in this case for example (first line):
0:0
When the above pattern is applied, sed would replace
<space>0: by <space>00: <- colon
The same colon would not match again as the beginning : of the second zero. Therefore the loop until everything is replaced.
A succinct and precise solution for GNU sed:
sed -Ee 's/\b[0-9a-f](:|$)/0&/gi' file
(On macOS, I recommend installing gsed using brew install gnu-sed.)
very circuitous and verbose solution to deal with mawk not having regex :: back-references - the approach is to prepad every slot with extra zeros, then trim out the excess :
nawk ' sub(".+","\5:&:\3", $NF)^_ + gsub(":", "&00") + \
gsub("[0-9A-Fa-f]{2}:","\6&") + gsub("[^:]*\6|\5:|:00\3$",_)'
mawk ' sub("^", "\5:", $NF)^_ + gsub(":", "&00") + \
gsub("[^ :][^ :](:|$)", "\6&") + gsub("[^:]*\6|\5:",_)'
10.5.96.41 00:00:0e:4c:b7:42
10.5.96.42 c4:f7:00:13:ef:32
10.5.96.43 00:e8:4c:60:2b:42
10.5.96.44 00:6a:bf:0b:35:f1
To do it the proper gawk gensub() way
-- needed 2 calls to gensub() - calling once ended up missing a few ::
gawk -be 'BEGIN { ___ *= __ = "([ :\t])([[:xdigit:]]?)(:|$)"
_="\\10\\2\\3" } $___ = gensub(__, _, "g", gensub(__, _, "g"))'

Postgres literal carriage return error on Linux

Specs:
Linux Mint 18.x
Posgres 10.x
Pgadmin3
I've tried loading in a TSV into postgres but keep getting a literal carriage return error. In the file, "\N" is used to denote NULL.
I've tried with both \copy and using the import dialogue in pgadmin3. In pgadmin3 I've tried to leave out the file formating, and also tried setting it to UTF8 . Error is still persisting.
initial command used:
\copy table FROM PROGRAM 'tail -n +2 /home/super/Downloads/folder/myfile.tsv'
ERROR; literal carriage return found in data
HINT: use "\r" to represent carriage return.
I've been using sed to create different versions of the file that replace what I believe could be causing the error:
sed 's/\n/\r/g myfile.tsv > newfile1.tsv'
sed 's/\\n/\r/g myfile.tsv > newfile2.tsv'
sed 's/\\n//g myfile.tsv > newfile3.tsv'
I also tried the following (non chronologically)
sed 's/\r\n/\r/g' new.tsv
sed 's/\\N/NULL/g' new.tsv
sed 's/\\//'
sed 's/\\N/\r/'
sed -n 's/\n/\r/'
sed -n 's/\\n/\r/'
sed 's/\N/\r/'
sed 's/\\N/NULL/'
sed 's/\N/NULL/'
sed 's/\r//'
sed 's/\N/NULL/'
sed 's/\\N/NULL/'
sed 's/\\//'
sed 's/\N/\r/g' new.tsv
sed 's/\N/NULL/g' new.tsv
sed 's/\N/NULL/g' new.tsv
And none of these have worked. When I view with LibreOffice's preview dialogue it appears to scroll through the contents and format them as a table just fine.
I've looked at this question on a literal newline error and this question about using copy.
I didn't understand what was meant about the wrong 'byte' being inserted.
Preview of data: https://imgur.com/JPhHB52
UPDATE: ran sed 's/\r/CR was here/g' myfile.tsv | grep 'CR was here' and it returned two results
This is solved by using:
sed 's/\r//\\r/g' myfile.tsv > myfile_copy.tsv
It needs to be escaped twice apparently.
#Abelisto pointed out the error occuring in this comment:
Ok, there is relatively simple way to reproduce the error: echo -e
'aaa\nbbb\rccc' > foo.tsv ; psql -c 'create table foo(x text)' -c
'\copy foo from foo.tsv' -c 'drop table foo;' (be careful if you
already have valuable foo table in your DB) Lets try to find how to
fix it...
Also helpful:
echo -e 'aaa\nbbb\rccc' > foo.tsv ; sed -i 's/\r/\\r/g' foo.tsv ;
psql -c 'create table foo(x text)' -c "\copy foo from foo.tsv" -c
'table foo' -c 'drop table foo;' fixed the error. I have no ideas why
\r sequence does not translated to CR by copy command as mentioned in
the documentation. But it is yet another question. – Abelisto

Using sed, Insert a line above or below the pattern? [duplicate]

This question already has answers here:
How to insert a line using sed before a pattern and after a line number?
(5 answers)
Closed 9 years ago.
I need to edit a good number of files, by inserting a line or multiple lines either right below a unique pattern or above it. Please advise on how to do that using sed, awk, perl (or anything else) in a shell. Thanks! Example:
some text
lorem ipsum dolor sit amet
more text
I want to insert consectetur adipiscing elit after lorem ipsum dolor sit amet, so the output file will look like:
some text
lorem ipsum dolor sit amet
consectetur adipiscing elit
more text
To append after the pattern: (-i is for in place replace). line1 and line2 are the lines you want to append(or prepend)
sed -i '/pattern/a \
line1 \
line2' inputfile
Output:
#cat inputfile
pattern
line1 line2
To prepend the lines before:
sed -i '/pattern/i \
line1 \
line2' inputfile
Output:
#cat inputfile
line1 line2
pattern
The following adds one line after SearchPattern.
sed -i '/SearchPattern/aNew Text' SomeFile.txt
It inserts New Text one line below each line that contains SearchPattern.
To add two lines, you can use a \ and enter a newline while typing New Text.
POSIX sed requires a \ and a newline after the a sed function. [1]
Specifying the text to append without the newline is a GNU sed extension (as documented in the sed info page), so its usage is not as portable.
[1] https://unix.stackexchange.com/questions/52131/sed-on-osx-insert-at-a-certain-line/
Insert a new verse after the given verse in your stanza:
sed -i '/^lorem ipsum dolor sit amet$/ s:$:\nconsectetur adipiscing elit:' FILE
More portable to use ed; some systems don't support \n in sed
printf "/^lorem ipsum dolor sit amet/a\nconsectetur adipiscing elit\n.\nw\nq\n" |\
/bin/ed $filename

How to add page numbers to Postscript/PDF

If you've got a large document (500 pages+) in Postscript and want to add page numbers, does anyone know how to do this?
Based on rcs's proposed solution, I did the following:
Converted the document to example.pdf and ran pdflatex addpages, where addpages.tex reads:
\documentclass[8pt]{article}
\usepackage[final]{pdfpages}
\usepackage{fancyhdr}
\topmargin 70pt
\oddsidemargin 70pt
\pagestyle{fancy}
\rfoot{\Large\thepage}
\cfoot{}
\renewcommand {\headrulewidth}{0pt}
\renewcommand {\footrulewidth}{0pt}
\begin{document}
\includepdfset{pagecommand=\thispagestyle{fancy}}
\includepdf[fitpaper=true,scale=0.98,pages=-]{example.pdf}
% fitpaper & scale aren't always necessary - depends on the paper being submitted.
\end{document}
or alternatively, for two-sided pages (i.e. with the page number consistently on the outside):
\documentclass[8pt]{book}
\usepackage[final]{pdfpages}
\usepackage{fancyhdr}
\topmargin 70pt
\oddsidemargin 150pt
\evensidemargin -40pt
\pagestyle{fancy}
\fancyhead{}
\fancyfoot{}
\fancyfoot[LE,RO]{\Large\thepage}
\renewcommand{\headrulewidth}{0pt}
\renewcommand{\footrulewidth}{0pt}
\begin{document}
\includepdfset{pages=-,pagecommand=\thispagestyle{fancy}}
\includepdf{target.pdf}
\end{document}
Easy way to change header margins:
% set margins for headers, won't shrink included pdfs
% you can remove the topmargin/oddsidemargin/evensidemargin lines
\usepackage[margin=1in,includehead,includefoot]{geometry}
you can simply use
pspdftool
http://sourceforge.net/projects/pspdftool
in this way:
pspdftool 'number(x=-1pt,y=-1pt,start=1,size=10)' input.pdf output.pdf
see these two examples (unnumbered and numbered pdf with pspdftool)
unnumbered pdf
http://ge.tt/7ctUFfj2
numbered pdf
http://ge.tt/7ctUFfj2
with this as the first command-line argument:
number(start=1, size=40, x=297.5 pt, y=10 pt)
I used to add page numbers to my pdf using latex like in the accepted answer.
Now I found an easier way:
Use enscript to create empty pages with a header containing the page number, and then use pdftk with the multistamp option to put the header on your file.
This bash script expects the pdf file as it's only parameter:
#!/bin/bash
input="$1"
output="${1%.pdf}-header.pdf"
pagenum=$(pdftk "$input" dump_data | grep "NumberOfPages" | cut -d":" -f2)
enscript -L1 --header='||Page $% of $=' --output - < <(for i in $(seq "$pagenum"); do echo; done) | ps2pdf - | pdftk "$input" multistamp - output $output
I was looking for a postscript-only solution, using ghostscript. I needed this to merge multiple PDFs and put a counter on every page. Only solution I found was an old gs-devel posting, which I heavily simplified:
%!PS
% add page numbers document bottom right (20 units spacing , harcoded below)
% Note: Page dimensions are expressed in units of the default user space (72nds of an inch).
% inspired by https://www.ghostscript.com/pipermail/gs-devel/2005-May/006956.html
globaldict /MyPageCount 1 put % initialize page counter
% executed at the end of each page. Before calling the procedure, the interpreter
% pushes two integers on the operand stack:
% 1. a count of previous showpage executions for this device
% 2. a reason code indicating the circumstances under which this call is being made:
% 0: During showpage or (LanguageLevel 3) copypage
% 1: During copypage (LanguageLevel 2 only)
% 2: At device deactivation
% The procedure must return a boolean value specifying whether to transmit the page image to the
% physical output device.
<< /EndPage {
exch pop % remove showpage counter (unused)
0 eq dup { % only run and return true for showpage
/Helvetica 12 selectfont % select font and size for following operations
MyPageCount =string cvs % get page counter as string
dup % need it twice (width determination and actual show)
stringwidth pop % get width of page counter string ...
currentpagedevice /PageSize get 0 get % get width from PageSize on stack
exch sub 20 sub % pagewidth - stringwidth - some extra space
20 moveto % move to calculated x and y=20 (0/0 is the bottom left corner)
show % finally show the page counter
globaldict /MyPageCount MyPageCount 1 add put % increment page counter
} if
} bind >> setpagedevice
If you save this to a file called pagecount.ps you can use it on command line like this:
gs \
-dBATCH -dNOPAUSE \
-sDEVICE=pdfwrite -dPDFSETTINGS=/prepress \
-sOutputFile=/path/to/merged.pdf \
-f pagecount.ps -f input1.pdf -f input2.pdf
Note that pagecount.ps must be given first (technically, right before the the input file which the page counting should start with).
If you don't want to use an extra .ps file, you can also use a minimized form like this:
gs \
-dBATCH -dNOPAUSE \
-sDEVICE=pdfwrite -dPDFSETTINGS=/prepress \
-sOutputFile=/path/to/merged.pdf \
-c 'globaldict /MyPageCount 1 put << /EndPage {exch pop 0 eq dup {/Helvetica 12 selectfont MyPageCount =string cvs dup stringwidth pop currentpagedevice /PageSize get 0 get exch sub 20 sub 20 moveto show globaldict /MyPageCount MyPageCount 1 add put } if } bind >> setpagedevice' \
-f input1.pdf -f input2.pdf
Depending on your input, you may have to use gsave/grestore at the beginning/end of the if block.
This might be a solution:
convert postscript to pdf using ps2pdf
create a LaTeX file and insert the pages using the pdfpages package (\includepdf)
use pagecommand={\thispagestyle{plain}} or something from the fancyhdr package in the arguments of \includepdf
if postscript output is required, convert the pdflatex output back to postscript via pdf2ps
Further to captaincomic's solution, I've extended it to support the starting of page numbering at any page.
Requires enscript, pdftk 1.43 or greater and pdfjam (for pdfjoin utility)
#!/bin/bash
input="$1"
count=$2
blank=$((count - 1))
output="${1%.pdf}-header.pdf"
pagenum=$(pdftk "$input" dump_data | grep "NumberOfPages" | cut -d":" -f2)
(for i in $(seq "$blank"); do echo; done) | enscript -L1 -B --output - | ps2pdf - > /tmp/pa$$.pdf
(for i in $(seq "$pagenum"); do echo; done) | enscript -a ${count}- -L1 -F Helvetica#10 --header='||Page $% of $=' --output - | ps2pdf - > /tmp/pb$$.pdf
pdfjoin --paper letter --outfile /tmp/join$$.pdf /tmp/pa$$.pdf /tmp/pb$$.pdf &>/dev/null
cat /tmp/join$$.pdf | pdftk "$input" multistamp - output "$output"
rm /tmp/pa$$.pdf
rm /tmp/pb$$.pdf
rm /tmp/join$$.pdf
For example.. place this in /usr/local/bin/pagestamp.sh and execute like:
pagestamp.sh doc.pdf 3
This will start the page number at page 3.. useful when you have coversheets, title pages and table of contents, etc.
The unfortunate thing is that enscript's --footer option is broken, so you cannot get the page numbering at the bottom using this method.
I liked the idea of using pspdftool (man page) but what I was after was page x out of y format and the font style to match the rest of the page.
To find out about the font names used in the document:
$ strings input.pdf | grep Font
To get the number of pages:
$ pdfinfo input.pdf | grep "Pages:" | tr -s ' ' | cut -d" " -f2
Glue it together with a few pspdftool commands:
$ in=input.pdf; \
out=output.pdf; \
indent=30; \
pageNumberIndent=49; \
pageCountIndent=56; \
font=LiberationSerif-Italic; \
fontSize=9; \
bottomMargin=40; \
pageCount=`pdfinfo $in | grep "Pages:" | tr -s ' ' | cut -d" " -f2`; \
pspdftool "number(x=$pageNumberIndent pt, y=$bottomMargin pt, start=1, size=$fontSize, font=\"$font\")" $in tmp.pdf; \
pspdftool "text(x=$indent pt, y=$bottomMargin pt, size=$fontSize, font=\"$font\", text=\"page \")" tmp.pdf tmp.pdf; \
pspdftool "text(x=$pageCountIndent pt, y=$bottomMargin pt, size=$fontSize, font=\"$font\", text=\"out of $pageCount\")" tmp.pdf $out; \
rm tmp.pdf;
Here is the result:
Oh, it's a long time since I used postscript, but a quick dip into the blue book will tell you :) www-cdf.fnal.gov/offline/PostScript/BLUEBOOK.PDF
On the other hand, Adobe Acrobat and a bit of javascript would also do wonders ;)
Alternatively, I did find this: http://www.ghostscript.com/pipermail/gs-devel/2005-May/006956.html, which seems to fit the bill (I didn't try it)
You can use the free and open source pdftools to add page numbers to a PDF file with a single command line.
The command line you could use is (on GNU/Linux you have to escape the $ sign in the shell, on Windows it is not necessary):
pdftools.py --input-file ./input/wikipedia_algorithm.pdf --output ./output/addtext.pdf --text "\$page/\$pages" br 1 1 --overwrite
Regarding the --text option:
The first parameter is the text to add. Some placeholders are available. $page stands for the current page number, while $pages stands for the total number of pages in the PDF file. Thus the option so formulated would add something like "1/10" for the first page of a 10-page PDF document, and so on for the following pages
The second parameter is the anchor point of the text box. "br" will position the bottom right corner of the text box
The third parameter is the horizontal position of the anchor point of the text box as a percentage of the page width. Must be a number between 0 and 1, with the dot . separating decimals
The fourth parameter option is the vertical position of the anchor point on the text box as a percentage of the page height. Must be a number between 0 and 1, with the dot . separating decimals
Disclaimer: I'm the author of pdftools
I am assuming you are looking for a PS-based solution. There is no page-level operator in PS that will allow you to do this. You need to add a footer-sort of thingy in the PageSetup section for each page. Any scripting language should be able to help you along.
I tried pspdftool (http://sourceforge.net/projects/pspdftool).
I eventually got it to work, but at first I got this error:
pspdftool: xreftable read error
The source file was created with pdfjoin from pdfjam, and contained a bunch of scans from my Epson Workforce as well as generated tag pages. I couldn't figure out a way to fix the xref table, so I converted to ps with pdf2ps and back to pdf with pdf2ps. Then I could use this to get nice page numbers on the bottom right corner:
pspdftool 'number(start=1, size=20, x=550 pt, y=10 pt)' input.pdf output.pdf
Unfortunately, it means that any text-searchable pages are no longer searchable because the text was rasterized in the ps conversion. Fortunately, in my case it doesn't matter.
Is there any way to fix or empty the xref table of a pdf file without losing what pages are searchable?
I took captaincomic's solution and added support for filenames containing spaces, plus giving some more informations about the progress
#!/bin/bash
clear
echo
echo This skript adds pagenumbers to a given .pdf file.
echo
echo This skript needs the packages pdftk and enscript
echo if not installed the script will fail.
echo use the command sudo apt-get install pdftk enscript
echo to install.
echo
input="$1"
output="${1%.pdf}-header.pdf"
echo input file is $input
echo output file will be $output
echo
pagenum=$(pdftk "$input" dump_data | grep "NumberOfPages" | cut -d":" -f2)
enscript -L1 --header='||Page $% of $=' --output - < <(for i in $(seq "$pagenum"); do echo; done) | ps2pdf - | pdftk "$input" multistamp - output "$output"
echo done.
I wrote the following shell script to solve this for LaTeX beamer style slides produced with inkscape (I pdftk cat the slides together into the final presentation PDF & then add slide numbers using the script below):
#!/bin/sh
# create working directory
tmpdir=$(mktemp --directory)
# read un-numbered beamer slides PDF from STDIN & create temporary copy
cat > $tmpdir/input.pdf
# get total number of pages
pagenum=$(pdftk $tmpdir/input.pdf dump_data | awk '/NumberOfPages/{print $NF}')
# generate latex beamer document with the desired number of empty but numbered slides
printf '%s' '
\documentclass{beamer}
\usenavigationsymbolstemplate{}
\setbeamertemplate{footline}[frame number]
\usepackage{forloop}
\begin{document}
\newcounter{thepage}
\forloop{thepage}{0}{\value{thepage} < '$pagenum'}{
\begin{frame}
\end{frame}
}
\end{document}
' > $tmpdir/numbers.tex
# compile latex file into PDF (2nd run needed for total number of pages) & redirect output to STDERR
pdflatex -output-directory=$tmpdir numbers.tex >&2 && pdflatex -output-directory=$tmpdir numbers.tex >&2
# add empty numbered PDF slides as background to (transparent background) input slides (page by
# page) & write results to STDOUT
pdftk $tmpdir/input.pdf multibackground $tmpdir/numbers.pdf output -
# remove temporary working directory with all intermediate files
rm -r $tmpdir >&2
The script reads STDIN & writes STDOUT printing diagnostic pdflatex output to STDERR.
So just copy-paste the above code in a text file, say enumerate_slides.sh, make it executable (chmod +x enumerate_slides.sh) & call it like this:
./enumerate_slides.sh < input.pdf > output.pdf [2>/dev/null]
It should be easy to adjust this to any other kind of document by adjusting the LaTeX template to use the proper documentclass, paper size & style options.
edit:
I replaced echo by $(which echo) since in ubuntu symlinks /bin/sh to dash which overrides the echo command by a shell internal interpreting escape sequences by default & not providing the -E option to override this behaviour. Note that alternatively you could escape all \ in the LaTeX template as \\.
edit:
I replaced $(which echo) by printf '%s' since in zsh, which echo returns echo: shell built-in command instead of /bin/echo.
See this question for details why I decided to use printf in the end.
Maybe pstops (part of psutils) can be used for this?
I have used LibreOffice Calc for this. Adding a page number field is easy using Insert->Field->Page Number. And then you can copy-and-paste this field to other pages; fortunately the position is not changed and the copy-and-paste can be done quickly with down arrow key and Ctrl+V. Worked for me for a 30 page article. Maybe prone to errors for a 500+ one!