Splitting single page into two pages with ghostscript

Splitting single page into two pages with ghostscript - pdf

I have a pdf with something like presentations slides and multiple slides per page. How can I use ghostscript to split the file so that there is one slide per page?

A long time ago I wrote some code for someone on comp.lang.postscript to do this, again it was for PowerPoint slides. This PostScript code assumes that all the 'subpages' (ie slides) are the same size and location on the PDF page and that all the PDF pages are the same size. Save the following as a file called pdf_slice.ps and follow the usage as described in the comments.
%!PS
% Copyright (C) 2011 Artifex Software, Inc. All rights reserved.
%
% This software is provided AS-IS with no warranty, either express or
% implied.
%
% This software is distributed under license and may not be copied,
% modified or distributed except as expressly authorized under the terms
% of the license contained in the file LICENSE in this distribution.
%
% For more information about licensing, please refer to
% http://www.ghostscript.com/licensing/. For information on
% commercial licensing, go to http://www.artifex.com/licensing/ or
% contact Artifex Software, Inc., 101 Lucas Valley Road #110,
% San Rafael, CA 94903, U.S.A., +1(415)492-9861.
%
% Slice up a PDF file
%
% usage: gs -sFile=____.pdf -dSubPagesX= -dSubPagesY= [-dSubPageOrder=] [-dVerbose=]pdf_slice.ps
%
% SubPageOrder is a bit field;
% Default = 0
% Bit 0 - 0 = top to bottom
% 1 = bottom to top
% Bit 1 - 0 = left to right
% 1 = right to left
% Bit 3 - 0 = increase x then y
% - 1 = increase y then x
%
% 0 - page 1 at top left, increasing left to right, top to bottom
% 1 - page 1 at bottom left increasing left to right, bottom to top
% 2 - page 1 at top right, increasing right to left, top to bottom
% 3 - page 1 at bottom right increasing right to left, bottom to top
% 4 - page 1 at top left, increasing top to bottom, left to right
% 5 - page 1 at bottom left increasing bottom to top, left to right
% 6 - page 1 at top right, increasing top to bottom, right to left
% 7 - page 1 at bottom right increasing bottom to top, right to left
%
% Check the parameters to see they are present and of the correct type
%
/Usage {
( usage: gs -dNODISPLAY -q -sFile=____.pdf \n) =
( -dSubPagesX= -dSubPagesY= [-dSubPageOrder=] pdf_slice.ps \n) =
(Please see comments in pdf_slice.ps for more details) =
flush
quit
} bind def
/Verbose where not {
/Verbose false def
}{
pop /Verbose true def
} ifelse
/File where not {
(\n *** Missing source file. \(use -sFile=____.pdf\)\n) =
Usage
} {
pop
}ifelse
/SubPagesX where not {
(\n *** SubPagesX not integer! \(use -dSubPagesX=\)\n) =
Usage
} {
Verbose { (SubPagesX ) print } if
SubPagesX type
Verbose { dup == } if
/integertype eq not {
(\n *** SubPagesX not integer! \(use -dSubPagesX=\)\n) =
Usage
}
pop
}ifelse
/SubPagesY where not {
(\n *** SubPagesY not integer! \(use -dSubPagesY=\)\n) =
Usage
} {
Verbose { (SubPagesY ) print } if
SubPagesY type
Verbose { dup == } if
/integertype eq not {
(\n *** SubPagesY not integer! \(use -dSubPagesY=\)\n) =
Usage
}
pop
}ifelse
/SubPageOrder where not {
/SubPageOrder 0 def
} {
Verbose { (SubPageOrder ) print } if
SubPageOrder type
Verbose { dup == } if
dup ==
/integertype eq not {
(\n *** SubPageOrder not integer! \(use -dSubPageOrder=\)\n) =
Usage
}
pop
}ifelse
%
% Turns off most messages
%
/QUIET true def % in case they forgot
%() =
%
% Open the PDF file and tell the PDF interpreter to start dealing with it
%
File dup (r) file runpdfbegin pop
/PDFPageCount pdfpagecount def
%
% Set up our bookkeeping
%
% First get the size of the page from page 1 of the PDF file
% We assume that all PDF pages are the same size.
%
1 pdfgetpage currentpagedevice
1 index get_any_box
exch pop dup 2 get exch 3 get
/PDFHeight exch def
/PDFWidth exch def
%
% Now get the page size of the current device. We are assuming that
% this is the size of the individual sub-pages in the original PDF. NB
% This assumes no margins between sub-pages, all sub-pages the same size.
%
currentpagedevice /PageSize get
dup 0 get /SubPageWidth exch def
1 get /SubPageHeight exch def
%
% Calculate the margins. This is the margin between the page border and
% the enclosed group of sub-pages, we assume there are no borders
% between sub pages.
%
/TopMargin PDFHeight SubPageHeight SubPagesY mul sub 2 div def
/LeftMargin PDFWidth SubPageWidth SubPagesX mul sub 2 div def
Verbose {
(PDFHeight = ) print PDFHeight ==
(PDFWidth = ) print PDFWidth ==
(SubPageHeight = ) print SubPageHeight ==
(SubPageWidth = ) print SubPageWidth ==
(TopMargin = ) print TopMargin ==
(LeftMmargin = ) print LeftMargin ==
} if
%
% This rouitne calculates and sets the PageOffset in the page device
% dictionary for each subpage, so that the PDF page is 'moved' in such
% a way that the required sub page is under the 'window' which is the current
% page being imaged.
%
/NextPage {
SubPageOrder 2 mod 0 eq {
/H SubPagesY SubPageY sub SubPageHeight mul TopMargin add def
}{
/H SubPageY 1 sub SubPageHeight mul TopMargin add def
} ifelse
SubPageOrder 2 div floor cvi 2 mod 0 eq {
/W SubPageX 1 sub SubPageWidth mul LeftMargin add def
}{
/W SubPagesX SubPageX sub SubPageWidth mul LeftMargin add def
} ifelse
<< /PageOffset [W neg H neg]>> setpagedevice
Verbose {
(SubPageX ) print SubPageX ==
(SubPageY ) print SubPageY ==
(X Offset ) print W ==
(Y Offset ) print H == flush
} if
PDFPage
} bind def
%
% The main loop
% For every page in the original PDF file
%
1 1 PDFPageCount
{
/PDFPage exch def
% Do the gross ordering here rather than in
% NextPage. We eiither process rows and then
% columns, or columns then rows, depending on
% Bit 3 of SubPageorder
SubPageOrder 3 le {
1 1 SubPagesY {
/SubPageY exch def
1 1 SubPagesX {
/SubPageX exch def
NextPage
pdfgetpage
pdfshowpage
} for
} for
} {
1 1 SubPagesX {
/SubPageX exch def
1 1 SubPagesY {
/SubPageY exch def
NextPage
pdfgetpage
pdfshowpage
} for
} for
} ifelse
} for

The answer of KenS is the one which should be accepted by #howardh. KenS uses a very clever PostScript language program to achieve the result. (Always keep in mind what KenS said: his solution will work well only 'if all the 'subpages' (ie slides) are the same size and location on the PDF page and that all the PDF pages are the same size).
However, for completeness' sake, let me link to a few other previous answers (some of which are illustrated), which solved similar problems:
Convert PDF 2 sides per page to 1 side per page (SuperUser.com)
How can I split a PDF's pages down the middle? (SuperUser.com)
Cropping a PDF using Ghostscript 9.01 (StackOverflow.com)
PDF - Remove White Margins (StackOverflow.com)
Split one PDF page into two (StackOverflow.com)
Freeware to split a pdf's pages down the middle? (SuperUser.com)
These answers also use PostScript code, but only as 'snippets' which are passed to Ghostscript on the commandline. (If you are not PostScript-savvy, these may be more easy to modify and adapt for cases where the 'subpages' are not of the same size and location on PDF pages, and where PDF pages are of different sizes.)

I would like to propose one solution, that actually
1) splits one PS or PDF page to many separate pages and
2) then merges *.pdf to multipage pdf.
But this solution don't process margins.
This script works in Linux BASH:
INPUT="input.ps" ;
RESOLUTION=72 ;
WHOLE_WIDTH=598 ; # current size of portrait A4
WHOLE_HEIGHT=843 ;
COLOUMNS=2 ; # split vertically
ROWS=1 ; # split horizontally
PAGE_WIDTH=$((WHOLE_WIDTH/COLOUMNS)) ;
PAGE_HEIGHT=$((WHOLE_HEIGHT/ROWS)) ;
# Split:
for x in `seq 1 ${COLOUMNS}` ; do
for y in `seq 1 ${ROWS}` ; do
gs -dBATCH -dNOPAUSE -dSAFER \
-o gramps_tmp_${x},${y}.pdf \
-r${RESOLUTION} \
-sDEVICE=pdfwrite \
-g${PAGE_WIDTH}x${PAGE_HEIGHT} \
-c "<</PageOffset [$(((x - 1)*(0 - PAGE_WIDTH))) \
$(((y - 1)*(0 - PAGE_HEIGHT)))]>> setpagedevice" \
-f "$INPUT" ;
done ;
done ;
# Merge:
gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=singleCombinedPdfFile.pdf -dBATCH gramps_tmp_*.pdf ;
But we may arrange pages in desired order:
ORDERED="tmp_1,1.pdf tmp_1,2.pdf" ;
gs -dNOPAUSE -sDEVICE=pdfwrite -sOUTPUTFILE=singleCombinedMultipagePdfFile.pdf -dBATCH ${ORDERED};

Related

decoding base64 encoded text with POSIX awk

In a bash script that I'm writing for Linux/Solaris I need to decode more than a hundred thousand base64-encoded text strings, and, because I don't wanna massively fork a non-portable base64 binary from awk, I wrote a function that does the decoding.
Here's the code of my base64_decode function:
function base64_decode(str, out,i,n,v) {
out = ""
if ( ! ("A" in _BASE64_DECODE_c2i) )
for (i = 1; i <= 64; i++)
_BASE64_DECODE_c2i[substr("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/",i,1)] = i-1
i = 0
n = length(str)
while (i <= n) {
v = _BASE64_DECODE_c2i[substr(str,++i,1)] * 262144 + \
_BASE64_DECODE_c2i[substr(str,++i,1)] * 4096 + \
_BASE64_DECODE_c2i[substr(str,++i,1)] * 64 + \
_BASE64_DECODE_c2i[substr(str,++i,1)]
out = out sprintf("%c%c%c", int(v/65536), int(v/256), v)
}
return out
}
Which works fine:
printf '%s\n' SmFuZQ== amRvZQ== |
LANG=C command -p awk '
{ print base64_decode($0) }
function base64_decode(...) {...}
'
Jane
jdoe
SIMPLIFIED REAL-LIFE EXAMPLE THAT DOESN'T WORK AS EXPECTED
I want to get the givenName of the users that are members of GroupCode = 025496 from the output of ldapsearch -LLL -o ldif-wrap=no ... '(|(uid=*)(GroupCode=*))' uid givenName sn GroupCode memberUid:
dn: uid=jsmith,ou=users,dc=example,dc=com
givenName: John
sn: SMITH
uid: jsmith
dn: uid=jdoe,ou=users,dc=example,dc=com
uid: jdoe
givenName:: SmFuZQ==
sn:: RE9F
dn: cn=group1,ou=groups,dc=example,dc=com
GroupCode: 025496
memberUid:: amRvZQ==
memberUid: jsmith
Here would be an awk for doing so:
LANG=C command -p awk -F '\n' -v RS='' -v GroupCode=025496 '
{
delete attrs
for (i = 2; i <= NF; i++) {
match($i,/::? /)
key = substr($i,1,RSTART-1)
val = substr($i,RSTART+RLENGTH)
if (RLENGTH == 3)
val = base64_decode(val)
attrs[key] = ((key in attrs) ? attrs[key] SUBSEP val : val)
}
if ( /\nuid:/ )
givenName[ attrs["uid"] ] = attrs["givenName"]
else
memberUid[ attrs["GroupCode"] ] = attrs["memberUid"]
}
END {
n = split(memberUid[GroupCode],uid,SUBSEP)
for ( i = 1; i <= n; i++ )
print givenName[ uid[i] ]
}
function base64_decode(...) { ... }
'
On BSD and Solaris the result is:
Jane
John
While on Linux it is:
John
I don't know where the issue might be; is there something wrong with the base64_decode function and/or the code that uses it?

Your function generates NUL bytes when its argument (encoded string) ends with padding characters (=s). Below is a corrected version of your while loop:
while (i < n) {
v = _BASE64_DECODE_c2i[substr(str,1+i,1)] * 262144 + \
_BASE64_DECODE_c2i[substr(str,2+i,1)] * 4096 + \
_BASE64_DECODE_c2i[substr(str,3+i,1)] * 64 + \
_BASE64_DECODE_c2i[substr(str,4+i,1)]
i += 4
if (v%256 != 0)
out = out sprintf("%c%c%c", int(v/65536), int(v/256), v)
else if (int(v/256)%256 != 0)
out = out sprintf("%c%c", int(v/65536), int(v/256))
else
out = out sprintf("%c", int(v/65536))
}
Note that if the decoded bytes contains an embedded NUL then this approach may not work properly.

Problem is within base64_decode function that outputs some junk characters on gnu-awk.
You can use this awk code that uses system provided base64 utility as an alternative:
{
delete attrs
for (i = 2; i <= NF; i++) {
match($i,/::? /)
key = substr($i,1,RSTART-1)
val = substr($i,RSTART+RLENGTH)
if (RLENGTH == 3) {
cmd = "echo " val " | base64 -di"
cmd | getline val # should also check exit code here
}
attrs[key] = ((key in attrs) ? attrs[key] SUBSEP val : val)
}
if ( /\nuid:/ )
givenName[ attrs["uid"] ] = attrs["givenName"]
else
memberUid[ attrs["GroupCode"] ] = attrs["memberUid"]
}
END {
n = split(memberUid[GroupCode],uid,SUBSEP)
for ( i = 1; i <= n; i++ )
print givenName[ uid[i] ]
}
I have tested this on gnu and BSD awk versions and I am getting expected output in all the cases.
If you cannot use external base64 utility then I suggest you take a look here for awk version of base64 decode.

This answer is for reference
Here's a working base64_decode function (thanks #MNejatAydin for pointing out the issue(s) in the original one):
function base64_decode(str, out,bits,n,i,c1,c2,c3,c4) {
out = ""
# One-time initialization during the first execution
if ( ! ("A" in _BASE64) )
for (i = 1; i <= 64; i++)
# The "_BASE64" array associates a character to its base64 index
_BASE64[substr("ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+/",i,1)] = i-1
# Decoding the input string
n = length(str)
i = 0
while ( i < n ) {
c1 = substr(str, ++i, 1)
c2 = substr(str, ++i, 1)
c3 = substr(str, ++i, 1)
c4 = substr(str, ++i, 1)
bits = _BASE64[c1] * 262144 + _BASE64[c2] * 4096 + _BASE64[c3] * 64 + _BASE64[c4]
if ( c4 != "=" )
out = out sprintf("%c%c%c", bits/65536, bits/256, bits)
else if ( c3 != "=" )
out = out sprintf("%c%c", bits/65536, bits/256)
else
out = out sprintf("%c", bits/65536)
}
return out
}
WARNING: the function requires LANG=C
It also doesn't check that the input is a valid base64 string; for that you can add a simple condition like:
match( str, "^([a-zA-Z/-9+]{4})*([a-zA-Z/-9+]{2}[a-zA-Z/-9+=]{2})?$" )
Interestingly, the code is 2x faster than base64decode.awk, but it's only 3x faster than forking the base64 binary from inside awk.
notes:
In a base64 encoded string, 4 bytes represent 3 bytes of data; the input have to be processed by groups of 4 characters.
Multiplying and dividing an integer by a power of two is equivalent to do bitwise left and right shifts operations.
262144 is 2^18, so N * 262144 is equivalent to N << 18
4096 is 2^12, so N * 4096 is equivalent to N << 12
64 id 2^6, so N * 4096 is equivalent to N << 6
65536 is 2^16, so N / 65536 (integer division) is equivalent to N >> 16
256 is 2^8, so N / 256 (integer division) is equivalent to N >> 8
What happens in printf "%c", N:
N is first converted to an integer (if need be) and then, WITH LANG=C, the 8 least significant bits are taken in for the %c formatting.
How the possible padding of one or two trailing = characters at the end of the encoded string is handled:
If the 4th char isn't = (i.e. there's no padding) then the result should be 3 bytes of data.
If the 4th char is = and the 3rd char isn't = then there's 2 bytes of of data to decode.
If the fourth char is = and the third char is = then there's only one byte of data.

Adam optimizer in MatConvNet

I tried implementing Adam instead of default SGD optimizer by changing following code in cnn_train from:
opts.solver = [] ; % Empty array means use the default SGD solver
[opts, varargin] = vl_argparse(opts, varargin) ;
if ~isempty(opts.solver)
assert(isa(opts.solver, 'function_handle') && nargout(opts.solver) == 2,...
'Invalid solver; expected a function handle with two outputs.') ;
% Call without input arguments, to get default options
opts.solverOpts = opts.solver() ;
end
to:
opts.solver = 'adam';
[opts, varargin] = vl_argparse(opts, varargin) ;
opts.solverOpts = opts.solver() ;
However, I get an error:
Insufficient number of outputs from right hand side of equal sign to satisfy assignment.
Error in cnn_train>accumulateGradients (line 508)
params.solver(net.layers{l}.weights{j}, state.solverState{l}{j}, ...
Have any of you tried changing from default compiler? What else should I change in cnn_train?
The code for Adam function:
function [w, state] = adam(w, state, grad, opts, lr)
%ADAM
% Adam solver for use with CNN_TRAIN and CNN_TRAIN_DAG
%
% See [Kingma et. al., 2014](http://arxiv.org/abs/1412.6980)
% | ([pdf](http://arxiv.org/pdf/1412.6980.pdf)).
%
% If called without any input argument, returns the default options
% structure. Otherwise provide all input arguments.
%
% W is the vector/matrix/tensor of parameters. It can be single/double
% precision and can be a `gpuArray`.
%
% STATE is as defined below and so are supported OPTS.
%
% GRAD is the gradient of the objective w.r.t W
%
% LR is the learning rate, referred to as \alpha by Algorithm 1 in
% [Kingma et. al., 2014].
%
% Solver options: (opts.train.solverOpts)
%
% `beta1`:: 0.9
% Decay for 1st moment vector. See algorithm 1 in [Kingma et.al. 2014]
%
% `beta2`:: 0.999
% Decay for 2nd moment vector
%
% `eps`:: 1e-8
% Additive offset when dividing by state.v
%
% The state is initialized as 0 (number) to start with. The first call to
% this function will initialize it with the default state consisting of
%
% `m`:: 0
% First moment vector
%
% `v`:: 0
% Second moment vector
%
% `t`:: 0
% Global iteration number across epochs
%
% This implementation borrowed from torch optim.adam
% Copyright (C) 2016 Aravindh Mahendran.
% All rights reserved.
%
% This file is part of the VLFeat library and is made available under
% the terms of the BSD license (see the COPYING file).
if nargin == 0 % Returns the default solver options
w = struct('beta1', 0.9, 'beta2', 0.999, 'eps', 1e-8) ;
return ;
end
if isequal(state, 0) % start off with state = 0 so as to get default state
state = struct('m', 0, 'v', 0, 't', 0);
end
% update first moment vector `m`
state.m = opts.beta1 * state.m + (1 - opts.beta1) * grad ;
% update second moment vector `v`
state.v = opts.beta2 * state.v + (1 - opts.beta2) * grad.^2 ;
% update the time step
state.t = state.t + 1 ;
% This implicitly corrects for biased estimates of first and second moment
% vectors
lr_t = lr * (((1 - opts.beta2^state.t)^0.5) / (1 - opts.beta1^state.t)) ;
% Update `w`
w = w - lr_t * state.m ./ (state.v.^0.5 + opts.eps) ;

"Insufficient number of outputs from right hand side of equal sign to satisfy assignment."
It seems your number of outputs doesn't match what's required by cnn_train.
Could you show your adam function?
In the latest version of MatConvNet
[net.layers{l}.weights{j}, state.solverState{l}{j}] = ...
params.solver(net.layers{l}.weights{j}, state.solverState{l}{j}, ...
parDer, params.solverOpts, thisLR) ;
It seems to match your adam function
Why don't you try this:
opts.solver = #adam;
Instead of opts.solver = 'adam';

Split large PDF image into multiple pages using pstools

For a rather large scale project we have a need to render call graphs using the tools egypt that creates Graphviz input files. Currently, the command line:
dot -T pdf -Granksep=1.42 -Nfontsize=8 input.dot -o output.pdf
renders a single page pdf image of 37803x2078 pts which is a bit larger than what will fit on a single A3 page and still be readable.
I already know of the tool Poster Printer, but unfortunately, in this restrictive environment, all I have at hand are graphviz, ghostscript and the other common ps/pdf command line programs.
I've tried setting the 'page' directive in the dot file, but that gave me a 200 page postscript file with no obvious page order so that option is currently out.
What I'd prefer is a gs command line where I split my huge ps/pdf file into a pre-defined number of pages, say 5x2, or a way to control the scaling of the output from graphviz to fit on 5x2 pages.

You can try to use both the size and
page attributes, but you first need to find out the computed size of your graph to have an idea of the ration between width and height. You can find it at the beginning of the .dot file.
In an example of mine, I have for instance:
graph [bb="0,0,18866,1005"];
which means roughly 18:1.
Because I use A4 paper which is about 8,27 x 11,69 inches, I have set page size to 8 x 11.
And because of the image size, I decided to print on 20 x 1 pages, which gives me:
dot -Gpage="8,11" -Gsize="160,10" -Tps graph.dot > graph.ps
There is also a ratio attribute that you could look at, but I've never used it.

Some time ago I wrote this for someone who wanted to slice up 2x2 PowerPoint style slides. I have no idea if it still works but you can try it. Note this only works with Ghostscript.
%!PS
% Copyright (C) 2011 Artifex Software, Inc. All rights reserved.
%
% This software is provided AS-IS with no warranty, either express or
% implied.
%
% This software is distributed under license and may not be copied,
% modified or distributed except as expressly authorized under the terms
% of the license contained in the file LICENSE in this distribution.
%
% For more information about licensing, please refer to
% http://www.ghostscript.com/licensing/. For information on
% commercial licensing, go to http://www.artifex.com/licensing/ or
% contact Artifex Software, Inc., 101 Lucas Valley Road #110,
% San Rafael, CA 94903, U.S.A., +1(415)492-9861.
%
% Slice up a PDF file
%
% usage: gs -sFile=____.pdf -dSubPagesX= -dSubPagesY= [-dSubPageOrder=] [-dVerbose=]pdf_slice.ps
%
% SubPageOrder is a bit field;
% Default = 0
% Bit 0 - 0 = top to bottom
% 1 = bottom to top
% Bit 1 - 0 = left to right
% 1 = right to left
% Bit 3 - 0 = increase x then y
% - 1 = increase y then x
%
% 0 - page 1 at top left, increasing left to right, top to bottom
% 1 - page 1 at bottom left increasing left to right, bottom to top
% 2 - page 1 at top right, increasing right to left, top to bottom
% 3 - page 1 at bottom right increasing right to left, bottom to top
% 4 - page 1 at top left, increasing top to bottom, left to right
% 5 - page 1 at bottom left increasing bottom to top, left to right
% 6 - page 1 at top right, increasing top to bottom, right to left
% 7 - page 1 at bottom right increasing bottom to top, right to left
%
% Check the parameters to see they are present and of the correct type
%
/Usage {
( usage: gs -dNODISPLAY -q -sFile=____.pdf \n) =
( -dSubPagesX= -dSubPagesY= [-dSubPageOrder=] pdf_slice.ps \n) =
(Please see comments in pdf_slice.ps for more details) =
flush
quit
} bind def
/Verbose where not {
/Verbose false def
}{
pop /Verbose true def
} ifelse
/File where not {
(\n *** Missing source file. \(use -sFile=____.pdf\)\n) =
Usage
} {
pop
}ifelse
/SubPagesX where not {
(\n *** SubPagesX not integer! \(use -dSubPagesX=\)\n) =
Usage
} {
Verbose { (SubPagesX ) print } if
SubPagesX type
Verbose { dup == } if
/integertype eq not {
(\n *** SubPagesX not integer! \(use -dSubPagesX=\)\n) =
Usage
}
pop
}ifelse
/SubPagesY where not {
(\n *** SubPagesY not integer! \(use -dSubPagesY=\)\n) =
Usage
} {
Verbose { (SubPagesY ) print } if
SubPagesY type
Verbose { dup == } if
/integertype eq not {
(\n *** SubPagesY not integer! \(use -dSubPagesY=\)\n) =
Usage
}
pop
}ifelse
/SubPageOrder where not {
/SubPageOrder 0 def
} {
Verbose { (SubPageOrder ) print } if
SubPageOrder type
Verbose { dup == } if
dup ==
/integertype eq not {
(\n *** SubPageOrder not integer! \(use -dSubPageOrder=\)\n) =
Usage
}
pop
}ifelse
%
% Turns off most messages
%
/QUIET true def % in case they forgot
%() =
%
% Open the PDF file and tell the PDF interpreter to start dealing with it
%
File dup (r) file runpdfbegin pop
/PDFPageCount pdfpagecount def
%
% Set up our bookkeeping
%
% First get the size of the page from page 1 of the PDF file
% We assume that all PDF pages are the same size.
%
1 pdfgetpage currentpagedevice
1 index get_any_box
exch pop dup 2 get exch 3 get
/PDFHeight exch def
/PDFWidth exch def
%
% Now get the page size of the current device. We are assuming that
% this is the size of the individual sub-pages in the original PDF. NB
% This assumes no margins between sub-pages, all sub-pages the same size.
%
currentpagedevice /PageSize get
dup 0 get /SubPageWidth exch def
1 get /SubPageHeight exch def
%
% Calculate the margins. This is the margin between the page border and
% the enclosed group of sub-pages, we assume there are no borders
% between sub pages.
%
/TopMargin PDFHeight SubPageHeight SubPagesY mul sub 2 div def
/LeftMargin PDFWidth SubPageWidth SubPagesX mul sub 2 div def
Verbose {
(PDFHeight = ) print PDFHeight ==
(PDFWidth = ) print PDFWidth ==
(SubPageHeight = ) print SubPageHeight ==
(SubPageWidth = ) print SubPageWidth ==
(TopMargin = ) print TopMargin ==
(LeftMmargin = ) print LeftMargin ==
} if
%
% This rouitne calculates and sets the PageOffset in the page device
% dictionary for each subpage, so that the PDF page is 'moved' in such
% a way that the required sub page is under the 'window' which is the current
% page being imaged.
%
/NextPage {
SubPageOrder 2 mod 0 eq {
/H SubPagesY SubPageY sub SubPageHeight mul TopMargin add def
}{
/H SubPageY 1 sub SubPageHeight mul TopMargin add def
} ifelse
SubPageOrder 2 div floor cvi 2 mod 0 eq {
/W SubPageX 1 sub SubPageWidth mul LeftMargin add def
}{
/W SubPagesX SubPageX sub SubPageWidth mul LeftMargin add def
} ifelse
<< /PageOffset [W neg H neg]>> setpagedevice
Verbose {
(SubPageX ) print SubPageX ==
(SubPageY ) print SubPageY ==
(X Offset ) print W ==
(Y Offset ) print H == flush
} if
PDFPage
} bind def
%
% The main loop
% For every page in the original PDF file
%
1 1 PDFPageCount
{
/PDFPage exch def
% Do the gross ordering here rather than in
% NextPage. We eiither process rows and then
% columns, or columns then rows, depending on
% Bit 3 of SubPageorder
SubPageOrder 3 le {
1 1 SubPagesY {
/SubPageY exch def
1 1 SubPagesX {
/SubPageX exch def
NextPage
pdfgetpage
pdfshowpage
} for
} for
} {
1 1 SubPagesX {
/SubPageX exch def
1 1 SubPagesY {
/SubPageY exch def
NextPage
pdfgetpage
pdfshowpage
} for
} for
} ifelse
} for

Do you have to use pstools? Why not use pdftk instead? You can split and compress a PDF file easily using this command when using PDFTK
pdftk.exe Paper.pdf burst output Paper_%1d.pdf compress

Why is this postscript calculator (type 4) shading so slow to render at high zoom?

In an effort to produce a smooth gradient to specifications, I have tried my hand at using type 4 (postscript calculator) shading, so that I can write the function that specifies the color at each point. Here is the function I produced, which accepts two real numbers (x and y coordinates on [0,1] x [0,1]) and returns three real numbers (the r, g, b components of the color):
2 copy 0.25 sub exch 0.25 sub exch dup mul exch dup mul add dup .0001 le {pop 10000.0} {1.0 exch div} ifelse
3 1 roll 2 copy 0.75 sub exch 0.75 sub exch dup mul exch dup mul add dup .0001 le {pop 10000.0} {1.0 exch div} ifelse
3 1 roll 2 copy 0.75 sub exch 0.25 sub exch dup mul exch dup mul add dup .0001 le {pop 10000.0} {1.0 exch div} ifelse
3 1 roll 0.25 sub exch 0.75 sub exch dup mul exch dup mul add dup .0001 le {pop 10000.0} {1.0 exch div} ifelse
4 copy 0.0 add add add add 1.0 exch div
dup 3 1 roll mul 5 1 roll
dup 3 1 roll mul 5 1 roll
dup 3 1 roll mul 5 1 roll
dup 3 1 roll mul 5 1 roll pop
4 copy 0.0 exch 0 mul add exch 1 mul add exch 0 mul add exch 1 mul add 5 1 roll
4 copy 0.0 exch 0 mul add exch 1 mul add exch 1 mul add exch 0 mul add 5 1 roll
0.0 exch 1 mul add exch 0 mul add exch 0 mul add exch 0 mul add
Here is the Asymptote code that produced the string above as well as the actual pdf file:
// input: a nonnegative real number r^2 (the square of the distance)
// output: min(1/r^2, 10000.0)
string ps_weight_rsquared = ' dup .0001 le {pop 10000.0} {1.0 exch div} ifelse';
// input: x and y coordinates of a vector
// output: x^2 + y^2
string ps_distsquared = ' dup mul exch dup mul add';
//input: x and y coordinates
//output: the weight at (x,y)
string ps_weight_displacement = ps_distsquared + ps_weight_rsquared;
//input: x, y
//output: weight at the vector ((x,y) - point)
string ps_naiveWeight_pair(pair point) {
// compute displacement:
string toreturn = ' ' + (string)point.y + ' sub exch ' + (string)point.x + ' sub exch' ;
// compute weight from displacement:
return toreturn + ps_weight_displacement;
}
/* The string will be an postscript calculator formula that accepts
* a pair and returns a list of naive weights, with the deepest weight
* on the stack corresponding to points[0].
*/
string ps_naiveWeights_pair(pair[] points) {
string toreturn = '';
for (int i = 0; i < points.length; ++i) {
if (i < points.length - 1)
toreturn += ' 2 copy';
toreturn += ps_naiveWeight_pair(points[i]);
if (i < points.length - 1)
toreturn += ' 3 1 roll';
}
return toreturn;
}
// input: x,y
// output: the weights of all the displacement vectors ((x,y) - points[i]), normalized so that their sum is one
string ps_partitionWeights_pair(pair[] points) {
string toreturn = ps_naiveWeights_pair(points);
// compute the sum of the all the naive weights:
toreturn += ' ' + (string)points.length + ' copy 0.0';
for (int i = 0; i < points.length; ++i)
toreturn += ' add';
// take the reciprocal of the sum:
toreturn += ' 1.0 exch div';
for (int i = 1; i <= points.length; ++i) {
// multiply a weight by the sum reciprocal and roll the new weight to the back:
toreturn += ' dup 3 1 roll mul ' + (string)(1+points.length) + ' 1 roll';
}
//discard the sum reciprocal, which is no longer needed:
toreturn += ' pop';
return toreturn;
}
// Assumes the weights are already on the stack, with the deepest weight
// corresponding to summands[0].
string ps_weighted_sum(real[] summands) {
// At each step, the top element of the stack should be the sum so far:
string toreturn = ' 0.0';
while(summands.length > 0) {
toreturn += ' exch ' + (string)(summands.pop()) + ' mul add';
}
return toreturn;
}
// input: real numbers x, y
// output: shading function based on a weighted sum of the colors, with the weight of the color of point p equal to 1/(dist to p)^2 (and the weights normalized to have sum one)
string ps_interpolate_shade(path g, pair[] points, pen[] pointcolors) {
pair min = min(g);
pair max = max(g);
real[] reds, greens, blues;
for (pen thecolor : pointcolors) {
real[] thecolors = colors(rgb(thecolor));
reds.push(thecolors[0]);
greens.push(thecolors[1]);
blues.push(thecolors[2]);
}
transform t = scale(1/(max.x - min.x), 1/(max.y - min.y)) * shift(-min);
points = t * points;
string toreturn = ps_partitionWeights_pair(points);
toreturn += ' ' + (string)points.length + ' copy';
toreturn += ps_weighted_sum(reds);
toreturn += ' ' + (string)(points.length + 1) + ' 1 roll';
toreturn += ' ' + (string)points.length + ' copy';
toreturn += ps_weighted_sum(greens);
toreturn += ' ' + (string)(points.length + 1) + ' 1 roll';
toreturn += ps_weighted_sum(blues);
return toreturn;
}
void applyInterpolateShade(path g, pair[] points, pen[] pointcolors) {
string shader = ps_interpolate_shade(g, points, pointcolors);
write(shader); //output the ps string to the terminal
functionshade(g, fillrule=rgb(zerowinding), shader=shader);
}
/********************************************/
settings.tex = "pdflatex";
size(5cm);
applyInterpolateShade(unitcircle, new pair[] {(-.5,-.5), (.5,.5), (-.5,.5), (.5,-.5)}, new pen[] {red, green, yellow, blue});
And here is the output, converted to a png file:
It's pretty much what I had in mind.
The problem: If I open the pdf file (using either Apple Previewer or Adobe Reader) and zoom in, the rendering program slows to a crawl and (according to Activity Monitor) uses 100% of the CPU (from one core; fortunately I have other cores, so other applications keep responding). Am I doing something in the postscript function that is too computationally intensive? If so, am I using bugs or bad coding practices (memory leakage, too many rolls,...) or is it simply an inevitable consequence of the algorithm I am using (e.g., can the renderer not handle five divisions per pixel)?
Either way, why does this only show up when I zoom in? Is the renderer trying to render the whole zoomed-in image internally in case I scroll around?

You don't say which pdf viewer you are using, but different viewers will be optimised very differently.
Shadings are designed to be interpolated, i.e. selected coordinates within the shading should be evaluated using your PS evaluator function. The vast majority of the pixels between these should be linearly interpolated. The selection of the evaluated coordinates depends on the current smoothness. In PDF that is selected using the SM entry of an ExtGState dictionary. The shading area will be decomposed until small regions are detected as being "smooth" relative to the SM value. You could try changing SM; the default in Acrobat is 0.02, but YMMV.
If your shading is taking a long time a few things could be happening. The function could be highly non-linear; exponential functions and functions with a sharp edge can prevent the detection of linearity until the regions become very small, possibly as small as 1 pixel. Alternatively, your pdf viewer just isn't optimised for shading. Or quite possibly both of these. FWIW. I can't say if this PS calculator function is a bad fit for decomposition because I can't tell what it's doing.

How can I programatically generate venn diagram images with labels on top of the image?

I'm trying to generate Venn diagrams for a pdf report, with text on top of the distinct regions.
We're using htmldoc to generate pdfs, which precludes text on top of background images.
We use the google charts api for other images, but their Venn diagrams don't support text on top of the diagram (from what I can tell).
The easiest path would be some way to generate an image of the venn on our server using a 3rd party library, and then link the image into the document, I just don't know any software packages that would support our use case.
Any links/pointers would be appreciated.

Here's some example code. This seems like a decent tutorial:
http://paulbourke.net/dataformats/postscript/
If you're on Linux, you can use the gv command to view it. There are various utilities to convert it to PDF too; ps2pdf on Linux, and I think Acrobat Distiller on Windows.
%!PS-Adobe-3.0 EPSF-3.0
%%BoundingBox: 0 0 144 144
% CenterText - paint text centered on x with baseline on y
% x y s CenterText
/CenterText
{
<< >> begin
/s exch def /y exch def /x exch def
newpath x s stringwidth pop 2 div sub y moveto s show
end
} bind def
2 setlinewidth
54 72 36 0 360 arc stroke
90 72 36 0 360 arc stroke
/Helvetica 10 selectfont
36 72 (A) CenterText
108 72 (B) CenterText
72 72 (A^B) CenterText
Here's the three-circle one. It works but I don't vouch for the quality of the coding, I haven't done any serious PS code in years.
%!PS-Adobe-3.0 EPSF-3.0
%%BoundingBox: 0 0 216 216
% CenterText - paint text centered on x with baseline on y
% x y s CenterText
/CenterText
{
<< >> begin
/s exch def /y exch def /x exch def
newpath x s stringwidth pop 2 div sub y moveto s show
end
} bind def
% Set center of bounding box at 0,0 and rotate 90 degrees cw
108 108 translate
gsave
180 rotate
% Draw 3 circles at 120-degree intervals
/ct 3 def
/offset 36 def
/radius 60 def
0 1 ct 1 sub % for
{
gsave
360 mul ct div rotate
0 offset translate
0 0 radius 0 360 arc stroke
grestore
} for
grestore
/Helvetica 10 selectfont
-54 36 (A) CenterText
54 36 (B) CenterText
0 -72 (C) CenterText
0 36 (A^B) CenterText
-36 -24 (A^C) CenterText
36 -24 (B^C) CenterText
0 -6 (A^B^C) CenterText

Here's a two-cell diagram in pic. I found ellipses easier to squeeze the text into than circles.
.PS
ellipse
"A" at 1st ellipse - (.2, 0)
ellipse with .w at 1st ellipse.e - (.4, 0)
"B" at 2nd ellipse + (.2, 0)
"A^B" at 1st ellipse.e - (.2, 0)
.PE
And a three-cell diagram:
.PS
ellipsewid = 1
ellipseht = .75
ellipse
ellipse at 1st ellipse + (.5, 0)
ellipse at 1st ellipse + (.25, .35)
"A" at 1st ellipse - (.2, .1)
"B" at 2nd ellipse + (.2, -.1)
"C" at 3rd ellipse + (0, .1)
"A^B" at 3rd ellipse - (0, .5)
"A^C" at 3rd ellipse - (.3, .1)
"B^C" at 3rd ellipse + (.3, -.1)
"A^B^C" at 3rd ellipse - (0, .25)
.PE
Convert to ps: groff -p ven.pic > ven.ps.
I haven't found a nifty way to produce the .eps, yet. Stay tuned! Edit: sudo apt-get install ps2eps!
Edit:
It's much easier to construct everything relative to the compass-points on a central invisible box.
Two-cell:
.PS
box invis "A^B"
ellipse wid 1st box.wid*1.5 at 1st box.w + (.1, 0)
ellipse wid 1st box.wid*1.5 at 1st box.e - (.1, 0)
"A " at 2nd ellipse.w rjust
" B" at 1st ellipse.e ljust
.PE
Three-cell:
.PS
box invis "A^B^C" below wid .5 ht .3
ellipse at 1st box.sw
ellipse at 1st box.se
ellipse at 1st box.n
"A " at 2st ellipse.w rjust below
" B" at 1nd ellipse.e ljust below
"C" "" "" at 3rd ellipse above
"A^B" at 3rd ellipse.s below
"A^C " at 2nd ellipse.nw rjust
" B^C" at 1nd ellipse.ne ljust
.PE
Still requires tweaking, though. But there are far fewer numbers! The width and height of the box define an isosceles triangle used for placing the centers of the ellipses.
Edit:
This last idea suggests a method for making a four-cell diagram. I had to shrink the font for the wedges.
.PS
box invis "A^B^C^D" wid .65 ht .5
ellipsewid = 2
ellipseht = 1.25
ellipse at 1st box.ne
ellipse at 1st box.se
ellipse at 1st box.sw
ellipse at 1st box.nw
"A" at 1st box.ne + (.4, .4)
"B" at 1st box.se + (.4, -.4)
"C" at 1st box.sw - (.4, .4)
"D" at 1st box.nw - (.4, -.4)
"A^B" at 1st box.e + (.4, 0) ljust
"B^C" at 1st box.s - (0, .2) below
"C^D" at 1st box.w - (.4, 0) rjust
"A^D" at 1st box.n + (0, .2) above
"\s-1A^B^D\s+1" at 1st box.ne + (.15, .03)
"\s-1A^B^C\s+1" at 1st box.se + (.15, -.03)
"\s-1B^C^D\s+1" at 1st box.sw - (.15, .03)
"\s-1A^C^D\s+1" at 1st box.nw - (.15, -.03)
.PE
Here's a jpg of the output. I might have lost some resolution when cropping to the box.

Having gone as far as is practical with pic, postscript really is the natural choice for this.
Alright, I haven't solved the labelling yet, but here's the generalized diagram. Turns out you just place the centers on the vertices of the regular polygon for that n.
But some of those spaces get reeeally small. So I'm thinking about some pattern of labelled arcs, spiralling out. Perhaps the radius of the label should reflect the depth of the designated partition...
Edit: I've redesigned the code, so there's a pretty 15-diagram page in revision 1.
Edit: I just got schooled by Wikipedia. It turns out that what I've been calling a 4-cell Venn diagram is not, in fact, a Venn diagram at all.
It's an Euler diagram. The problem is that nowhere can you get the intersection of two regions alone from opposite sides of the diagram. The real 4-cell diagram gets weird no matter how you do it. So the scope of the answer is reduced from what I've pursued in the last two edits.
For the 2-circle diagram, the best placement I can find is defined by the intersection of the radii from the diagram center through the circle centers to the edges, with defining circles placed on the circle centers.
For the 3-circle diagram, the best placement I can find is defined by the intersections of the radii (and rotated radii) with rotated triangle approximations to the circles and unrotated triangles, respectively.
A version of the code can be found in the previous revision of this answer. I posted an expanded version to usenet in the thread geodesic flowers. But since it's overkill for this answer (and still doesn't actually draw any labels or return their locations), and underkill for real generalized Venn diagrams, I'll need to trim most of the baggage before subjecting this question to any more long blocks of code.
Edit: I think I've got this just about licked. This program contains only those parts of the previous program necessary to produce 2- and 3- Venn diagrams with little circles at the "ideal" label locations. For the 2-cell diagram the solution really is trivial (double the defining radius). For the 3-cell diagram the solution is cos(60) * circle-radius + defining radius, either multiplying first or adding first.
Edit: At long last, labels. There was some last-minute trickiness required since I used matrix rotations to find the points. That meant that when I tried printing labels, they were all at strange orientations. So the "centershow" procedure has a little more to it that usual. It has to reset the scaling portions of the current transformation matrix while leaving the translation components alone. That means somewhere earlier in the execution we need to stash an oriented matrix at the correct scale.
(Edit: Another way to get the text upright without modifying a matrix would be to transform the location to device coordinates, install the oriented matrix (at any scale or translation!), itransform the point back to the "new" user coordinates, and then moveto.)
%!
%cp:xy rad circ -
/circ {
currentpoint newpath
2 copy 5 -1 roll 0 360 arc stroke
moveto
} def
%rad n poly [pointlist]
/poly {
1 dict begin exch /prad exch def
[ exch
0 exch 360 exch div 359.9 {
[ exch
dup cos prad mul exch
sin prad mul
]
} for
]
end
} def
%[list] rad subcirc -
/subcirc {
1 dict begin /crad exch def gsave
currentpoint translate
{ aload pop moveto crad circ } forall
grestore end
} def
%[list] locate -
%draw little circles around each point
/locate {
gsave
currentpoint translate
0 0 moveto 5 circ
{ aload pop moveto 5 circ } forall
grestore
} def
%cp:xy (string) cshow -
/cshow {
gsave
currentpoint translate %0 0 moveto
matrix currentmatrix
dup 0 normal 0 4 getinterval %reset rotation, keep translation
putinterval setmatrix
dup true charpath flattenpath pathbbox
3 -1 roll sub 3 1 roll sub
2 div exch -2 div moveto show
grestore
} def
%[list] [labels] label -
%print label text centered on each point
/label {
gsave
currentpoint translate
0 1 3 index length 1 sub {
2 index 1 index get aload pop moveto
2 copy get cshow pop
} for
pop pop
grestore
} def
%[x0 y0] [x1 y1] pyth-dist radius
/pyth-dist {
aload pop 3 -1 roll aload pop % x1 y1 x0 y0
exch % x1 y1 y0 x0
3 1 roll sub dup mul % x1 x0 dy^2
3 1 roll sub dup mul % dy^2 dx^2
add sqrt
} def
/rotw { 180 n div rotate } def
%cp:xy rad n venn -
%make the circles intersect the opposite point of def poly
/venn {
3 dict begin /n exch def /vrad exch def
vrad n poly
dup 0 get exch
dup length 2 idiv get
pyth-dist /crad exch def
%vrad crad n ven
vrad n poly crad subcirc %the Venn circles
[[0 0]] [(All)] label
n 2 eq {
%vrad 2 mul n poly locate
vrad 2 mul n poly
[(A) (B)] label
}{
n 3 eq {
%vrad crad 60 cos mul add n poly locate
vrad crad 60 cos mul add n poly
[ (A) (B) (C) ] label
%gsave rotw vrad crad add 60 cos mul n poly locate grestore
gsave rotw vrad crad add 60 cos mul n poly
[ (A^B) (B^C) (A^C) ] label
grestore
} if
} ifelse
end
} def
/normal matrix currentmatrix def
/in{72 mul}def
/Palatino-Roman 20 selectfont
4.25 in 8.25 in moveto
1 in 2 venn
4.25 in 3.5 in moveto
1 in 3 venn
showpage
And ghostscript produces (gs -sDEVICE=jpeggray -sOutputFile=venlabel.jpg v4.ps):

Why not just use LaTeX?
Much simpler then manually writing up ps:
\tikz \fill[even odd rule] (0,0) circle (1) (1,0) circle (1);

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Splitting single page into two pages with ghostscript - pdf

I have a pdf with something like presentations slides and multiple slides per page. How can I use ghostscript to split the file so that there is one slide per page?

Related

decoding base64 encoded text with POSIX awk

Adam optimizer in MatConvNet

Split large PDF image into multiple pages using pstools

Why is this postscript calculator (type 4) shading so slow to render at high zoom?

How can I programatically generate venn diagram images with labels on top of the image?

Categories

Resources