I want to put two landscape A5 pages (.ps or .pdf) on one portrait A4 page (.ps or .pdf) - pdf

I created a document in A5 size and managed to reshuffle the pages of the produced .pdf output with psbook, so that the pages have the perfect order for a booklet.
There are lots of hints that the next step would work with psnup, but that's not true. I also tried a2ps and pstops with various options. The last thing I found was bookletimposer (Ubuntu), but it has failed as well.
It seems so easy, because no scaling and no rotating is involved. Just put one page # position 0,0 and the following on # 0,14,85cm (half the height of the A4 page).
input:
+----------+
| this is |
| page one |
+----------+
+----------+
| this is |
| page two |
+----------+
output:
+----------+
| this is |
| page one |
| |
| this is |
| page two |
+----------+

assuming you had a multipage pdf file, let's say consisting in 16 sequentially ordered pages:
1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16
and you already reordered the pages to have this new sequence:
16,1,2,15,14,3,4,13,12,5,6,11,10,7,8,9
and these pages are oriented in landscape mode
you can use, on this last file,
Multivalent.jar (last free version with pdf tools included)
https://rg.to/file/c6bd7f31bf8885bcaa69b50ffab7e355
with this syntax:
java -cp /path/to.../Multivalent.jar tool.pdf.Impose -dim 1x2 -paper A4 A5-already-huffled_for_imposing.pdf
since you have already the pages rightly shuffled for imposing:
16,1,2,15,14,3,4,13,12,5,6,11,10,7,8,9
Multivalent will pick
pages 16,1 ...and put on same page (front 1)
pages 2,15 ...and put on same page (back 1)
and so on... achieving the goal to create a perfectly imposed booklet

Related

pdfposter: crop / tile / posterize long PDF to multi pages from Safari Export as PDF

When I save a webpage with Safari's > File > Export as PDF...
Safari renders a long PDF in several (long) pages.
Here a screenshot of Preview's Crop Inspector
The 200 inch height appears to be a distiller’s limit for PostScript, based on the Windows printer driver limitation.
Before saving I set Safari > Develop > Show Responsive Design Mode
for my iPad mini with a resolution of 768 x 1024 (portrait)
The beauty of this feature (unlike File > Print) is that it can be used with Safari in Responsive Design Mode, so an exact snapshot of the webpage (responsive layout, images and even dark modes) gets exported to PDF, without any print margins and such.
--> Now I want to cut / tile / crop / posterize / de-impose (or whatever one should call it) these [200 inch or 14400 pt long] long pages into more manageable page sizes.
So with Responsive Design Mode set to iPad mini (768 x 1024) I would like to cut to the same dimensions; a mediabox / cropbox of 768pt x 1024pt
I tried already various command line tools like BRISS, PDFTILECUT, PLAKATIV, MUPDF ecc.
Some libraries like the Python binding PYMUPDF seem to convert the PDF to an image first to get it cut, thus loosing all the hyperlinks = NO go
Until now I get a decent result with PDFPOSTER using following command line; I have set the height of the --poster-size BOX to something really long 100000pt
pdfposter \
-v \
-m 768x1024pt \
-p 768x1000000pt \
Safari-Export-as-PDF-IN.pdf \
Safari-Export-as-PDF-OUT.pdf
That works for all the pages, one after the other, but I can’t find a solution to set the Y coordinates of the first page to 0
The pages always seem to start from the bottom of the poster size, leaving space at the top..
Example PDF: >>> download here <<<
--------- =========
| | | xxxxx |
========= | xxxxx |
| xxxxx | | xxxxx |
--------- ---------
| xxxxx | | xxxxx |
| xxxxx | -> | xxxxx |
| xxxxx | | xxxxx |
--------- ---------
| xxxxx | | xxxxx |
| xxxxx | =========
| xxxxx | | |
========= ---------
OK with a lot of testing I found out something: PDFPOSTER does not like PDF's generated from HTML
I first made a 100x200px box in Illustrator and exported that to a PDF.
than run:
pdfposter -m 100x80pt -p 100x99999pt in-100x200.pdf out-100x200.pdf
This gives me a very nice result, the first page has a Crop Box of 100x40px and a Media Box of 100x80px, the rest of the pages Crop & Media Boxes of 100x80px
Than I made a very very basic HTML (left even out the doctype)
<html>
<body style="background-color:white;margin:0;padding:0">
<div style="background-color:gold;width:100%;height:1500px"></div>
</body>
</html>
and run:
pdfposter -m 767x1024pt -p 767x99999pt cleanHTML-IN.pdf cleanHTML-OUT.pdf
And get the first page with a white margin in the top, like in my initial problem.
So this is actually the Crop Box which does not seem to be set when using a PDF generated from HTML?
UPDATE:
Thanks to PDFPOSTER I have found my way to PYPDF.
Basically you define:
reader = PdfReader('in.pdf')
writer = PdfWriter()
I than loop over the pages page_x = reader.pages[i] from the input file, set mediaboxes for each "new" page (like photocopying) and append it to the writer writer.add_page(page_x)
Finally write out with writer.write()
Regarding corrupt PDF files, PIKEPDF a Python wrapper around QPDF features automatic repairs just by opening and saving the file.
# pikepdf / pikepdf:
# https://github.com/pikepdf/pikepdf
# https://pikepdf.readthedocs.io/en/latest/
#
# py-pdf / pypdf:
# https://github.com/py-pdf/pypdf
# https://pypdf.readthedocs.io/en/latest/
import pikepdf, os, math
from pypdf import PdfWriter, PdfReader
# define, could become arguments
pagecut_h = 1024
inputfile = 'in.pdf'
outputfile = 'out.pdf'
# repair with PikePDF
print("repairing {0} .....".format(inputfile))
pdf = pikepdf.Pdf.open(inputfile)
pdf.save(inputfile + '.tmp')
pdf.close()
os.unlink(inputfile)
os.rename(inputfile + '.tmp', inputfile)
reader = PdfReader(inputfile)
writer = PdfWriter()
pages_n = len(reader.pages)
print('reading ..... {} input pages'
.format(pages_n))
for i in range(pages_n):
page = reader.pages[i]
page_w = page.mediabox.width
page_h = page.mediabox.height
print('input page {}/{} [w:{}, h:{}]'
.format(i + 1, pages_n, page_w, page_h))
if (page_h <= pagecut_h):
print('> input page height is smaller than the cut height')
print('appending original input page [w:{}, h:{}]'
.format(page_w, page_h))
writer.add_page(page)
else:
pagesfull_n = math.floor(page_h / pagecut_h)
print('calculating .......... {} output pages'
.format(pagesfull_n + 1))
# first FULL page
page.mediabox.left = 0
page.mediabox.right = page_w
page.mediabox.top = page_h
page.mediabox.bottom = page_h - pagecut_h
print('appending output page 1/{} [w:{}, h:{}]'.
format((pagesfull_n + 1), page_w, pagecut_h))
writer.add_page(page)
# other FULL pages
for j in range(pagesfull_n - 1):
page.mediabox.top -= pagecut_h
page.mediabox.bottom -= pagecut_h
print('appending output page {}/{} [w:{}, h:{}]'
.format((j + 2), (pagesfull_n + 1), page_w, pagecut_h))
writer.add_page(page)
# LAST (not full) page
pagelast_h = (page_h - (pagecut_h * pagesfull_n))
page.mediabox.top = pagelast_h
page.mediabox.bottom = 0
print('appending last output page {}/{} [w:{}, h:{}]'
.format((pagesfull_n + 1), (pagesfull_n + 1), page_w, pagelast_h))
writer.add_page(page)
with open(outputfile, 'wb') as fp:
writer.write(fp)

In Cucumber 'Feature file '-> 'Examples' , how to set path for CSV file

My sample feature file rather than giving data from Examples I want it to pass from csv how to achieve that can anyone help me out.
Feature file:
Feature: Rocky Search Status
Scenario Outline: Rocky Search Status with Filters
Given Open firefox and start application for Rocky Search Status
When User enters "<price_right>" and "<Carat_left>" and "<Color_right_param>" and "<Cut_right_param>" and "<Clarity_right_param>"
Then Message displayed Rocky Search Status Successful
Then Application should be closed after Rocky Search Status
Examples:
| price_right | Carat_left | Color_right_param | Cut_right_param | Clarity_right_param |
| 10000 | 1.5 | 80 | 180 | 84 |
I want the data values to be defined in CSV outside the Project.
Not directly. However, you can have a record ID (or test case number) of sorts in the Example table. You can then retrieve records from the CSV in the step code based on the ID.
Scenario Outline: Rocky Search Status with Filters
Given Open firefox and start application for Rocky Search Status
When User enters data specified in test case <tcn>
Then Message displayed Rocky Search Status Successful
Then Application should be closed after Rocky Search Status
Examples:
|tcn|
|1 |
|2 |
The "When" step will use the tcn to retrieve the corresponding record from the CSV.
You can't with Gherkin. What you can do is to give your CSV file an appropriate name, refer to the name inside your Gherkin step, and then load and read the file inside your step definition.
abc.feature
Feature: A
Scenario: 1
Given data at abc.csv
...
step-definitions.js
Given(/^data at (.*)$/, function (fileName) {
const data = jsonfile.readFileSync(`${__dirname}/${fileName}`);
// iterate over data
})

Can GraphDB load 10 million statements with OWL reasoning?

I am struggling to load most of the Drug Ontology OWL files and most of the ChEBI OWL files into GraphDB free v8.3 repository with Optimized OWL Horst reasoning on.
is this possible? Should I do something other than "be patient?"
Details:
I'm using the loadrdf offline bulk loader to populate an AWS r4.16xlarge instance with 488.0 GiB and 64 vCPUs
Over the weekend, I played around with different pool buffer sizes and found that most of these files individually load fastest with a pool buffer of 2,000 or 20,000 statements instead of the suggested 200,000. I also added -Xmx470g to the loadrdf script. Most of the OWL files would load individually in less than one hour.
Around 10 pm EDT last night, I started to load all of the files listed below simultaneously. Now it's 11 hours later, and there are still millions of statements to go. The load rate is around 70/second now. It appears that only 30% of my RAM is being used, but the CPU load is consistently around 60.
are there websites that document other people doing something of this scale?
should I be using a different reasoning configuration? I chose this configuration as it was the fastest loading OWL configuration, based on my experiments over the weekend. I think I will need to look for relationships that go beyond rdfs:subClassOf.
Files I'm trying to load:
+-------------+------------+---------------------+
| bytes | statements | file |
+-------------+------------+---------------------+
| 471,265,716 | 4,268,532 | chebi.owl |
| 61,529 | 451 | chebi-disjoints.owl |
| 82,449 | 1,076 | chebi-proteins.owl |
| 10,237,338 | 135,369 | dron-chebi.owl |
| 2,374 | 16 | dron-full.owl |
| 170,896 | 2,257 | dron-hand.owl |
| 140,434,070 | 1,986,609 | dron-ingredient.owl |
| 2,391 | 16 | dron-lite.owl |
| 234,853,064 | 2,495,144 | dron-ndc.owl |
| 4,970 | 28 | dron-pro.owl |
| 37,198,480 | 301,031 | dron-rxnorm.owl |
| 137,507 | 1,228 | dron-upper.owl |
+-------------+------------+---------------------+
#MarkMiller you can take a look at the Preload tool, which is part of GraphDB 8.4.0 release. It's specially designed to handle large amount of data with constant speed. Note that it works without inference, so you'll need to load your data and then change the ruleset and reinfer the statements.
http://graphdb.ontotext.com/documentation/free/loading-data-using-preload.html
Just typing out #Konstantin Petrov's correct suggestion with tidier formatting. All of these queries should be run in the repository of interest... at some point in working this out, I misled myself into thinking that I should be connected to the SYSTEM repo when running these queries.
All of these queries also require the following prefix definition
prefix sys: <http://www.ontotext.com/owlim/system#>
This doesn't directly address the timing/performance of loading large datasets into an OWL reasoning repository, but it does show how to switch to a higher level of reasoning after loading lots of triples into a no-inference ("empty" ruleset) repository.
Could start by querying for the current reasoning level/rule set, and then run this same select statement after each insert.
SELECT ?state ?ruleset {
?state sys:listRulesets ?ruleset
}
Add a predefined ruleset
INSERT DATA {
_:b sys:addRuleset "rdfsplus-optimized"
}
Make the new ruleset the default
INSERT DATA {
_:b sys:defaultRuleset "rdfsplus-optimized"
}
Re-infer... could take a long time!
INSERT DATA {
[] <http://www.ontotext.com/owlim/system#reinfer> []
}

Add Custom Header in Pentaho

I have data that looks like the following:
assetnum | assetdesc
123 | sampledesc
432 | sample desc2
I want to insert another row with four fields so it looks like the following:
SYSNAME | OBJSTRUC | AddChange | En
assetnum | assetdesc
123 | sampledesc
432 | sample desc2
However I am unsure how to do this. Does anyone know how?
I have tried generating rows but I am unsure how to merge so that it looks like this. I have also thought of adding headers but I am unsure how to specify the header (without it being created automatically) I am quite new to Pentaho.
Thanks.
Here is a hack. Assume StepA writes the actual data into a file fileA. Before writing anything into your fileA have a Text file output step and in the content tab, Add Ending line of file field, enter the custom row you need to insert. Since the file is empty at the beginning, your last line will become the first line. Once it is done, you can write the other data as per your original source using Append flag. To set the dependency, use the Block until steps finish to block the actual write in StepA.

Font metrics for the "base 14" fonts in the PDF specification

I've been writing software to parse content from PDFs, specifically text broken into regions. For this I need font metrics such as glyph displacements, font-wide ascent, descent and glyph bounding box, etc. In short, the type of metrics that should be available in the FontDescriptor dictionary of a font definition in a PDF.
Unfortunately a FontDescriptor doesn't have to be included for fonts whose base font is one of the "base 14" set of standard fonts.
Where can I find or how can I generate font metrics for the base 14 fonts?
Meanwhile I've found these links on an Adobe Website, which contain the information asked for:
Font Metrics for Base 14 fonts (Windows)
Font Metrics for Base 14 fonts (UNIX)
Font Metrics for Base 14 fonts (Macintosh)
On Linux (and probably on Mac OS X too) you can easily use the font2afm script which creates font metrics files from PostScript or TrueType fonts (.pfa, .pfb, .ttf, .otf).
If you don't have the original Base 14 available, you can use the clones provided by Ghostscript. These clones may use completely different font names, but they can only be clones by using the very same metrics for each glyph.
Here is a Ghostscript commandline, that lists you all the base 14 fontnames:
Windows:
gswin32c.exe -q -dNODISPLAY -dSAFER -c "systemdict /.standardfonts get == quit"
Linux/Unix/Mac:
gs -q -dNODISPLAY -dSAFER -c "systemdict /.standardfonts get == quit"
In recent versions of Ghostscript, the filenames for cloned fonts usually match the clone's fontname. Older GS versions may have used more cryptic nameing conventions. Here is the list of fontname mappings to the cloned fonts:
+===============+========================+==========================+
| Base 14 name | Ghostscript name | Font filename (older GS) |
+===============+========================+==========================+
| Courier | | |
| standard | NimbusMonL-Regu | n022003l.pfb |
| bold | NimbusMonL-Bold | n022004l.pfb |
| italic | NimbusMonL-ReguObli | n022023l.pfb |
| bolditalic | NimbusMonL-BoldObli | n022024l.pfb |
+---------------+------------------------+--------------------------+
| Helvetica | | |
| standard | NimbusSanL-Regu | n019003l.pfb |
| bold | NimbusSanL-Bold | n019004l.pfb |
| italic | NimbusSanL-ReguItal | n019023l.pfb |
| bolditalic | NimbusSanL-BoldItal | n019024l.pfb |
+---------------+------------------------+--------------------------+
| Times-Roman | | |
| standard | NimbusRomNo9L-Regu | n021003l.pfb |
| bold | NimbusRomNo9L-Medi | n021004l.pfb |
| italic | NimbusRomNo9L-ReguItal | n021023l.pfb |
| bolditalic | NimbusRomNo9L-MediItal | n021024l.pfb |
+---------------+------------------------+--------------------------+
| Symbol | StandardSymL | s050000l.pfb |
+---------------+------------------------+--------------------------+
| ZapfDingbats | Dingbats | d050000l.pfb |
+---------------+------------------------+--------------------------+
You can download the Ghostscript fonts from many places on the 'net (f.e. from here). Then run f.e. this command:
font2afm StandardSymL.ttf
and the resulting file, StandardSymL.afm should contain the font metrics for the Symbol font in standard .afm format....
I'm sure those font metrics are widely available. For instance, in my Ubuntu they're in /usr/share/fonts/type1/gsfonts/ -- maybe you don't recognize some of the font names, but they're metrically compatible to Helvetica etc.