PhantomJS: exported PDF to stdout - pdf

Is there a way to trigger the PDF export feature in PhantomJS without specifying an output file with the .pdf extension? We'd like to use stdout to output the PDF.

You can output directly to stdout without a need for a temporary file.
page.render('/dev/stdout', { format: 'pdf' });
See here for history on when this was added.
If you want to get HTML from stdin and output the PDF to stdout, see here

Sorry for the extremely long answer; I have a feeling that I'll need to refer to this method several dozen times in my life, so I'll write "one answer to rule them all". I'll first babble a little about files, file descriptors, (named) pipes, and output redirection, and then answer your question.
Consider this simple C99 program:
#include <stdio.h>
#include <stdlib.h>
int main(int argc, char* argv[])
{
if (argc < 2) {
printf("Usage: %s file_name\n", argv[0]);
return 1;
}
FILE* file = fopen(argv[1], "w");
if (!file) {
printf("No such file: %s\n", argv[1]);
return 2;
}
fprintf(file, "some text...");
fclose(file);
return 0;
}
Very straightforward. It takes an argument (a file name) and prints some text into it. Couldn't be any simpler.
Compile it with clang write_to_file.c -o write_to_file.o or gcc write_to_file.c -o write_to_file.o.
Now, run ./write_to_file.o some_file (which prints into some_file). Then run cat some_file. The result, as expected, is some text...
Now let's get more fancy. Type (./write_to_file.o /dev/stdout) > some_file in the terminal. We're asking the program to write to its standard output (instead of a regular file), and then we're redirecting that stdout to some_file (using > some_file). We could've used any of the following to achieve this:
(./write_to_file.o /dev/stdout) > some_file, which means "use stdout"
(./write_to_file.o /dev/stderr) 2> some_file, which means "use stderr, and redirect it using 2>"
(./write_to_file.o /dev/fd/2) 2> some_file, which is the same as above; stderr is the third file descriptor assigned to Unix processes by default (after stdin and stdout)
(./write_to_file.o /dev/fd/5) 5> some_file, which means "use your sixth file descriptor, and redirect it to some_file"
In case it's not clear, we're using a Unix pipe instead of an actual file (everything is a file in Unix after all). We can do all sort of fancy things with this pipe: write it to a file, or write it to a named pipe and share it between different processes.
Now, let's create a named pipe:
mkfifo my_pipe
If you type ls -l now, you'll see:
total 32
prw-r--r-- 1 pooriaazimi staff 0 Jul 15 09:12 my_pipe
-rw-r--r-- 1 pooriaazimi staff 336 Jul 15 08:29 write_to_file.c
-rwxr-xr-x 1 pooriaazimi staff 8832 Jul 15 08:34 write_to_file.o
Note the p at the beginning of second line. It means that my_pipe is a (named) pipe.
Now, let's specify what we want to do with our pipe:
gzip -c < my_pipe > out.gz &
It means: gzip what I put inside my_pipe and write the results in out.gz. The & at the end asks the shell to run this command in the background. You'll get something like [1] 10449 and the control gets back to the terminal.
Then, simply redirect the output of our C program to this pipe:
(./write_to_file.o /dev/fd/5) 5> my_pipe
Or
./write_to_file.o my_pipe
You'll get
[1]+ Done gzip -c < my_pipe > out.gz
which means the gzip command has finished.
Now, do another ls -l:
total 40
prw-r--r-- 1 pooriaazimi staff 0 Jul 15 09:14 my_pipe
-rw-r--r-- 1 pooriaazimi staff 32 Jul 15 09:14 out.gz
-rw-r--r-- 1 pooriaazimi staff 336 Jul 15 08:29 write_to_file.c
-rwxr-xr-x 1 pooriaazimi staff 8832 Jul 15 08:34 write_to_file.o
We've successfully gziped our text!
Execute gzip -d out.gz to decompress this gziped file. It will be deleted and a new file (out) will be created. cat out gets us:
some text...
which is what we expected.
Don't forget to remove the pipe with rm my_pipe!
Now back to PhantomJS.
This is a simple PhantomJS script (render.coffee, written in CoffeeScript) that takes two arguments: a URL and a file name. It loads the URL, renders it and writes it to the given file name:
system = require 'system'
renderUrlToFile = (url, file, callback) ->
page = require('webpage').create()
page.viewportSize = { width: 1024, height : 800 }
page.settings.userAgent = 'Phantom.js bot'
page.open url, (status) ->
if status isnt 'success'
console.log "Unable to render '#{url}'"
else
page.render file
delete page
callback url, file
url = system.args[1]
file_name = system.args[2]
console.log "Will render to #{file_name}"
renderUrlToFile "http://#{url}", file_name, (url, file) ->
console.log "Rendered '#{url}' to '#{file}'"
phantom.exit()
Now type phantomjs render.coffee news.ycombinator.com hn.png in the terminal to render Hacker News front page into file hn.png. It works as expected. So does phantomjs render.coffee news.ycombinator.com hn.pdf.
Let's repeat what we did earlier with our C program:
(phantomjs render.coffee news.ycombinator.com /dev/fd/5) 5> hn.pdf
It doesn't work... :( Why? Because, as stated on PhantomJS's manual:
render(fileName)
Renders the web page to an image buffer and save it
as the specified file.
Currently the output format is automatically set based on the file
extension. Supported formats are PNG, JPEG, and PDF.
It fails, simply because neither /dev/fd/2 nor /dev/stdout end in .PNG, etc.
But no fear, named pipes can help you!
Create another named pipe, but this time use the extension .pdf:
mkfifo my_pipe.pdf
Now, tell it to simply cat its inout to hn.pdf:
cat < my_pipe.pdf > hn.pdf &
Then run:
phantomjs render.coffee news.ycombinator.com my_pipe.pdf
And behold the beautiful hn.pdf!
Obviously you want to do something more sophisticated that just cating the output, but I'm sure it's clear now what you should do :)
TL;DR:
Create a named pipe, using ".pdf" file extension (so it fools PhantomJS to think it's a PDF file):
mkfifo my_pipe.pdf
Do whatever you want to do with the contents of the file, like:
cat < my_pipe.pdf > hn.pdf
which simply cats it to hn.pdf
In PhantomJS, render to this file/pipe.
Later on, you should remove the pipe:
rm my_pipe.pdf

As pointed out by Niko you can use renderBase64() to render the web page to an image buffer and return the result as a base64-encoded string.But for now this will only work for PNG, JPEG and GIF.
To write something from a phantomjs script to stdout just use the filesystem API.
I use something like this for images :
var base64image = page.renderBase64('PNG');
var fs = require("fs");
fs.write("/dev/stdout", base64image, "w");
I don't know if the PDF format for renderBase64() will be in a future version of phanthomjs but as a workaround something along these lines may work for you:
page.render(output);
var fs = require("fs");
var pdf = fs.read(output);
fs.write("/dev/stdout", pdf, "w");
fs.remove(output);
Where output is the path to the pdf file.

I don't know if it would address your problem, but you may also check the new renderBase64() method added to PhantomJS 1.6: https://github.com/ariya/phantomjs/blob/master/src/webpage.cpp#L623
Unfortunately, the feature is not documented on the wiki yet :/

Related

Problems getting two output files in Nextflow

Hello all!
I´m trying to write a small Nextflow pipeline that runs vcftools comands in 300 vcf´s. The pipe takes four inputs: vcf, pop1, pop2 and a .txt file, and would have to generate two outputs: a .log.weir.fst and a .log.log file. When i run the pipeline, it only gives the .log.weir.fst files but not the .log files.
Here´s my process definition:
process fst_calculation {
publishDir "${results_dir}/fst_results_pop1_pop2/", mode:"copy"
input:
file vcf
file pop_1
file pop_2
file mart
output:
path "*.log.*"
"""
while read linea
do
echo "[DEBUG] working in line: \$linea"
inicio=\$(echo "\$linea" | cut -f3)
final=\$(echo "\$linea" | cut -f4)
cromosoma=\$(echo "\$linea" | cut -f1)
segmento=\$(echo "\$linea" | cut -f5)
vcftools --vcf ${vcf} \
--weir-fst-pop ${pop_1} \
--weir-fst-pop ${pop_2} \
--out \$inicio.log --chr \$cromosoma \
--from-bp \$inicio --to-bp \$final
done < ${mart}
"""
}
And here´s the workflow of my process
/* Load files into channel*/
pop_1 = Channel.fromPath("${params.fst_path}/pop_1")
pop_2 = Channel.fromPath("${params.fst_path}/pop_2")
vcf = Channel.fromPath("${params.fst_path}/*.vcf")
mart = Channel.fromPath("${params.fst_path}/*.txt")
/* Import modules
*/
include {
fst_calculation } from './nf_modules/modules.nf'
/*
* main pipeline logic
*/
workflow {
p1 = fst_calculation(vcf, pop_1, pop_2, mart)
p1.view()
}
When i check the work directory of the pipeline, I can see that the pipe only generates the .log.weir.fst. To verify if my code was wrong, i ran "bash .command.sh" in the working directory and this actually generates the two output files. So, is there a reason for not getting the two output files when i run the pipe?
I appreciate any help.
Note that bash .command.sh and bash .command.run do different things. The latter is basically a wrapper around the former that sets up the environment and stages the declared input files, among other things. If running the latter produces the unusual behavior, you'll need to dig deeper.
It's not completely clear to me what the problem is here. My guess is that vcftools might behave differently when run non-interactively, such that it sends it's logging to STDERR. If that's the case, the logging will be captured in a file called .command.err. To instead send that to a file, you can just redirect STDERR in the usual way, untested:
while IFS=\$'\\t' read -r cromosoma null inicio final segmento ; do
>&2 echo "[DEBUG] Working with: \${cromosoma}, \${inicio}, \${final}, \${segmento}"
vcftools \\
--vcf "${vcf}" \\
--weir-fst-pop "${pop_1}" \\
--weir-fst-pop "${pop_2}" \\
--out "\${inicio}.log" \\
--chr "\${cromosoma}" \\
--from-bp "\${inicio}" \\
--to-bp "\${final}" \\
2> "\${cromosoma}.\${inicio}.\${final}.log.log"
done < "${mart}"

How to merge multiple markdown files with pandoc while retaining cross document links?

I am trying to merge multiple markdown documents in a single folder together into a PDF with pandoc.
The documents may contain links to each other which should be browseable in the markdown format, e.g. through IntelliJ or within GitLab.
Simple example documents:
0001-file-a.md
---
id: 0001
---
# File a
This is a simple file without an [external link](www.stackoverflow.com).
0002-file-b.md
---
id: 0002
---
# File b
This file links to [another file](0001-file-a.md).
By default pandoc does not handle this case out of the box, e.g. when running the following command:
pandoc -s -f markdown -t pdf *.md -V linkcolor=blue -o test.pdf
It merges the files, creates a PDF and highlights the links correctly, but when clicking the second link it wants to open the file instead of jumping to the right location in the document.
This problem has been experienced by many before me but none of the solutions I found so far have solved it. The closest I came was with the help of this answer: https://stackoverflow.com/a/61908457/6628753
It defines a filter that is first applied to each file and then the resulting JSON files are merged.
I modified this filter to fit my needs:
Add the number of the file to the label of the top-level header
Prepend the top-level header to all other header labels
Remove .md from internal links
Here is the filter:
#!/usr/bin/env python3
from pandocfilters import toJSONFilter, Header, Link
import re
import sys
"""
Pandoc filter to convert internal links for multifile documents
"""
headerL1 = []
def fix_links(key, value, format, meta):
global headerL1
# Store level 1 headers
if key == "Header":
[level, [label, t1, t2], header] = value
if level == 1:
id = meta.get("id")
newlabel = f"{id['c'][0]['c']}-{label}"
headerL1 = [newlabel]
sys.stderr.write(f"\nGlobal header: {headerL1}\n")
return Header(level, [newlabel, t1, t2], header)
# Prepend level 1 header label to all other header labels
if level > 1:
prefix = headerL1[0]
newlabel = prefix + "-" + label
sys.stderr.write(f"Header label: {label} -> {newlabel}\n")
return Header(level, [newlabel, t1, t2], header)
if key == "Link":
[t1, linktext, [linkref, t4]] = value
if ".md" in linkref:
newlinkref = re.sub(r'.md', r'', linkref)
sys.stderr.write(f'Link: {linkref} -> {newlinkref}\n')
return Link(t1, linktext, [newlinkref, t4])
else:
sys.stderr.write(f'External link: {linkref}\n')
if __name__ == "__main__":
toJSONFilter(fix_links)
And here is a script that executes the whole thing:
#!/bin/bash
MD_INPUT=$(find . -type f | grep md | sort)
# Pass the markdown through the gitlab filters into Pandoc JSON files
echo "Filtering Gitlab markdown"
for file in $MD_INPUT
do
echo "Filtering $file"
pandoc \
--filter fix-links.py \
"$file" \
-t json \
-o "${file%.md}.json"
done
JSON_INPUT=$(find . -type f | grep json | sort)
echo "Generating LaTeX"
pandoc -s -f json -t latex $JSON_INPUT -V linkcolor=blue -o test.tex
echo "Generating PDF"
pandoc -s -f json -t pdf $JSON_INPUT -V linkcolor=blue -o test.pdf
Applying this script generates a PDF where the second link does not work at all.
Looking at the LaTeX code the problem can be solved by replacing the generated \href directive with \hyperlink.
Once this is done the linking works as expected.
The problem now is that this isn't done automatically by pandoc, which almost seems like a bug.
Is there a way to tell pandoc a link is internal from within the filter?
After running the filter it is non-trivial to fix the issue since there is no good way to differentiate internal and external links.

perl gunzip to buffer and gunzip to file have different byte orders

I'm using Perl v5.22.1, Storable 2.53_01, and IO::Uncompress::Gunzip 2.068.
I want to use Perl to gunzip a Storable file in memory, without using an intermediate file.
I have a variable $zip_file = '/some/storable.gz' that points to this zipped file.
If I gunzip directly to a file, this works fine, and %root is correctly set to the Storable hash.
gunzip($zip_file, '/home/myusername/Programming/unzipped');
my %root = %{retrieve('/home/myusername/Programming/unzipped')};
However if I gunzip into memory like this:
my $file;
gunzip($zip_file, \$file);
my %root = %{thaw($file)};
I get the error
Storable binary image v56.115 more recent than I am (v2.10)`
so the Storable's magic number has been butchered: it should never be that high.
However, the strings in the unzipped buffer are still correct; the buffer starts with pst which is the correct Storable header. It only seems to be multi-byte variables like integers which are being broken.
Does this have something to do with byte ordering, such that writing to a file works one way while writing to a file buffer works in another? How can I gunzip to a buffer without it ruining my integers?
That's not related to unzip but to using retrieve vs. thaw. They both expect different input, i.e. thaw expect the output from freeze while retrieve expects the output from store.
This can be verified with a simple test:
$ perl -MStorable -e 'my $x = {}; store($x,q[file.store])'
$ perl -MStorable=freeze -e 'my $x = {}; print freeze($x)' > file.freeze
On my machine this gives 24 bytes for the file created by store and 20 bytes for freeze. If I remove the leading 4 bytes from file.store the file is equivalent to file.freeze, i.e. store just added a 4 byte header. Thus you might try to uncompress the file in memory, remove the leading 4 bytes and run thaw on the rest.

Generating .gcda coverage files via QEMU/GDB

Executive summary: I want to use GDB to extract the coverage execution counts stored in memory in my embedded target, and use them to create .gcda files (for feeding to gcov/lcov).
The setup:
I can successfully cross-compile my binary, targeting my specific embedded target - and then execute it under QEMU.
I can also use QEMU's GDB support to debug the binary (i.e. use tar extended-remote localhost:... to attach to the running QEMU GDB server, and fully control the execution of my binary).
Coverage:
Now, to perform "on-target" coverage analysis, I cross-compile with
-fprofile-arcs -ftest-coverage. GCC then emits 64-bit counters to keep track of execution counts of specific code blocks.
Under normal (i.e. host-based, not cross-compiled) execution, when the app finishes __gcov_exit is called - and gathers all these execution counts into .gcdafiles (that gcov then uses to report coverage details).
In my embedded target however, there's no filesystem to speak of - and libgcov basically contains empty stubs for all __gcov_... functions.
Workaround via QEMU/GDB: To address this, and do it in a GCC-version-agnostic way, I could list the coverage-related symbols in my binary via MYPLATFORM-readelf, and grep-out the relevant ones (e.g. __gcov0.Task1_EntryPoint, __gcov0.worker, etc):
$ MYPLATFORM-readelf -s binary | grep __gcov
...
46: 40021498 48 OBJECT LOCAL DEFAULT 4 __gcov0.Task1_EntryPoint
...
I could then use the offsets/sizes reported to automatically create a GDB script - a script that extracts the counters' data via simple memory dumps (from offset, dump length bytes to a local file).
What I don't know (and failed to find any relevant info/tool), is how to convert the resulting pairs of (memory offset,memory data) into .gcda files. If such a tool/script exists, I'd have a portable (platform-agnostic) way to do coverage on any QEMU-supported platform.
Is there such a tool/script?
Any suggestions/pointers would be most appreciated.
UPDATE: I solved this myself, as you can read below - and wrote a blog post about it.
Turned out there was a (much) better way to do what I wanted.
The Linux kernel includes portable GCOV related functionality, that abstracts away the GCC version-specific details by providing this endpoint:
size_t convert_to_gcda(char *buffer, struct gcov_info *info)
So basically, I was able to do on-target coverage via the following steps:
Step 1
I added three slightly modified versions of the linux gcov files to my project: base.c, gcc_4_7.c and gcov.h. I had to replace some linux-isms inside them - like vmalloc,kfree, etc - to make the code portable (and thus, compileable on my embedded platform, which has nothing to do with Linux).
Step 2
I then provided my own __gcov_init...
typedef struct tagGcovInfo {
struct gcov_info *info;
struct tagGcovInfo *next;
} GcovInfo;
GcovInfo *headGcov = NULL;
void __gcov_init(struct gcov_info *info)
{
printf(
"__gcov_init called for %s!\n",
gcov_info_filename(info));
fflush(stdout);
GcovInfo *newHead = malloc(sizeof(GcovInfo));
if (!newHead) {
puts("Out of memory!");
exit(1);
}
newHead->info = info;
newHead->next = headGcov;
headGcov = newHead;
}
...and __gcov_exit:
void __gcov_exit()
{
GcovInfo *tmp = headGcov;
while(tmp) {
char *buffer;
int bytesNeeded = convert_to_gcda(NULL, tmp->info);
buffer = malloc(bytesNeeded);
if (!buffer) {
puts("Out of memory!");
exit(1);
}
convert_to_gcda(buffer, tmp->info);
printf("Emitting %6d bytes for %s\n", bytesNeeded, gcov_info_filename(tmp->info));
free(buffer);
tmp = tmp->next;
}
}
Step 3
Finally, I scripted my GDB (driving QEMU remotely) via this:
$ cat coverage.gdb
tar extended-remote :9976
file bin.debug/fputest
b base.c:88 <================= This breaks on the "Emitting" printf in __gcov_exit
commands 1
silent
set $filename = tmp->info->filename
set $dataBegin = buffer
set $dataEnd = buffer + bytesNeeded
eval "dump binary memory %s 0x%lx 0x%lx", $filename, $dataBegin, $dataEnd
c
end
c
quit
And finally, executed both QEMU and GDB - like this:
$ # In terminal 1:
qemu-system-MYPLATFORM ... -kernel bin.debug/fputest -gdb tcp::9976 -S
$ # In terminal 2:
MYPLATFORM-gdb -x coverage.gdb
...and that's it - I was able to generate the .gcda files in my local filesystem, and then see coverage results over gcov and lcov.
UPDATE: I wrote a blog post showing the process in detail.

Reading files using Contiki on MSP430F5438A

I want to read a file from my computer and display it on the board. But we are facing a problem while reading the file as the board is constantly resetting. (The Watchdog is off)
Can anyone help?
Here are the steps to upload a file:
If you want to read a file from your local drive only way to do it by uploading the file into coffee file system (cfs) first than read the file using cfs library such as cfs_open, cfs_seek, and cfs_read as a reference have a look into this link:
https://github.com/contiki-os/contiki/wiki/Coffee-filesystem-guide
Modify the program ".c" file you are working to initialize the base64 and coffee commands in the shell by adding:
shell_base64_init();
shell_coffee_init();
Compile and upload via the command:
make TARGET=platformuaresingnow example.upload
to read/upload .txt file by modifying some bash code. To do so, add the following lines
%.shell-upload: %.txt
``(echo; sleep 4; echo "~K"; sleep 4;``
``echo "dec64 | write $*.txt | null"; sleep 4; \``
``../../tools/base64-encode < $<; sleep 4; ``
`` echo ""; echo "~K"; echo "read $*.txt | size"; sleep 4) | make login``
Now you can upload any .txt file to the coffee filesystem of the currently connected mote node using the command:
make testfile.shell-upload
Hope that it 'll solve your problem.