Mass PDF form fill - pdf

I need to fill in a single PDF template multiple times and concat the results. When I say multiple, I mean up to a few hundred times, potentially over one thousand.
I can do this with pdftk fill_form, one by one, and then use pdftk cat. We can parallelize this fairly easily.
I'm curious if this is the only option, or if there is a piece of software (Linux + OSX, command line) that will allow me to say "take this template, and these sets of fields, fill out this form, and concat the files" so I can avoid doing every one individually. Then again, if something does exist, but it's not any faster than just doing the fork parallelization method, then it's probably not worth it.

My Perl library CAM::PDF can do this. The form filling is a bit weak (it doesn't support checkboxes, for example) but the concatenation works great.
#perl -w
use strict;
use CAM::PDF;
my $infile = 'in.pdf';
my $outfile = 'out.pdf';
my #fills = (
{ name => 'John' },
{ name => 'Fred' },
);
my $pdf = CAM::PDF->new($infile) or die $CAM::PDF::errstr;
for my $i (0 .. #fills-1) {
my $filledPDF = $i == 0 ? $pdf : CAM::PDF->new($infile);
$filledPDF->fillFormFields(%{$fills[$i]});
if ($i > 0) {
$pdf->appendPDF($filledPDF);
}
}
$pdf->cleanoutput($outfile) or die;

Related

Perl6: large gzipped files read line by line

I'm trying to read a gz file line by line in Perl6, however, I'm getting blocked:
How to read gz file line by line in Perl6 however, this method, reading everything into :out uses far too much RAM to be usable except on very small files.
I don't understand how to use Perl6's Compress::Zlib to get everything line by line, although I opened an issue on their github https://github.com/retupmoca/P6-Compress-Zlib/issues/17
I'm trying Perl5's Compress::Zlib to translate this code, which works perfectly in Perl5:
use Compress::Zlib;
my $file = "data.txt.gz";
my $gz = gzopen($file, "rb") or die "Error reading $file: $gzerrno";
while ($gz->gzreadline($_) > 0) {
# Process the line read in $_
}
die "Error reading $file: $gzerrno" if $gzerrno != Z_STREAM_END ;
$gz->gzclose() ;
to something like this using Inline::Perl5 in Perl6:
use Compress::Zlib:from<Perl5>;
my $file = 'chrMT.1.vcf.gz';
my $gz = Compress::Zlib::new(gzopen($file, 'r');
while ($gz.gzreadline($_) > 0) {
print $_;
}
$gz.gzclose();
but I can't see how to translate this :(
I'm confused by Lib::Archive example https://github.com/frithnanth/perl6-Archive-Libarchive/blob/master/examples/readfile.p6 I don't see how I can get something like item 3 here
There should be something like
for $file.IO.lines(gz) -> $line { or something like that in Perl6, if it exists, I can't find it.
How can I read a large file line by line without reading everything into RAM in Perl6?
Update Now tested, which revealed an error, now fixed.
Solution #2
use Compress::Zlib;
my $file = "data.txt.gz" ;
my $handle = try open $file or die "Error reading $file: $!" ;
my $zwrap = zwrap($handle, :gzip) ;
for $zwrap.lines {
.print
}
CATCH { default { die "Error reading $file: $_" } }
$handle.close ;
I've tested this with a small gzipped text file.
I don't know much about gzip etc. but figured this out based on:
Knowing P6;
Reading Compress::Zlib's README and choosing the zwrap routine;
Looking at the module's source code, in particular the signature of the zwrap routine our sub zwrap ($thing, :$zlib, :$deflate, :$gzip);
And trial and error, mainly to guess that I needed to pass the :gzip adverb.
Please comment on whether my code works for you. I'm guessing the main thing is whether it's fast enough for the large files you have.
A failed attempt at solution #5
With solution #2 working I would have expected to be able to write just:
use Compress::Zlib ;
.print for "data.txt.gz".&zwrap(:gzip).lines ;
But that fails with:
No such method 'eof' for invocant of type 'IO::Path'
This is presumably because this module was written before the reorganization of the IO classes.
That led me to #MattOates' IO::Handle like object with .lines ? issue. I note no response and I saw no related repo at https://github.com/MattOates?tab=repositories.
I am focusing on the Inline::Perl5 solution that you tried.
For the call to $gz.gzreadline($_): it seems like gzreadline tries to return the line read from the zip file by modifying its input argument $_ (treated as an output argument, but it is not a true Perl 5 reference variable[1]), but the modified value is not returned to the Perl 6 script.
Here is a possoble workaround:
Create a wrapper module in the curent directory, e.g. ./MyZlibWrapper.pm:
package MyZlibWrapper;
use strict;
use warnings;
use Compress::Zlib ();
use Exporter qw(import);
our #EXPORT = qw(gzopen);
our $VERSION = 0.01;
sub gzopen {
my ( $fn, $mode ) = #_;
my $gz = Compress::Zlib::gzopen( $fn, $mode );
my $self = {gz => $gz};
return bless $self, __PACKAGE__;
}
sub gzreadline {
my ( $self ) = #_;
my $line = "";
my $res = $self->{gz}->gzreadline($line);
return [$res, $line];
}
sub gzclose {
my ( $self ) = #_;
$self->{gz}->gzclose();
}
1;
Then use Inline::Perl5 on this wrapper module instead of Compress::Zlib. For example ./p.p6:
use v6;
use lib:from<Perl5> '.';
use MyZlibWrapper:from<Perl5>;
my $file = 'data.txt.gz';
my $mode = 'rb';
my $gz = gzopen($file, $mode);
loop {
my ($res, $line) = $gz.gzreadline();
last if $res == 0;
print $line;
}
$gz.gzclose();
[1]
In Perl 5 you can modify an input argument that is not a reference, and the change will be reflected in the caller. This is done by modifying entries in the special #_ array variable. For example: sub quote { $_[0] = "'$_[0]'" } $str = "Hello"; quote($str) will quote $str even if $str is not passed by reference.

How to get/set Trello custom fields using the API?

I'm already in love with the Custom Fields feature in Trello. Is there a way to get and set custom fields via the API?
I tried using the get field API call to get a field (on a board with a custom field defined called "MyCustomField"):
curl "https://api.trello.com/1/cards/57c473503a5ef0b76fddd0e5/MyCustomField?key=${TRELLO_API_KEY}&token=${TRELLO_OAUTH_TOKEN}"
to no avail.
The Custom Fields API from Trello is now officially available (announcement blog post here).
It allows users to manipulate both custom field items of boards and custom field item values on cards.
Custom Fields API documentation
Getting customFieldItems For Cards
Setting & Updating CustomFieldItems
This is just to add to bdwakefield's answer. His solution involves hard coding the names of the field ids.
If you want to also retrieve the name of the fields themselves (for example get that "ZIn76ljn-4yeYvz" is actually "priority" in Trello, without needing to hard code it, call the following end point:
boards/{trello board id}/pluginData
This will return an array with the plugins information and in one of the array items, it will include a line along the lines of:
[value] => {"fields":[{"n":"~custom field name~:","t":0,"b":1,"id":"~custom field id that is the weird stuff at the card level~","friendlyType":"Text"}]}
So you just need to figure out the plugin for custom fields in your case, and you can retrieve the key -> value pair for the custom field name and the id associated with it.
It means that if you add / remove fields, or rename them, you can handle it at run time vs changing your code.
This will also give you the options for the custom field (when it is a dropdown like in bdwakefield's example above).
So I have a "sort of" answer to this. It requires some hackery on your part to make it work and there is more than a little manual upkeep as you add properties or values -- but it works.
I am doing this in powershell (I am NOT well versed in ps just yet and this my first really 'big' script that I have pulled together for it) since my intent is to use this with TFS Builds to automate moving some cards around and creating release notes. We are using custom fields to help us classify the card and note estimate/actual hours etc. I used this guys work as a basis for my own scripting. I am not including everything but you should be able to piece everything together.
I have left out everything with connecting to Trello and all that. I have a bunch of other functions for gettings lists, moving cards, adding comments etc. The ps module I linked above has a lot of that built in as well.
function Get-TrelloCardPluginData
{
[CmdletBinding()]
param
(
[Parameter(Mandatory,ValueFromPipelineByPropertyName)]
[ValidateNotNullOrEmpty()]
[Alias('Id')]
[string]$CardId
)
begin
{
$ErrorActionPreference = 'Stop'
}
process
{
try
{
$uri = "$baseUrl/cards/$CardId/pluginData?$($trelloConfig.String)"
$result = Invoke-RestMethod -Uri $uri -Method GET
return $result
}
catch
{
Write-Error $_.Exception.Message
}
}
}
You'll get data that looks like this:
#{id=582b5ec8df1572e572411513; idPlugin=56d5e249a98895a9797bebb9;
scope=card; idModel=58263201749710ed3c706bef;
value={"fields":{"ZIn76ljn-4yeYvz":2,"ZIn76ljn-c2yhZH":1}};
access=shared}
#{id=5834536fcff0525f26f9e53b; idPlugin=56d5e249a98895a9797bebb9;
scope=card; idModel=567031ea6a01f722978b795d;
value={"fields":{"ZIn76ljn-4yeYvz":4,"ZIn76ljn-c2yhZH":3}};
access=shared}
The fields collection is basically key/pair. The random characters correspond to the property and the value after that is what was set on the custom property. In this case it is an 'index' for the value in the dropdown. These two fields have a 'priority' (low, medium, high) and a 'classification' (Bug, Change Request, etc) for us. (We are using labels for something else).
So you'll have to create another fucntion where you can parse this data out. I am sure there are better ways to do it -- but this is what I have now:
function Get-TrelloCustomPropertyData($propertyData)
{
$data = $propertyData.Replace('{"fields":{', '')
$data = $data.Replace('}}', '')
$data = $data.Replace('"', '')
$sepone = ","
$septwo = ":"
$options = [System.StringSplitOptions]::RemoveEmptyEntries
$obj = $data.Split($sepone, $options)
$cardCustomFields = Get-TrelloCustomFieldObject
foreach($pair in $obj)
{
$field = $pair.Split($septwo,$options)
if (-Not [string]::IsNullOrWhiteSpace($field[0].Trim()))
{
switch($field[0].Trim())
{
'ZIn76ljn-4yeYvz' {
switch($field[1].Trim())
{
'1'{
$cardCustomFields.Priority = "Critical"
}
'2'{
$cardCustomFields.Priority = "High"
}
'3'{
$cardCustomFields.Priority = "Medium"
}
'4'{
$cardCustomFields.Priority = "Low"
}
}
}
'ZIn76ljn-c2yhZH' {
switch($field[1].Trim())
{
'1'{
$cardCustomFields.Classification = "Bug"
}
'2'{
$cardCustomFields.Classification = "Change Request"
}
'3'{
$cardCustomFields.Classification = "New Development"
}
}
}
'ZIn76ljn-uJyxzA'{$cardCustomFields.Estimated = $field[1].Trim()}
'ZIn76ljn-AwYurD'{$cardCustomFields.Actual = $field[1].Trim()}
}
}
}
return $cardCustomFields
}
Get-TrelloCustomFieldObject is another ps function that I set up to build an object based on the properties I know that I have defined.
function Get-TrelloCustomFieldObject
{
[CmdletBinding()]
param()
begin
{
$ErrorActionPreference = 'Stop'
}
process
{
$ccf = New-Object System.Object
$ccf | Add-Member -type NoteProperty -name Priority -value "None"
$ccf | Add-Member -type NoteProperty -name Classification -value "None"
$ccf | Add-Member -type NoteProperty -name Estimated -value ""
$ccf | Add-Member -type NoteProperty -name Actual -value ""
return $ccf
}
}

Rancid/ Looking Glass perl script hitting an odd error: $router unavailable

I am attempting to set up a small test environment (homelab) using CentOS 6.6, Rancid 3.1, Looking Glass, and some Cisco Switches/Routers, with httpd acting as the handler. I have picked up a little perl by means of this endeavor, but python (more 2 than 3) is my background. Right now, everything on the rancid side of things works without issue: bin/clogin successfully logs into all of the equipment in the router.db file, and logging of the configs is working as expected. All switches/routers to be accessed are available and online, verified by ssh connection to devices as well as using bin/clogin.
Right now, I have placed the lg.cgi and lgform.cgi files into var/www/cgi-bin/ which allows the forms to be run as cgi scripts. I had to modify the files to split on ';' instead of ':' due to the change in the .db file in Rancid 3.1:#record = split('\:', $_); was replaced with: #record = split('\;', $_); etc. Once that change was made, I was able to load the lgform.cgi with the proper router.db parsing. At this point, it seemed like everything should be good to go. When I attempt to ping from one of those devices out to 8.8.8.8, the file correctly redirects to lg.cgi, and the page loads, but with
main is unavailable. Try again later.
as the error, where 'main' is the router hostname. Using this output, I was able to find the function responsible for this output. Here it is before I added anything:
sub DoRsh
{
my ($router, $mfg, $cmd, $arg) = #_;
my($ctime) = time();
my($val);
my($lckobj) = LockFile::Simple->make(-delay => $lock_int,
-max => $max_lock_wait, -hold => $max_lock_hold);
if ($pingcmd =~ /\d$/) {
`$pingcmd $router`;
} else {
`$pingcmd $router 56 1`;
}
if ($?) {
print "$router is unreachable. Try again later.\n";
return(-1);
}
if ($LG_SINGLE) {
if (! $lckobj->lock("$cache_dir/$router")) {
print "$router is busy. Try again later.\n";
return(-1);
}
}
$val = &DoCmd($router, $mfg, $cmd, $arg);
if ($LG_SINGLE) {
$lckobj->unlock("$cache_dir/$router");
}
return($val);
}
In order to dig in a little deeper, I peppered that function with several print statements. Here is the modified function, followed by the output from the loaded lg.cgi page:
sub DoRsh
{
my ($router, $mfg, $cmd, $arg) = #_;
my($ctime) = time();
my($val);
my($lckobj) = LockFile::Simple->make(-delay => $lock_int,
-max => $max_lock_wait, -hold => $max_lock_hold);
if ($pingcmd =~ /\d$/) {
`$pingcmd $router`;
} else {
`$pingcmd $router 56 1`;
}
print "About to test the ($?) branch.\n";
print "Also who is the remote_user?:' $remote_user'\n";
print "What about the ENV{REMOTE_USER} '$ENV{REMOTE_USER}'\n";
print "Here is the ENV{HOME}: '$ENV{HOME}'\n";
if ($?) {
print "$lckobj is the lock object.\n";
print "#_ something else to look at.\n";
print "$? whatever this is suppose to be....\n";
print "Some variables:\n";
print "$mfg is the mfg.\n";
print "$cmd was the command passed in with $arg as the argument.\n";
print "$pingcmd $router\n";
print "$cloginrc - Is the cloginrc pointing correctly?\n";
print "$LG_SINGLE the next value to be tested.\n";
print "$router is unreachable. Try again later.\n";
return(-1);
}
if ($LG_SINGLE) {
if (! $lckobj->lock("$cache_dir/$router")) {
print "$router is busy. Try again later.\n";
return(-1);
}
}
$val = &DoCmd($router, $mfg, $cmd, $arg);
if ($LG_SINGLE) {
$lckobj->unlock("$cache_dir/$router");
}
return($val);
}
OUTPUT:
About to test the (512) branch.
Also who is the remote_user?:' '
What about the ENV{REMOTE_USER} ''
Here is the ENV{HOME}: '.'
LockFile::Simple=HASH(0x1a13650) is the lock object.
main cisco ping 8.8.8.8 something else to look at.
512 whatever this is suppose to be....
Some variables:
cisco is the mfg.
ping was the command passed in with 8.8.8.8 as the argument.
/bin/ping -c 1 main
./.cloginrc - Is the cloginrc pointing correctly?
1 the next value to be tested.
main is unreachable. Try again later.
I can provide the code for when DoRsh is called, if necessary, but it looks mostly like this:&DoRsh($router, $mfg, $cmd, $arg);.
From what I can tell the '$?' special variable (or at least according to
this reference it is a special var) is returning the 512 value, which is causing that fork to test true. The problem is I don't know what that 512 means, nor where it is coming from. Using the ref site's description ("The status returned by the last pipe close, backtick (``) command, or system operator.") and the formation of the conditional tree above, I can see that it is some error of some kind, but I don't know how else to proceed with this inspection. I'm wondering if maybe it is in response to some permission issue, since the remote_user variable is null, when I didn't expect it to be. Any guidance anyone may be able to provide would be helpful. Furthermore, if there is any information that I may have skipped over, that I didn't think to include, or that may prove helpful, please ask, and I will provide to the best of my ability
May be you put in something like
my $pingret=$pingcmd ...;
print 'Ping result was:'.$pingret;
And check the returned strings?

How can I implement incremental (find-as-you-type) search on command line?

I'd like to write small scripts which feature incremental search (find-as-you-type) on the command line.
Use case: I have my mobile phone connected via USB, Using gammu --sendsms TEXT I can write text messages. I have the phonebook as CSV, and want to search-as-i-type on that.
What's the easiest/best way to do it? It might be in bash/zsh/Perl/Python or any other scripting language.
Edit:
Solution: Modifying Term::Complete did what I want. See below for the answer.
I get the impression GNU Readline supports this kind of thing. Though, I have not used it myself. Here is a C++ example of custom auto complete, which could easily be done in C too. There is also a Python API for readline.
This StackOverflow question gives examples in Python, one of which is ...
import readline
def completer(text, state):
options = [x in addrs where x.startswith(text)]
if state < options.length:
return options[state]
else
return None
readline.set_completer(completer)
this article on Bash autocompletion may help. This article also gives examples of programming bash's auto complete feature.
Following Aiden Bell's hint, I tried Readline in Perl.
Solution 1 using Term::Complete (also used by CPAN, I think):
use Term::Complete;
my $F;
open($F,"<","bin/phonebook.csv");
my #terms = <$F>; chomp(#terms);
close($F);
my $input;
while (!defined $input) {
$input = Complete("Enter a name or number: ",#terms);
my ($name,$number) = split(/\t/,$input);
print("Sending SMS to $name ($number).\n");
system("sudo gammu --sendsms TEXT $number");
}
Press \ to complete, press Ctrl-D to see all possibilities.
Solution 2: Ctrl-D is one keystroke to much, so using standard Term::Readline allows completion and the display off possible completions using only \.
use Term::ReadLine;
my $F;
open($F,"<","bin/phonebook.csv");
my #terms = <$F>; chomp(#terms);
close($F);
my $term = new Term::ReadLine;
$term->Attribs->{completion_function} = sub { return #terms; };
my $prompt = "Enter name or number >> ";
my $OUT = $term->OUT || \*STDOUT;
while ( defined (my $input = $term->readline($prompt)) ) {
my ($name,$number) = split(/\t/,$input);
print("Sending SMS to $name ($number).\n");
system("sudo gammu --sendsms TEXT $number");
}
This solution still needs a for completion.
Edit: Final Solution
Modifying Term::Complete (http://search.cpan.org/~jesse/perl-5.12.0/lib/Term/Complete.pm) does give me on the fly completion.
Source code: http://search.cpan.org/CPAN/authors/id/J/JE/JESSE/perl-5.12.0.tar.gz
Solution number 1 works with this modification. I will put the whole sample online somewhere else if this can be used by somebody
Modifications of Completion.pm (just reusing it's code for Control-D and \ for each character):
170c172,189
my $redo=0;
#match = grep(/^\Q$return/, #cmp_lst);
unless ($#match < 0) {
$l = length($test = shift(#match));
foreach $cmp (#match) {
until (substr($cmp, 0, $l) eq substr($test, 0, $l)) {
$l--;
}
}
print("\a");
print($test = substr($test, $r, $l - $r));
$redo = $l - $r == 0;
if ($redo) { print(join("\r\n", '', grep(/^\Q$return/, #cmp_lst)), "\r\n"); }
$r = length($return .= $test);
}
if ($redo) { redo LOOP; } else { last CASE; }

Compiling a week's worth of tweets automatically?

I'd like to be able to run a script that parsed through the twitter page and compiled a list of tweets for a given time period - one week to be more exact. Ideally it should return the results as a html list that could then be posted in a blog. Like here:
http://www.perezfox.com/2009/07/12/the-week-in-tweet-for-2009-07-12/
I'm sure there's a script out there that could do it, unless the guy does it manually (that would be a big pain!). If there is such a script forgive my ignorance.
Thanks.
Use the Twitter search API. For instance, this query returns my tweets between 2009-07-10 and 2009-07-17:
http://search.twitter.com/search.atom?q=from:tormodfj&since=2009-07-10&until=2009-07-17
For anyone that's interested, I hacked together a quick PHP parser that will take the XML output of the above feed and turn it into a nice list. It's sensible if you post a lot of tweets to use the rpp parameter, so that your feed doesn't get clipped at 15. The maximum limit is 100. So by sticking this url into NetNewsWire (or equivalent feed reader):
http://search.twitter.com/search.atom?q=from:yourTwitterAccountHere&since=2009-07-13&until=2009-07-19&rpp=100
and exporting the xml to a hard file, you can use this script:
<?php
$date = "";
$in = 'links.xml'; //tweets
file_exists($in) ? $xml = simplexml_load_file($in) : die ('Failed to open xml data.');
foreach($xml->entry as $item)
{
$newdate = date("dS F", strtotime($item->published));
if ($date == "")
{
echo "<h2>$newdate</h2>\n<ul>\n";
}
elseif ($newdate != $date)
{
echo "</ul>\n<h2>$newdate</h2>\n<ul>\n";
}
echo "<li>\n<p>" . $item->content ." *</p>\n</li>\n";
$date = $newdate;
}
echo "</ul>\n";
?>