Perl6: large gzipped files read line by line - gzip

I'm trying to read a gz file line by line in Perl6, however, I'm getting blocked:
How to read gz file line by line in Perl6 however, this method, reading everything into :out uses far too much RAM to be usable except on very small files.
I don't understand how to use Perl6's Compress::Zlib to get everything line by line, although I opened an issue on their github https://github.com/retupmoca/P6-Compress-Zlib/issues/17
I'm trying Perl5's Compress::Zlib to translate this code, which works perfectly in Perl5:
use Compress::Zlib;
my $file = "data.txt.gz";
my $gz = gzopen($file, "rb") or die "Error reading $file: $gzerrno";
while ($gz->gzreadline($_) > 0) {
# Process the line read in $_
}
die "Error reading $file: $gzerrno" if $gzerrno != Z_STREAM_END ;
$gz->gzclose() ;
to something like this using Inline::Perl5 in Perl6:
use Compress::Zlib:from<Perl5>;
my $file = 'chrMT.1.vcf.gz';
my $gz = Compress::Zlib::new(gzopen($file, 'r');
while ($gz.gzreadline($_) > 0) {
print $_;
}
$gz.gzclose();
but I can't see how to translate this :(
I'm confused by Lib::Archive example https://github.com/frithnanth/perl6-Archive-Libarchive/blob/master/examples/readfile.p6 I don't see how I can get something like item 3 here
There should be something like
for $file.IO.lines(gz) -> $line { or something like that in Perl6, if it exists, I can't find it.
How can I read a large file line by line without reading everything into RAM in Perl6?

Update Now tested, which revealed an error, now fixed.
Solution #2
use Compress::Zlib;
my $file = "data.txt.gz" ;
my $handle = try open $file or die "Error reading $file: $!" ;
my $zwrap = zwrap($handle, :gzip) ;
for $zwrap.lines {
.print
}
CATCH { default { die "Error reading $file: $_" } }
$handle.close ;
I've tested this with a small gzipped text file.
I don't know much about gzip etc. but figured this out based on:
Knowing P6;
Reading Compress::Zlib's README and choosing the zwrap routine;
Looking at the module's source code, in particular the signature of the zwrap routine our sub zwrap ($thing, :$zlib, :$deflate, :$gzip);
And trial and error, mainly to guess that I needed to pass the :gzip adverb.
Please comment on whether my code works for you. I'm guessing the main thing is whether it's fast enough for the large files you have.
A failed attempt at solution #5
With solution #2 working I would have expected to be able to write just:
use Compress::Zlib ;
.print for "data.txt.gz".&zwrap(:gzip).lines ;
But that fails with:
No such method 'eof' for invocant of type 'IO::Path'
This is presumably because this module was written before the reorganization of the IO classes.
That led me to #MattOates' IO::Handle like object with .lines ? issue. I note no response and I saw no related repo at https://github.com/MattOates?tab=repositories.

I am focusing on the Inline::Perl5 solution that you tried.
For the call to $gz.gzreadline($_): it seems like gzreadline tries to return the line read from the zip file by modifying its input argument $_ (treated as an output argument, but it is not a true Perl 5 reference variable[1]), but the modified value is not returned to the Perl 6 script.
Here is a possoble workaround:
Create a wrapper module in the curent directory, e.g. ./MyZlibWrapper.pm:
package MyZlibWrapper;
use strict;
use warnings;
use Compress::Zlib ();
use Exporter qw(import);
our #EXPORT = qw(gzopen);
our $VERSION = 0.01;
sub gzopen {
my ( $fn, $mode ) = #_;
my $gz = Compress::Zlib::gzopen( $fn, $mode );
my $self = {gz => $gz};
return bless $self, __PACKAGE__;
}
sub gzreadline {
my ( $self ) = #_;
my $line = "";
my $res = $self->{gz}->gzreadline($line);
return [$res, $line];
}
sub gzclose {
my ( $self ) = #_;
$self->{gz}->gzclose();
}
1;
Then use Inline::Perl5 on this wrapper module instead of Compress::Zlib. For example ./p.p6:
use v6;
use lib:from<Perl5> '.';
use MyZlibWrapper:from<Perl5>;
my $file = 'data.txt.gz';
my $mode = 'rb';
my $gz = gzopen($file, $mode);
loop {
my ($res, $line) = $gz.gzreadline();
last if $res == 0;
print $line;
}
$gz.gzclose();
[1]
In Perl 5 you can modify an input argument that is not a reference, and the change will be reflected in the caller. This is done by modifying entries in the special #_ array variable. For example: sub quote { $_[0] = "'$_[0]'" } $str = "Hello"; quote($str) will quote $str even if $str is not passed by reference.

Related

Perl6: check if STDIN has data

In my Perl 6 script, I want to do a (preferably non-blocking) check of standard input to see if data is available. If this is the case, then I want to process it, otherwise I want to do other stuff.
Example (consumer.p6):
#!/usr/bin/perl6
use v6.b;
use fatal;
sub MAIN() returns UInt:D {
while !$*IN.eof {
if some_fancy_check_for_STDIN() { #TODO: this needs to be done.
for $*IN.lines -> $line {
say "Process '$line'";
}
}
say "Do something Else.";
}
say "I'm done.";
return 0;
}
As a STDIN-Generator I wrote another Perl6 script (producer.p6):
#!/usr/bin/perl6
use v6.b;
use fatal;
sub MAIN() returns UInt:D {
$*OUT.say("aaaa aaa");
sleep-until now+2;
$*OUT.say("nbbasdf");
sleep-until now+2;
$*OUT.say("xxxxx");
sleep-until now+2;
return 0;
}
If consumer.p6 works as expected, it should produce the following output, if called via ./producer.p6 | ./consumer.p6:
Process 'aaaa aaa'
Do something Else.
Process 'nbbasdf'
Do something Else.
Process 'xxxxx'
Do something Else.
I'm done.
But actually, it produces the following output (if the if condition is commented out):
Process 'aaaa aaa'
Process 'nbbasdf'
Process 'xxxxx'
Do something Else.
I'm done.
You are using an old version of Perl 6, as v6.b is from before the official release of the language.
So some of what I have below may need a newer version to work.
Also why are you using sleep-until now+2 instead of sleep 2?
One way to do this is to turn the .lines into a Channel, then you can use .poll.
#!/usr/bin/env perl6
use v6.c;
sub MAIN () {
# convert it into a Channel so we can poll it
my $lines = $*IN.Supply.lines.Channel;
my $running = True;
$lines.closed.then: {$running = False}
while $running {
with $lines.poll() -> $line {
say "Process '$line'";
}
say "Do something Else.";
sleep ½;
}
say "I'm done.";
}
Note that the code above blocks at the my $lines = … line currently; so it doesn't start doing something until the first line comes in. To get around that you could do the following
my $lines = supply {
# unblock the $*IN.Supply.lines call
whenever start $*IN.Supply {
whenever .lines { .emit }
}
}.Channel;

Perl web server: How to route

As seen in my code below, I am using apache to serve my Perl web server. I need Perl to have multple routes for my client as seen in my %dispatch. If I figure one out I'm sure the rest will be very similar. If we look at my Subroutine sub resp_index, how can I modify this to link to my index.html file located in my root: /var/www/perl directory?
/var/www/perl/perlServer.pl:
#!/usr/bin/perl
{
package MyWebServer;
use HTTP::Server::Simple::CGI;
use base qw(HTTP::Server::Simple::CGI);
my %dispatch = (
'/index.html' => \&resp_index,
# ...
);
sub handle_request {
my $self = shift;
my $cgi = shift;
my $path = $cgi->path_info();
my $handler = $dispatch{$path};
if (ref($handler) eq "CODE") {
print "HTTP/1.0 200 OK\r\n";
$handler->($cgi);
} else {
print "HTTP/1.0 404 Not found\r\n";
print $cgi->header,
$cgi->start_html('Not found'),
$cgi->h1('Not found'),
$cgi->end_html;
}
}
sub resp_index {
my $cgi = shift; # CGI.pm object
return if !ref $cgi;
my $who = $cgi->param('name');
print $cgi->header,
$cgi->start_html("index"),
$cgi-h1("THIS IS INDEX"),
$cgi->end_html;
}
}
my $pid = MyWebServer->new()->background();
print "Use 'kill $pid' to stop server.\n";
I think what you're asking is how do you serve a file from your web server? Open it and print it, like any other file.
use autodie;
sub resp_index {
my $cgi = shift;
return if !ref $cgi;
print $cgi->header;
open my $fh, "<", "/var/www/perl/index.html";
print <$fh>;
}
Unless this is an exercise, really, really, REALLY don't write your own web framework. It's going to be slow, buggy, and insecure. Consider a small routing framework like Dancer.
For example, mixing documents like index.html and executable code like perlServer.pl in the same directory invites a security hole. Executable code should be isolated in their own directory so they can be given wholly different permissions and stronger protection.
Let's talk about this line...
return if !ref $cgi;
This line is hiding an error. If your functions are passed the wrong argument, or no argument, it will silently return and you (or the person using this) will have no idea why nothing happened. This should be an error...
use Carp;
croak "resp_index() was not given a CGI object" if !ref $cgi;
...but really you should use one of the existing function signature modules such as Method::Signatures.
use Method::Signatures;
func resp_index(CGI $cgi) {
...
}

Rancid/ Looking Glass perl script hitting an odd error: $router unavailable

I am attempting to set up a small test environment (homelab) using CentOS 6.6, Rancid 3.1, Looking Glass, and some Cisco Switches/Routers, with httpd acting as the handler. I have picked up a little perl by means of this endeavor, but python (more 2 than 3) is my background. Right now, everything on the rancid side of things works without issue: bin/clogin successfully logs into all of the equipment in the router.db file, and logging of the configs is working as expected. All switches/routers to be accessed are available and online, verified by ssh connection to devices as well as using bin/clogin.
Right now, I have placed the lg.cgi and lgform.cgi files into var/www/cgi-bin/ which allows the forms to be run as cgi scripts. I had to modify the files to split on ';' instead of ':' due to the change in the .db file in Rancid 3.1:#record = split('\:', $_); was replaced with: #record = split('\;', $_); etc. Once that change was made, I was able to load the lgform.cgi with the proper router.db parsing. At this point, it seemed like everything should be good to go. When I attempt to ping from one of those devices out to 8.8.8.8, the file correctly redirects to lg.cgi, and the page loads, but with
main is unavailable. Try again later.
as the error, where 'main' is the router hostname. Using this output, I was able to find the function responsible for this output. Here it is before I added anything:
sub DoRsh
{
my ($router, $mfg, $cmd, $arg) = #_;
my($ctime) = time();
my($val);
my($lckobj) = LockFile::Simple->make(-delay => $lock_int,
-max => $max_lock_wait, -hold => $max_lock_hold);
if ($pingcmd =~ /\d$/) {
`$pingcmd $router`;
} else {
`$pingcmd $router 56 1`;
}
if ($?) {
print "$router is unreachable. Try again later.\n";
return(-1);
}
if ($LG_SINGLE) {
if (! $lckobj->lock("$cache_dir/$router")) {
print "$router is busy. Try again later.\n";
return(-1);
}
}
$val = &DoCmd($router, $mfg, $cmd, $arg);
if ($LG_SINGLE) {
$lckobj->unlock("$cache_dir/$router");
}
return($val);
}
In order to dig in a little deeper, I peppered that function with several print statements. Here is the modified function, followed by the output from the loaded lg.cgi page:
sub DoRsh
{
my ($router, $mfg, $cmd, $arg) = #_;
my($ctime) = time();
my($val);
my($lckobj) = LockFile::Simple->make(-delay => $lock_int,
-max => $max_lock_wait, -hold => $max_lock_hold);
if ($pingcmd =~ /\d$/) {
`$pingcmd $router`;
} else {
`$pingcmd $router 56 1`;
}
print "About to test the ($?) branch.\n";
print "Also who is the remote_user?:' $remote_user'\n";
print "What about the ENV{REMOTE_USER} '$ENV{REMOTE_USER}'\n";
print "Here is the ENV{HOME}: '$ENV{HOME}'\n";
if ($?) {
print "$lckobj is the lock object.\n";
print "#_ something else to look at.\n";
print "$? whatever this is suppose to be....\n";
print "Some variables:\n";
print "$mfg is the mfg.\n";
print "$cmd was the command passed in with $arg as the argument.\n";
print "$pingcmd $router\n";
print "$cloginrc - Is the cloginrc pointing correctly?\n";
print "$LG_SINGLE the next value to be tested.\n";
print "$router is unreachable. Try again later.\n";
return(-1);
}
if ($LG_SINGLE) {
if (! $lckobj->lock("$cache_dir/$router")) {
print "$router is busy. Try again later.\n";
return(-1);
}
}
$val = &DoCmd($router, $mfg, $cmd, $arg);
if ($LG_SINGLE) {
$lckobj->unlock("$cache_dir/$router");
}
return($val);
}
OUTPUT:
About to test the (512) branch.
Also who is the remote_user?:' '
What about the ENV{REMOTE_USER} ''
Here is the ENV{HOME}: '.'
LockFile::Simple=HASH(0x1a13650) is the lock object.
main cisco ping 8.8.8.8 something else to look at.
512 whatever this is suppose to be....
Some variables:
cisco is the mfg.
ping was the command passed in with 8.8.8.8 as the argument.
/bin/ping -c 1 main
./.cloginrc - Is the cloginrc pointing correctly?
1 the next value to be tested.
main is unreachable. Try again later.
I can provide the code for when DoRsh is called, if necessary, but it looks mostly like this:&DoRsh($router, $mfg, $cmd, $arg);.
From what I can tell the '$?' special variable (or at least according to
this reference it is a special var) is returning the 512 value, which is causing that fork to test true. The problem is I don't know what that 512 means, nor where it is coming from. Using the ref site's description ("The status returned by the last pipe close, backtick (``) command, or system operator.") and the formation of the conditional tree above, I can see that it is some error of some kind, but I don't know how else to proceed with this inspection. I'm wondering if maybe it is in response to some permission issue, since the remote_user variable is null, when I didn't expect it to be. Any guidance anyone may be able to provide would be helpful. Furthermore, if there is any information that I may have skipped over, that I didn't think to include, or that may prove helpful, please ask, and I will provide to the best of my ability
May be you put in something like
my $pingret=$pingcmd ...;
print 'Ping result was:'.$pingret;
And check the returned strings?

Perl script to export sql query to csv

The code below works, but all of the data displays in one row(but different columns) when opened in Excel. The query SHOULD display the data headings, row 1, and row 2. Also, when I open the file, I get a warning that says "The file you are trying to open,'xxxx.csv', is in a different format than specified by the file extension. Verify that the file is not corrupted...etc. Do you want to open the file now?" Anyway to fix that? That may be the cause too.
tldr; export to csv with multiple rows - not just one. fix Excel error. Thanks!
#!/usr/bin/perl
use warnings;
use DBI;
use Text::CSV;
# local time variables
($sec,$min,$hr,$mday,$mon,$year) = localtime(time);
$mon++;
$year += 1900;
# set name of database to connect to
$database=MDLSDB1;
# connection to the database
my $dbh = DBI->connect("dbi:Oracle:$database", "", "")
or die "Can't make database connect: $DBI::errstr\n";
# some settings that you usually want for oracle 10
$dbh->{LongReadLen} = 65535;
$dbh->{PrintError} = 0;
# sql statement to run
$sql="select * from eg.well where rownum < 3";
my $sth = $dbh->prepare($sql);
$sth->execute();
my $csv = Text::CSV->new ( { binary => 1 } )
or die "Cannot use CSV: ".Text::CSV->error_diag ();
open my $fh, ">:raw", "results-$year-$mon-$mday-$hr.$min.$sec.csv";
$csv->print($fh, $sth->{NAME});
while(my $row = $sth->fetchrow_arrayref){
$csv->print($fh, $row);
}
close $fh or die "Failed to write CSV: $!";
while(my $row = $sth->fetchrow_arrayref){
$csv->print($fh, $row);
$csv->print($fh, "\n");
}
CSV rows are delimited by newlines. Just simply add a newline after each row.
I think another solution is to use the instantiation of the Text::CSV object and pass along the desired line termination there...
my $csv = Text::CSV->new ( { binary => 1 } )
or die "Cannot use CSV: " . Text::CSV->error_diag();
becomes:
my $csv = Text::CSV->new({ binary => 1, eol => "\r\n" })
or die "Cannot use CSV: " . Text::CSV->error_diag();

How can I implement incremental (find-as-you-type) search on command line?

I'd like to write small scripts which feature incremental search (find-as-you-type) on the command line.
Use case: I have my mobile phone connected via USB, Using gammu --sendsms TEXT I can write text messages. I have the phonebook as CSV, and want to search-as-i-type on that.
What's the easiest/best way to do it? It might be in bash/zsh/Perl/Python or any other scripting language.
Edit:
Solution: Modifying Term::Complete did what I want. See below for the answer.
I get the impression GNU Readline supports this kind of thing. Though, I have not used it myself. Here is a C++ example of custom auto complete, which could easily be done in C too. There is also a Python API for readline.
This StackOverflow question gives examples in Python, one of which is ...
import readline
def completer(text, state):
options = [x in addrs where x.startswith(text)]
if state < options.length:
return options[state]
else
return None
readline.set_completer(completer)
this article on Bash autocompletion may help. This article also gives examples of programming bash's auto complete feature.
Following Aiden Bell's hint, I tried Readline in Perl.
Solution 1 using Term::Complete (also used by CPAN, I think):
use Term::Complete;
my $F;
open($F,"<","bin/phonebook.csv");
my #terms = <$F>; chomp(#terms);
close($F);
my $input;
while (!defined $input) {
$input = Complete("Enter a name or number: ",#terms);
my ($name,$number) = split(/\t/,$input);
print("Sending SMS to $name ($number).\n");
system("sudo gammu --sendsms TEXT $number");
}
Press \ to complete, press Ctrl-D to see all possibilities.
Solution 2: Ctrl-D is one keystroke to much, so using standard Term::Readline allows completion and the display off possible completions using only \.
use Term::ReadLine;
my $F;
open($F,"<","bin/phonebook.csv");
my #terms = <$F>; chomp(#terms);
close($F);
my $term = new Term::ReadLine;
$term->Attribs->{completion_function} = sub { return #terms; };
my $prompt = "Enter name or number >> ";
my $OUT = $term->OUT || \*STDOUT;
while ( defined (my $input = $term->readline($prompt)) ) {
my ($name,$number) = split(/\t/,$input);
print("Sending SMS to $name ($number).\n");
system("sudo gammu --sendsms TEXT $number");
}
This solution still needs a for completion.
Edit: Final Solution
Modifying Term::Complete (http://search.cpan.org/~jesse/perl-5.12.0/lib/Term/Complete.pm) does give me on the fly completion.
Source code: http://search.cpan.org/CPAN/authors/id/J/JE/JESSE/perl-5.12.0.tar.gz
Solution number 1 works with this modification. I will put the whole sample online somewhere else if this can be used by somebody
Modifications of Completion.pm (just reusing it's code for Control-D and \ for each character):
170c172,189
my $redo=0;
#match = grep(/^\Q$return/, #cmp_lst);
unless ($#match < 0) {
$l = length($test = shift(#match));
foreach $cmp (#match) {
until (substr($cmp, 0, $l) eq substr($test, 0, $l)) {
$l--;
}
}
print("\a");
print($test = substr($test, $r, $l - $r));
$redo = $l - $r == 0;
if ($redo) { print(join("\r\n", '', grep(/^\Q$return/, #cmp_lst)), "\r\n"); }
$r = length($return .= $test);
}
if ($redo) { redo LOOP; } else { last CASE; }