Perl web server: How to route - apache

As seen in my code below, I am using apache to serve my Perl web server. I need Perl to have multple routes for my client as seen in my %dispatch. If I figure one out I'm sure the rest will be very similar. If we look at my Subroutine sub resp_index, how can I modify this to link to my index.html file located in my root: /var/www/perl directory?
/var/www/perl/perlServer.pl:
#!/usr/bin/perl
{
package MyWebServer;
use HTTP::Server::Simple::CGI;
use base qw(HTTP::Server::Simple::CGI);
my %dispatch = (
'/index.html' => \&resp_index,
# ...
);
sub handle_request {
my $self = shift;
my $cgi = shift;
my $path = $cgi->path_info();
my $handler = $dispatch{$path};
if (ref($handler) eq "CODE") {
print "HTTP/1.0 200 OK\r\n";
$handler->($cgi);
} else {
print "HTTP/1.0 404 Not found\r\n";
print $cgi->header,
$cgi->start_html('Not found'),
$cgi->h1('Not found'),
$cgi->end_html;
}
}
sub resp_index {
my $cgi = shift; # CGI.pm object
return if !ref $cgi;
my $who = $cgi->param('name');
print $cgi->header,
$cgi->start_html("index"),
$cgi-h1("THIS IS INDEX"),
$cgi->end_html;
}
}
my $pid = MyWebServer->new()->background();
print "Use 'kill $pid' to stop server.\n";

I think what you're asking is how do you serve a file from your web server? Open it and print it, like any other file.
use autodie;
sub resp_index {
my $cgi = shift;
return if !ref $cgi;
print $cgi->header;
open my $fh, "<", "/var/www/perl/index.html";
print <$fh>;
}
Unless this is an exercise, really, really, REALLY don't write your own web framework. It's going to be slow, buggy, and insecure. Consider a small routing framework like Dancer.
For example, mixing documents like index.html and executable code like perlServer.pl in the same directory invites a security hole. Executable code should be isolated in their own directory so they can be given wholly different permissions and stronger protection.
Let's talk about this line...
return if !ref $cgi;
This line is hiding an error. If your functions are passed the wrong argument, or no argument, it will silently return and you (or the person using this) will have no idea why nothing happened. This should be an error...
use Carp;
croak "resp_index() was not given a CGI object" if !ref $cgi;
...but really you should use one of the existing function signature modules such as Method::Signatures.
use Method::Signatures;
func resp_index(CGI $cgi) {
...
}

Related

Perl : Scrape website and how to download PDF files from the website using Perl Selenium:Chrome

So I'm studying Scraping website using Selenium:Chrome on Perl, I just wondering how can I download all pdf files from year 2017 to 2021 and store it into a folder from this website https://www.fda.gov/drugs/warning-letters-and-notice-violation-letters-pharmaceutical-companies/untitled-letters-2021 . So far this is what I've done
use strict;
use warnings;
use Time::Piece;
use POSIX qw(strftime);
use Selenium::Chrome;
use File::Slurp;
use File::Copy qw(copy);
use File::Path;
use File::Path qw(make_path remove_tree);
use LWP::Simple;
my $collection_name = "mre_zen_test3";
make_path("$collection_name");
#DECLARE SELENIUM DRIVER
my $driver = Selenium::Chrome->new;
#NAVIGATE TO SITE
print "trying to get toc_url\n";
$driver->navigate('https://www.fda.gov/drugs/warning-letters-and-notice-violation-letters-pharmaceutical-companies/untitled-letters-2021');
sleep(8);
#GET PAGE SOURCE
my $toc_content = $driver->get_page_source();
$toc_content =~ s/[^\x00-\x7f]//g;
write_file("toc.html", $toc_content);
print "writing toc.html\n";
sleep(5);
$toc_content = read_file("toc.html");
This script only download the entire content of the website. Hope someone here can help me and teach me. Thank you very much.
Here is some working code, to help you get going hopefully
use warnings;
use strict;
use feature 'say';
use Path::Tiny; # only convenience
use Selenium::Chrome;
my $base_url = q(https://www.fda.gov/drugs/)
. q(warning-letters-and-notice-violation-letters-pharmaceutical-companies/);
my $show = 1; # to see navigation. set to false for headless operation
# A little demo of how to set some browser options
my %chrome_capab = do {
my #cfg = ($show)
? ('window-position=960,10', 'window-size=950,1180')
: 'headless';
'extra_capabilities' => { 'goog:chromeOptions' => { args => [ #cfg ] } }
};
my $drv = Selenium::Chrome->new( %chrome_capab );
my #years = 2017..2021;
foreach my $year (#years) {
my $url = $base_url . "untitled-letters-$year";
$drv->get($url);
say "\nPage title: ", $drv->get_title;
sleep 1 if $show;
my $elem = $drv->find_element(
q{//li[contains(text(), 'PDF')]/a[contains(text(), 'Untitled Letter')]}
);
sleep 1 if $show;
# Downloading the file is surprisingly not simple with Selenium (see text)
# But as we found the link we can get its url and then use Selenium-provided
# user-agent (it's LWP::UserAgent)
my $href = $elem->get_attribute('href');
say "pdf's url: $href";
my $response = $drv->ua->get($href);
die $response->status_line if not $response->is_success;
say "Downloading 'Content-Type': ", $response->header('Content-Type');
my $filename = "download_$year.pdf";
say "Save as $filename";
path($filename)->spew( $response->decoded_content );
}
This takes shortcuts, switches approaches, and sidesteps some issues (which one need resolve for a fuller utility of this useful tool). It downloads one pdf from each page; to download all we need to change the XPath expression used to locate them
my #hrefs =
map { $_->get_attribute('href') }
$drv->find_elements(
# There's no ends-with(...) in XPath 1.0 (nor matches() with regex)
q{//li[contains(text(), '(PDF)')]}
. q{/a[starts-with(#href, '/media/') and contains(#href, '/download')]}
);
Now loop over the links, forming filenames more carefully, and download each like in the program above. I can fill the gaps further if there's need for that.
The code puts the pdf files on disk, in its working directory. Please review that before running this so to make sure that nothing gets overwritten!
See Selenium::Remove::Driver for starters.
Note: there is no need for Selenium for this particular task; it's all straight-up HTTP requests, no JavaScript. So LWP::UserAgent or Mojo would do it just fine. But I take it that you want to learn how to use Selenium, since it often is needed and is useful.

Perl6: Cannot invoke this object (REPR: P6opaque; Parallel::ForkManager)

I'm attempting to run a series of shell commands in parallel in Perl6, using Perl5's Parallel::ForkManager
This is an almost exact translation of working Perl5 code.
CONTROL {
when CX::Warn {
note $_;
exit 1;
}
}
use fatal;
role KeyRequired {
method AT-KEY (\key) {
die "Key {key} not found" unless self.EXISTS-KEY(key);
nextsame;
}
}
use Parallel::ForkManager:from<Perl5>;
sub run_parallel (#cmd) {
my $manager = Parallel::ForkManager(8).new();
for (#cmd) -> $command {
$manager.start and $manager.next;
my $proc = shell $command, :out, :err;
if $proc.exitcode != 0 {
put "$command failed";
put $proc.out.slurp;
put $proc.err.slurp;
die;
}
$manager.finish;
}
$manager.wait_all_children;#necessary after all lists
}
my #cmd;
my Str $dir = 'A/1';
for dir($dir, test => /\.vcf\.gz$/) -> $vcf {
#cmd.append: "aws s3 cp $vcf s3://s3dir/$dir/"
}
put #cmd.elems;
run_parallel(#cmd);
Basically, I'm trying to parallelize tedious shell commands.
However, this mysterious error comes up:
Cannot invoke this object (REPR: P6opaque; Parallel::ForkManager) in
sub run_parallel at 2.aws_cp.p6 line 18 in block at
2.aws_cp.p6 line 39
Why is Perl6 saying this? what is wrong? how can I get these commands to run?
Perhaps there is a more native/idiomatic way to run shell commands in parallel in Perl6?
You probably want to look at using Proc::Async which runs external commands asynchronously in threads without forking separate instances of the code to do it.
Perl5's Parallel::ForkManager probably won't work in Perl6 because of how Inline::Perl5 is implemented.
Inline::Perl5 embeds the Perl5 compiler/runtime inside of Perl6.
Parallel::ForkManager expects that Perl5 was run by itself.
If you ever did get it to do something other than generate an error it would probably screw up the Perl6 runtime. The main problem is the use of fork. For more information about why fork is a a problem see the article Bart Wiegmans (brrt) wrote about it: “A future for fork(2)”
Perl6 already has a similar feature that is easier to use.
sub run_parallel (#cmd) {
my #children = do for (#cmd) -> $command {
start {
my $proc = shell $command, :out, :err;
if $proc.exitcode != 0 {
put "$command failed";
put $proc.out.slurp;
put $proc.err.slurp;
die;
}
}
}
await #children;
}
start is a prefix that tells the runtime to start running the following code sometime in the near future. It returns a Promise.
await takes a list of Promises and returns a list of their results.
start basically calls Promise.start which is similar to:
sub start ( &code ) {
my $promise = Promise.new;
my $vow = $promise.vow;
$*SCHEDULER.cue(
{ $vow.keep(code(|c)) },
:catch(-> $ex { $vow.break($ex); }) );
$promise
}
So it will use the globally available thread pool in $*SCHEDULER. If you want to use a separate one you could.
sub run_parallel (#cmd) {
my $*SCHEDULER = ThreadPoolScheduler.new(max_threads => 8);
my #children = do for (#cmd) -> $command {
start {
my $proc = shell $command, :out, :err;
if $proc.exitcode != 0 {
put "$command failed";
put $proc.out.slurp;
put $proc.err.slurp;
die;
}
}
}
await #children;
}
It would make more sense to use Proc::Async for this though.

Perl6: large gzipped files read line by line

I'm trying to read a gz file line by line in Perl6, however, I'm getting blocked:
How to read gz file line by line in Perl6 however, this method, reading everything into :out uses far too much RAM to be usable except on very small files.
I don't understand how to use Perl6's Compress::Zlib to get everything line by line, although I opened an issue on their github https://github.com/retupmoca/P6-Compress-Zlib/issues/17
I'm trying Perl5's Compress::Zlib to translate this code, which works perfectly in Perl5:
use Compress::Zlib;
my $file = "data.txt.gz";
my $gz = gzopen($file, "rb") or die "Error reading $file: $gzerrno";
while ($gz->gzreadline($_) > 0) {
# Process the line read in $_
}
die "Error reading $file: $gzerrno" if $gzerrno != Z_STREAM_END ;
$gz->gzclose() ;
to something like this using Inline::Perl5 in Perl6:
use Compress::Zlib:from<Perl5>;
my $file = 'chrMT.1.vcf.gz';
my $gz = Compress::Zlib::new(gzopen($file, 'r');
while ($gz.gzreadline($_) > 0) {
print $_;
}
$gz.gzclose();
but I can't see how to translate this :(
I'm confused by Lib::Archive example https://github.com/frithnanth/perl6-Archive-Libarchive/blob/master/examples/readfile.p6 I don't see how I can get something like item 3 here
There should be something like
for $file.IO.lines(gz) -> $line { or something like that in Perl6, if it exists, I can't find it.
How can I read a large file line by line without reading everything into RAM in Perl6?
Update Now tested, which revealed an error, now fixed.
Solution #2
use Compress::Zlib;
my $file = "data.txt.gz" ;
my $handle = try open $file or die "Error reading $file: $!" ;
my $zwrap = zwrap($handle, :gzip) ;
for $zwrap.lines {
.print
}
CATCH { default { die "Error reading $file: $_" } }
$handle.close ;
I've tested this with a small gzipped text file.
I don't know much about gzip etc. but figured this out based on:
Knowing P6;
Reading Compress::Zlib's README and choosing the zwrap routine;
Looking at the module's source code, in particular the signature of the zwrap routine our sub zwrap ($thing, :$zlib, :$deflate, :$gzip);
And trial and error, mainly to guess that I needed to pass the :gzip adverb.
Please comment on whether my code works for you. I'm guessing the main thing is whether it's fast enough for the large files you have.
A failed attempt at solution #5
With solution #2 working I would have expected to be able to write just:
use Compress::Zlib ;
.print for "data.txt.gz".&zwrap(:gzip).lines ;
But that fails with:
No such method 'eof' for invocant of type 'IO::Path'
This is presumably because this module was written before the reorganization of the IO classes.
That led me to #MattOates' IO::Handle like object with .lines ? issue. I note no response and I saw no related repo at https://github.com/MattOates?tab=repositories.
I am focusing on the Inline::Perl5 solution that you tried.
For the call to $gz.gzreadline($_): it seems like gzreadline tries to return the line read from the zip file by modifying its input argument $_ (treated as an output argument, but it is not a true Perl 5 reference variable[1]), but the modified value is not returned to the Perl 6 script.
Here is a possoble workaround:
Create a wrapper module in the curent directory, e.g. ./MyZlibWrapper.pm:
package MyZlibWrapper;
use strict;
use warnings;
use Compress::Zlib ();
use Exporter qw(import);
our #EXPORT = qw(gzopen);
our $VERSION = 0.01;
sub gzopen {
my ( $fn, $mode ) = #_;
my $gz = Compress::Zlib::gzopen( $fn, $mode );
my $self = {gz => $gz};
return bless $self, __PACKAGE__;
}
sub gzreadline {
my ( $self ) = #_;
my $line = "";
my $res = $self->{gz}->gzreadline($line);
return [$res, $line];
}
sub gzclose {
my ( $self ) = #_;
$self->{gz}->gzclose();
}
1;
Then use Inline::Perl5 on this wrapper module instead of Compress::Zlib. For example ./p.p6:
use v6;
use lib:from<Perl5> '.';
use MyZlibWrapper:from<Perl5>;
my $file = 'data.txt.gz';
my $mode = 'rb';
my $gz = gzopen($file, $mode);
loop {
my ($res, $line) = $gz.gzreadline();
last if $res == 0;
print $line;
}
$gz.gzclose();
[1]
In Perl 5 you can modify an input argument that is not a reference, and the change will be reflected in the caller. This is done by modifying entries in the special #_ array variable. For example: sub quote { $_[0] = "'$_[0]'" } $str = "Hello"; quote($str) will quote $str even if $str is not passed by reference.

Perl6: check if STDIN has data

In my Perl 6 script, I want to do a (preferably non-blocking) check of standard input to see if data is available. If this is the case, then I want to process it, otherwise I want to do other stuff.
Example (consumer.p6):
#!/usr/bin/perl6
use v6.b;
use fatal;
sub MAIN() returns UInt:D {
while !$*IN.eof {
if some_fancy_check_for_STDIN() { #TODO: this needs to be done.
for $*IN.lines -> $line {
say "Process '$line'";
}
}
say "Do something Else.";
}
say "I'm done.";
return 0;
}
As a STDIN-Generator I wrote another Perl6 script (producer.p6):
#!/usr/bin/perl6
use v6.b;
use fatal;
sub MAIN() returns UInt:D {
$*OUT.say("aaaa aaa");
sleep-until now+2;
$*OUT.say("nbbasdf");
sleep-until now+2;
$*OUT.say("xxxxx");
sleep-until now+2;
return 0;
}
If consumer.p6 works as expected, it should produce the following output, if called via ./producer.p6 | ./consumer.p6:
Process 'aaaa aaa'
Do something Else.
Process 'nbbasdf'
Do something Else.
Process 'xxxxx'
Do something Else.
I'm done.
But actually, it produces the following output (if the if condition is commented out):
Process 'aaaa aaa'
Process 'nbbasdf'
Process 'xxxxx'
Do something Else.
I'm done.
You are using an old version of Perl 6, as v6.b is from before the official release of the language.
So some of what I have below may need a newer version to work.
Also why are you using sleep-until now+2 instead of sleep 2?
One way to do this is to turn the .lines into a Channel, then you can use .poll.
#!/usr/bin/env perl6
use v6.c;
sub MAIN () {
# convert it into a Channel so we can poll it
my $lines = $*IN.Supply.lines.Channel;
my $running = True;
$lines.closed.then: {$running = False}
while $running {
with $lines.poll() -> $line {
say "Process '$line'";
}
say "Do something Else.";
sleep ½;
}
say "I'm done.";
}
Note that the code above blocks at the my $lines = … line currently; so it doesn't start doing something until the first line comes in. To get around that you could do the following
my $lines = supply {
# unblock the $*IN.Supply.lines call
whenever start $*IN.Supply {
whenever .lines { .emit }
}
}.Channel;

Rancid/ Looking Glass perl script hitting an odd error: $router unavailable

I am attempting to set up a small test environment (homelab) using CentOS 6.6, Rancid 3.1, Looking Glass, and some Cisco Switches/Routers, with httpd acting as the handler. I have picked up a little perl by means of this endeavor, but python (more 2 than 3) is my background. Right now, everything on the rancid side of things works without issue: bin/clogin successfully logs into all of the equipment in the router.db file, and logging of the configs is working as expected. All switches/routers to be accessed are available and online, verified by ssh connection to devices as well as using bin/clogin.
Right now, I have placed the lg.cgi and lgform.cgi files into var/www/cgi-bin/ which allows the forms to be run as cgi scripts. I had to modify the files to split on ';' instead of ':' due to the change in the .db file in Rancid 3.1:#record = split('\:', $_); was replaced with: #record = split('\;', $_); etc. Once that change was made, I was able to load the lgform.cgi with the proper router.db parsing. At this point, it seemed like everything should be good to go. When I attempt to ping from one of those devices out to 8.8.8.8, the file correctly redirects to lg.cgi, and the page loads, but with
main is unavailable. Try again later.
as the error, where 'main' is the router hostname. Using this output, I was able to find the function responsible for this output. Here it is before I added anything:
sub DoRsh
{
my ($router, $mfg, $cmd, $arg) = #_;
my($ctime) = time();
my($val);
my($lckobj) = LockFile::Simple->make(-delay => $lock_int,
-max => $max_lock_wait, -hold => $max_lock_hold);
if ($pingcmd =~ /\d$/) {
`$pingcmd $router`;
} else {
`$pingcmd $router 56 1`;
}
if ($?) {
print "$router is unreachable. Try again later.\n";
return(-1);
}
if ($LG_SINGLE) {
if (! $lckobj->lock("$cache_dir/$router")) {
print "$router is busy. Try again later.\n";
return(-1);
}
}
$val = &DoCmd($router, $mfg, $cmd, $arg);
if ($LG_SINGLE) {
$lckobj->unlock("$cache_dir/$router");
}
return($val);
}
In order to dig in a little deeper, I peppered that function with several print statements. Here is the modified function, followed by the output from the loaded lg.cgi page:
sub DoRsh
{
my ($router, $mfg, $cmd, $arg) = #_;
my($ctime) = time();
my($val);
my($lckobj) = LockFile::Simple->make(-delay => $lock_int,
-max => $max_lock_wait, -hold => $max_lock_hold);
if ($pingcmd =~ /\d$/) {
`$pingcmd $router`;
} else {
`$pingcmd $router 56 1`;
}
print "About to test the ($?) branch.\n";
print "Also who is the remote_user?:' $remote_user'\n";
print "What about the ENV{REMOTE_USER} '$ENV{REMOTE_USER}'\n";
print "Here is the ENV{HOME}: '$ENV{HOME}'\n";
if ($?) {
print "$lckobj is the lock object.\n";
print "#_ something else to look at.\n";
print "$? whatever this is suppose to be....\n";
print "Some variables:\n";
print "$mfg is the mfg.\n";
print "$cmd was the command passed in with $arg as the argument.\n";
print "$pingcmd $router\n";
print "$cloginrc - Is the cloginrc pointing correctly?\n";
print "$LG_SINGLE the next value to be tested.\n";
print "$router is unreachable. Try again later.\n";
return(-1);
}
if ($LG_SINGLE) {
if (! $lckobj->lock("$cache_dir/$router")) {
print "$router is busy. Try again later.\n";
return(-1);
}
}
$val = &DoCmd($router, $mfg, $cmd, $arg);
if ($LG_SINGLE) {
$lckobj->unlock("$cache_dir/$router");
}
return($val);
}
OUTPUT:
About to test the (512) branch.
Also who is the remote_user?:' '
What about the ENV{REMOTE_USER} ''
Here is the ENV{HOME}: '.'
LockFile::Simple=HASH(0x1a13650) is the lock object.
main cisco ping 8.8.8.8 something else to look at.
512 whatever this is suppose to be....
Some variables:
cisco is the mfg.
ping was the command passed in with 8.8.8.8 as the argument.
/bin/ping -c 1 main
./.cloginrc - Is the cloginrc pointing correctly?
1 the next value to be tested.
main is unreachable. Try again later.
I can provide the code for when DoRsh is called, if necessary, but it looks mostly like this:&DoRsh($router, $mfg, $cmd, $arg);.
From what I can tell the '$?' special variable (or at least according to
this reference it is a special var) is returning the 512 value, which is causing that fork to test true. The problem is I don't know what that 512 means, nor where it is coming from. Using the ref site's description ("The status returned by the last pipe close, backtick (``) command, or system operator.") and the formation of the conditional tree above, I can see that it is some error of some kind, but I don't know how else to proceed with this inspection. I'm wondering if maybe it is in response to some permission issue, since the remote_user variable is null, when I didn't expect it to be. Any guidance anyone may be able to provide would be helpful. Furthermore, if there is any information that I may have skipped over, that I didn't think to include, or that may prove helpful, please ask, and I will provide to the best of my ability
May be you put in something like
my $pingret=$pingcmd ...;
print 'Ping result was:'.$pingret;
And check the returned strings?