All of my .html and .php webpages should be minified. That means I would like to get all HTML-comments and more than one whitespace-character stripped.
Does an Apache-module exist for minification?
(Or perhaps another method to automatically add a script directly before the output gets sent to the user?)
(I could add a function like the following, but if an Apache-module or another automatic solution existed, I could not forget to do so.)
<?
function sanitize_output($buffer)
{
$search = array(
'/\>[^\S ]+/s', //strip whitespaces after tags, except space
'/[^\S ]+\</s', //strip whitespaces before tags, except space
'/(\s)+/s' // shorten multiple whitespace sequences
);
$replace = array(
'>',
'<',
'\\1'
);
$buffer = preg_replace($search, $replace, $buffer);
return $buffer;
}
?>
Try mod_pagespeed which may be of some use to you.
Related
Suppose I have a grammar with the following tokens
token paragraph {
(
|| <header>
|| <regular>
)
\n
}
token header { ^^ '---' '+'**1..5 ' ' \N+ }
token regular { \N+ }
The problem is that a line starting with ---++Foo will be parsed as a regular paragraph because there is no space before "Foo". I'd like to fail the parse in this case, i.e. somehow "commit" to this branch of the alternation, e.g. after seeing --- I want to either parse the header successfully or fail the match completely.
How can I do this? The only way I see is to use a negative lookahead assertion before <regular> to check that it does not start with ---, but this looks rather ugly and impractical, considering that my actual grammar has many more than just these 2 branches. Is there some better way? Thanks in advance!
If I understood your question correctly, you could do something like this:
token header {
^^ '---' [
|| '+'**1..5 ' ' \N+
|| { die "match failed near position $/.pos()" }
]
}
So I'm studying Scraping website using Selenium:Chrome on Perl, I just wondering how can I download all pdf files from year 2017 to 2021 and store it into a folder from this website https://www.fda.gov/drugs/warning-letters-and-notice-violation-letters-pharmaceutical-companies/untitled-letters-2021 . So far this is what I've done
use strict;
use warnings;
use Time::Piece;
use POSIX qw(strftime);
use Selenium::Chrome;
use File::Slurp;
use File::Copy qw(copy);
use File::Path;
use File::Path qw(make_path remove_tree);
use LWP::Simple;
my $collection_name = "mre_zen_test3";
make_path("$collection_name");
#DECLARE SELENIUM DRIVER
my $driver = Selenium::Chrome->new;
#NAVIGATE TO SITE
print "trying to get toc_url\n";
$driver->navigate('https://www.fda.gov/drugs/warning-letters-and-notice-violation-letters-pharmaceutical-companies/untitled-letters-2021');
sleep(8);
#GET PAGE SOURCE
my $toc_content = $driver->get_page_source();
$toc_content =~ s/[^\x00-\x7f]//g;
write_file("toc.html", $toc_content);
print "writing toc.html\n";
sleep(5);
$toc_content = read_file("toc.html");
This script only download the entire content of the website. Hope someone here can help me and teach me. Thank you very much.
Here is some working code, to help you get going hopefully
use warnings;
use strict;
use feature 'say';
use Path::Tiny; # only convenience
use Selenium::Chrome;
my $base_url = q(https://www.fda.gov/drugs/)
. q(warning-letters-and-notice-violation-letters-pharmaceutical-companies/);
my $show = 1; # to see navigation. set to false for headless operation
# A little demo of how to set some browser options
my %chrome_capab = do {
my #cfg = ($show)
? ('window-position=960,10', 'window-size=950,1180')
: 'headless';
'extra_capabilities' => { 'goog:chromeOptions' => { args => [ #cfg ] } }
};
my $drv = Selenium::Chrome->new( %chrome_capab );
my #years = 2017..2021;
foreach my $year (#years) {
my $url = $base_url . "untitled-letters-$year";
$drv->get($url);
say "\nPage title: ", $drv->get_title;
sleep 1 if $show;
my $elem = $drv->find_element(
q{//li[contains(text(), 'PDF')]/a[contains(text(), 'Untitled Letter')]}
);
sleep 1 if $show;
# Downloading the file is surprisingly not simple with Selenium (see text)
# But as we found the link we can get its url and then use Selenium-provided
# user-agent (it's LWP::UserAgent)
my $href = $elem->get_attribute('href');
say "pdf's url: $href";
my $response = $drv->ua->get($href);
die $response->status_line if not $response->is_success;
say "Downloading 'Content-Type': ", $response->header('Content-Type');
my $filename = "download_$year.pdf";
say "Save as $filename";
path($filename)->spew( $response->decoded_content );
}
This takes shortcuts, switches approaches, and sidesteps some issues (which one need resolve for a fuller utility of this useful tool). It downloads one pdf from each page; to download all we need to change the XPath expression used to locate them
my #hrefs =
map { $_->get_attribute('href') }
$drv->find_elements(
# There's no ends-with(...) in XPath 1.0 (nor matches() with regex)
q{//li[contains(text(), '(PDF)')]}
. q{/a[starts-with(#href, '/media/') and contains(#href, '/download')]}
);
Now loop over the links, forming filenames more carefully, and download each like in the program above. I can fill the gaps further if there's need for that.
The code puts the pdf files on disk, in its working directory. Please review that before running this so to make sure that nothing gets overwritten!
See Selenium::Remove::Driver for starters.
Note: there is no need for Selenium for this particular task; it's all straight-up HTTP requests, no JavaScript. So LWP::UserAgent or Mojo would do it just fine. But I take it that you want to learn how to use Selenium, since it often is needed and is useful.
I used the below code to search documents (which has a particular keyword in content) from Apache Solr
my $solrgetapi = "http://$address:$port/solr/OppsBot/select?q=content:";
my $solrgeturl = $solrgetapi.'"'.$keyword.'"';
my $browser = LWP::UserAgent->new;
my $req = HTTP::Request->new( GET => $solrgeturl );
$req->authorization_basic( "$username", "$pass" );
my $page = $browser->request( $req );
print $page->decoded_content;
The result I get is as follows:
{
"responseHeader":{
"status":0,
"QTime":2,
"params":{
"q":"content:\"ABC\""}},
"response":{"numFound":0,"start":0,"docs":[]
}}
I want to extract the numFound value to a variable.
I came across some solutions in SolrJ like these
queryResponse.getResults().getNumFound();
But I couldn't find in Perl.
I tried with these below codes also. But I couldn't get these to work. Please help.
$numFound = $page->decoded_content->{response}->{numFound};
print $page->{numFound}
You neglected to transform the JSON text into a data structure.
use JSON::MaybeXS qw(decode_json);
say decode_json($page->decoded_content)->{response}{numFound}
# 0
I want to make my yii app multilanguage. To do this I want to use gettext (because its much simpler than yii messages).
To this I used this yii extension, I configured the PO files, I made the translations, etc.
The big problem: nothing happened. Nothing was translated.
I can advice you this awesome multilanguage extension!
http://www.yiiframework.com/extension/tstranslation/
To use gettext without any extension follow those steps. In config/main.php set yout target language like this:
'language' = 'ru',
Set messages component to use CGettextMessageSource:
'messages' => array(
'class' => 'CGettextMessageSource',
),
Create messages.po file in protected/messages/ru folder (note: folder name is same as language code). If poedit is used messages.po file must have appropriate headers. Example:
msgid ""
msgstr ""
"Project-Id-Version: FOO BAR 1.0\n"
"Report-Msgid-Bugs-To: \n"
"POT-Creation-Date: 2011-11-11 11:11+0300\n"
"PO-Revision-Date: \n"
"Language: ru\n"
"MIME-Version: 1.0\n"
"Content-Type: text/plain; charset=UTF-8\n"
"Content-Transfer-Encoding: 8bit\n"
"Plural-Forms: nplurals=3; plural=n%10==1 && n%100!=11 ? 0 : n%10>=2 && n%10<=4 && (n%100<10 || n%100>=20) ? 1 : 2;\n"
"X-Poedit-SourceCharset: utf-8\n"
"X-Poedit-Basepath: .\n"
"X-Poedit-KeywordsList: _ngettext:1,2;t:1c,2\n"
"X-Poedit-SearchPath-0: ../..\n"
Note t:1c,2. This means, that first parameter from function Yii::app() will be used as context (see msgctx) and second as actual string to translate. Without this your i18n will not work!
Now just open messages.po in poedit → Update → do translations → Save.
And messages.mo file will be created and used by Yii.
For you language plurals string see gettext help.
I need to fill in a single PDF template multiple times and concat the results. When I say multiple, I mean up to a few hundred times, potentially over one thousand.
I can do this with pdftk fill_form, one by one, and then use pdftk cat. We can parallelize this fairly easily.
I'm curious if this is the only option, or if there is a piece of software (Linux + OSX, command line) that will allow me to say "take this template, and these sets of fields, fill out this form, and concat the files" so I can avoid doing every one individually. Then again, if something does exist, but it's not any faster than just doing the fork parallelization method, then it's probably not worth it.
My Perl library CAM::PDF can do this. The form filling is a bit weak (it doesn't support checkboxes, for example) but the concatenation works great.
#perl -w
use strict;
use CAM::PDF;
my $infile = 'in.pdf';
my $outfile = 'out.pdf';
my #fills = (
{ name => 'John' },
{ name => 'Fred' },
);
my $pdf = CAM::PDF->new($infile) or die $CAM::PDF::errstr;
for my $i (0 .. #fills-1) {
my $filledPDF = $i == 0 ? $pdf : CAM::PDF->new($infile);
$filledPDF->fillFormFields(%{$fills[$i]});
if ($i > 0) {
$pdf->appendPDF($filledPDF);
}
}
$pdf->cleanoutput($outfile) or die;