Alternate version of grammar not working as I'd prefer - grammar

This code parses $string as I'd like:
#! /usr/bin/env raku
my $string = q:to/END/;
aaa bbb # this has trailing spaces which I want to keep
kjkjsdf
kjkdsf
END
grammar Markdown {
token TOP { ^ ([ <blank> | <text> ])+ $ }
token blank { [ \h* <.newline> ] }
token text { <indent> <content> }
token indent { \h* }
token newline { \n }
token content { \N*? <trailing>* <.newline> }
token trailing { \h+ }
}
my $match = Markdown.parse($string);
$match.say;
OUTPUT
「aaa bbb
kjkjsdf
kjkdsf
」
0 => 「aaa bbb
」
text => 「aaa bbb
」
indent => 「」
content => 「aaa bbb
」
trailing => 「 」
0 => 「
」
blank => 「
」
0 => 「 kjkjsdf
」
text => 「 kjkjsdf
」
indent => 「 」
content => 「kjkjsdf
」
0 => 「kjkdsf
」
text => 「kjkdsf
」
indent => 「」
content => 「kjkdsf
」
Now, the only problem I'm having is that I'd like the <trailing> level to be in the same level of the hierarchy as <indent> and <content> captures.
So I tried this grammar:
grammar Markdown {
token TOP { ^ ([ <blank> | <text> ])+ $ }
token blank { [ \h* <.newline> ] }
token text { <indent> <content> <trailing>* <.newline> }
token indent { \h* }
token newline { \n }
token content { \N*? }
token trailing { \h+ }
}
However, it breaks the parsing. So I tried this:
token TOP { ^ ([ <blank> | <text> ])+ $ }
token blank { [ \h* <.newline> ] }
token text { <indent> <content>*? <trailing>* <.newline> }
token indent { \h* }
token newline { \n }
token content { \N }
token trailing { \h+ }
And got:
0 => 「aaa bbb
」
text => 「aaa bbb
」
indent => 「」
content => 「a」
content => 「a」
content => 「a」
content => 「 」
content => 「b」
content => 「b」
content => 「b」
trailing => 「 」
0 => 「
」
blank => 「
」
0 => 「 kjkjsdf
」
text => 「 kjkjsdf
」
indent => 「 」
content => 「k」
content => 「j」
content => 「k」
content => 「j」
content => 「s」
content => 「d」
content => 「f」
0 => 「kjkdsf
」
text => 「kjkdsf
」
indent => 「」
content => 「k」
content => 「j」
content => 「k」
content => 「d」
content => 「s」
content => 「f」
This is pretty close to what I want but it has the undesirable effect of breaking <content> up into individual letters, which is not ideal. I could fix this pretty easily after the fact by massaging the $match object but would like to try to improve my skills with grammars.

quick and dirty
my $string = q:to/END/;
aaa bbb
kjkjsdf
kjkdsf
END
grammar Markdown {
token TOP { ^ ([ <blank> | <text> ])+ $ }
token blank { [ \h* <.newline> ] }
token text { <indent>? $<content>=\N*? <trailing>? <.newline> }
token indent { \h+ }
token newline { \n }
token trailing { \h+ }
}
my $match = Markdown.parse($string);
$match.say;
lookahead assertions
my $string = q:to/END/;
aaa bbb
kjkjsdf
kjkdsf
END
grammar Markdown {
token TOP { ^ ([ <blank> | <text> ])+ $ }
token blank { [ \h* <.newline> ] }
token text { <indent>? <content> <trailing>? <.newline> }
token indent { \h+ }
token newline { \n }
token content { [<!before <trailing>> \N]+ }
token trailing { \h+ $$ }
}
my $match = Markdown.parse($string);
$match.say;
a little refactoring
my $string = q:to/END/;
aaa bbb
kjkjsdf
kjkdsf
END
grammar Markdown {
token TOP { ( <blank> | <text> )+ %% \n }
token blank { ^^ \h* $$ }
token text { <indent>? <content> <trailing>? }
token indent { ^^ \h+ }
token content { [<!before <trailing>> \N]+ }
token trailing { \h+ $$ }
}
my $match = Markdown.parse($string);
$match.say;

I was able to accomplish what I want with a negative lookahead assertion:
token TOP { ^ ([ <blank> | <text> ])+ $ }
token blank { [ \h* <.newline> ] }
token text { <indent>? <content> <trailing>? <.newline> }
token indent { \h+ }
token newline { \n }
token content { <.non_trailing> }
token non_trailing { ( . <!before \w \h* \n>)+ \S* }
token trailing { \h+ }
The <.non_trailing> suppresses the individual characters from appearing in the match object and the . <!before \w \h* \n>)+ \S* bit will match any character not followed by white space and a new line and the \S* bit gets the character left over from the negative lookahead.
OUTPUT
「aaa bbb
kjkjsdf
kjkdsf
」
0 => 「aaa bbb
」
text => 「aaa bbb
」
content => 「aaa bbb」
trailing => 「 」
0 => 「
」
blank => 「
」
0 => 「 kjkjsdf
」
text => 「 kjkjsdf
」
indent => 「 」
content => 「kjkjsdf」
0 => 「kjkdsf
」
text => 「kjkdsf
」
content => 「kjkdsf」

Related

Macro match arm pattern "no rules expected the token `if`"

So i have this macro thats used to match Box<dyn error::Error> against multiple error types
#[macro_export]
macro_rules! dynmatch {
($e:expr, $(type $ty:ty {$(arm $pat:pat => $result:expr),*, _ => $any:expr}),*, _ => $end:expr) => (
$(
if let Some(e) = $e.downcast_ref::<$ty>() {
match e {
$(
$pat => {$result}
)*
_ => $any
}
} else
)*
{$end}
);
}
It was working fine until i tried adding match gaurds. when i try using "if" statements in the pattern it gives me the error no rules expected the token 'if'
let _i = match example(2) {
Ok(i) => i,
Err(e) => {
dynmatch!(e,
type ExampleError1 {
arm ExampleError1::ThisError(2) => panic!("it was 2!"),
_ => panic!("{}",e)
},
type ExampleError2 {
arm ExampleError2::ThatError(8) => panic!("it was 8!"),
arm ExampleError2::ThatError(9..=11) => 10,
_ => panic!("{}",e)
},
type std::io::Error {
arm i if i.kind() == std::io::ErrorKind::NotFound => panic!("not found"), //ERROR no rules expected the token `if`
_ => panic!("{}", e)
},
_ => panic!("{}",e)
)
}
};
Is there any way to use match guards in my pattern matching without getting token errors?
and of course, even though i spent like an hour looking for a solution, right after i post this question i find an answer.
the correct macro looks like:
#[macro_export]
macro_rules! dynmatch {
($e:expr, $(type $ty:ty {$(arm $( $pattern:pat )|+ $( if $guard: expr )? => $result:expr),*, _ => $any:expr}),*, _ => $end:expr) => (
$(
if let Some(e) = $e.downcast_ref::<$ty>() {
match e {
$(
$( $pattern )|+ $( if $guard )? => {$result}
)*
_ => $any
}
} else
)*
{$end}
);
}
credit to the rust matches! source lines 244-251

Error when I try to create index in elastic search from logstash

Hi Im getting the following error when I try to create index in ElasticSearch from logstash:
[Converge PipelineAction::Create] agent - Failed to execute action
{:action=>LogStash::PipelineAction::Create/pipeline_id:main,
:exception=>"LogStash::ConfigurationError", :message=>"Expected one of #, input, filter, output at
line 1, column 1 (byte 1)"
Can you tell me if I got something wrong in my .conf file
iput {
file {
path => "/opt/sis-host/process/uptime_test*"
# start_position => "beginning"
ignore_older => 0
}
}*emphasized text*
filter {
grok {
match => { "message" => "%{DATA:hora} %{DATA:fecha} %{DATA:status} %{DATA:server} %
{INT:segundos}" }
}
date {
match => ["horayfecha", "HH:mm:ss MM/dd/YYYY" ]
target => "#timestamp"
}
}
output {
elasticsearch {
hosts => ["host:9200"]
index => "uptime_test-%{+YYYY.MM.dd}"
}
stdout { codec => rubydebug }
}
The configuration file should start with input and not "iput"
input { # not iput
file {
path => "/opt/sis-host/process/uptime_test*"
# start_position => "beginning"
ignore_older => 0
}
}
filter {
grok {
match => { "message" => "%{DATA:hora} %{DATA:fecha} %{DATA:status} %{DATA:server} %
{INT:segundos}" }
}
date {
match => ["horayfecha", "HH:mm:ss MM/dd/YYYY" ]
target => "#timestamp"
}
}
output {
elasticsearch {
hosts => ["host:9200"]
index => "uptime_test-%{+YYYY.MM.dd}"
}
stdout { codec => rubydebug }
}

What would be best approach to work with two hashes of arrays in this scenario?

What would be the best approach to process these two hashes of arrays? The 1st data set contains xml data and 2nd is from csv file, the idea is to check if the filename from 2nd dataset is in the first one and if so, calculate the delay in file delivery. Im not sure how to best produce the workable hash that i can work with (or change existing ones to have filenames as their keys or maybe somehow merge these together), any feedback would be greatly appreciated
dataset 1 (xml data):
$VAR1 = [
{
'StartTimestamp' => 1478146371,
'EndTimestamp' => 1478149167,
'FileName' => 'a3_file_20161024.req',
'Stage' => 'SentUserResponse'
},
{
'StartTimestamp' => 1478146375,
'EndTimestamp' => 1478149907,
'FileName' => 'a2_file_20161024.req',
'Stage' => 'SentUserResponse'
},
{
'StartTimestamp' => 1478161030,
'EndTimestamp' => 1478161234,
'FileName' => 'file_DEX_0.req',
'Stage' => 'SentUserResponse'
},
Data Set 2 from csv file:
$VAR1 = [
{
'FileName' => 'a3_file_20161024.req',
'ExpectedTime' => '20:04:07'
},
{
'FileName' => 'a2_file_20161024.req',
'ExpectedTime' => '20:14:39'
},
{
'FileName' => 'file_DEX_0.req',
'ExpectedTime' => '20:48:40'
},
code used:
sub Demo {
my $api_ref = GetData($apicall);
my $csvdata = ReadDataFile();
print Dumper($api_ref);
print "-------------------------*********--------------************------------------\n";
print Dumper ($csvdata);
print "#####################\n";
}
sub ReadDataFile {
my $parser = Text::CSV::Simple->new;
$parser->field_map(qw/FileName ExpectedTime/);
my #csv_data = $parser->read_file($datafile);
return \#csv_data;
}
sub GetData {
my ($xml) = #_;
my #api_data;
my %request;
my $t = XML::Twig->new(
twig_handlers => {
'//UserRequest' => sub {
push #api_data, {%request} if %request;
%request = ();
$_->purge; # free memory
},
'//UserRequest/HomeFileName' => sub {
$request{FileName} = $_->trimmed_text;
},
'//UserRequest/Stage' => sub {
$request{Stage} = $_->trimmed_text;
},
'//UserRequest/StartTimestamp' => sub {
$request{StartTimestamp} = str2time(substr($_->trimmed_text, -8));
},
'//UserRequest/EndTimestamp' => sub {
$request{EndTimestamp} = str2time(substr($_->trimmed_text, -8));
},
},
);
$t->xparse($xml);
$t->purge;
return \#api_data;
}
I am assuming, that you can map the elements of the first array to the elements of the second array by comparing by the filename and that relation is an 1:1 relation, I would perform the following steps:
Sort the lists by filename or generate a index hash
Combine both sets into a single array of hashs or use the index to process through your data set
Do whatever you need to do with the data sets
Just a litte example:
#!/usr/bin/env perl
use strict;
use warnings;
my $api_ref = [
{
'StartTimestamp' => 1478146371,
'EndTimestamp' => 1478149167,
'FileName' => 'a3_file_20161024.req',
'Stage' => 'SentUserResponse'
},
{
'StartTimestamp' => 1478146375,
'EndTimestamp' => 1478149907,
'FileName' => 'a2_file_20161024.req',
'Stage' => 'SentUserResponse'
},
{
'StartTimestamp' => 1478161030,
'EndTimestamp' => 1478161234,
'FileName' => 'file_DEX_0.req',
'Stage' => 'SentUserResponse'
}
];
my $csvdata = [
{
'FileName' => 'a3_file_20161024.req',
'ExpectedTime' => '20:04:07'
},
{
'FileName' => 'a2_file_20161024.req',
'ExpectedTime' => '20:14:39'
},
{
'FileName' => 'file_DEX_0.req',
'ExpectedTime' => '20:48:40'
}
];
# generate the index
my %index = ();
for ( my $i = 0 ; $i < #{$api_ref} ; $i++ ) {
$index{ $api_ref->[$i]{FileName} }{api_idx} = $i;
}
for ( my $i = 0 ; $i < #{$csvdata} ; $i++ ) {
$index{ $csvdata->[$i]{FileName} }{csv_idx} = $i;
}
# filter for elements not present in both data sets
my #filename_intersection =
grep { exists $index{$_}{api_idx} && exists $index{$_}{csv_idx} }
( keys %index );
foreach my $filename (#filename_intersection) {
# do something with
my $api_entry = $api_ref->[ $index{$filename}{api_idx} ];
my $csv_entry = $csvdata->[ $index{$filename}{csv_idx} ];
# example convert ExpectedTime into seconds and compare it to Start/End time difference
$csv_entry->{ExpectedTime} =~ /^(\d{2}):(\d{2}):(\d{2})$/;
my $exp_sec = ( $1 * 60 + $2 ) * 60 + $3;
my $real_sec = $api_entry->{EndTimestamp} - $api_entry->{StartTimestamp};
my $msg = "";
if ( $exp_sec >= $real_sec ) {
$msg = "in time:";
}
else {
$msg = "late:";
}
printf
"Filename %s was %s; expected time: %d seconds, real time: %d seconds\n",
$filename, $msg, $exp_sec, $real_sec;
}
Best,
Frank

Variables in logstash config not substituted

None of the variables in prefix are substituted - why?
It was working with on old version of logstash (1.5.4) but doesn't anymore with 2.3.
Part of the output filter in logstash.cfg (dumps to s3):
output {
if [bucket] == "bucket1" {
s3 {
bucket => "bucket1"
access_key_id => "****"
secret_access_key => "****"
region => "ap-southeast-2"
prefix => "%{env}/%{year}/%{month}/%{day}/"
size_file => 50000000 #50mb
time_file => 1
codec => json_lines # save log as json line (no newlines)
temporary_directory => "/var/log/temp-logstash"
tags => ["bucket1"]
}
}
..
}
Example dataset (taken from stdout):
{
"random_person" => "Kenneth Cumming 2016-04-14 00:53:59.777647",
"#timestamp" => "2016-04-14T00:53:59.917Z",
"host" => "192.168.99.1",
"year" => "2016",
"month" => "04",
"day" => "14",
"env" => "dev",
"bucket" => "bucket1"
}
Just in case, here is the filter:
filter {
mutate {
add_field => {
"request_uri" => "%{[headers][request_path]}"
}
}
grok {
break_on_match => false # default behaviour is to stop matching after first match, we don't want that
match => { "#timestamp" => "%{NOTSPACE:date}T%{NOTSPACE:time}Z"} # break timestamp field into date and time
match => { "date" => "%{INT:year}-%{INT:month}-%{INT:day}"} # break date into year month and day fields
match => { "request_uri" => "/%{WORD:env}/%{NOTSPACE:bucket}"} # break request uri into environment and bucket fields
}
mutate {
remove_field => ["request_uri", "headers", "#version", "date", "time"]
}
}
It's a known issue that field variables aren't allowed in 'prefix'.

Parse a json array in PHP to get the required values

I am using a API provided by this website
http://pnrapi.alagu.net/
By using this API, we can get PNR status of our indian railways.
I am using CURL to make a call and get the page content which is something like this, in an array format:
Array ( [url] => http://pnrapi.alagu.net/api/v1.0/pnr/4563869832 [content_type] => application/json;charset=utf-8 [http_code] => 200 [header_size] => 185 [request_size] => 130 [filetime] => -1 [ssl_verify_result] => 0 [redirect_count] => 0 [total_time] => 2.906 [namelookup_time] => 0 [connect_time] => 0.312 [pretransfer_time] => 0.312 [size_upload] => 0 [size_download] => 548 [speed_download] => 188 [speed_upload] => 0 [download_content_length] => 548 [upload_content_length] => 0 [starttransfer_time] => 2.906 [redirect_time] => 0 [certinfo] => Array ( ) [primary_ip] => 50.57.204.234 [primary_port] => 80 [local_ip] => 192.168.1.10 [local_port] => 60105 [redirect_url] => [errno] => 0 [errmsg] => [content] => {"status":"OK","data":{"train_number":"16178","chart_prepared":false,"pnr_number":"4563869832","train_name":"ROCKFORT EXPRES","travel_date":{"timestamp":1369506600,"date":"26-5-2013"},"from":{"code":"TPJ","name":"TIRUCHIRAPPALLI JUNCTION","time":"22:20"},"to":{"code":"MS","name":"CHENNAI EGMORE","time":"05:15"},"alight":{"code":"MS","name":"CHENNAI EGMORE","time":"05:15"},"board":{"code":"TPJ","name":"TIRUCHIRAPPALLI JUNCTION","time":"22:20","timestamp":1369587000},"class":"2A","passenger":[{"seat_number":"W/L 39,RLGN","status":"W/L 27"}]}} )
but when I go to the URL http://pnrapi.alagu.net/api/v1.0/pnr/4563869832 , it gives me output as shown below:
{"status":"OK","data":{"train_number":"16178","chart_prepared":false,"pnr_number":"4563869832","train_name":"ROCKFORT EXPRES","travel_date":{"timestamp":1369506600,"date":"26-5-2013"},"from":{"code":"TPJ","name":"TIRUCHIRAPPALLI JUNCTION","time":"22:20"},"to":{"code":"MS","name":"CHENNAI EGMORE","time":"05:15"},"alight":{"code":"MS","name":"CHENNAI EGMORE","time":"05:15"},"board":{"code":"TPJ","name":"TIRUCHIRAPPALLI JUNCTION","time":"22:20","timestamp":1369587000},"class":"2A","passenger":[{"seat_number":"W/L 39,RLGN","status":"W/L 27"}]}}
Now, it seems that output on my web page with curl have got some extra text which is in the start as you can see both the outputs above.
Well, my question is, how can I get the values from above array.
I am talking about the array output which I'm getting on my page using CURL, which looks like this:
Array (
[url] => http://pnrapi.alagu.net/api/v1.0/pnr/4563869832
[content_type] => application/json;charset=utf-8
[http_code] => 200
[header_size] => 185
[request_size] => 130
[filetime] => -1
[ssl_verify_result] => 0
[redirect_count] => 0
[total_time] => 2.906
[namelookup_time] => 0
[connect_time] => 0.312
[pretransfer_time] => 0.312
[size_upload] => 0
[size_download] => 548
[speed_download] => 188
[speed_upload] => 0
[download_content_length] => 548
[upload_content_length] => 0
[starttransfer_time] => 2.906
[redirect_time] => 0
[certinfo] => Array ( )
[primary_ip] => 50.57.204.234
[primary_port] => 80
[local_ip] => 192.168.1.10
[local_port] => 60105
[redirect_url] =>
[errno] => 0
[errmsg] => [content] => {"status":"OK","data":{"train_number":"16178","chart_prepared":false,"pnr_number":"4563869832","train_name":"ROCKFORT EXPRES","travel_date":{"timestamp":1369506600,"date":"26-5-2013"},"from":{"code":"TPJ","name":"TIRUCHIRAPPALLI JUNCTION","time":"22:20"},"to":{"code":"MS","name":"CHENNAI EGMORE","time":"05:15"},"alight":{"code":"MS","name":"CHENNAI EGMORE","time":"05:15"},"board":{"code":"TPJ","name":"TIRUCHIRAPPALLI JUNCTION","time":"22:20","timestamp":1369587000},"class":"2A","passenger":[{"seat_number":"W/L 39,RLGN","status":"W/L 27"}]}} )
Code in my PHP page is:
<?php
function get_web_page( $url )
{
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}
$pnr = get_web_page('http://pnrapi.alagu.net/api/v1.0/pnr/4563869832');
echo "<code>";
print_r($pnr);
echo "</code>";
?>
I only need the values under "content" which are train number, train name, travel date etc.
So, what would be best way to extract this information into each variable?
Like I want it like this:
$train_no = [some code];
$train_name = [some_code];
and so on...
Thanks in advance.
I tried this:
echo $pnr['content'];
and the output I got is:
{"status":"OK",
"data":"train_number":"16178",
"chart_prepared":false,
"pnr_number":"4563869832",
"train_name":"ROCKFORT EXPRES",
"travel_date":{"timestamp":1369506600,"date":"26-5-2013"},
"from":{"code":"TPJ","name":"TIRUCHIRAPPALLI JUNCTION","time":"22:20"},
"to":{"code":"MS","name":"CHENNAI EGMORE","time":"05:15"},
"alight":{"code":"MS","name":"CHENNAI EGMORE","time":"05:15"},
"board":{"code":"TPJ","name":"TIRUCHIRAPPALLI JUNCTION","time":"22:20","timestamp":1369587000},
"class":"2A","passenger":[{"seat_number":"W/L 39,RLGN","status":"W/L 27"}]}}
Now can any one give me an idea about how can I fetch unique values from above array?
I'm not sure where the JSON string is. But let's say it's the $pnr variable.
$json = json_decode($pnr, true);
$train_no = $json["data"]["train_number"];
$train_name = $json["data"]["train_name"];
Updated:
If you don't need all the other things you can do something like the following:
$npr = file_get_contents(url);
and then run the code above.
You're looking through the header, where you should be looking at the content. Return $content instead in your function and then you can parse out the response:
function get_web_page( $url ) {
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = json_decode( curl_exec( $ch ) );
curl_close( $ch );
return array(
'train_no' => $content->data->train_number,
'train_name' => $content->data->train_name,
);
}
$pnr = get_web_page('http://pnrapi.alagu.net/api/v1.0/pnr/4563869832');
echo "<pre>" . print_r($pnr, true) . "</pre>";