How can I pass command-line parameters with whitespace to an apache pig script? - apache-pig

I want to write a pig script that takes a filter condition as a command line parameter. From the command line I want to type something like:
pig -p "MY_FILTER=field1 == 0 and field2 == 5" myscript.pig
In my script I have a line:
my_filtered_data = filter my_data by $MY_FILTER;
This works as expected when MY_FILTER has no spaces and I pass quotes around my value; So if I type MY_FILTER=\"field1==0\" at the command line the shell will pass the quotes with the value and pig does the expansion I want. However, the parameter will fail to expand if I supply it like MY_FILTER=\"field1 == 0\"
I've tried a bunch of different quoting techniques and even tried running the command directly from python's subprocess module to ensure my shell wasn't doing something weird.

Which version of Pig do you use? I use 0.9.2 and the following command works for me:
pig -p "F='field1 == 3 AND field2 == 5'" test.pig
But it doesn't work with 0.8.1.

Related

How to use "%" character in sql query on linux shell?

I am trying to pull all the jdk packages installed on set of hosts by sending a sql select statement to osquery on linux shell via pssh .
Here is the query:
pssh -h myhosts -i 'echo "SELECT name FROM rpm_packages where name like '%jdk%';"| osqueryi --json'
but usage of "%" is giving me below error.
Error: near line 1: near "%": syntax error
I tried to escape % ,but the error remains same. Any ideas how to overcome this error?
You aren't getting this error from your shell but from the query parser, and it's not actually caused by the % character, but to the ' that immediately precedes it. Look at where you have quotes:
'echo "SELECT name FROM rpm_packages where name like '%jdk%';"| osqueryi --json'
^----------------------------------------------------^ ^-------------------^
These quotes are consumed by the shell when it parses the argument. Single quotes tell the shell to ignore any otherwise-special characters inside and treat what is within the quotes as part of the argument -- but not the quotes themselves.
After shell parsing finishes, the actual, verbatim argument that gets sent to pssh looks like this:
echo "SELECT name FROM rpm_packages where name like %jdk%;"| osqueryi --json
Note that all of the single quotes have been erased. The result is that your query tool sees the % (presumably modulus) operator in a place that it doesn't expect -- right after another operator (like) which makes about as much sense to the parser as name like * jdk. The parser doesn't understand what it means to have two consecutive binary operators, so it complains about the second one: %.
In order to get a literal ' there, you need to jump through this hoop:
'\''
^^^^- start quoting again
|||
|\+-- literal '
|
\---- stop quoting
So, to fix this, replace all ' instances inside the string with '\'':
pssh -h myhosts -i 'echo "SELECT name FROM rpm_packages where name like '\''%jdk%'\'';"| osqueryi --json'
osqueryi accepts a single statement on the command line. Eliminating the echo can make quoting a bit simpler:
osqueryi --json "SELECT * FROM users where username like '%jdk%'"
You will, however, need the quotes to pass through your pssh command line.
While osqueryi is great for short simple things, if you're building a frequent polling service, osqueryd with scheduled queries is generally simpler.

Running command in perl6, commands that work in shell produce failure when run inside perl6

I'm trying to run a series of shell commands with Perl6 to the variable $cmd, which look like
databricks jobs run-now --job-id 35 --notebook-params '{"directory": "s3://bucket", "output": "s3://bucket/extension", "sampleID_to_canonical_id_map": "s3://somefile.csv"}'
Splitting the command by everything after notebook-params
my $cmd0 = 'databricks jobs run-now --job-id 35 --notebook-params ';
my $args = "'{\"directory\": \"$in-dir\", \"output\": \"$out-dir\",
\"sampleID_to_canonical_id_map\": \"$map\"}'"; my $run = run $cmd0,
$args, :err, :out;
Fails. No answer given either by Databricks or the shell. Stdout and stderr are empty.
Splitting the entire command by white space
my #cmd = $cmd.split(/\s+/);
my $run = run $cmd, :err, :out
Error: Got unexpected extra arguments ("s3://bucket", "output":
"s3://bucket/extension",
"sampleID_to_canonical_id_map":
"s3://somefile.csv"}'
Submitting the command as a string
my $cmd = "$cmd0\"$in-dir\", \"output\": \"$out-dir\", \"sampleID_to_canonical_id_map\": \"$map\"}'";
again, stdout and stderr are empty. Exit code 1.
this is something about how run can only accept arrays, and not strings (I'm curious why)
If I copy and paste the command that was given to Perl6's run, it works when given from the shell. It doesn't work when given through perl6. This isn't good, because I have to execute this command hundreds of times.
Perhaps Perl6's shell https://docs.perl6.org/routine/shell would be better? I didn't use that, because the manual suggests that run is safer. I want to capture both stdout and stderr inside a Proc class
EDIT: I've gotten this running with shell but have encountered other problems not related to what I originally posted. I'm not sure if this qualifies as being answered then. I just decided to use backticks with perl5. Yes, backticks are deprecated, but they get the job done.
I'm trying to run a series of shell commands
To run shell commands, call the shell routine. It passes the positional argument you provide it, coerced to a single string, to the shell of the system you're running the P6 program on.
For running commands without involving a shell, call the run routine. The first positional argument is coerced to a string and passed to the operating system as the filename of the program you want run. The remaining arguments are concatenated together with a space in between each argument to form a single string that is passed as a command line to the program being run.
my $cmd0 = 'databricks jobs run-now --job-id 35 --notebook-params ';
That's wrong for both shell and run:
shell only accepts one argument and $cmd0 is incomplete.
The first argument for run is a string interpreted by the OS as the filename of a program to be run and $cmd0 isn't a filename.
So in both cases you'll get either no result or nonsense results.
Your other two experiments are also invalid in their own ways as you discovered.
this is something about how run can only accept arrays, and not strings (I'm curious why)
run can accept a single argument. It would be passed to the OS as the name of the program to be run.
It can accept two arguments. The first would be the program name, the second the command line passed to the program.
It can accept three or more arguments. The first would be the program name, the rest would be concatenated to form the command line passed to the program. (There are cases where this is more convenient coding wise than the two argument form.)
run can also accept a single array. The first element would the program name and the rest the command line passed to it. (There are cases where this is more convenient.)
I just decided to use backticks with perl5. Yes, backticks are deprecated, but they get the job done.
Backticks are subject to code injection and shell interpolation attacks and errors. But yes, if they work, they work.
P6 has direct equivalents of most P5 features. This includes backticks. P6 has two variants:
The safer P6 alternative to backticks is qx. The qx quoting construct calls the shell but does not interpolate P6 variables so it has the same sort of level of danger as using shell with a single quoted string.
The qqx variant is the direct equivalent of P5 backticks or using shell with a double quoted string so it suffers from the same security dangers.
Two mistakes:
the simplistic split cuts up the last, single parameter into multiple arguments
you are passing $cmd to run, not #cmd
use strict;
my #cmd = ('/tmp/dummy.sh', '--param1', 'param2 with spaces');
my $run = run #cmd, :err, :out;
print(#cmd ~ "\n");
print("EXIT_CODE:\t" ~ $run.exitcode ~ "\n");
print("STDOUT:\t" ~ $run.out.slurp ~ "\n");
print("STDERR:\t" ~ $run.err.slurp ~ "\n");
output:
$ cat /tmp/dummy.sh
#!/bin/bash
echo "prog: '$0'"
echo "arg1: '$1'"
echo "arg2: '$2'"
exit 0
$ perl6 dummy.pl
/tmp/dummy.sh --param1 param2 with spaces
EXIT_CODE: 0
STDOUT: prog: '/tmp/dummy.sh'
arg1: '--param1'
arg2: 'param2 with spaces'
STDERR:
If you can avoid generating $cmd as single string, I would generate it into #cmd directly. Otherwise you'll have to implement complex split operation that handles quoting.

How to store a result to a variable in HP OpenVMS DCL?

I want to save the output of a program to a variable.
I use the following approach ,but fail.
$ PIPE RUN TEST | DEFINE/JOB VALUE #SYS$PIPE
$ x = f$logical("VALUE")
I got an error:%DCL-W-MAXPARM, too many parameters - reenter command with fewer parameters
\WORLD\
reference :
How to assign the output of a program to a variable in a DCL com script on VMS?
The usual way to do this is to write the output to a file and read from the file and put that into a DCL symbol (or logical). Although not obvious, you can do this with the PIPE command was well:
$ pipe r 2words
hello world
$ pipe r 2words |(read sys$pipe line ; line=""""+line+"""" ; def/job value &line )
$ sh log value
"VALUE" = "hello world" (LNM$JOB_85AB4440)
$
IF you are able to change the program, add some code to it to write the required values into symbols or logicals (see LIB$ routines)
If you can modify the program, using LIB$SET_SYMBOL in the program defines a DCL symbol (what you are calling a variable) for DCL. That's the cleanest way to do this. If it really needs to be a logical, then there are system calls that define logicals.

Pig Latin Parameter Regex Match

I have a filter in my pig script that should basically either take a value from the command line or if no command is provided should just assume no filtering and proceed.
Example, the line in the script is as :
b = FILTER a by STATE matches '$VALUEMATCH';
In the command line, I may provide:
pig -param VALUEMATCH='VA' SCRIPT.pig
If i don't provide this in the command line, I basically want the script to continue using all values of STATE.
So basically want the %default VALUEMATCH to be all.So, what should the correct default statement be?
%default VALUEMATCH = '*'
does not work.
Any ideas?
Remove the = and use .*
%default VALUEMATCH '.*'

Execute SQL from file in bash

I'm trying to load a sql from a file in bash and execute the loaded sql. The sql file needs to be versatile, meaning it cannot be altered in order to make things easy while being run in bash (escaping special characters like * )
So I have run into some problems:
If I read my sample.sql
SELECT * FROM SAMPLETABLE
to a variable with
ab=`cat sample.sql`
and execute it
db2 `echo $ab`
I receive an sql error because by doing a cat the * has been replaced by all the files in the directory of sample.sql.
Easy solution would be to replace "" with "\" . But I cannot do this, because the file needs to stay executable in programs like DB Visualizer etc.
Could someone give me hint in the right direction?
The DB2 command line processor has options that accept a filename as input, so you shouldn't need to load statements from a text file into a shell variable.
This command will execute all SQL statements in the file, with newline treated as the statement terminator:
db2 -f sample.sql
This command will execute all SQL statements in the file, with semicolon treated as the statement terminator:
db2 -t -f sample.sql
Other useful CLP flags are:
-x : Suppress the column headings
-v : Echo the statement text immediately before execution
-z : Tee a copy of all CLP output to the filename immediately following this flag
Redirect stdin from the file.
db2 < sample.sql
In case, you have a variable used in your script and wanted to get it replaced by the shell before executed in DB2 then use this approach:
Contents of File.sql:
cat <<xEOF
insert values(1,2) into ${MY_SCHEMA}.${MY_TABLE};
select * from ${MY_SCHEMA}.${MY_TABLE};
xEOF
In command prompt do:
export MY_SCHEMA='STAR'
export MY_TAVLE='DIMENSION'
Then you are all good to get it executed in DB2:
eval File.sq |db2 +p -t
The shell will replace the global variables and then DB2 will execute it.
Hope it helps.