How to pass shell variable to pig param file - apache-pig

How we can pass shell variable to pig param file. As an example I have a shell variable defined as DB_NAME. i would like to define my pig parameter file as
p_db_nm=$DB_NAME
I tried like above which does not work and i did try like echo $DB_NAME does not work either.
I'm aware that i can pass this by using -param in command line but i have many variables which i would like to put it in param file but the values will be defined in shell script. I searched many topics in google and didn't have any luck!!!
My question is similar what was posted in http://grokbase.com/t/pig/user/09bdjeeftk/is-it-possible-to-use-an-env-variable-in-parameters-file but i see no workable solution is posted.
Can anyone help?

you can pass parameter file using –param_file option.
if Parameter File named "pig.cfg" defined like below,
p_db_nm=$DB_NAME
in the shell, pig command will be like this,
pig -param_file pig.cfg
and finally in your pig, you can use does variables named by KEY in the cfg file. (in this case, $p_db_nm)

Related

Newman pass variable from GitLab

I have pipeline on GitLab and there the variable - ENV_VAR. This variable is changing based on branch for pipeline.
In the same yml file I have script with newman, where I want to pass this variable like this -> newman run ... -e test/apis/$ENV_VAR_environment.json
But the issue I have right now is that it seems the variable is not being passed as i want. The pipeline shows error - cannot read the test/apis/here_should_be_the_variable_name.json
Is there a way to pass this variable into the file source?
It looks like you only need to enclose the variable name in braces:
-e test/apis/${ENV_VAR}_environment.json
because test/apis/$ENV_VAR_environment.json means that it looks for $ENV_VAR_environment variable which obviously does not exist.

pentaho PDI passing uservariable in command line

I am trying to run a Transformation/Job by passing a user variable in command line.
I have tried by passing variable value as below.
sh pan.sh -file='test.ktr' '-param:input_directory=/path/to/directory' -level=basic
where input_directory is variable in transformation and i mentioned it as ${input_directory}
But when I do this, the pan is unable to find the variable value. It is throwing error as below
Could not list the contents of "file:///home/user1/pdi8.1/data-integration8.1/${input_directory}" because it is not a folder.
can someone help me on this. Thank you
To pass named parameters to your job or transformation, the parameters need to be defined in the properties window, shown here for a transformation. The default value is not needed, but works well for testing. Pay attention to capitalization.
So the pieces of the puzzle are:
From the command line, pass the parameter like -param:yourparam=yourvalue
Define this same parameter in the highest-level job or transformation
Use it as you would use any variable, with ${yourparam}
i think the parameter names to be used in job should be ${PARAM_NAME1}
using command line i follow the below convention
call "{Replace with kitchen.bat File Path}" /file:"{Replace with JOB File Path}" "-param:PARAM_NAME1=PARAM_VALUE1" "-param:PARAM_NAME2=PARAM_VALUE2"

Powershell: Specify file path as variable

I am running the following SQL query through a powershell script and need to run the script multiple times against different files. So what I am trying to figure out is how to specify a file path as a variable when I run the script?
update [$Db_name].[dbo].[$BatchTable]
set [$Db_name].[dbo].[$BatchTable].Wave = 'Wave1.1'
from [$Db_name].[dbo].[$BatchTable]
inner join OPENROWSET(BULK 'FilePath\file.csv',
FORMATFILE= 'E:\import.xml') AS a
on ([$Db_name].[dbo].[$BatchTable].Name= a.Name) and
([$Db_name].[dbo].[$BatchTable].Domain = a.Domain)
The 'FilePath\file.csv' is the file path I need to define as a variable so that my code would instead look like this:
inner join OPENROWSET(BULK '$INPUTFILEPATH',
FORMATFILE= 'E:\import.xml') AS a
Any help or potentially better methods to accomplish this would help very much.
From the command like I want to be able to run the script like this:
CMD: updatescript.ps1 $INPUTFILEPATH = C:\Documents\myfile.csv
Again, I'm not sure this is the best way to go about this?
You're nearly there.
You will need to add a parameter block at the very start of your script e.g.
Param(
[Parameter(Mandatory=$true)]
[ValidateScript({Test-Path $_ -PathType 'leaf'})]
[string] $InputFilePath
)
This creates a mandatory (not optional) string parameter, called InputFilePath, and the ValidateScript is code used to validate the parameter, in this case checking the file exists using the Test-Path cmdlet and pathtype of leaf (if checking existence of a directory use 'container').
When running your script use the syntax below:
updatescript.ps1 -INPUTFILEPATH "C:\Documents\myfile.csv"
and in the script use the variable as the path exactly as in your question:
inner join OPENROWSET(BULK '$INPUTFILEPATH',
FORMATFILE= 'E:\import.xml') AS a
NOTE: in powershell when using parameters when running a script you only need to use the least amount of characters that uniquely identify that parameter from all the others in your param block - in this case -I works just as well as -InputFilePath.
You can pass command line parameters to the powershell script using param.
Example:
param(
[string]$INPUTFILEPATH
)
And then call the script as follows:
updatescript.ps1 -INPUTFILEPATH C:\Documents\myfile.csv
More details about cmd line parameters can be found here

Check if Windows batch variable starts with a specific string

How can I find out (with Windows a batch command), if, for example, a variable starts with ABC?
I know that I can search for variables if I know the whole content (if "%variable%"=="abc"), but I want that it only looks after the beginning.
I also need it to find out where the batch file is located, so if there is a other command that reveals the file's location, please let me know.
Use the variable substring syntax:
IF "%variable:~0,3%"=="ABC" [...]
If you need the path to the batch file without the batch file name, you can use the variable:
%~dp0
Syntax for this is explained in the help for the for command, although this variable syntax extends beyond just the for command syntax.
to find batch file location use %0 (gives full patch to current batch file) or %CD% variable which gives local directory

WebHCat & Pig - how to pass a parameter file to the job?

I am using HCatalog's WebHCat API to run Pig jobs, such as documented here:
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Pig
I have no problem running a simple job but I would like to attach a parameters file to the job, such as one can do using pig command line's parameter: --param_file .
I assume this is possible through arg request's parameter, so I tried multiple things, such as passing:
'arg': '-param_file /path/to/param.file'
or:
'arg': {'param_file': '/path/to/param.file'}
None seems to work, and error stacks don't say much.
I would love to know if this is possible, and if so, how to correctly achieve this.
Many thanks
Correct usage:
'arg': ['-param_file', '/path/to/param.file']
Explanation:
By passing the value in arg,
'arg': {'-param_file': '/path/to/param.file'}
webhcat generates "-param_file" for the command prompt.
Pig throws the following error
ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Can not create a Path from a null string
Using a comma instead of the colon operator passes the path to file as a second argument.
webhcat will generate "-param_file" "/path/to/param.file"
P.S: I am using Requests library on python to make the REST calls