Pig Action in Oozie - Unable to use pig Parameter file in workflow

Pig Action in Oozie - Unable to use pig Parameter file in workflow - apache-pig

Use the Pig's parameter file in Oozie.
Provided the parameter file with argument element. It ended in launch error. Below is the error
" APP[visit_c] JOB[0000079-160420073357222-oozie-oozi-W] ACTION[0000079-160420073357222-oozie-oozi-W#pcount] Launcher ERROR, reason: Main class [org.apache.oozie.action.hadoop.PigMain], exit code [2] "
Do we have an option to pass parameter's in a file to Oozie?. Thanks in advance for your help
..
..
<script>${pigScript}</script> <argument>-param_file</argument>
<argument>parameter_file</argument> ..
..

Issue resolved after referring [link] (https://cwiki.apache.org/confluence/display/OOZIE/Pig+Cookbook). Case 3 in this link had options for using parameter file. I just moved the parameter file into 'lib' directory under workflow application.

Related

Error serializing export interface, Unable to load transformation [null] : can't find directory

I used a Transformation Executor to call another transformation, Under browse I defined the variable as ${trans_path} however when I run it on the server it complains about the following error "Unable to load transformation [null] : can't find directory "

Before calling the trans executor have a write to log step output the value of that variable. It’ll allow you to check whether the variable is being passed correctly.
The log output should appear on you catalina.out file under tomcat’s lot folder.

Hive query file execution is failing through oozie

I have an hive query file which is having an UDF function. When I ran this query file using "hive -f myqfile.q", it is executing properly and the data is populated in my final table.
But when then same query file is running through the oozie work flow execution, it is failing with the below error message,
FAILED: SemanticException: [Error: 10014]: Line: 29:17 Wrong arguments '"start"': No method matching for class com.abc.xyz.hive.udf.GetRowKeyRange with (string, string, string, string). Possible choices: _FUNC_(string, string, string, string, string)
Intercepting System.exit(10014)
Failing Oozie Launcher, Main class [org.apache.oozie.action.hadoop.HiveMain], exit code [10014]
In the above error message "start" is the value for one of the parameter for my hive udf.
Jar path which is present the q file is proper (manual execution is working), and the hive udf name is created in side q file, and the class com.abc.xyz.hive.udf.GetRowKeyRange is having evaluate method with only 4 parameters.
I am not sure how this error is coming up, I tried to figure out, at the end I couldn't figure out the reason for this. Can some one help me on this.

Is the .jar containing this UDF also sitting on HDFS? Oozie might not be able to follow the jar path if it is local.

IS it possible to manage NO FILE error in Pig?

I'm trying to load simple file:
log = load 'file_1.gz' using TextLoader AS (line:chararray);
dump log
And I get an error:
2014-04-08 11:46:19,471 [main] ERROR org.apache.pig.tools.pigstats.SimplePigStats - ERROR 2997: Unable to recreate exception from backend error: org.apache.pig.backend.executionengine.ExecException: ERROR 2118: Input Pattern hdfs://hadoop1:8020/pko/file*gz matches 0 files
at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigInputFormat.getSplits(PigInputFormat.java:288)
at org.apache.hadoop.mapred.JobClient.writeNewSplits(JobClient.java:1054)
Is is possible to manage such situation before error appears?

Input Pattern hdfs://hadoop1:8020/pko/file*gz matches 0 files
The error is the input file doesn't exist in the given hdfs path.
log = load 'file_1.gz' using TextLoader AS (line:chararray);
as you haven’t mentioned the absolute path of file_1.gz , it will taken the home hdfs dir of the user with which you are running your pig-script

Unfortunately in the current version of Pig (0.15.0) it is impossible to manage these errors without using UDF's.
I suggest creating a Java or Python script using try and catch to take care of this.
Here's a good website that might be of some use to you: https://wiki.apache.org/pig/PigErrorHandlingInScripts
Good luck learning Pig!

I'm facing this issue as well. My load command is:
DATA = LOAD '${qurwf_folder_input}/data/*/' AS (...);
I want to load all files from the data subfolders, but the data folder is empty and I got the same error as you. What I did, in my particular case, was to create an empty folder in the data directory. So the LOAD returns an empty dataset and the script did not fail.
By the way, I'm using Oozie workflow to run the scripts, and in the prepare, I create the empty folders.

WebHCat & Pig - how to pass a parameter file to the job?

I am using HCatalog's WebHCat API to run Pig jobs, such as documented here:
https://cwiki.apache.org/confluence/display/Hive/WebHCat+Reference+Pig
I have no problem running a simple job but I would like to attach a parameters file to the job, such as one can do using pig command line's parameter: --param_file .
I assume this is possible through arg request's parameter, so I tried multiple things, such as passing:
'arg': '-param_file /path/to/param.file'
or:
'arg': {'param_file': '/path/to/param.file'}
None seems to work, and error stacks don't say much.
I would love to know if this is possible, and if so, how to correctly achieve this.
Many thanks

Correct usage:
'arg': ['-param_file', '/path/to/param.file']
Explanation:
By passing the value in arg,
'arg': {'-param_file': '/path/to/param.file'}
webhcat generates "-param_file" for the command prompt.
Pig throws the following error
ERROR org.apache.pig.Main - ERROR 2999: Unexpected internal error. Can not create a Path from a null string
Using a comma instead of the colon operator passes the path to file as a second argument.
webhcat will generate "-param_file" "/path/to/param.file"
P.S: I am using Requests library on python to make the REST calls

Rebol Is it possible to get the name of the script currently executing?

I'm executing multiple libraries from user.r.
I can get the path of the script from system/script/path but I can't see how I can get the name of the script. So am I obliged to hardcode the file name in header property like below (File):
REBOL [
Title: "Lib1"
File: "lib1.r"
]
script-path: ""
]
system/script/header/script-path: rejoin [system/script/path system/script/header/file]
probe system/script/header/script-path
input

system/options/script does only give the full script name and path of the first script passed by dos command line (not if it is executed in console) and not the path of subsequent scripts called by the very first one.
What I want is the full path of the subsequents scripts.
So it seems there's no solution!

Try help system/options and you will find the information you are lookimg for.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

Pig Action in Oozie - Unable to use pig Parameter file in workflow - apache-pig

Issue resolved after referring [link] (https://cwiki.apache.org/confluence/display/OOZIE/Pig+Cookbook). Case 3 in this link had options for using parameter file. I just moved the parameter file into 'lib' directory under workflow application.

Related

Error serializing export interface, Unable to load transformation [null] : can't find directory

Hive query file execution is failing through oozie

IS it possible to manage NO FILE error in Pig?

WebHCat & Pig - how to pass a parameter file to the job?

Rebol Is it possible to get the name of the script currently executing?

Categories

Resources