students = load '/home/vm4learning/Desktop/students-db.txt' using PigStorage('|') as (rnum, sname, name, age, gender, class, subject, marks);
I am facing syntax error while using parameter substitution for /home/vm4learning/Desktop/students-db.txt.
So what is the correct command with proper syntax to use here.
Thanks
You need to specify the HDFS path to your Pig LOAD script
First you need to copy your input file in HDFS then you can specify the hdfs path in your pig script
You can use hadoop put command to copy your input file into HDFS using:
hadoop fs -put /home/vm4learning/Desktop/students-db.txt /user/input
then you can use that path in your pig script
students = load '/user/input/students-db.txt' using PigStorage('|') as (.....);
UPDATE:
save your pig scripts in a file with extention .pig file.
process.pig:
students = load '$inputPath' using PigStorage('|') as (.....);
Now from terminal you can issue the following command to execute your pig file by passing your input path as argument:
pig -p inputPath=/user/input/students-db.txt process.pig
For more details you can check here
use pig -x filename dryrun -param key=value -param key2=value2
Related
I have a "solution.pig" file which contain all load, join and dump queries. I need to run them by typing "solution.pig" in grunt> and save all the result in other file. How can I do that?
You can run the file directly with pig -f solution.pig. Don't open the grunt REPL
And in the file, you can use as many STORE commands as you want to save results into files, rather than DUMP
I have command to move files between s3 folders. I am getting bucket name from context variable.
placed the command in array line
"aws s3 mv s3://"+context.bucket+"/Egress/test1/abc/xyz.dat s3://"+context.bucket+"/Egress/test1/abc/archive/archive_xyz.dat"
The command fetches the bucket name from context variable, but shows no file or direcrtory error=2.
I think it is due to (") at begin and end.
Is there any way to solve.
please help
You probably want to use an array command.
Using /bin/bash or cmd
then your command.
While trying to store output to CSV file in Pig, the command runs successfully but a new folder is getting created in the destination location instead of the file name.
Can you please help me?
This is the command i used
STORE A into '/home/cloudera/Downloads/res.csv';
The STORE command writes the output to hdfs and based on the number of reducers, it will write the final result to files equal to the total number of reducers used.If you want to get the results to a single csv file,you have to merge it, write to local system and then copy it back to the lcoation of your choice.
You can have the hadoop commands in your Pig script.
fs -getmerge /home/cloudera/Downloads/* /your/local/dir/res.csv
fs -copyFromLocal /your/local/dir/res.csv /home/cloudera/Downloads
Or
fs -cat /home/cloudera/Downloads/* | fs -put - /home/cloudera/Downloads/res.csv
I have a pig script where I want to pass the contents of a file as a parameter. For example the file could contain something like this
asdfadfafd""""""
adfadfaf'' '''adsfa
adsfadfadfafdafadf
I want to pass these contents as a single parameter to the pig script. How can I do it?
Ok. You can send a single parameter or multiple parameters to a Pig script if you invoke that Pig script from a shell script.
Simply you can initialize a value for a variable inside shell script and then you can pass that value to pig script
lets say your shell script name is demo.sh
input_data = "hello";
Pig -P input_data=$input_data /user/cloudera/pigscripts/demo.pig
Lets assume the below is your pig code. You can access the parameter in your pig code as below
A = LOAD 'input.txt' USING PigStorage(',') AS(id:int,name:chararray);
B = FILTER A by name == '$input_data';
Similarly you can pass a file that contains some properties to your pigscript
Pig -param_file '/user/cloudera/propertyfiles/file1.txt' demo.pig
By doing so, we can get all variables that is inside file1.txt inside Pigscript using $
There are multiple ways. one is
pig_script.pig:-
R1 = LOAD '$INPUTFILES1' USING PigStorage(',') AS (show_name:chararray, no_of_viewer:int);
pig_param.param:-
INPUTFILES1 = hdfs://192.168.0.107/path-to/input-file
execution command:-
pig -param_file hdfs://192.168.0.107/path-to/pig_param.param hdfs://192.168.0.107/path-to/pig_script.pig
I have a list of file names stored in a filenames.txt. Is it possible to load them all together using a single LOAD command?
They are not in the same directory, nor with similar format, so it is not like using /201308 to load 20130801.gz through 20130831.gz.
Plus there are too many files in the list, preventing me to do like this:
shell: pig -f script.pig -param input=/user/training/test/{20100810..20100812}
pig: temp = LOAD '$input' USING SomeLoader() AS (...);
Thanks in advance for insights!
If the number of files are reasonably small (e.g: in the command line you fit into ARG_MAX) you may try to concat the lines in the file into one string:
pig -param input=`cat filenames.txt | tr "\n" ","` -f script.pig
script.pig:
A = LOAD '$input' ....
Probably it would be better to list the directories rather than the individual files if it is an option for you.