Pig basic program error - apache-pig

I am getting the below error while running a pig script.![
]1

Please read Pig manual carefully
https://pig.apache.org/docs/r0.9.1/start.html
and observe that -x expects execution mode to be specified (either local or mapreduce). So the correct command would be
pig -x local wordcount.pig

Related

Snowflake put command throws HTTPError('400 Client Error: Bad Request for url - )

I am receiving below error while using put command and uploading file on snowflake.
It is Error 400
It works for other files but fails for only one file with below error.
Error link - https://sfc-uk-ds1-customer-stage.s3.amazonaws.com/drko-s-ukss0127/tables/2323577421858/FACT_RFM_SEGMENTATION.csv?partNumber=83&uploadId=PosKnuUecwKdJdFOQopotAdcwdk2IJ2wtgwsrvbDD_mSas7L.vD.7Bz8bXX1M_QAZKyVWiaxdf5I6ime9FWSwQHI0BpV17WGVRgfpMSd5_hhm92jNGI3a2JrRiTvsblz
Snowflake Put command -
user_details#XSMALL_WHSE#DEV_DB.FACT>put file://D:\snowflake\lab_db_csv_files\file_name.csv #"FACT".%"table_name" auto_compress=False;
I tried with both Auto_compress False and True. but it is not working in any case.
I remember one of the versions had an issue with the PUT command.
Check your current version by running:
snowsql --version
Then try to use different versions:
snowsql --version 1.2.18
Above command will launch SnowSQL using version 1.2.18, then test again.

How to keep the snakemake shell file while running in cluster

While running my snakemake file in cluster I keep getting an error,
snakemake -j 20 --cluster "qsub -o out.txt -e err.txt -q debug" -s
seadragon/scripts/viral_hisat.snake --config json="<input file>"
output="<output file>"
Now this gives me the follwing error,
Error in job run_salmon while creating output file
/gpfs/home/user/seadragon/output/quant_v2_4/test.
ClusterJobException in line 58 of seadragon/scripts/viral_hisat.snake
:
Error executing rule run_salmon on cluster (jobid: 1, external: 156618.sn-mgmt.cm.cluster, jobscript: /gpfs/home/user/.snakemake/tmp.j9nb0hyo/snakejob.run_salmon.1.sh). For detailed error see the cluster log.
Will exit after finishing currently running jobs.
Exiting because a job execution failed. Look above for error message
Now I don't find any way to track the error, since my cluster does not give me an way to store the log files, on the other hand /gpfs/home/user/.snakemake/tmp.j9nb0hyo/snakejob.run_salmon.1.sh file is deleted immediately after finishing.
Please let me know if there is an way to keep this shell file even if the snakemake fails.
I am not a qsub user anymore, but if I remember correctly, stdout and stderr are stored in the working directory, under the jobid that Snakemake gives you under external in the error message.
You need to redirect the standard output and standard error output to a file yourself instead of relying on the cluster or snakemake to do this for you.
Instead of the following
my_script.sh
Run the following
my_script.sh > output_file.txt 2> error_file.txt

ORA-12545: Connect failed because target host or object does not exist while connecting through the shell

I am trying to run the sql scripts from shell. My scripts are working fine. It is getting connected to database and applying the sql files. Only thing I am not able to understand is why the below error message is getting logged every time.
Error Message :
ERROR:
ORA-12545: Connect failed because target host or object does not exist
Shell Script:
/opt/ORACLE/app/oracle/product/11.2.0/client_1/bin/sqlplus -s <<eoj >>$LOG_FIL 2>&1
${DBUSER1}/${DBPASS}#${hostBillingDBSID}
#${SQLParm} $RPT_FIL
eoj
try the below.
Shell Script:
#let's include oracle installation in the PATH variable
export PATH=$PATH:/opt/ORACLE/app/oracle/product/11.2.0/client_1/bin
#now just use sqlplus, instead of full path reference.
sqlplus -s ${DBUSER1}/${DBPASS}#${hostBillingDBSID} <<eoj >>$LOG_FIL 2>&1
#${SQLParm} $RPT_FIL
eoj
The user/password(connection string) has to be passed as command line arguments to sqlplus.

SGE Command Not Found, Undefined Variable

I'm attempting to setup a new compute cluster, and currently experiencing errors when using the qsub command in the SGE. Here's a simple experiment that shows the problem:
test.sh
#!/usr/bin/zsh
test="hello"
echo "${test}"
test.sh.eXX
test=hello: Command not found.
test: Undefined variable.
test.sh.oXX
Warning: no access to tty (Bad file descriptor).
Thus no job control in this shell.
If I ran the script on the head node (sh test.sh), the output is correct. I submit the job to the SGE by typing "qsub test.sh".
If I submit the exact same script job in the same way on an established compute cluster like HPC, it works perfectly as expected. What setting could be causing this problem?
Thanks for any help on this matter.
Most likely the queues on your cluster are set to posix_compliant mode with a default shell of /bin/csh. The posix_compliant setting means your #! line is ignored. You can either change the queues to unix_behavior or specify the required shell using qsub's -S option.
#$ -S /bin/sh

Setting additional jars property in Pig while running in local mode

When Pig is running in distributed (HDFS) mode, you can pass additional jars to it from command-line using the following syntax, so that you don't have to explicitly using the REGISTER call
pig -Dpig.additional.jars=jar1.jar:jar2.jar -f pigfile.pig
But when I do the same thing while running in local mode, it fails
pig -x local -Dpig.additional.jars=jar1.jar:jar2.jar -f pigfile.pig
Does anyone know how to register additional jars while running Pig in local mode?
Properties should be passed before any Pig-specific options:
pig -Dpig.additional.jars=jar1.jar:jar2.jar -x local -f pigfile.pig