Where is s3-dist-cp of EMR 6.2.0? - amazon-s3

I created an EMR Spark cluster with the following configuration:
Then I ssh into the master node, typed the command s3-dist-cp, then got the following error:
s3-dist-cp: command not found
I searched the whole disk but found nothing:
sudo find / -name "*s3-dist-cp*"
Where is the s3-dist-cp command? Thanks!

It turns out I must select "Hadoop", see the screenshot below:


gsutil doesn't work: "AttributeError: 'SymbolDatabase' object has no attribute 'RegisterServiceDescriptor'"

The gsutil command in my VM is failing with the following error:
packages/google/iam/v1/iam_policy_pb2.py", line 296, in
AttributeError: 'SymbolDatabase' object has no attribute 'RegisterServiceDescriptor'
When did this issue start to appear, was it after a configuration change to this VM? If not because of a configuration change, below steps should help:
Please ssh into the instance and run below command to see which Cloud SDK version and gsutil version you’re using: 'gcloud version'
As it appears to be a gsutil issue it might help to update your gsutil:
'sudo gcloud components update gsutil'
Enter ‘N’ at the ‘Do you want to run install instead (y/N)?’ prompt and you should be able to update gsutil. You might have to use ‘sudo apt-get install google-cloud-sdk’ which should give you the same results, if Cloud SDK component manager is not enabled.
Check to see if above steps help.

Hive script not running in crontab with hadoop must be in the path error

After setting Hadoop Home path and Prefix path in .bashrc and /etc/profile also im getting the same error - Cannot find hadoop installation: $HADOOP_HOME or $HADOOP_PREFIX must be set or hadoop must be in the path
If i run the script from crontab im facing this error from hive> prompt its working fine
plz help with the regarding how to solve this
Set $HADOOP_HOME in $HIVE_HOME/conf/hive-env.sh
try loading user bash profile in the script, as below,
. ~/.bash_profile
bash is included in user bash_profile and it will have user specific configurations as well.
see the similar question Hbase commands not working in script executed via crontab

Apache Hadoop 2.6: Pseudo Distribution Mode Setup

I am setting up Apche Hadoop 2.6 for the Psuedo Distributed Operation by following the instructions provided in the link:
I am facing an issue after I execute the command: $ bin/hdfs dfs -put etc/hadoop input
The error message is: put:'input': No such file or directory
How to resolve this?
Also, I have edited the hadoop-env.sh with the statement: export HADOOP_PREFIX=/usr/local/hadoop, but cannot understand that why shell prints out the warning: /usr/local/hadoop/etc/hadoop/hadop-env.sh: line 32: export:='/usr/local/hadoop': not a valid identifier
Thanks for the help.
I have fixed this problem.
I created the directory: $ bin/hdfs dfs -mkdir /user/root and the problem got solved, as I was logged in as the root in ubuntu. Earlier, I was giving wrong username, hence, facing the issue.

Hadoop + Hive - hcatalog won't startup

I just installed a single node Hadoop 2.2.0 cluster running on ubuntu.
I tried a couple of basic example calculations and it works fine.
I then tried to setup hive 0.12.0, that includes hcatalog.
I actually follow this tutorial.
And when I try to start hcatalog, I always get the following error :
bash $HIVE_HOME/hcatalog/sbin/hcat_server.sh start
dirname: missing operand
Try `dirname --help' for more information.
Started metastore server init, testing if initialized correctly...
/usr/local/hive/hcatalog/sbin/hcat_server.sh: line 91: /usr/local/hive-0.12.0/hcatalog/sbin/../var/log/hcat.out: No such file or directory
Metastore startup failed, see /usr/local/hive-0.12.0/hcatalog/sbin/../var/log/hcat.err
But there's no hcat.err file at all, I'm kind of blocked right now.
Any help would be much appreciated !
Thanks in advance,
I worked out that hcat was not executable in the hive installation I have downloaded.
S just sudo chmod A+X hcat and it works

Download using gsutil

I was using gsutil to download a trace file from google storage.
The command I used was:
gsutil/gsutil cp gs://clusterdata-2011-1/task_usage/part-00499-of-00500.csv.gz ./
But I got an error:
GSReponseError: Status=404, code=NoSuchKey, reason=Not Found.
However I used ls command in gsutil and the file existed.
Any suggestion is appreciated.
It works finally. The reason may be gsutil version or that the last time the server wasn't working.