Cannot use apache flink in amazon emr - hadoop-yarn

I can not a start a yarn session of Apache Flink in Amazons EMR. The error message I get is
$ tar xvfj flink-0.9.0-bin-hadoop26.tgz
$ cd flink-0.9.0
$ ./bin/yarn-session.sh -n 4 -jm 1024 -tm 4096
...
Diagnostics: File file:/home/hadoop/.flink/application_1439466798234_0008/flink-conf.yaml does not exist
java.io.FileNotFoundException: File file:/home/hadoop/.flink/application_1439466798234_0008/flink-conf.yaml does not exist
...
I am using Flink verision 0.9 and Amazons Hadoop version 4.0.0. Any ideas or hints?
The full log can be found here: https://gist.github.com/headmyshoulder/48279f06c1850c62c28c

From the log:
The file system scheme is 'file'. This indicates that the specified Hadoop configuration path is wrong and the sytem is using the default Hadoop configuration values.The Flink YARN client needs to store its files in a distributed file system
Flink failed to read the Hadoop configuration files. They are either picked up from the environment variables, e.g. HADOOP_HOME, or you can set the configuration dir in the flink-conf.yaml before you execute your YARN command.
Flink needs to read the Hadoop configuration to know how to upload the Flink jar to the cluster file system such that the newly created YARN cluster can access it. If Flink fails to resolve the Hadoop configuration, it uses the local file system for uploading the jar. That means that the jar will be put on the machine you launch your cluster from. Thus, it won't be accessible from the Flink YARN cluster.
Please see the Flink configuration page for more information.
edit: On Amazong EMR, export HADOOP_CONF_DIR=/etc/hadoop/conf let's Flink discover the Hadoop configuration directory.

if i were you i would try with this:
./bin/yarn-session.sh -n 1 -jm 768 -tm 768

Related

How to submit a SPARK job of which the jar is hosted in S3 object store

I have a SPARK cluster with Yarn, and I want to put my job's jar into a S3 100% compatible Object Store. If I want to submit the job, I search from google and seems that just simply as this way:
spark-submit --master yarn --deploy-mode cluster <...other parameters...> s3://my_ bucket/jar_file
However the S3 Object Store required user name and password to access. So how I can config those credential information to let SPARRK download the jar from S3?
Many thanks!
You can use Default Credential Provider Chain from AWS docs:
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
./bin/spark-submit \
--master local[2] \
--class org.apache.spark.examples.SparkPi \
s3a://your_bucket/.../spark-examples_2.11-2.4.6-SNAPSHOT.jar
I needed to download the following jars from Maven and put it to Spark jar dir in order to allow to use s3a schema in spark-submit (note, you can use --packages directive to reference these dependencies from inside your jar, but not from spark-submit itself):
// build Spark `assembly` project
sbt "project assembly" package
cd assembly/target/scala-2.11/jars/
wget https://repo1.maven.org/maven2/com/amazonaws/aws-java-sdk/1.7.4/aws-java-sdk-1.7.4.jar
wget https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.7/hadoop-aws-2.7.7.jar

after modified the config files in /etc/presto/conf, how to restart presto-server

In aws emr, after modified the config file in /etc/presto/conf, how can we restart presto-server? Just on master node or on all nodes?
On EMR you can restart Presto with
sudo stop presto
sudo start presto
You should do this on every node where you modified the config file. You should also update the config file on every node, as appropriate.

How to fix Exception while running locally spark-sql program on windows10 by enabling HiveSupport?

I am working on SPARK-SQL 2.3.1 and
I am trying to enable the hiveSupport for while creating a session as below
.enableHiveSupport()
.config("spark.sql.warehouse.dir", "c://tmp//hive")
I ran below command
C:\Software\hadoop\hadoop-2.7.1\bin>winutils.exe chmod 777 C:\tmp\hive
While running my program getting:
Caused by: java.lang.RuntimeException: java.lang.RuntimeException: The root scratch dir: /tmp/hive on HDFS should be writable. Current permissions are: rw-rw-rw-
at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:522)
How to fix this issue and run my local windows machine?
Try to use this command:
hadoop fs -chmod -R 777 /tmp/hive/
This is Spark Exception, not Windows. You need to set correct permissions for the HDFS folder, not only for your local directory.

What is the default location for Redis AOF file for Ubuntu?

Background
Yesterday our machine crashed unexpectedly and our AOF file for Redis got corrupted.
Upon trying to start the service with sudo systemctl start redis-server we are greeted with the following logs:
Bad file format reading the append only file: make a backup of your
AOF file, then use ./redis-check-aof --fix
Research
Aparently this looks like a simple error to fix, just execute ./redis-check-aof --fix <filename>.
Except I don't have the smallest idea of where that file is.
I have searched the Github discussions for this issue, but unfortunately none provides me with the location for the file:
https://github.com/antirez/redis/issues/4931
The persistence documentation also doesn't make a mention of the location for this file:
https://redis.io/topics/persistence
Specs
These are the specs of the system where I am running Redis:
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 16.04.3 LTS
Release: 16.04
Codename: xenial
Question
Where is located this file?
You have two choices:
Find the configure file for Redis, normally, it's named redis.conf. The dir and appendfilename configuration specify the directory and file name of the AOF file.
Connect to Redis with redis-cli, and use the CONFIG GET command to get the dir configuration, i.e. CONFIG GET dir. The AOF file should located under this directory.
The path is typically /var/lib/redis/appendonly.aof you will need to run sudo redis-check-aof --fix /var/lib/redis/appendonly.aof
in case if you use docker and append volume to /data then the path to appendonly.aof will be: /data/appendonly.aof
In my case, I was using docker. I started the redis server without using --appendonly yes, then it started without any issues. And then ran CONFIG GET dir like #for-stack said and got this output:
1) "dir"
2) "/data"
So I checked under the /data path and found the file appendonly.aof
Then I ran /usr/local/bin/redis-check-aof --fix /data/appendonly.aof to fix the issue.
I ran /path/redis-check-aof --fix /data/appendonly.aof to fix this.
Thanks all.

How to use Apache hive with fully distributed cluster

I am using hadoop 1.2.1 having 3 data nodes and one namenode. My hbase version is 0.94.14. I have configured apache hive 1.0 on name node machine.
I have to import hbase table data to hive. When I run a query, it gives following error in log file
ERROR org.apache.hadoop.hbase.mapreduce.TableInputFormatBase - Cannot resolve the host name for /192.168.3.9 because of javax.naming.NameNotFoundException: DNS name not found [response code 3]; remaining name '9.3.168.192.in-addr.arpa'
What is the problem in my setup. I have followed this tutorial for hadoop installation.
In hadoop namenode log file following warning appears when I run query in hive
WARN org.apache.hadoop.hdfs.server.namenode.FSEditLog: Cannot roll edit log, edits.new files already exists in all healthy directories:
Is there any information needed for hive about how many datanode hadoop have?
Also my Hmaster is running on some other machine and I have configured hive at namnode machine/
Your hadoop, zookeeper, hbase and hive should be in running condition.
1) COPY THESE FILES TO THE HADOOP LIBRARY.
sudo cp /usr/lib/hive/lib/hive-common-0.7.0-cdh3u0.jar /usr/lib/hadoop/lib/
sudo cp /usr/lib/hive/lib/hbase-0.90.1-cdh3u0.jar /usr/lib/hadoop/lib/
2)STOP HBASE AND HADOOP USING FOLLOWING COMMONDS
/usr/lib/hadoop/bin/stop-all.sh
/usr/lib/hbase/bin/stop-hbase.sh
3) RESTART HBASE AND HADOOP USING COMMONDS
/usr/lib/hadoop/bin/start-all.sh
/usr/lib/hadoop/bin/start-hbase.sh