Create an external hive metastore in a ec2 machine behind a load balancer - hive-metastore

I would like to create a external hive metastore service connected to mysql. How can I create a external hive metastore service on a EC2 machine providing api which can be connected to my EMR or Hdinsight or databricks cluster.
Lot of articles on using metastore service but cannot see any article on setting up hive metastore service on a ec2 machine

Related

Can AWS Glue catalog point to a data location in Azure ADLS?

We are trying configure AWS Databricks Runtime to use the AWS Glue Data Catalog as its metastore. In this environment ,Azure ADLS is one of the source system.In that case,Can AWS Glue catalog point to a data location in Azure ADLS?
AWS glue catalog can speak JDBC, so if you can configure Azure ADLS to speak JDBC, which it seems like you can, you should be able to do this.
glue catalog documentation

Configure Hive Metastore for presto and query data from s3 and apache kudu

I am pretty new to Presto and hive. In one of our application we want to use presto to query data from apache kudu and aws s3. As per my knowledge presto has its own catalog(meta) service, but we want to configure hive metastore(without hadoop and hive) so that in future other application(e.g spark) can use hive metastore to query data from Kudu and s3. I have been using latest version of presto and kudu.
Could someone help me to configure this system?
Thanks and regards

what is the use of hive server and metastore server?

I am new to hive, and some question confusing me very much.
first, after installation of hive, I just run hive, then I can create, select tables. where is the hive server, what is the use of it.
second, what is the use of metastore server, I know we need the metastore to access the metadata about hive tables, does that mean if I start a metastore server I can request it in other app and get the information?
Metastore server talks to the backend such as Derby/MySql to store and retrieve table metadata. If any Hive component wants to get/set metadata, it calls the MetaStore APIs. APIs are such getTable(tableName), createDatabase(dbName) etc. Basically metastore abstracts and provides backend (derby/mysql/postgres) independent API layer. Similar to HiveServer this can also run as a server. If there is no metastore server running, then the Driver will load the metastore in its process. If metastore is running as a separate server then the Driver object communicates with the metastore over network.

Can I setup hive metastore w/o hadoop on aws and use RDS as db

want to have central hive meta store to consume from databrick, spectrum etc ..
Is it possible to setup w/o installing hadoop
Yes, Hive metastore installation does not require Hadoop.
Querying data from the Hive metastore requires a Hive client (within Spark) and a Hadoop compatible filesystem (such as S3)
AWS Glue Data Catalog is the recommended system nowadays, not RDS

Can not connect to Spark Thrift Server using JDBC, keeps using Hive

I am using Azure HDInsight and want to connect to Thrift Server using JDBC in similar way as described here: Thrift JDBC/ODBC Server.
However it always connects to Hive and not Spark Thrift Server. While they both look similar and I can query data, I want to exploit Spark execution engine as I am using mainly Spark2 and sometimes need JDBC connection. Spark engine is also probably faster than Hive/TEZ.  
Connection string looks like this:
jdbc:hive2://hdinsight-name.azurehdinsight.net:443/default;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/hive2
Drivers tried:
1. maven:/org.spark-project.hive:hive-jdbc:1.2.1.spark2
2. maven:/org.apache.hive:hive-jdbc
Update: Looks like Spark Thrift Server is not exposed to public: Ports used in HDInsight
I was able to connect to Spark Thrift Server from JDBC client with following workaround.
Spark Thrift Server is running on port 10002, which is not publicly accessible as documented here in Azure HDInsight docs. Thus, here is alternative way to connect to Spark SQL from local JDBC client.
Background:
I connected to cluster head node via SSH.
ssh user#cluster-name-ssh.azurehdinsight.net
From here, I was able to connect to Spark Thrift Server using Beeline client.
beeline -u 'jdbc:hive2://localhost:10002/;transportMode=http'
With Beeline, I can run SQL queries using Spark engine.
Solution:
So I set up SSH port forwarding in my local machine (forward local port 10002 to cluster head node)
ssh -L 10002:localhost:10002 user#cluster-name-ssh.azurehdinsight.net
Now, I can use this port in JDBC client to connect to Spark SQL.
jdbc:hive2://localhost:10002/;transportMode=http
With that, you can use Spark SQL from your local JDBC client.