How to configure locally installed hive to use Azure Datalake as warehouse? - hive

I have installed Hive in my local windows system, I have configured it to use MySQL as metastore and I need to configure it to use Azure Data lake as warehouse.
How to configure hive to use Azure Datalake as warehouse?.

This is possible, but you have to configure your local hadoop system accordingly.
Ensure that you have the latest ADLS libs and modify the cores-site.xml to access the Azure Data Lake Store:
<configuration>
<property>
<name>dfs.adls.oauth2.access.token.provider.type</name>
<value>ClientCredential</value>
</property>
<property>
<name>dfs.adls.oauth2.refresh.url</name>
<value>YOUR TOKEN ENDPOINT</value>
</property>
<property>
<name>dfs.adls.oauth2.client.id</name>
<value>YOUR CLIENT ID</value>
</property>
<property>
<name>dfs.adls.oauth2.credential</name>
<value>YOUR CLIENT SECRET</value>
</property>
<property>
<name>fs.adl.impl</name>
<value>org.apache.hadoop.fs.adl.AdlFileSystem</value>
</property>
<property>
<name>fs.AbstractFileSystem.adl.impl</name>
<value>org.apache.hadoop.fs.adl.Adl</value>
</property>
</configuration>
A step by step guide can be found here.
Finally ensure that in the hive-site.xml your "hive.metastore.warehouse.dir" points to the ADL.

This is not a supported use case of Azure Data Lake. Azure Data Lake is a cloud based data lake and currently supports HDInsight, Azure Data Lake Analytics, or Azure Databricks as its compute engines. It cannot connect with a locally run instance of Hive.

Related

Azure synapse Analytics connection to MongoDB Atlas

I'm new to Azure synapse Analytics. I'm trying to copy data from my mongodb Atlas cluster to a datalake
I'm trying to use a private endpoint to authorize the connection from my Azure Synapse workspace, but I always get a timeout issue every time I try to test the connection from the service linked MongoDb. Any ideas on how to get my MongoDB Atlas databases to communicate with Azure Synapse Analytics without allowing all IP addresses? Thanks

How to create linked service from azure analysis service to azure synapse SQL pool

How to pull data from cube that is hosted on Azure analysis service and load data in SQL pools of synapse
One solution is to use Azure Data Factory for data movement.
There's no built-in connector for Azure Analysis Service in Data Factory. But since Azure Analysis Services uses Azure Blob Storage to persist storage, you can use the connector for Azure Blob Storage.
In Data Factory, use a Copy Activity with Blob Storage as source and Azure Synapse Analytics as sink.
More on Azure Data Factory here: https://learn.microsoft.com/en-us/azure/data-factory/
Available connectors in Data Factory: https://learn.microsoft.com/en-us/azure/data-factory/connector-overview

Create an external hive metastore in a ec2 machine behind a load balancer

I would like to create a external hive metastore service connected to mysql. How can I create a external hive metastore service on a EC2 machine providing api which can be connected to my EMR or Hdinsight or databricks cluster.
Lot of articles on using metastore service but cannot see any article on setting up hive metastore service on a ec2 machine

Can I setup hive metastore w/o hadoop on aws and use RDS as db

want to have central hive meta store to consume from databrick, spectrum etc ..
Is it possible to setup w/o installing hadoop
Yes, Hive metastore installation does not require Hadoop.
Querying data from the Hive metastore requires a Hive client (within Spark) and a Hadoop compatible filesystem (such as S3)
AWS Glue Data Catalog is the recommended system nowadays, not RDS

Hive 1.0 - REMOTE MySQL Metastore configuration

on EMR 4.2 - Hive 1.0 version, I want to connect to a remote mysql metastore.
<property>
<name>hive.metastore.uris</name>
<value>thrift://hive-metastore-remotemysql.aws.com:9083</value>
<description>JDBC connect string for a JDBC metastore</description>
</property>
This remorte metastore is on Hive .12 and I still want to connect to same metastore from a new cluster. Because of the new hive-site.xml format I cannot give proper value to metastore-uris. As 9083 deoes not exist on remote. Id I give local then hive does not know about all databases.
Anyone has faced this and solved?
Thanks!