I have configured an ODBC connection in Windows ODBC Data Source Administrator and test successfully.
I want to connect to HIVE database (the ODBC connection) using Spark. As my understanding, the connection should be configured in hive-site.xml and put to folder /config. But how to configure hive-site.xml using ODBC connection? Can anyone provide an example for hive-site.xml?
I am using cloudera ODBC driver.
Environment: Windows, spark 2.3.3
Related
If I enter the remote machine dlw2nia-bd01, and I execute beeline and I execute this connecting string
!connect jdbc:hive2://dlw2nia-bd02.walgreens.com:2181,dlw2nia-bd03.walgreens.com:2181,dlw2nia-bd10.walgreens.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
, I can connect to Hive, in fact I see:
Connected to: Apache Hive (version 1.2.1000.2.6.4.0-91)
Driver: Hive JDBC (version 1.2.1000.2.6.4.0-91)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://dlw2nia-bd02.walgreens.com:21>
I would like to use, however, a client to better show the Hive tables.
I installed JDBC Driver Clouder and I tried to configure Squirrel to connect to Hive but I'm having an issue connection. Another SQL Client such as DBEaver or SQlDeveloper would also be fine.
I was thinking of downloading the drivers JDBC used for Beeline and using them with an SQL client, but I don't know where to find drivers in the remote machine.
Can you help me configure a client to connect to Hive?
I am Trying to configure Spark to use with Logi Analytics. So that I could use Spark SQL to query data and visualize it in logi analytics. Any suggestion on connecting Apache spark with logi Analytics will be helpful to me.
The Spark SQL is not specifically supported by Logi Info, but you may be able to use an ODBC driver and Connection element to connect to the Spark Thrift Server.
First of all, you will need to download ODBC driver SimbaSpark/Cdata. Then you should configure the DSN of the ODBC driver.
After that in Logi connections element add ODBC connection element and set connection string attribute to the name of your DSN (e.g. 'DSN=NameOfYourDSN;').
I am using Azure HDInsight and want to connect to Thrift Server using JDBC in similar way as described here: Thrift JDBC/ODBC Server.
However it always connects to Hive and not Spark Thrift Server. While they both look similar and I can query data, I want to exploit Spark execution engine as I am using mainly Spark2 and sometimes need JDBC connection. Spark engine is also probably faster than Hive/TEZ.
Connection string looks like this:
jdbc:hive2://hdinsight-name.azurehdinsight.net:443/default;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/hive2
Drivers tried:
1. maven:/org.spark-project.hive:hive-jdbc:1.2.1.spark2
2. maven:/org.apache.hive:hive-jdbc
Update: Looks like Spark Thrift Server is not exposed to public: Ports used in HDInsight
I was able to connect to Spark Thrift Server from JDBC client with following workaround.
Spark Thrift Server is running on port 10002, which is not publicly accessible as documented here in Azure HDInsight docs. Thus, here is alternative way to connect to Spark SQL from local JDBC client.
Background:
I connected to cluster head node via SSH.
ssh user#cluster-name-ssh.azurehdinsight.net
From here, I was able to connect to Spark Thrift Server using Beeline client.
beeline -u 'jdbc:hive2://localhost:10002/;transportMode=http'
With Beeline, I can run SQL queries using Spark engine.
Solution:
So I set up SSH port forwarding in my local machine (forward local port 10002 to cluster head node)
ssh -L 10002:localhost:10002 user#cluster-name-ssh.azurehdinsight.net
Now, I can use this port in JDBC client to connect to Spark SQL.
jdbc:hive2://localhost:10002/;transportMode=http
With that, you can use Spark SQL from your local JDBC client.
I am trying to set up SAS ODBC connection to sas server installed on linux.
I have sas ODBC installed on my local (windows)cand now I need to connect to the sas session on the SAS server.
My approach :
trying to create a port forward using putty on windows and then configuring ODBC.
Is my approach correct.
When testing the connection from Tableau Desktop to Apache Hive Server, it throws an error that the drivers have not be installed. Tableau is providing drivers for only Cloudera, HortonWorks and MapR. But, the drivers are not provided for Apache Hive.
How to connect to from Tableau Desktop to Apache Hive?
I got it working using the MapR driver.
Reference http://doc.mapr.com/display/MapR/Hive+ODBC+Connector#HiveODBCConnector-InstallingtheHiveODBCConnectoronWindows