Differences between beeline embedded and remote modes? - hive

What are the real differences between beeline embedded and remote modes? Do they just refer to connecting to remote hive server versus local hive server? When we connect to beeline through embedded mode does the client run in the same JVM of hive?

Related

How connect to Hive with Squirrel and beeline command

If I enter the remote machine dlw2nia-bd01, and I execute beeline and I execute this connecting string
!connect jdbc:hive2://dlw2nia-bd02.walgreens.com:2181,dlw2nia-bd03.walgreens.com:2181,dlw2nia-bd10.walgreens.com:2181/;serviceDiscoveryMode=zooKeeper;zooKeeperNamespace=hiveserver2
, I can connect to Hive, in fact I see:
Connected to: Apache Hive (version 1.2.1000.2.6.4.0-91)
Driver: Hive JDBC (version 1.2.1000.2.6.4.0-91)
Transaction isolation: TRANSACTION_REPEATABLE_READ
0: jdbc:hive2://dlw2nia-bd02.walgreens.com:21>
I would like to use, however, a client to better show the Hive tables.
I installed JDBC Driver Clouder and I tried to configure Squirrel to connect to Hive but I'm having an issue connection. Another SQL Client such as DBEaver or SQlDeveloper would also be fine.
I was thinking of downloading the drivers JDBC used for Beeline and using them with an SQL client, but I don't know where to find drivers in the remote machine.
Can you help me configure a client to connect to Hive?

How to configure hive-site.xml for ODBC connection

I have configured an ODBC connection in Windows ODBC Data Source Administrator and test successfully.
I want to connect to HIVE database (the ODBC connection) using Spark. As my understanding, the connection should be configured in hive-site.xml and put to folder /config. But how to configure hive-site.xml using ODBC connection? Can anyone provide an example for hive-site.xml?
I am using cloudera ODBC driver.
Environment: Windows, spark 2.3.3

Can not connect to Spark Thrift Server using JDBC, keeps using Hive

I am using Azure HDInsight and want to connect to Thrift Server using JDBC in similar way as described here: Thrift JDBC/ODBC Server.
However it always connects to Hive and not Spark Thrift Server. While they both look similar and I can query data, I want to exploit Spark execution engine as I am using mainly Spark2 and sometimes need JDBC connection. Spark engine is also probably faster than Hive/TEZ.  
Connection string looks like this:
jdbc:hive2://hdinsight-name.azurehdinsight.net:443/default;ssl=true?hive.server2.transport.mode=http;hive.server2.thrift.http.path=/hive2
Drivers tried:
1. maven:/org.spark-project.hive:hive-jdbc:1.2.1.spark2
2. maven:/org.apache.hive:hive-jdbc
Update: Looks like Spark Thrift Server is not exposed to public: Ports used in HDInsight
I was able to connect to Spark Thrift Server from JDBC client with following workaround.
Spark Thrift Server is running on port 10002, which is not publicly accessible as documented here in Azure HDInsight docs. Thus, here is alternative way to connect to Spark SQL from local JDBC client.
Background:
I connected to cluster head node via SSH.
ssh user#cluster-name-ssh.azurehdinsight.net
From here, I was able to connect to Spark Thrift Server using Beeline client.
beeline -u 'jdbc:hive2://localhost:10002/;transportMode=http'
With Beeline, I can run SQL queries using Spark engine.
Solution:
So I set up SSH port forwarding in my local machine (forward local port 10002 to cluster head node)
ssh -L 10002:localhost:10002 user#cluster-name-ssh.azurehdinsight.net
Now, I can use this port in JDBC client to connect to Spark SQL.
jdbc:hive2://localhost:10002/;transportMode=http
With that, you can use Spark SQL from your local JDBC client.

HSQLDB Database multiple connections

Is it possible to connect to a HSQL database over multiple connections?
I have 2 connections using the same JDBC URL and the same hsqllib.jar and it appears I get a "new" database.
I imagine each connection is initializing the database in its memory?
You will need to run HSQLDB in standalone mode, and then connect both instances of your application to the standalone instance. The documentation describes how that start in server mode. For example, the following would start an in-memory database named database1:
java -cp ../lib/hsqldb.jar org.hsqldb.Server -database.0 mem:database1 -dbname.0 database1
You can then connect to that instance from your application using the following URL (assuming that everything is running on the same server):
jdbc:hsqldb:hsql://localhost/database1

Access Hive Tables in SQLClient but not from the Putty

I am new to Hive, MapReduce and Hadoop.
I am using Putty to connect to hive table and access records in the tables. So what I did is- I opened Putty and in the host name I typed- vip.name.com and then I click Open. And then I entered my username and password and then few commands to get to Hive sql. Below is the list what I did
$ bash
bash-3.00$ hive
Hive history file=/tmp/rkost/hive_job_log_rkost_201207010451_1212680168.txt
hive> set mapred.job.queue.name=mdhi-technology;
hive> select * from table LIMIT 1;
So my question is-
Is there any other way I can do the same thing in any Sql client like Sql Developer or Squirel SQL Client instead of doing it from the command prompt. And if it is there then what is the step by step process to do this considering my example as I am logging to vip.name.com from Putty .
And same thing if I need to do through JDBC Program in my windows machine then how I can do it. Means with the JDBC Program, how I can access Hive tables and get the result back. As I know how I can do this with the oracle tables. But the only confusion I have is, as I am using this hostname vip.name.com to log into Putty. I am hoping the question is clear. Any suggestion will be appreciated.
In short my question is- Can I do the same thing in any SQLClient instead of logging from the Putty?
Update-
I tried doing the way Mark has suggested me. But I am always getting- Hive: Could not establish connection to vip.host.com:10000/default: java.net.ConnectionException: Connection timed out: connect
What are you doing with Putty is SSH'ing into a machine with Hive installed and set up. Then you are issuing Hive queries from the Hive command line. That is one way of issuing Hive queries. There are other ways that don't require SSH'ing, one you probably need is connection via JDBC.
Here is an article which describes how to connect to a Hive installation on Amazon's EMR cluster using SQuirreL via JDBC. The article might appear to be Amazon specific but it's not. As long you have Hive server running on one of the nodes of the cluster and no firewall impeding connection between the client machine and one running Hive, you should be able to connect.
A couple things you might want to keep in mind related to the above link:
You can ignore step 3 where it asks you to create a SSH tunnel unless you are using EMR.
The port that you enter in your connection URI might be different in your case. Replace localhost with the fully qualified domain name of the machine that Hive is running on. To find out which port Hive server is listening on, you can look into your Hive server nanny log file present in the log directory (whose location depends on your installation) or run a simple netstat -a command. I believe 10000 is the default port number, so it might make sense to try out 10000 directly.