How to run MSCK on Hive Standalone Metastore server via thrift client - hive

I'm using Hive as my meta store database and the Hive Standalone Metastore for dealing with the DDLs, via this thrift client that implements the server thrift mapping.
I want to perform an MSCK (or some other method like this) to bulk add partitions to the Hive new tables.
But afaik, this Thrift mapping file doesn't expose an msck method.
Although, I see that there's something about the Msck implemented inside standalone server (I think that it should have been implemented in jira HIVE-17824). But there isn't in the HiveMetastore class (that I understood that is the mapping of the Thrift server methods).
Does anyone know whether I can run MSCK through the standalone hive server via thrift client?

With python I am currently using this client with success: PyHive.
And from dbeaver you can also do it (if the command must be run by some human): dbeaver.
EDIT (I did not realize that the question was about sending the command directly to hive metastore):
The interface called IMetaStoreClient (the protocol between hive client and hive metastore server) does not implement MSCK command because it does not need it. Let me explain the logic behind MSCK command:
Check if table exists in hive metastore.
Scan for new partitions in the physical file system where the table stores its data. See code checkMetastore.
Create/Add those new partitions. See code createPartitionsInBatches. This code ends up using the method called add_partitions of the hive metastore client.
See add_partitions. In this point and not before the client application sends data to the hive metastore server.
Drop partitions which are not in the file system anymore. See code dropPartitionsInBatches which ends up using the method called dropPartitions of the hive metastore client.
See dropPartitions. Again, it is in this point and not before where the client application sends data to the hive metastore server.
MSCK is not really a hive metastore command. It requires logic implemented by the client running that MSCK command. In your case, you should add that logic to the client that you want to use.
For example, Spark, already implements that logic when using MSCK.

Related

connecting to hive to execute queries wih kerberos

I am trying to connect to hive databases with a client, I have tried using DBeaver and downloaded the hive driver, but after that I have noticed that there is a kerbero's instance in the middle, and it seems that the dbeaver driver doesn't supoort kerberos.
¿There is some windows client suitable to query hive databases easy to plug in, considering the kerbero's instance?
Thanks in advance.

what is the use of hive server and metastore server?

I am new to hive, and some question confusing me very much.
first, after installation of hive, I just run hive, then I can create, select tables. where is the hive server, what is the use of it.
second, what is the use of metastore server, I know we need the metastore to access the metadata about hive tables, does that mean if I start a metastore server I can request it in other app and get the information?
Metastore server talks to the backend such as Derby/MySql to store and retrieve table metadata. If any Hive component wants to get/set metadata, it calls the MetaStore APIs. APIs are such getTable(tableName), createDatabase(dbName) etc. Basically metastore abstracts and provides backend (derby/mysql/postgres) independent API layer. Similar to HiveServer this can also run as a server. If there is no metastore server running, then the Driver will load the metastore in its process. If metastore is running as a separate server then the Driver object communicates with the metastore over network.

HIVE - How it works without a meta store?

I installed Hive 1.2.1 and configured to work with Hadoop 2.7.
But I didn't setup meta store for Hive with Derby or MySQL.
And also I don't have a copy of hive-site.xml under $HIVE_HOME/conf.
My question is how still I am able to create database & tables in Hive. Where all these meta data is stored?
Appreciate your insight.
Thanks in advance.
By default Hive uses Derby and starts metastore (based on derby) in embedded mode. The metastore and hiveserver runs in the same process. I believe hive initializes the metastore for you in embedded mode.
http://www.cloudera.com/documentation/archive/cdh/4-x/4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html

How to load SQL data into the Hortonworks?

I have Installed Hortonworks SandBox in my pc. also tried with a CSV file and its getting in a table structerd manner its OK (Hive + Hadoop), nw I want to migrate my current SQL Databse into Sandbox (MS SQL 2008 r2).How I will do this? Also want to connect to my project (VS 2010 C#).
Is it possible to connect through ODBC?
I Heard sqoop is using for transferring data from SQL to Hadoop so how I can do this migration with sqoop?
You could write your own job to migrate the data. But Sqoop would be more convenient. To do that you have to download Sqoop and the appropriate connector, Microsoft SQL Server Connector for Apache Hadoop in your case. You can download it from here.Please go through the Sqoop user guide. It contains all the information in proper detail.
And Hive does support ODBC. You can find more on this at this page.
I wrote down the steps you need to go through in the Hortonworks Sandbox to install the JDBC driver and get it to work: http://hortonworks.com/community/forums/topic/import-microsoft-sql-data-into-sandbox/
To connect to Hadoop in your C# project you can use the Hortonworks Hive ODBC driver from http://hortonworks.com/thankyou-hdp13/#addon-table. Read the PDF (which is also on that page) to see how it works (I used Hive Server Type 2 with user name sandbox)

Access Hive Tables in SQLClient but not from the Putty

I am new to Hive, MapReduce and Hadoop.
I am using Putty to connect to hive table and access records in the tables. So what I did is- I opened Putty and in the host name I typed- vip.name.com and then I click Open. And then I entered my username and password and then few commands to get to Hive sql. Below is the list what I did
$ bash
bash-3.00$ hive
Hive history file=/tmp/rkost/hive_job_log_rkost_201207010451_1212680168.txt
hive> set mapred.job.queue.name=mdhi-technology;
hive> select * from table LIMIT 1;
So my question is-
Is there any other way I can do the same thing in any Sql client like Sql Developer or Squirel SQL Client instead of doing it from the command prompt. And if it is there then what is the step by step process to do this considering my example as I am logging to vip.name.com from Putty .
And same thing if I need to do through JDBC Program in my windows machine then how I can do it. Means with the JDBC Program, how I can access Hive tables and get the result back. As I know how I can do this with the oracle tables. But the only confusion I have is, as I am using this hostname vip.name.com to log into Putty. I am hoping the question is clear. Any suggestion will be appreciated.
In short my question is- Can I do the same thing in any SQLClient instead of logging from the Putty?
Update-
I tried doing the way Mark has suggested me. But I am always getting- Hive: Could not establish connection to vip.host.com:10000/default: java.net.ConnectionException: Connection timed out: connect
What are you doing with Putty is SSH'ing into a machine with Hive installed and set up. Then you are issuing Hive queries from the Hive command line. That is one way of issuing Hive queries. There are other ways that don't require SSH'ing, one you probably need is connection via JDBC.
Here is an article which describes how to connect to a Hive installation on Amazon's EMR cluster using SQuirreL via JDBC. The article might appear to be Amazon specific but it's not. As long you have Hive server running on one of the nodes of the cluster and no firewall impeding connection between the client machine and one running Hive, you should be able to connect.
A couple things you might want to keep in mind related to the above link:
You can ignore step 3 where it asks you to create a SSH tunnel unless you are using EMR.
The port that you enter in your connection URI might be different in your case. Replace localhost with the fully qualified domain name of the machine that Hive is running on. To find out which port Hive server is listening on, you can look into your Hive server nanny log file present in the log directory (whose location depends on your installation) or run a simple netstat -a command. I believe 10000 is the default port number, so it might make sense to try out 10000 directly.