I am bit new to MapR Hbase but I have worked with Hbase with HDP/Cloudera. We have hbase cluster in HDP and we are planning to migrate Hbase data to MapR Hbase cluster.
What should be appropriate approach that I can take here? (Downtime is not a problem for us at this moment.)
Should we use export/import utilities, copytable commands, etc.?
You would have to create the destination table by hand and then use the CopyTable command. For details, see
http://doc.mapr.com/display/MapR/Migrating+Between+Apache+HBase+Tables+and+MapR+Tables
Related
I'm using Hive as my meta store database and the Hive Standalone Metastore for dealing with the DDLs, via this thrift client that implements the server thrift mapping.
I want to perform an MSCK (or some other method like this) to bulk add partitions to the Hive new tables.
But afaik, this Thrift mapping file doesn't expose an msck method.
Although, I see that there's something about the Msck implemented inside standalone server (I think that it should have been implemented in jira HIVE-17824). But there isn't in the HiveMetastore class (that I understood that is the mapping of the Thrift server methods).
Does anyone know whether I can run MSCK through the standalone hive server via thrift client?
With python I am currently using this client with success: PyHive.
And from dbeaver you can also do it (if the command must be run by some human): dbeaver.
EDIT (I did not realize that the question was about sending the command directly to hive metastore):
The interface called IMetaStoreClient (the protocol between hive client and hive metastore server) does not implement MSCK command because it does not need it. Let me explain the logic behind MSCK command:
Check if table exists in hive metastore.
Scan for new partitions in the physical file system where the table stores its data. See code checkMetastore.
Create/Add those new partitions. See code createPartitionsInBatches. This code ends up using the method called add_partitions of the hive metastore client.
See add_partitions. In this point and not before the client application sends data to the hive metastore server.
Drop partitions which are not in the file system anymore. See code dropPartitionsInBatches which ends up using the method called dropPartitions of the hive metastore client.
See dropPartitions. Again, it is in this point and not before where the client application sends data to the hive metastore server.
MSCK is not really a hive metastore command. It requires logic implemented by the client running that MSCK command. In your case, you should add that logic to the client that you want to use.
For example, Spark, already implements that logic when using MSCK.
I am pretty new to Presto and hive. In one of our application we want to use presto to query data from apache kudu and aws s3. As per my knowledge presto has its own catalog(meta) service, but we want to configure hive metastore(without hadoop and hive) so that in future other application(e.g spark) can use hive metastore to query data from Kudu and s3. I have been using latest version of presto and kudu.
Could someone help me to configure this system?
Thanks and regards
I am new to hive, and some question confusing me very much.
first, after installation of hive, I just run hive, then I can create, select tables. where is the hive server, what is the use of it.
second, what is the use of metastore server, I know we need the metastore to access the metadata about hive tables, does that mean if I start a metastore server I can request it in other app and get the information?
Metastore server talks to the backend such as Derby/MySql to store and retrieve table metadata. If any Hive component wants to get/set metadata, it calls the MetaStore APIs. APIs are such getTable(tableName), createDatabase(dbName) etc. Basically metastore abstracts and provides backend (derby/mysql/postgres) independent API layer. Similar to HiveServer this can also run as a server. If there is no metastore server running, then the Driver will load the metastore in its process. If metastore is running as a separate server then the Driver object communicates with the metastore over network.
want to have central hive meta store to consume from databrick, spectrum etc ..
Is it possible to setup w/o installing hadoop
Yes, Hive metastore installation does not require Hadoop.
Querying data from the Hive metastore requires a Hive client (within Spark) and a Hadoop compatible filesystem (such as S3)
AWS Glue Data Catalog is the recommended system nowadays, not RDS
I installed Hive 1.2.1 and configured to work with Hadoop 2.7.
But I didn't setup meta store for Hive with Derby or MySQL.
And also I don't have a copy of hive-site.xml under $HIVE_HOME/conf.
My question is how still I am able to create database & tables in Hive. Where all these meta data is stored?
Appreciate your insight.
Thanks in advance.
By default Hive uses Derby and starts metastore (based on derby) in embedded mode. The metastore and hiveserver runs in the same process. I believe hive initializes the metastore for you in embedded mode.
http://www.cloudera.com/documentation/archive/cdh/4-x/4-2-0/CDH4-Installation-Guide/cdh4ig_topic_18_4.html