Is it possible to create sub schema in Hive to have:
db_name.sub_schema.table_name
it is not possible Hive. In hive the database and schema are essentially the same. From the Hive documentation
The uses of SCHEMA and DATABASE are interchangeable – they mean the
same thing. CREATE DATABASE was added in Hive 0.6 (HIVE-675). The
WITH DBPROPERTIES clause was added in Hive 0.7 (HIVE-1836).
Related
I want to insert Json to hive database.
I try to transform Json to SQL using ConvertJsonToSQL Ni-Fi processor. How can I use PARTITION (....) part into my query??
Can I do this or I should use ReplaceText processor for making query?
What version of Hive are you using? There are Hive 1.2 and Hive 3 versions of PutHiveStreaming and PutHive3Streaming (respectively) that let you put the data directly into Hive without having to issue HiveQL statements. For external Hive tables in ORC format, there are also ConvertAvroToORC (for Hive 1.2) and PutORC (for Hive 3) processors.
Assuming those don't work for your use case, you may also consider ConvertRecord with a FreeFormTextRecordSetWriter that generates the HiveQL with the PARTITION statement and such. It gives a lot more flexibility than trying to patch a SQL statement to turn it into HiveQL for a partitioned table.
EDIT: I forgot to mention that the Hive 3 NAR/components are not included with the NiFi release due to space reasons. You can find the Hive 3 NAR for NiFi 1.11.4 here.
My hive version is 1.2.0
I am doing hive hbase integration where my hbase table already present.
While creating hive table, I was checking if I can use few of hive's built-in date functions as a candidate for virtual columns/derived columns, which is something like this -
create external table `Hive_Test`(
*existing hbase columns*,
*new_column* AS to_date(from_unixtime(unix_timestamp(*existing_column*,'yyyy/MM/dd HH:mm:ss')...
)CLUSTERED BY (..) SORTED BY (new_colulmn) INTO n BUCKETS
..
WITH SERDEPROPERTIES(
hbase.columns.mappings=':key,cf:*,:timestamp',
..
)
If there is any other way where I can use built-in functions capability in create table, then please let me know.
Thanks.
With reference to - Hive Computed Column, i think you are defining a logic when creating a table which is not possible with hive.
You can refer this article for Apache Hive Derived Column Support and Alternative
A better way is to create a view on top of the non-native table created for Hive-HBase integration, with which you can do almost any kind of mapping that facilitates your business.
For Pig, the default schema is ByteArray. Is there a default schema for Hive if we don't mention a schema in Hive? I tried to look at some Hive documentation but couldn't find any.
Hive is schema on Read --- I am not sure this is the answer...If some one could give an insight on this that would be great
Hive does the best that it can to
read the data. You will get lots of null values if there aren’t enough fields in each record
to match the schema. If some fields are numbers and Hive encounters nonnumeric
strings, it will return nulls for those fields. Above all else, Hive tries to recover from all
errors as best it can.
There is not default schema in Hive, in order to query data in hive you have to first create a table explaining the content of your data (by using create external table ... location).
So you basically have to tell hive the "scheme" before querying the data.
Just now I start reading about Hive and I have a doubt. When I create a database called 'xyz' in Hive, it creates a folder 'xyz.db'. Anyway Hive is using metastore_db to store the table schema. Then what is the use of this 'xyz.db' folder?
Regards
Sivagururaja.
It is the default directory where the data files for the tables are stored on HDFS.
metastore_db is an external db (mysql, postgres, derby, etc..) which stores the table schema to be used to read the files in xyz.db.
I have changed the Hive Metastore from Derby to SQL as given in this specification.
https://ccp.cloudera.com/display/CDHDOC/Hive+Installation
Please tell me how can I ensure whether it is changed to SQL.
You can query the metastore schema in your MySQL database.
Something like:
SELECT * FROM TBLS;
on your MySQL database should you the names of your Hive tables.
Add a new table and verify that the above query returns updated results.