What is the use of mydatabase.db in Hive? - hive

Just now I start reading about Hive and I have a doubt. When I create a database called 'xyz' in Hive, it creates a folder 'xyz.db'. Anyway Hive is using metastore_db to store the table schema. Then what is the use of this 'xyz.db' folder?
Regards
Sivagururaja.

It is the default directory where the data files for the tables are stored on HDFS.
metastore_db is an external db (mysql, postgres, derby, etc..) which stores the table schema to be used to read the files in xyz.db.

Related

How to dynamically create table in Snowflake getting schema from parquet file which stored in AWS

Could you help me to load a couple of parquet files to Snowflake.
I've got about 250 parquet-files which stored in AWS stage.
250 files = 250 different tables.
I'd like to dynamically load them into Snowflake tables.
So, I need:
Get schema from parquet file... I've read that I could get the schema from parquet file using parquet-tools (Apache).
Create table using schema from the parquet file
Load data from parquet-file to this table.
Could anyone help me how to do that? Does exist the most efficient way to realize it? (by using GUI Snowflake, for example). Can't find it.
Thanks.
If the schema of the files is same you can put them in a single stage and use the Infer-Schema function. This will give you the schema of the parquet files.
https://docs.snowflake.com/en/sql-reference/functions/infer_schema.html
In case all files have different schema then I'm afraid you have to infer the schema on each file.

How Can I create a Hive Table on top of a Parquet File

Facing issue on creating hive table on top of parquet file. Can someone help me on the same.? I have read many articles and followed the guidelines but not able to load a parquet file in Hive Table.
According "Using Parquet Tables in Hive" it is often useful to create the table as an external table pointing to the location where the files will be created, if a table will be populated with data files generated outside of Hive.
hive> create external table parquet_table_name (<yourParquetDataStructure>)
STORED AS PARQUET
LOCATION '/<yourPath>/<yourParquetFile>';

In sqoop export, Avro table to define schema in RDBMS

I'm loading data from HDFS to mySQL using SQOOP, in this data one record has got more than 70 fields, making it difficult to define the schema while creating the table in RDBMS.
Is there a way to use AVRO tables to dynamically create the table with schema in RDBMS using SQOOP?
Or is there any some tool which does the same?
This is not supported in sqoop today. From the sqoop documentation
The export tool exports a set of files from HDFS back to an RDBMS. The
target table must already exist in the database. The input files are
read and parsed into a set of records according to the user-specified
delimiters.

Where does hive stores its table?

I am new to Hadoop and I just started working on Hive, I my understanding it provides a query language to process data in HDFS. With HiveQl we can create tables and load data into it from HDFS.
So my question is: where are those tables stored? Specifically if we have 100 GB file in our HDFS and we want to make a hive table out of that data what will be the size of that table and where is it stored?
If my understanding about this concept is wrong please correct me ..
If the table is 100GB you should consider an Hive External Table (as opposed to a "managed table", for the difference, see this).
With an external table the data itself will be still stored on the HDFS in the file path that you specify (note that you may specify a directory of files as long as they all have the same structure), but Hive will create a map of it in the meta-store whereas the managed table will store the data "in Hive".
When you drop a managed table, it drops the underlying data as opposed to dropping a hive external table which only drops the meta-data from the meta-store referencing that data.
Either way you are using only 100GB as viewed by the user and are taking advantage of the HDFS' robustness though duplication of the data.
Hive will create a directory on HDFS. If you didn't specify any location it will create a directory at /user/hive/warehouse on HDFS. After load command the files are moved to the /warehouse/tablename. You can also point to the HDFS directory if it contains partitions (if the files are partitioned), or use external table concept.

How to export data from Hive to Cassandra?

Is there a way to export data from Hortonworks Hive to Apache Cassandra without using ETL tools?
You can create a database "export" and a table "myview" inside of it.
create database export;
use export;
create table myview as select <put your query here>
Then use "desc" to get the full directory path:
describe myview.
From the directory path, you can point cassandra to that hdfs location and import it from hdfs.
Disclaimer: this process is only for smaller tables since your "myview" is not partitioned. It is not suitable for large ones that should have partitions defined on them.