I am creating and insert tables in HIVE,and the files are created on HDFS and some on external storage S3
Assuming if I created a 10 tables,is there any system table in Hive where I can find the table info created by the user??? (for example like in Teradata we have DBC.tablesv which hold information of all the user defined tables)
You can find where you metastore is configured to be in the hive-site.xml file.
Its usual location is under /etc/hive/{$hadoop_version}/ or /etc/hive/conf/.
grep for "hive.metastore.uris" or "javax.jdo.option.ConnectionURL" to see which db you are using for the metastore. The credentials should also be there.
If, for example, your metastore is on a MySQL server, you can run queries like
SELECT * FROM TBLS;
SELECT * FROM PARTITIONS;
etc
You can't query (as in SELECT ... FROM...) the metadata from within Hive.
You do however have comnands that display that information, e.g. show databases, show tables, desc MyTable etc.
I'm not sure I understood 100% your question, if you mean the informations about the creation of the table, like the query itself, with the location on HDFS, table properties, etc, you can try with:
SHOW CREATE TABLE <table>;
If you need to retrieve a list of the columns names and datatypes try with:
DESCRIBE <table>;
Related
I have created a table using Drill and it is located at
/user/abc/drill/Drilltable.
Now I would like to load the data from DrillTable into HiveTable which is located at path
/user/hive/warehouse/userxyz.db
I am using below statement to load data
INSERT INTO TABLE HiveTable select * from DrillTable;
I get the error
Table not found
and I am bit confused how to let Hive know the path of Drill table.
What would be the right way to handle this?
Hive might be confused about the schema of the drill data as well as the location. If you're willing to experiment, try something like this:
Store the data in a Drill format you can model in Hive, CSV for example, as described in this post.
In Hive, create an external table that defines the schema and location of the textual data. You can then convert the external table to a managed table (optional). For example ....
In Impala, is there a way to check which tables in the database contain a specific column name?
Something like:
select tablename, columnname
from dbc.columns
where databasename = 'mydatabasename'
and columnname like '%findthis%'
order by tablename
The above query works in a teradata environment, but throws an error in Impala.
Thanks,
Impala shares the metastore with Hive. Unlike traditional RDBMS, Hive metadata is stored in a separate database. In most cases it is in MySQL or Postgres. If you have access to the metastore database, you can run SELECT on table TBLS to get the details about the tables and COLUMNS_V2 to get the details about columns.
If you do not have access to the metastore, the only option is to describe each table to get the column names. If you have a lot of databases and tables, you could write a shell script to get the list of tables using "show tables" and loop around the tables to describe them using "desc tablename".
Can anyone please suggest how to create partition table in Big Query ?.
Example: Suppose I have one log data in google storage for the year of 2016. I stored all data in one bucket partitioned by year , month and date wise. Here I want create table with partitioned by date.
Thanks in Advance
Documentation for partitioned tables is here:
https://cloud.google.com/bigquery/docs/creating-partitioned-tables
In this case, you'd create a partitioned table and populate the partitions with the data. You can run a query job that reads from GCS (and filters data for the specific date) and writes to the corresponding partition of a table. For example, to load data for May 1st, 2016 -- you'd specify the destination_table as table$20160501.
Currently, you'll have to run several query jobs to achieve this process. Please note that you'll be charged for each query job based on bytes processed.
Please see this post for some more details:
Migrating from non-partitioned to Partitioned tables
There are two options:
Option 1
You can load each daily file into separate respective table with name as YourLogs_YYYYMMDD
See details on how to Load Data from Cloud Storage
After tables created, you can access them either using Table wildcard functions (Legacy SQL) or using Wildcard Table (Standar SQL). See also Querying Multiple Tables Using a Wildcard Table for more examples
Option 2
You can create Date-Partitioned Table (just one table - YourLogs) - but you still will need to load each daily file into respective partition - see Creating and Updating Date-Partitioned Tables
After table is loaded you can easily Query Date-Partitioned Tables
Having partitions for an External Table is not allowed as for now. There is a Feature Request for it:
https://issuetracker.google.com/issues/62993684
(please vote for it if you're interested in it!)
Google says that they are considering it.
I am moving data around within Impala, not my design, and I have lost some data. I need to copy the data from the parquet tables back to their original non-parquet tables. Originally, the developers had done this with a simple one liner in a script. Since I don't know anything about databases and especially about Impala I was hoping you could help me out. This is the one line that is used to translate to a parquet table that I need to be reversed.
impalaShell -i <ipaddr> use db INVALIDATE METADATA <text_table>;
CREATE TABLE <parquet_table> LIKE <text_table> STORED AS PARQUET TABLE;
INSERT OVERWRITE <parquet_table> SELECT * FROM <text_table>;
Thanks.
Have you tried simply doing
CREATE TABLE <text_table>
AS
SELECT *
FROM <parquet_table>
Per the Cloudera documentation, this should be possible.
NOTE: Ensure that your does not exist or use a table name that does not already exist so that you do not accidentally overwrite other data.
Here is our setup -
We have Hive that uses MySQL on another machine as a metastore.
I can start the Hive command line shell and create a table and describe it.
But when I log on to the other machine where MySQL is used as metastore, I cannot see the Hive table details on the MySQL.
e.g. Here are hive commands -
hive> create table student(name STRING, id INT);
OK
Time taken: 7.464 seconds
hive> describe student;
OK
name string
id int
Time taken: 0.408 seconds
hive>
Next, I log on to the machine where MySQL is installed and this MySQL is used as Hive metastore. I use the "metastore" database. But if I want to list the tables, I cannot see the table or the table info I have created in Hive.
How can I see the Hive table information in the metastore?
First, find what MySql database the metastore is stored in. This is going to be in your hive-site.conf - connection URL. Then, once you connect to MySql you can
use metastore;
show tables;
select * from TBLS; <-- this will give you list of your hive tables
Another useful query if you want to search what other tables a particular column belongs to:
SELECT c.column_name, tbl_name, c.comment, c.type_name, c.integer_idx,
tbl_id, create_time, owner, retention, t.sd_id, tbl_type, input_format, is_compressed, location,
num_buckets, output_format, serde_id, s.cd_id
FROM TBLS t, SDS s, COLUMNS_V2 c
-- WHERE tbl_name = 'my_table'
WHERE t.SD_ID = s.SD_ID
AND s.cd_id = c.cd_id
AND c.column_name = 'my_col'
order by create_time
You can query the metastore schema in your MySQL database. Something like:
mysql> select * from TBLS;
More details on how to configure a MySQL metastore to store metadata for Hive and verify and see the stored metadata here.
*While setting up Hadoop services are any other services(this is mandatory too), admins use a relational databases in most of the scenarios to store the metadata information of the services like hive and oozie.
So, find which database(mysql,postgresql,sqlserver etc) your hive is backed up by, and you can see the metadata information in the TBLS table.*
While upgrading your hive, you have to take backup of these TBLS.