In Impala, is there a way to check which tables in the database contain a specific column name?
Something like:
select tablename, columnname
from dbc.columns
where databasename = 'mydatabasename'
and columnname like '%findthis%'
order by tablename
The above query works in a teradata environment, but throws an error in Impala.
Thanks,
Impala shares the metastore with Hive. Unlike traditional RDBMS, Hive metadata is stored in a separate database. In most cases it is in MySQL or Postgres. If you have access to the metastore database, you can run SELECT on table TBLS to get the details about the tables and COLUMNS_V2 to get the details about columns.
If you do not have access to the metastore, the only option is to describe each table to get the column names. If you have a lot of databases and tables, you could write a shell script to get the list of tables using "show tables" and loop around the tables to describe them using "desc tablename".
Related
I am looking for a single hive query that shows a list of tables and columns of a database in this format:
Table name, Column name
Note: I know this question is already asked, but I didn't find any working answer that works on Dbeaver because I don't have access to the Linux terminal related to the database
This question already has an answer here:
Check if a hive table is partitioned on a given column
(1 answer)
Closed 2 years ago.
Is there any way which allows listing of partitioned tables in Hive?
I found the way which allows this to happen in SQL Server.
https://dba.stackexchange.com/questions/14996/how-do-i-get-a-list-of-all-the-partitioned-tables-in-my-database
I want to list only partitioned tables under a specific database so that I don't get to check the DDLs of numerous tables to find whether the table is partitioned or non-partitioned. Any similar functionality in Hive? Please suggest.
You can directly connect with the hive metastore data base and get the information about the tables which are partitioned.
Need to know following information, may change according to your cluster configuration:
The database(e.g PostgreSQL,mysql etc) in which hive metastore is configured to store the metadata about the tables.
Usually metastore is the database name in which table information are stored in hive metastore database.
TBLS is the table which store hive table information. DBS is the table which store the hive database information and PARTITIONS is the table whoch store the information about partitioning in hive.
DB_ID is the foreign key in TBLS and TBL_ID is the foreign key of TBLS in PARTITIONS.
Join tables like below:
select d."NAME" as DATABASE_NAME, t."TBL_NAME" as TABLE_NAME, p."PKEY_NAME" as PARTITION_KEY_NAME
from "PARTITION" p
join "TBLS" on p."TBL_ID"=t."TBL_ID"
join "DBS" dat on t."DB_ID"=d."DB_ID"
where d."NAME"="filterdbname" AND p."PKEY_NAME" is not null;
This is the sql approach. If programmatic approach is needed.
HiveMetaStoreClient APIs can be used to query the metastore tables. Metastore connection setup is needed. In java below is the pseudo code,
import org.apache.hadoop.hive.conf.HiveConf;
import org.apache.hadoop.hive.metastore.HiveMetaStoreClient;
HiveConf conf = new HiveConf();
hiveConf.setVar(HiveConf.ConfVars.METASTOREURIS, Address+":"+ Port);
HiveMetaStoreClient hiveMetaStoreClient = new HiveMetaStoreClient(conf);
I am trying to migrate data from table A of database DB1 to table B of database DB2 using java and Oracle.
I am using java 1.8 and my source database has Oracle 11g and destination database has Oracle 12c.
I made structure (scema, tables )of destination database in source database. And migrating as by making use of *insert into dest select * from source* query in java . but as the number of records in source table in millions so it's consuming time.. and later on this migrated data i want to export into my actual destination so that too will going to take time.
As per my little knowledge.. i think I can't use prepared statement with 2 connection. Because my table consists of 400 to 500 columns , so binding that many columns with prepared statement is not a good idea. Also my structure of source and destination tables are different. I made the field mapping in properties file where I mapped the old field to new field for insert into select * from tbl query. Like my source table has column as col0001 and the corresponding column in destination is ref_no. So this too will not allow me to use prepared statement. But by making use of statement in java i can migrate data in single dB only.
I tried with dblink also. But for clob datatype i am not able to migrate data.
Kindly provide the solution if anyone did something like this previously.
For a one-off copy, you can do a direct mode insert:
insert /*+ APPEND */ into local_table select * from table#database_link;
Here are some other related links.
I am creating and insert tables in HIVE,and the files are created on HDFS and some on external storage S3
Assuming if I created a 10 tables,is there any system table in Hive where I can find the table info created by the user??? (for example like in Teradata we have DBC.tablesv which hold information of all the user defined tables)
You can find where you metastore is configured to be in the hive-site.xml file.
Its usual location is under /etc/hive/{$hadoop_version}/ or /etc/hive/conf/.
grep for "hive.metastore.uris" or "javax.jdo.option.ConnectionURL" to see which db you are using for the metastore. The credentials should also be there.
If, for example, your metastore is on a MySQL server, you can run queries like
SELECT * FROM TBLS;
SELECT * FROM PARTITIONS;
etc
You can't query (as in SELECT ... FROM...) the metadata from within Hive.
You do however have comnands that display that information, e.g. show databases, show tables, desc MyTable etc.
I'm not sure I understood 100% your question, if you mean the informations about the creation of the table, like the query itself, with the location on HDFS, table properties, etc, you can try with:
SHOW CREATE TABLE <table>;
If you need to retrieve a list of the columns names and datatypes try with:
DESCRIBE <table>;
Here is our setup -
We have Hive that uses MySQL on another machine as a metastore.
I can start the Hive command line shell and create a table and describe it.
But when I log on to the other machine where MySQL is used as metastore, I cannot see the Hive table details on the MySQL.
e.g. Here are hive commands -
hive> create table student(name STRING, id INT);
OK
Time taken: 7.464 seconds
hive> describe student;
OK
name string
id int
Time taken: 0.408 seconds
hive>
Next, I log on to the machine where MySQL is installed and this MySQL is used as Hive metastore. I use the "metastore" database. But if I want to list the tables, I cannot see the table or the table info I have created in Hive.
How can I see the Hive table information in the metastore?
First, find what MySql database the metastore is stored in. This is going to be in your hive-site.conf - connection URL. Then, once you connect to MySql you can
use metastore;
show tables;
select * from TBLS; <-- this will give you list of your hive tables
Another useful query if you want to search what other tables a particular column belongs to:
SELECT c.column_name, tbl_name, c.comment, c.type_name, c.integer_idx,
tbl_id, create_time, owner, retention, t.sd_id, tbl_type, input_format, is_compressed, location,
num_buckets, output_format, serde_id, s.cd_id
FROM TBLS t, SDS s, COLUMNS_V2 c
-- WHERE tbl_name = 'my_table'
WHERE t.SD_ID = s.SD_ID
AND s.cd_id = c.cd_id
AND c.column_name = 'my_col'
order by create_time
You can query the metastore schema in your MySQL database. Something like:
mysql> select * from TBLS;
More details on how to configure a MySQL metastore to store metadata for Hive and verify and see the stored metadata here.
*While setting up Hadoop services are any other services(this is mandatory too), admins use a relational databases in most of the scenarios to store the metadata information of the services like hive and oozie.
So, find which database(mysql,postgresql,sqlserver etc) your hive is backed up by, and you can see the metadata information in the TBLS table.*
While upgrading your hive, you have to take backup of these TBLS.