HiveQL : Query to list only the views

HiveQL : Query to list only the views - hive

Is there a Hive query to list only the views available in a particular database.
In MySql the query I think is below:
SELECT TABLE_NAME FROM information_schema.TABLES WHERE TABLE_TYPE LIKE 'VIEW' AND TABLE_SCHEMA LIKE 'database_name';
I want something similar for HiveQL.

There is no INFORMATION_SCHEMA implementation in Hive currently.
There is an Open JIRA which you can view at the following link:
https://issues.apache.org/jira/browse/HIVE-1010
However, if the Hive Metastore is configured using a Derby MySQL server then you can access the information you require.
The different ways of configuring a Hive Metastore can be found at:
http://www.thecloudavenue.com/2013/11/differentWaysOfConfiguringHiveMetastore.html
Here is a detailed E/R Diagram of the Metastore:
https://issues.apache.org/jira/secure/attachment/12471108/HiveMetaStore.pdf
After configuring this Metastore, you can obtain the information you want by a query like:
SELECT * from TBLS where TBLS_TYPE = "VIEW"

If you are stuck like me with older version of Hive and cannot use SHOW VIEWS then use the following script
export db=$1
tables=`hive -e "use $db; show tables;"`
show_table_command=''
for table in $tables; do
show_table_command="$show_table_command SHOW CREATE TABLE ${db}.${table};"
done
hive -e "$show_table_command" > show_table
sed_command="s/$db\.//g"
cat show_table | grep "CREATE VIEW" | cut -d" " -f3 | sed 's/`//g' | sed ${sed_command} > all_views
Run this code from as sh find_views.sh dbname. Now the table all_views has the list of views.
You can also use this same technique to find only tables by replacing "CREATE VIEW" to "CREATE TABLE" or "CREATE EXTERNAL TABLE" and adjusting the sed and cut statements accordingly.

SHOW VIEWS [in/from <dbName>] [<pattern>]
(much like SHOW TABLE command) can be tracked through https://issues.apache.org/jira/browse/HIVE-14558 patch is available and there is a chance it will land in Hive 2.0.

Related

Show create table not showing full output in hive

In hive, when typed - show create table table_name does not show the full output especially when the table has many columns specified. In ordert to show the whole output what command should be added??

Are you using hive or beeline shell? It might be related with interface that you used. Especially shell tools might be limited for long results.
Could you try it from Hue or a graphical desktop tool like Dbeaver?

How to drop multiple tables in Big query using Wildcards TABLE_DATE_RANGE()?

I was looking at the documentation but I haven't found the way to Drop multiple tables using wild cards.
I was trying to do something like this but it doesn't work:
DROP TABLE
TABLE_DATE_RANGE([clients.sessions_],
TIMESTAMP('2017-01-01'),
TIMESTAMP('2017-05-31'))

For dataset stats and tables like daily_table_20181017 keeping dates conventions, I would go with simple script and gcloud Command-Line Tool:
for table in `bq ls --max_results=10000000 stats |grep TABLE |grep daily_table |awk '{print $1}'`; do echo stats.$table; bq rm -f -t stats.$table; done

DROP TABLE [table_name]; is now supported in bigquery. So here is a purely SQL/bigquery UI solution.
select concat("drop table ",table_schema,".", table_name, ";" )
from <dataset-name>.INFORMATION_SCHEMA.TABLES
where table_name like "partial_table_name%"
order by table_name desc
Audit that you are dropping the correct tables. Copy and paste back into bigquery to drop listed tables.

DDL e.g. DROP TABLE doesn't exist yet in BigQuery. However, I know Google are currently working on it.
In the meantime, you'll need to use the API to delete tables. For example, using the gCloud tool:
bq rm -f -t dataset.table
If you want to do bulk deletes, then you can use some bash/awk magic. Or, if you prefer, call the Rest API directly with e.g. the Python client.
See here too.

I just used python to loop this and solve it using Graham example:
from subprocess import call
return_code = call('bq rm -f -t dataset.' + table_name +'_'+ period + '', shell=True)

For a long time #graham's approach worked for me. Just recently the BQ CLI stopped working effectively and froze everytime I ran the above command. Hence I dug around for a new approach and used some parts of Google cloud official documentation. I followed the following approach using a Jupyter notebook.
from google.cloud import bigquery
# TODO(developer): Construct a BigQuery client object.
client = bigquery.Client.from_service_account_json('/folder/my_service_account_credentials.json')
dataset_id = 'project_id.dataset_id'
dataset = client.get_dataset(dataset_id)
# Creating a list of all tables in the above dataset
tables = list(client.list_tables(dataset)) # API request(s)
## Filtering out relevant wildcard tables to be deleted
## Mention a substring that's common in all your tables that you want to delete
tables_to_delete = ["{}.{}.{}".format(dataset.project, dataset.dataset_id, table.table_id)
for table in tables if "search_sequence_" in format(table.table_id)]
for table in tables_to_delete:
client.delete_table(table)
print("Deleted table {}".format(table)) ```

To build off of #Dengar 's answer. You can use procedural SQL in BigQuery to run all of those delete statements in a for loop like so:
FOR record IN (
select concat(
"drop table ",
table_schema,".", table_name, ";" ) as del_stmt
from <dataset_name>.INFORMATION_SCHEMA.TABLES
order by table_name) DO
-- create the views
EXECUTE IMMEDIATE
FORMAT( """
%s
""", record.del_stmt);
END
FOR;
Add a WHERE condition if you do not want to delete all tables in the dataset.

With scripting and table information schema available, the following can also be used directly in the UI.
I would not recommend this for removing a larger number of tables.
FOR tn IN (SELECT table_name FROM yourDataset.INFORMATION_SCHEMA.TABLES WHERE table_name LIKE "filter%")
DO
EXECUTE IMMEDIATE FORMAT("DROP TABLE yourDataset.%s", tn.table_name);
END FOR;

Replicating tables from different databases postgresql

I have tables from different databases , and i want to create a data warehouse database that contains table replicas from different tables from different databases. I want the data in the warehouse to be synced with the data from the other tables everyday.I am using postgresql
I tried to do this using psql :
pg_dump -t table_to_copy source_db | psql target_db
However it didnt work as it keeps stating errors like table does no exist.
It all worked when i dumped the whole dabatase not only a single table, but however i want the data to be synced and i want to copy tables from different databases not the whole database.
How can i do this?
Thanks!

Probably you need FDW - Foreign Data Wrapper. You can create foreign tables for different external db in different schemas on local db. All tables accessible by local queries. For storing snap you can use local tables with just INSERT INTO local_table_YYYY_MM SELECT * FROM remote_table; .

1
pg_dump -t <table name> <source DB> | psql -d <target DB>
(Check the table name correctly and it says for you , table doesn't exist)
2
pg_dump allows the dumping of only select tables:
pg_dump -Fc -f output.dump -t tablename databasename
(dump 'tablename' from database 'databasename' into file 'output.dump' in pg_dumps binary custom format)
You can restore that pg_restore:
pg_restore -d databasename output.dump
If the table itself already exists in your target database, you can import only the rows by adding the --data-only flag.
Dblink
You can not perform cross database query like SQL Server, PostgreSQL does not support this. DbLink extension of PostgreSQL which is used to connect one database to another database. You have to install and configure DbLink to execute cross database query.
Here is the step by step script and example for executing cross database query in PostgreSQL. Please visit this post:

Hive - query table comments

I have tables with a comment. I added the comment using:
ALTER TABLE table1 SET TBLPROPERTIES ('comment' = 'Hello World!');
ALTER TABLE table2 SET TBLPROPERTIES ('comment' = 'Hello World!');
...
Now my question is, is there a table storing the table properties ?
I want to write a query returning the following data :
+------------+--------------+
| Table | Comment |
+------------+--------------+
| table1 | Hello World! |
| table2 | Hello World! |
+------------+--------------+
Thanks !

Unfortunately, I couldn't find any easier way for a query to return comments than what #Rijul suggested.
But if you're on Cloudera and you just wanna be able to see the comments, this could help:
On Hue query editor, right click on the table (or view) name and choose "show details". Under the tab details, you can see the comment for that table.

Yes their is a embedded data base which stores all the metadata of hive tables schema and other properties.
by default when you setup your hadoop cluster and hive , apache Derby is used for storing hive metadata. although you can change your meta database to postgres or mysql etc while creating your cluster.
so answer to your question is you have to manually install apache derby drivers and through commandline you can query apache derby data base for your desired output, assuming your cluster is using derby. either way you have to find out what is used in your case.
more information on hive metastore:
http://www.cloudera.com/documentation/enterprise/latest/topics/cdh_ig_hive_metastore_configure.html
information about derby:
https://cwiki.apache.org/confluence/display/Hive/AdminManual+MetastoreAdmin#AdminManualMetastoreAdmin-Local/EmbeddedMetastoreDatabase(Derby)

DESCRIBE FORMATTED tablename;
This command can help you get the comments along with many more information.

How to read schema of a PostgreSQL database

I installed an application that uses a postgreSQL server but I don't know the name of the database and the tables it uses. Is there any command in order to see the name of the database and the tables of this application?

If you are able to view the database using the psql terminal command:
> psql -h hostname -U username dbname
...then, in the psql shell, \d ("describe") will show you a list of all the relations in the database. You can use \d on specific relations as well, e.g.
db_name=# \d table_name
Table "public.table_name"
Column | Type | Modifiers
---------------+---------+-----------
id | integer | not null
... etc ...

Using the psql on Linux you can use the \l command to list databases, \c dbname to connect to that db and the \d command to list tables in the db.

Short answer: connect to the default database with psql, and list all databases with '\l'
Then, connect to you database of interest, and list tables with '\dt'
Slightly larger answer: A Postgresql server installation usually has a "data directory" (can have more than one, if there are two server instances running, but that's quite unusual), which defines what postgresl calls "a cluster". Inside it, you can have several databases ; you usually have at least the defaults 'template0' and 'template1', plus your own database(s).

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

HiveQL : Query to list only the views - hive

Is there a Hive query to list only the views available in a particular database. In MySql the query I think is below: SELECT TABLE_NAME FROM information_schema.TABLES WHERE TABLE_TYPE LIKE 'VIEW' AND TABLE_SCHEMA LIKE 'database_name'; I want something similar for HiveQL.

SHOW VIEWS [in/from <dbName>] [<pattern>] (much like SHOW TABLE command) can be tracked through https://issues.apache.org/jira/browse/HIVE-14558 patch is available and there is a chance it will land in Hive 2.0.

Related

Show create table not showing full output in hive

How to drop multiple tables in Big query using Wildcards TABLE_DATE_RANGE()?

Replicating tables from different databases postgresql

Hive - query table comments

How to read schema of a PostgreSQL database

Categories

Resources