I have written following query in SQL Server to view Table and column names
select * from information_schema.COLUMNS where column_name like '%name%'
Is there any similar query which can be written in Hive for similar result?
If not then how can I find list of tables in a particular Database which contains a particular column?
I don't think there is any option available in Hive. You can use shell scripting to get the same output. Something like this:
output=""
hive -S -e 'show databases' | while read database
do
eval "hive -S -e 'show tables in $database'" | while read line
do
if eval "hive -S -e 'describe $database.$line'" | grep -q "<column_name"; then
eval "hive -S -e 'show columns in $database.$line'" | while read column
do
output="$output$database.$line,$column"'\n'
done
fi
done
done
echo -e "$output"
Related
I have a partitioned table in bigquery which needs to be unpartitioned. It is empty at the moment so I do not need to worry about losing information.
I deleted the table by clicking on it > choosing delete > typing delete
but the table is still there even I refreshed the page or wait for 30 mins.
Then I tried the cloud shell terminal:
bq rm --table project:dataset.table_name
then it asked me to confirm and deleted the table but the table is still there!!!
Once I want to create an unpartitioned table with the same name, it gives an error that a table with this name already exists!
I have done this many times before, not sure why the table does not get removed?
Deleting partitions from Google Cloud Console is not supported. You need to execute the command bq rm with the -t shortcut in the Cloud Shell.
Before deleting the partitioned table, I suggest verifying you are going to delete the correct tables (partitions).
You can execute these commands:
for table in `bq ls --max_results=10000000 $MY_DATASET_ID | grep TABLE | grep $MY_PART_TABLE_NAME | awk '{print $1}'`; do echo $MY_DATASET_ID.$table; done
For the variable $MY_PART_TABLE_NAME, you need to write the table name and delete the partition date/value, like in this example "table_name_20200818".
After verifying that these are the correct partitions, you need to execute this command:
for table in `bq ls --max_results=10000000 $MY_DATASET_ID | grep TABLE | grep $MY_PART_TABLE_NAME | awk '{print $1}'`; do echo $MY_DATASET_ID.$table; bq rm -f -t $MY_DATASET_ID.$table; done
Using psql we can export a query output to a csv file.
psql -d somedb -h localhost -U postgres -p 5432 -c "\COPY (select * from sometable ) TO 'sometable.csv' DELIMITER ',' CSV HEADER;"
However I need to export the query output to a new table in a new sqlite3 database.
I also looked at pg_dump, but haven't been able to figure it out a way with it.
The reason I want to export it as a new table in a new sqlite3 db without any intermediately CSV conversion is because
The query output is going to run into GBs, I have disk space constraints - so rather than csv export and then create a new sqlite3 db, need to get this in one shot
My solution is using the standard INSERT SQL statements.
It's required the same table scheme. The grep command removes the problematic characters, such as -- or blanklines.
pg_dump --data-only --inserts --table=sometable DBNAME | grep -v -e '^SET' -e '^$' -e '^--' | sqlite3 ./target.db
I hope this will help you.
I use the bq command line tool to run queries, e.g:
bq query "select * from table"
What if I store the query in a file and run the query from that file? is there a way to do that?
The other answers seem to be either outdated or needlessly brittle. As of 2019, bq query reads from stdin, so you can just redirect your file into it:
bq query < myfile.sql
Query parameters are passed like this:
bq query --parameter name:type:value < myfile.sql
There is another way.
Try this:
bq query --flagfile=[your file with absolute path]
Ex:
bq query --flagfile=/home/user/abc.sql
You can run a query from a text file with a little bit of shell magic:
$ echo "SELECT 17" > qq.txt
$ bq query "$(cat qq.txt)"
Waiting on bqjob_r603d91b7e0435a0f_00000150c56689c6_1 ... (0s) Current status: DONE
+-----+
| f0_ |
+-----+
| 17 |
+-----+
Note this works on any unix variant (including mac). If you're using a windows, this should work under powershell but not the default cmd prompt.
If you are using standard sql (Not Legacy Sql).
**Steps:**
1. Create .sql file (you can you any extension).
2. Put your query in that. Make sure (;) at the end of the query.
3. Go to command line ad execute below commands.
4. If you want add parameter then you have to specify sequentially.
Example:
bq query --use_legacy_sql=False "$(cat /home/airflow/projects/bql/query/test.sql)"
for parameter
bq query --use_legacy_sql=False --parameter=country::USA "$(cat /home/airflow/projects/bql/query/test.sql)"
cat >/home/airflow/projects/bql/query/test.sql
select * from l1_gcb_trxn.account where country=#country;
This thread offers good solution
bq query `cat my_query.sql`
bq query --replace --use_legacy_sql=false --destination_table=syw-analytics:store_ranking.SHC_ENGAGEMENT_RANKING_TEST
"SELECT RED,
DEC,
REDEM
from `\syw.abc.xyz\`"
Is there a Hive query to list only the views available in a particular database.
In MySql the query I think is below:
SELECT TABLE_NAME FROM information_schema.TABLES WHERE TABLE_TYPE LIKE 'VIEW' AND TABLE_SCHEMA LIKE 'database_name';
I want something similar for HiveQL.
There is no INFORMATION_SCHEMA implementation in Hive currently.
There is an Open JIRA which you can view at the following link:
https://issues.apache.org/jira/browse/HIVE-1010
However, if the Hive Metastore is configured using a Derby MySQL server then you can access the information you require.
The different ways of configuring a Hive Metastore can be found at:
http://www.thecloudavenue.com/2013/11/differentWaysOfConfiguringHiveMetastore.html
Here is a detailed E/R Diagram of the Metastore:
https://issues.apache.org/jira/secure/attachment/12471108/HiveMetaStore.pdf
After configuring this Metastore, you can obtain the information you want by a query like:
SELECT * from TBLS where TBLS_TYPE = "VIEW"
If you are stuck like me with older version of Hive and cannot use SHOW VIEWS then use the following script
export db=$1
tables=`hive -e "use $db; show tables;"`
show_table_command=''
for table in $tables; do
show_table_command="$show_table_command SHOW CREATE TABLE ${db}.${table};"
done
hive -e "$show_table_command" > show_table
sed_command="s/$db\.//g"
cat show_table | grep "CREATE VIEW" | cut -d" " -f3 | sed 's/`//g' | sed ${sed_command} > all_views
Run this code from as sh find_views.sh dbname. Now the table all_views has the list of views.
You can also use this same technique to find only tables by replacing "CREATE VIEW" to "CREATE TABLE" or "CREATE EXTERNAL TABLE" and adjusting the sed and cut statements accordingly.
SHOW VIEWS [in/from <dbName>] [<pattern>]
(much like SHOW TABLE command) can be tracked through https://issues.apache.org/jira/browse/HIVE-14558 patch is available and there is a chance it will land in Hive 2.0.
In command line, I'm trying to restore some (not all) tables of data from an MySQL SQL script file. However, my single database tables have a prefix and the sql file tables does not.
Is there a way within the command line to specify a prefix on restore?
mysql -uroot -p databasename < script_with_no_prefix.sql
You can pick out the tables you need using a sed command. For example, if your table prefix is "prefix_", you could use this:
$ sed -n -e '/^-- Table structure for table `prefix_/,/^UNLOCK TABLES/p' \
script_with_no_prefix.sql | mysql -uroot -p databasename