I don't want to delete tables one by one.
What is the fastest way to do it?
Basically you need to remove all partitions for the partitioned BQ table to be dropped.
Assuming you have gcloud already installed... do next:
Using terminal(check/set) GCP project you are logged in:
$> gcloud config list - to check if you are using proper GCP project.
$> gcloud config set project <your_project_id> - to set required project
Export variables:
$> export MY_DATASET_ID=dataset_name;
$> export MY_PART_TABLE_NAME=table_name_; - specify table name without partition date/value, so the real partition table name for this example looks like -> "table_name_20200818"
Double-check if you are going to delete correct table/partitions by running this(it will just list all partitions for your table):
for table in `bq ls --max_results=10000000 $MY_DATASET_ID | grep TABLE | grep $MY_PART_TABLE_NAME | awk '{print $1}'`; do echo $MY_DATASET_ID.$table; done
After checking run almost the same command, plus bq remove command parametrized by that iteration to actually DELETE all partitions, eventually the table itself:
for table in `bq ls --max_results=10000000 $MY_DATASET_ID | grep TABLE | grep $MY_PART_TABLE_NAME | awk '{print $1}'`; do echo $MY_DATASET_ID.$table; bq rm -f -t $MY_DATASET_ID.$table; done
The process for deleting a time-partitioned table and all the
partitions in it are the same as the process for deleting a standard
table.
So if you delete a partition table without specifying the partition it will delete all tables. You don't have to delete one by one.
DROP TABLE <tablename>
You can also delete programmatically (i.e. java).
Use the sample code of DeleteTable.java and change the flow to have a list of all your tables and partitions to be deleted.
In case needed for deletion of specific partitions only, you can refer to a table partition (i.e. partition by day) in the following way:
String mainTable = "TABLE_NAME"; // i.e. my_table
String partitionId = "YYYYMMDD"; // i.e. 20200430
String decorator = "$";
String tableName = mainTable+decorator+partitionId;
Here is the guide to run java BigQuery samples, and ensure to set your project in the cloud shell:
gcloud config set project <project-id>
Related
I have a partitioned table in bigquery which needs to be unpartitioned. It is empty at the moment so I do not need to worry about losing information.
I deleted the table by clicking on it > choosing delete > typing delete
but the table is still there even I refreshed the page or wait for 30 mins.
Then I tried the cloud shell terminal:
bq rm --table project:dataset.table_name
then it asked me to confirm and deleted the table but the table is still there!!!
Once I want to create an unpartitioned table with the same name, it gives an error that a table with this name already exists!
I have done this many times before, not sure why the table does not get removed?
Deleting partitions from Google Cloud Console is not supported. You need to execute the command bq rm with the -t shortcut in the Cloud Shell.
Before deleting the partitioned table, I suggest verifying you are going to delete the correct tables (partitions).
You can execute these commands:
for table in `bq ls --max_results=10000000 $MY_DATASET_ID | grep TABLE | grep $MY_PART_TABLE_NAME | awk '{print $1}'`; do echo $MY_DATASET_ID.$table; done
For the variable $MY_PART_TABLE_NAME, you need to write the table name and delete the partition date/value, like in this example "table_name_20200818".
After verifying that these are the correct partitions, you need to execute this command:
for table in `bq ls --max_results=10000000 $MY_DATASET_ID | grep TABLE | grep $MY_PART_TABLE_NAME | awk '{print $1}'`; do echo $MY_DATASET_ID.$table; bq rm -f -t $MY_DATASET_ID.$table; done
I have a legacy unpartitioned big query table that streams logs from various sources (Let's say Table BigOldA). The aim is to transfer it to a new day partition table (Let's say PartByDay) which is done with the help of the following link:
https://cloud.google.com/bigquery/docs/creating-column-partitions#creating_a_partitioned_table_from_a_query_result
bq query
--allow_large_results
--replace=true
--destination_table <project>:<data-set>.<PartByDay>
--time_partitioning_field REQUEST_DATETIME
--use_legacy_sql=false 'SELECT * FROM `<project>.<data-set>.<BigOldA>`'
I have migrated the historical data to the new table but I cannot delete them in Table BigOldA as I am running into the same problem with running DMLs on streaming buffer tables are not supported yet.
Error: UPDATE or DELETE DML statements are not supported over
table <project>:<data-set>.BigOldA with streaming buffer
I was planning to run batch jobs everyday transferring T-1 data from Table BigOldA to Table PartByDay and deleting them periodically so that I can still maintain the streaming buffer data in Table BigOldA and start using PartByDay Table for analytics. Now I am not sure if it's achievable.
I am looking for an alternative solution or best practice on how to periodically transfer & maintain stream buffering table to partitioned table. Also, as the data is streaming from independent production sources it's not possible to point all sources streaming to PartByDay and streamingbuffer properties from tables.get is never null.
You could just delete the original table and then rename the migrated table to the original name after you've run the your history job. This assumes your streaming component to BigQuery is fault tolerant. If it's designed well, you shouldn't lose any data. Whatever is streaming to BigQuery should be able to store events until the table comes back online. It shouldn't change anything for your components that are streaming once the table is partitioned.
If anyone interested in the script, here you go.
#!/bin/sh
# This script
# 1. copies the data as the partitioned table
# 2. delete the unpartitioned table
# 3. copy the partitioned table to the same dataset table name
# TODO 4. deletes the copied table
set -e
source_project="<source-project>"
source_dataset="<source-dataset>"
source_table="<source-table-to-partition>"
destination_project="<destination-project>"
destination_dataset="<destination-dataset>"
partition_field="<timestamp-partition-field>"
destination_table="<table-copy-partition>"
source_path="$source_project.$source_dataset.$source_table"
source_l_path="$source_project:$source_dataset.$source_table"
destination_path="$destination_project:$destination_dataset.$destination_table"
echo "copying table from $source_path to $destination_path"
query=$(cat <<-END
SELECT * FROM \`$source_path\`
END
)
echo "deleting old table"
bq rm -f -t $destination_path
echo "running the query: $query"
bq query --quiet=true --use_legacy_sql=false --apilog=stderr --allow_large_results --replace=true --destination_table $destination_path --time_partitioning_field $partition_field "$query"
echo "removing the original table: $source_path"
bq rm -f -t $source_l_path
echo "table deleted"
echo "copying the partition table to the original source path"
bq cp -f -n $destination_path $source_l_path
I was looking at the documentation but I haven't found the way to Drop multiple tables using wild cards.
I was trying to do something like this but it doesn't work:
DROP TABLE
TABLE_DATE_RANGE([clients.sessions_],
TIMESTAMP('2017-01-01'),
TIMESTAMP('2017-05-31'))
For dataset stats and tables like daily_table_20181017 keeping dates conventions, I would go with simple script and gcloud Command-Line Tool:
for table in `bq ls --max_results=10000000 stats |grep TABLE |grep daily_table |awk '{print $1}'`; do echo stats.$table; bq rm -f -t stats.$table; done
DROP TABLE [table_name]; is now supported in bigquery. So here is a purely SQL/bigquery UI solution.
select concat("drop table ",table_schema,".", table_name, ";" )
from <dataset-name>.INFORMATION_SCHEMA.TABLES
where table_name like "partial_table_name%"
order by table_name desc
Audit that you are dropping the correct tables. Copy and paste back into bigquery to drop listed tables.
DDL e.g. DROP TABLE doesn't exist yet in BigQuery. However, I know Google are currently working on it.
In the meantime, you'll need to use the API to delete tables. For example, using the gCloud tool:
bq rm -f -t dataset.table
If you want to do bulk deletes, then you can use some bash/awk magic. Or, if you prefer, call the Rest API directly with e.g. the Python client.
See here too.
I just used python to loop this and solve it using Graham example:
from subprocess import call
return_code = call('bq rm -f -t dataset.' + table_name +'_'+ period + '', shell=True)
For a long time #graham's approach worked for me. Just recently the BQ CLI stopped working effectively and froze everytime I ran the above command. Hence I dug around for a new approach and used some parts of Google cloud official documentation. I followed the following approach using a Jupyter notebook.
from google.cloud import bigquery
# TODO(developer): Construct a BigQuery client object.
client = bigquery.Client.from_service_account_json('/folder/my_service_account_credentials.json')
dataset_id = 'project_id.dataset_id'
dataset = client.get_dataset(dataset_id)
# Creating a list of all tables in the above dataset
tables = list(client.list_tables(dataset)) # API request(s)
## Filtering out relevant wildcard tables to be deleted
## Mention a substring that's common in all your tables that you want to delete
tables_to_delete = ["{}.{}.{}".format(dataset.project, dataset.dataset_id, table.table_id)
for table in tables if "search_sequence_" in format(table.table_id)]
for table in tables_to_delete:
client.delete_table(table)
print("Deleted table {}".format(table)) ```
To build off of #Dengar 's answer. You can use procedural SQL in BigQuery to run all of those delete statements in a for loop like so:
FOR record IN (
select concat(
"drop table ",
table_schema,".", table_name, ";" ) as del_stmt
from <dataset_name>.INFORMATION_SCHEMA.TABLES
order by table_name) DO
-- create the views
EXECUTE IMMEDIATE
FORMAT( """
%s
""", record.del_stmt);
END
FOR;
Add a WHERE condition if you do not want to delete all tables in the dataset.
With scripting and table information schema available, the following can also be used directly in the UI.
I would not recommend this for removing a larger number of tables.
FOR tn IN (SELECT table_name FROM yourDataset.INFORMATION_SCHEMA.TABLES WHERE table_name LIKE "filter%")
DO
EXECUTE IMMEDIATE FORMAT("DROP TABLE yourDataset.%s", tn.table_name);
END FOR;
I'm using the Google SDK bq command. How do I change the name of a table? I'm not seeing this at https://cloud.google.com/bigquery/bq-command-line-tool
You have to copy to a new table and delete the original.
$ bq cp dataset.old_table dataset.new_table
$ bq rm -f -t dataset.old_table
I don't think there is a way just rename table
What you can do is COPY table to a new table with desired name (copy is free of charge) and then delete original table
The only drawback I see with this is that if you have long term stored data - I think you will lose storage discount (50%) for that data
I'm using Hive to aggregate stats, and I want to do a breakdown by the industry our customers fall under. Ideally, I'd like to write the stats for each industry to a separate output file per industry (e.g. industry1_stats, industry2_stats, etc.). I have a list of various industries our customers are in, but that list isn't pre-set.
So far, everything I've seen from Hive documentation indicates that I need to know what tables I'd want beforehand and hard-code those into my Hive script. Is there a way to do this dynamically, either in the Hive script itself (preferable) or through some external code before kicking off the Hive script?
I would suggest go for a shell script..
Get the list of columns
hive -e 'select distinct industry_name from [dbname].[table_name];' > list
Iterate over every line... passing every line(industry names) of list as argument to the do while loop
tail -n +1 list | while IFS=' ' read -r industry_name
do
hive -hiveconf MY_VAR=$industry_name -f my_script.hql
done
save the shell script as test.sh
and in my_script.hql
use uvtest;
create table ${hiveconf:MY_VAR} (id INT, name CHAR(10));
you'll have to place both the test.sh and my_script.hql in the same folder.
Below command should create all the tables from list of column names.
sh test.sh
Follow this link for using hive in shell scripts:
https://www.mapr.com/blog/quick-tips-using-hive-shell-inside-scripts
I wound up achieving this using Hive's dynamic partitioning (each partition writes to a separate directory on disk, so I can just iterate through that file). The official Hive documentation on partitioning and this blog post were particularly helpful for me.