I have a partitioned table in bigquery which needs to be unpartitioned. It is empty at the moment so I do not need to worry about losing information.
I deleted the table by clicking on it > choosing delete > typing delete
but the table is still there even I refreshed the page or wait for 30 mins.
Then I tried the cloud shell terminal:
bq rm --table project:dataset.table_name
then it asked me to confirm and deleted the table but the table is still there!!!
Once I want to create an unpartitioned table with the same name, it gives an error that a table with this name already exists!
I have done this many times before, not sure why the table does not get removed?
Deleting partitions from Google Cloud Console is not supported. You need to execute the command bq rm with the -t shortcut in the Cloud Shell.
Before deleting the partitioned table, I suggest verifying you are going to delete the correct tables (partitions).
You can execute these commands:
for table in `bq ls --max_results=10000000 $MY_DATASET_ID | grep TABLE | grep $MY_PART_TABLE_NAME | awk '{print $1}'`; do echo $MY_DATASET_ID.$table; done
For the variable $MY_PART_TABLE_NAME, you need to write the table name and delete the partition date/value, like in this example "table_name_20200818".
After verifying that these are the correct partitions, you need to execute this command:
for table in `bq ls --max_results=10000000 $MY_DATASET_ID | grep TABLE | grep $MY_PART_TABLE_NAME | awk '{print $1}'`; do echo $MY_DATASET_ID.$table; bq rm -f -t $MY_DATASET_ID.$table; done
Related
Accidentally my python script has made a table with name as "ext_data_content_modec --replace" which we want to delete.
However BQ doesn't seem to recognize the table with spaces and keywords(--replace).
We have tried many variants of bq rm , as well as tried deleting the from BQ console but it doesn't work
For example, see below (etlt_dsc is dataset name).
$ bq rm 'etlt_dsc.ext_data_content_modec --replace'
BigQuery error in rm operation: Not found: Table boeing-prod-atm-next-dsc:etlt_dsc.ext_data_content_modec --replace
Besides above we tried below commands but nothing worked
bq rm "etlt_dsc.ext_data_content_modec --replace"
bq rm [etlt_dsc.ext_data_content_modec --replace]
bq rm [etlt_dsc.ext_data_content_modec --replace']
bq rm etlt_dsc.ext_data_content_modec \--replace
Would anyone has input for us please ?
You can try this:
$ bq ls mydataset
tableId Type Labels Time Partitioning Clustered Fields
---------------------------------- ------- -------- ------------------- ------------------
ext_data_content_modec --replace TABLE
$
$ bq rm "mydataset.ext_data_content_modec --replace"
rm: remove table 'data-lab:mydataset.ext_data_content_modec --replace'? (y/N) y
$
$ bq ls mydataset
$
I was able to figure out the solution.
'removal' didn't work but we were able to 'drop' the table from BQ console.
I have written following query in SQL Server to view Table and column names
select * from information_schema.COLUMNS where column_name like '%name%'
Is there any similar query which can be written in Hive for similar result?
If not then how can I find list of tables in a particular Database which contains a particular column?
I don't think there is any option available in Hive. You can use shell scripting to get the same output. Something like this:
output=""
hive -S -e 'show databases' | while read database
do
eval "hive -S -e 'show tables in $database'" | while read line
do
if eval "hive -S -e 'describe $database.$line'" | grep -q "<column_name"; then
eval "hive -S -e 'show columns in $database.$line'" | while read column
do
output="$output$database.$line,$column"'\n'
done
fi
done
done
echo -e "$output"
I don't want to delete tables one by one.
What is the fastest way to do it?
Basically you need to remove all partitions for the partitioned BQ table to be dropped.
Assuming you have gcloud already installed... do next:
Using terminal(check/set) GCP project you are logged in:
$> gcloud config list - to check if you are using proper GCP project.
$> gcloud config set project <your_project_id> - to set required project
Export variables:
$> export MY_DATASET_ID=dataset_name;
$> export MY_PART_TABLE_NAME=table_name_; - specify table name without partition date/value, so the real partition table name for this example looks like -> "table_name_20200818"
Double-check if you are going to delete correct table/partitions by running this(it will just list all partitions for your table):
for table in `bq ls --max_results=10000000 $MY_DATASET_ID | grep TABLE | grep $MY_PART_TABLE_NAME | awk '{print $1}'`; do echo $MY_DATASET_ID.$table; done
After checking run almost the same command, plus bq remove command parametrized by that iteration to actually DELETE all partitions, eventually the table itself:
for table in `bq ls --max_results=10000000 $MY_DATASET_ID | grep TABLE | grep $MY_PART_TABLE_NAME | awk '{print $1}'`; do echo $MY_DATASET_ID.$table; bq rm -f -t $MY_DATASET_ID.$table; done
The process for deleting a time-partitioned table and all the
partitions in it are the same as the process for deleting a standard
table.
So if you delete a partition table without specifying the partition it will delete all tables. You don't have to delete one by one.
DROP TABLE <tablename>
You can also delete programmatically (i.e. java).
Use the sample code of DeleteTable.java and change the flow to have a list of all your tables and partitions to be deleted.
In case needed for deletion of specific partitions only, you can refer to a table partition (i.e. partition by day) in the following way:
String mainTable = "TABLE_NAME"; // i.e. my_table
String partitionId = "YYYYMMDD"; // i.e. 20200430
String decorator = "$";
String tableName = mainTable+decorator+partitionId;
Here is the guide to run java BigQuery samples, and ensure to set your project in the cloud shell:
gcloud config set project <project-id>
I'm using the Google SDK bq command. How do I change the name of a table? I'm not seeing this at https://cloud.google.com/bigquery/bq-command-line-tool
You have to copy to a new table and delete the original.
$ bq cp dataset.old_table dataset.new_table
$ bq rm -f -t dataset.old_table
I don't think there is a way just rename table
What you can do is COPY table to a new table with desired name (copy is free of charge) and then delete original table
The only drawback I see with this is that if you have long term stored data - I think you will lose storage discount (50%) for that data
I have a dataset with around 200k tables that I'm trying to delete. I've been using the commandline tool to run bq rm -r -f datasetID, but it has only deleted about 4% in 24 hours. (I can only guess at the amount by logging into the web UI and seeing what tables are left). Is there a faster way to get that done?
Quite late, but here is how I did it:
Install jq and gnu parallel first. Substitute PROJECT_ID with your project's ID.
bq ls --project_id PROJECT_ID --max_results=100000 --format=prettyjson | jq '.[] | .id' | parallel --bar -P 10 bq --project_id PROJECT_ID rm -r -f -d
You might need to tune -P parameter's value for better deletion rate.
Warning: It will end up deleting all the tables and datasets in your project. You can perform a dry run with echo, analyze the output and then finally run the above command:
bq ls --project_id PROJECT_ID --max_results=100000 --format=prettyjson | jq '.[] | .id' | parallel --bar -P 10 echo bq --project_id PROJECT_ID rm -r -f -d
Deleted 100K tables across 9K datasets in 15 minutes.
One way to do so would be to iterate through the tables and delete them individually (possibly in parallel). Or an even faster way could be to set an expiration time on the tables that is only a very short time in the future.
This is not a highly optimized path since we don't often get users who want to delete that many tables at once.