we have deleted some dataset and now we want them(deleted before 24 hours). so i check the documentation and followed the steps,
1. reacreate dataset with same name
2. bq cp mydataset.mytable# -3600000 mydataset.newtable
i also recreated table schema in db and then tried above.
but its giving me error that dataset is not present in my region.
I tried with select,
SELECT * FROM `mydataset.mytable` FOR SYSTEM_TIME AS OF TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 48 HOUR);
but no luck.
can anyone help here. what am I missing ?
is there any time bound for this restoration if we deleted dataset ? (like for table we can go back 7 days)
Note: for deletion we followed below steps:
drop command to drop table
once all tables dropped then we deleted dataset
Deleting a dataset is permanent and cannot be undone. After you delete
a dataset, it cannot be recovered or restored.
BigQuery allows you to recover deleted tables. If you are within the dataset's Time Travel Window you can restore tables, see more at the next document.
You can restore the tables from deleted dataset after recreating it with the same name. However, you have to know tables names.
# 1) Delete the dataset
bq rm -r -f -d TEST_DATASET
# 2) Create a new dataset for restore
bq mk -d --location=us-east4 TEST_DATASET_RESTORED
# 3) Recreate the origin dataset with the same name
bq mk -d --location=us-east4 aw-sndbx-dataplatform-01:TEST_DATASET
# 4) Restore the table from the latest snapshot (alias #0)
bq cp TEST_DATASET.test_table#0 TEST_DATASET_RESTORED.test_table
Related
It seems that the BigQuery CLI supports restoring tables in a dataset after they have been deleted by using BigQuery Time Travel functionality -- as in:
bq cp dataset.table#TIME_AGO_UNIX dataset.table
However, this assumes we know the names of the tables. I want to write a script to iterate over all the tables that were in the dataset at TIME_AGO_UNIX time.
How would I go about finding those tables at that time?
We have a BigQuery dataset that has some long list of tables (with data) in it. Since I am taking over a data pipeline, which I want to familiarize myself with by doing tests, I want to duplicate those dataset/tables without copying-truncating the tables. Essentially, I want to re-create those tables in a test dataset using their schema. How can this be done in bq client?
You have a couple of options considering you don't want to copy the data but the schema:
1.- extract the schema for each table and then create new ones just empty.
$ bq show --schema --format=prettyjson [PROJECT_ID]:[DATASET].[TABLE] > [SCHEMA_FILE]
$ bq mk --table [PROJECT_ID]:[NEW_DATASET].[TABLE] [SCHEMA_FILE]
2.- run a query with LIMIT 0 and setting a destination table.
bq query "SELECT * FROM [DATASET].[TABLE] LIMIT 0" --destination_table [NEW_DATASET].[TABLE]
Is it possible to query from a recently expired view in Big Query and save a snapshot? (expired 2h ago)
You can try Managing tables
In the documentation there is some examples on how to do that in section Restoring deleted tables.
You can undelete a table within seven days of deletion, including explicit deletions and implicit deletions due to table expiration. After seven days, it is not possible to undelete a table using any method, including opening a support ticket.
You can restore a deleted table by:
Using the # snapshot decorator in the bq command-line tool
Using the client libraries
To restore a table, use a table copy operation with the # snapshot decorator. First, determine a UNIX timestamp of when the table existed (in milliseconds). Then, use the bq copy command with the snapshot decorator.
For example, enter the following command to copy mydataset.mytable at the time 1418864998000 into a new table mydataset.newtable.
bq cp mydataset.mytable#1418864998000 mydataset.newtable
(Optional) Supply the --location flag and set the value to your location.
You can also specify a relative offset. The following example copies the version of a table from one hour ago:
bq cp mydataset.mytable#-3600000 mydataset.newtable
For more information, see Restore a table from a point in time.
I have a dataset, which consists of ~600 tables. I want to create a new dataset with no tables and same name using CLI.
At the moment I'm iterating through all of the tables and dropping them with "bq rm" 1 by one, but it takes ~20min. Maybe I can simply drop dataset without removing the tables first?
Use the -r flag. For example:
bq rm -rf dataset_name
The -f flag means "force", so the command won't prompt for confirmation.
You can just simply use contextual menu in BigQuery UI to delete dataset
When you try and dataset has tables in it - you will see below form
Just type your dataset_name to confirm deletion - and you are done!
I have a legacy unpartitioned big query table that streams logs from various sources (Let's say Table BigOldA). The aim is to transfer it to a new day partition table (Let's say PartByDay) which is done with the help of the following link:
https://cloud.google.com/bigquery/docs/creating-column-partitions#creating_a_partitioned_table_from_a_query_result
bq query
--allow_large_results
--replace=true
--destination_table <project>:<data-set>.<PartByDay>
--time_partitioning_field REQUEST_DATETIME
--use_legacy_sql=false 'SELECT * FROM `<project>.<data-set>.<BigOldA>`'
I have migrated the historical data to the new table but I cannot delete them in Table BigOldA as I am running into the same problem with running DMLs on streaming buffer tables are not supported yet.
Error: UPDATE or DELETE DML statements are not supported over
table <project>:<data-set>.BigOldA with streaming buffer
I was planning to run batch jobs everyday transferring T-1 data from Table BigOldA to Table PartByDay and deleting them periodically so that I can still maintain the streaming buffer data in Table BigOldA and start using PartByDay Table for analytics. Now I am not sure if it's achievable.
I am looking for an alternative solution or best practice on how to periodically transfer & maintain stream buffering table to partitioned table. Also, as the data is streaming from independent production sources it's not possible to point all sources streaming to PartByDay and streamingbuffer properties from tables.get is never null.
You could just delete the original table and then rename the migrated table to the original name after you've run the your history job. This assumes your streaming component to BigQuery is fault tolerant. If it's designed well, you shouldn't lose any data. Whatever is streaming to BigQuery should be able to store events until the table comes back online. It shouldn't change anything for your components that are streaming once the table is partitioned.
If anyone interested in the script, here you go.
#!/bin/sh
# This script
# 1. copies the data as the partitioned table
# 2. delete the unpartitioned table
# 3. copy the partitioned table to the same dataset table name
# TODO 4. deletes the copied table
set -e
source_project="<source-project>"
source_dataset="<source-dataset>"
source_table="<source-table-to-partition>"
destination_project="<destination-project>"
destination_dataset="<destination-dataset>"
partition_field="<timestamp-partition-field>"
destination_table="<table-copy-partition>"
source_path="$source_project.$source_dataset.$source_table"
source_l_path="$source_project:$source_dataset.$source_table"
destination_path="$destination_project:$destination_dataset.$destination_table"
echo "copying table from $source_path to $destination_path"
query=$(cat <<-END
SELECT * FROM \`$source_path\`
END
)
echo "deleting old table"
bq rm -f -t $destination_path
echo "running the query: $query"
bq query --quiet=true --use_legacy_sql=false --apilog=stderr --allow_large_results --replace=true --destination_table $destination_path --time_partitioning_field $partition_field "$query"
echo "removing the original table: $source_path"
bq rm -f -t $source_l_path
echo "table deleted"
echo "copying the partition table to the original source path"
bq cp -f -n $destination_path $source_l_path