Get table names in dataset after dataset truncate - google-bigquery

It seems that the BigQuery CLI supports restoring tables in a dataset after they have been deleted by using BigQuery Time Travel functionality -- as in:
bq cp dataset.table#TIME_AGO_UNIX dataset.table
However, this assumes we know the names of the tables. I want to write a script to iterate over all the tables that were in the dataset at TIME_AGO_UNIX time.
How would I go about finding those tables at that time?

Related

Duplicate several tables in bigquery project at once

In our BQ export schema, we have one table for each day as per the screenshot below.
I want to copy the tables before a certain date (2021-feb-07). I know how to copy one day at a time via the UI, but is there not a way to use the cloud console to write a code for copying the selected date range, all at once? Or maybe an sql command directly from a query window?
I think you should transform your sharding tables into a partitioned table. So you can handled your tables with just a single query. As mention in the official documentation, partitioned tables perform better.
To make the conversion, you can just execute the following commands in the console.
bq partition \
--time_partitioning_type=DAY \
--time_partitioning_expiration 259200 \
mydataset.sourcetable_ \
mydataset.mytable_partitioned
This will make your sharded tables sourcetable_(xxx) into a single partitioned table mytable_partitioned which can be query with just a single query trough your entire set of data entries.
SELECT
*
FROM
`myprojectid.mydataset.mytable_partitioned`
WHERE
_PARTITIONTIME BETWEEN TIMESTAMP('2022-01-01') AND TIMESTAMP('2022-01-03')
For more details about the conversion commands you can check this link. Also, I recommend to check the links about querying partionated tables and partiotioned tables for more details.

How to ETL from BigQuery to BigQuery?

I have an external data source that will load data into several tables, in batch mode, several times a day. However, the tables need cleanings, transferring, joining, etc...
Once raw tables are transformed they are visualized on data visualization software (PowerBI or Data Studio).
Now the question is how to update the result tables every time new data is loaded into raw tables?.
In BigQuery creating a table from other tables then when updating the source tables they don't update the created tables, right?
Another solution is to use something like cloud composer to query source tables and load data into destination tables.

Write Bigquery results to Bigquery in Datalab

I ran a query abc and got the result table m in Datalab.
Is there any way I can create a new table in Bigquery and write the content of table m to it in Datalab?
%%bq query --name abc
select *
from `test`
m=abc.execute().result()
Yes you can, Datalab supports a number of BigQuery magic commands, among them is create, which allows for the creation of datasets and tables.
See the documentation here:
http://googledatalab.github.io/pydatalab/datalab.magics.html
And specifically this section, which details all the available BigQuery commands:
positional arguments:
{sample,create,delete,dryrun,udf,execute,pipeline,table,schema,datasets,tables,extract,load}
commands
sample Display a sample of the results of a BigQuery SQL
query. The cell can optionally contain arguments for
expanding variables in the query, if -q/--query was
used, or it can contain SQL for a query.
create Create a dataset or table.
delete Delete a dataset or table.
dryrun Execute a dry run of a BigQuery query and display
approximate usage statistics
udf Create a named Javascript BigQuery UDF
execute Execute a BigQuery SQL query and optionally send the
results to a named table. The cell can optionally
contain arguments for expanding variables in the
query.
pipeline Define a deployable pipeline based on a BigQuery
query. The cell can optionally contain arguments for
expanding variables in the query.
table View a BigQuery table.
schema View a BigQuery table or view schema.
datasets List the datasets in a BigQuery project.
tables List the tables in a BigQuery project or dataset.
extract Extract BigQuery query results or table to GCS.
load Load data from GCS into a BigQuery table.

Finding Bigquery table size across all projects

We are maintaining a table in Bigquery that captures all the activity logs from the Stack driver logs. This table helps me list all the tables present, User, who created the table, what was the last command run on the table etc across projects and data sets in our organization. Along with this information, I also want the table size for the tables I am trying to check.
I can Join with the TABLES and TABLE_SUMMARY however I need to explicitly specify the project and dataset I want to query, but my driving table has details of multiple projects and Datasets.
Is there any other metadata table I can get the table size from, or any logs that I can load into a Bigquery table to join and get the desired results
You can use the bq command line tool. With the command:
bq show --format=prettyjson
This provides the numBytes, datasetId, projectId and more.
With a script you can use:
bq ls
and loop through the datasets and tables in each projects to get the information needed. Keep in mind that you can also use API or a client library.

Create Partition table in Big Query

Can anyone please suggest how to create partition table in Big Query ?.
Example: Suppose I have one log data in google storage for the year of 2016. I stored all data in one bucket partitioned by year , month and date wise. Here I want create table with partitioned by date.
Thanks in Advance
Documentation for partitioned tables is here:
https://cloud.google.com/bigquery/docs/creating-partitioned-tables
In this case, you'd create a partitioned table and populate the partitions with the data. You can run a query job that reads from GCS (and filters data for the specific date) and writes to the corresponding partition of a table. For example, to load data for May 1st, 2016 -- you'd specify the destination_table as table$20160501.
Currently, you'll have to run several query jobs to achieve this process. Please note that you'll be charged for each query job based on bytes processed.
Please see this post for some more details:
Migrating from non-partitioned to Partitioned tables
There are two options:
Option 1
You can load each daily file into separate respective table with name as YourLogs_YYYYMMDD
See details on how to Load Data from Cloud Storage
After tables created, you can access them either using Table wildcard functions (Legacy SQL) or using Wildcard Table (Standar SQL). See also Querying Multiple Tables Using a Wildcard Table for more examples
Option 2
You can create Date-Partitioned Table (just one table - YourLogs) - but you still will need to load each daily file into respective partition - see Creating and Updating Date-Partitioned Tables
After table is loaded you can easily Query Date-Partitioned Tables
Having partitions for an External Table is not allowed as for now. There is a Feature Request for it:
https://issuetracker.google.com/issues/62993684
(please vote for it if you're interested in it!)
Google says that they are considering it.