Need to replicate tables from Prod to Test within Bigquery. Apart from BQ export/import, please let me know if there are any replication utility/tools within Bigquery.
Thanks.
To copy a table in BigQuery you can use several methods:
bq tool
Transfer the entire dataset using the transfer tool
Copy using the BigQuery UI
You can also query the table and write its results to a new table
You can try these options:
BigQuery Data Transfer Service:
https://cloud.google.com/bigquery-transfer/docs/working-with-transfers
Copy Tables:
bq cp source-project:dataset.table target-project:dataset.table
CREATE TABLE AS SELECT (CTAS):
CREATE TABLE `target-project.dataset.table` AS SELECT * FROM `source-project.dataset.table`
BigQuery API Client Libraries:
https://cloud.google.com/bigquery/docs/reference/libraries
Related
I have linked Firebase events to BigQuery and my goal is to pull the events into S3 from BigQuery using AWS Glue.
When you link Firebase to BigQuery, it creates a default dataset and a date-partitioned table something like this:
analytics_456985675.events_20230101
analytics_456985675.events_20230102
I'm used to querying the events in BigQuery using
Select
...
from analytics_456985675.events_*
where date >= [date]
However, when configuring the Glue ETL job, it refuses to work with this format for a table analytics_456985675.events_* I get this error message:
it seems the Glue job will only work when I specify a single table.
How can I create a Glue ETL job that pulls data from BigQuery incrementally if I have to specify a single partition table?
I like to export data from Big query to Google cloud storage using any script. Also for multiple table using loop save in CSV format and overwrite existing file.
Also how can we schedule this script.
If anybody have answer that will be great help.
Thanks in advance
Common way to approach this problem is to use Airflow and write a DAG to meet your requirements.
But if you want to iterate tables and dump them in GCS on a regular basis only with BigQuery, following could be another option.
1. Export Data
You can export data to GCS with EXPORT DATA statement in BigQuery script.
EXPORT DATA OPTIONS(
uri='gs://bucket/folder/*.csv',
format='CSV',
overwrite=true,
header=true,
field_delimiter=';') AS
SELECT field1, field2 FROM mydataset.table1 ORDER BY field1 LIMIT 10
https://cloud.google.com/bigquery/docs/reference/standard-sql/other-statements
2. Loops and Dynamic SQL
If you have a list of table you want to dump, you can loop those tables in BigQuery FOR loop.
https://cloud.google.com/bigquery/docs/reference/standard-sql/procedural-language#loops
And you need to generate EXPORT DATA script dynamically for each table. To do so, you can use EXECUTE IMMEDIATE Dynamic SQL.
https://cloud.google.com/bigquery/docs/reference/standard-sql/procedural-language#execute_immediate
3. Scheduling
BigQuery provides a feature to schedule a user query and you can use it for your purpose.
https://cloud.google.com/bigquery/docs/scheduling-queries#set_up_scheduled_queries
We are using Google Dataproc cluster and spark-sql shell.
And able to create a table as follows:
CREATE TABLE table_bq
USING bigquery
OPTIONS (
project 'project',
dataset 'dataset',
table 'bq_table'
);
This connects to BigQuery for all query purposes, however, when we try to do
INSERT OVERWRITE TABLE table_bq SELECT ....;
It fails with error:
) does not allow insertion.;;
Any pointers on how can we load data into BigQuery from spark-sql ?
Note: I have seen example of writing data to BigQuery with spark with dataframe, however my question is there anyway to do with spark-sql?
I am creating a Airflow pipeline where I use the BigQueryOperator to query my BigQuery tables and use the BigQueryToCloudStorageOperator to export the result table to GCS as csv.
I need to move the csv to a mysql database where it should be stored as a table in the mysql database.
Can I please get any advice or ideas on how to implement this. Thanks!
Since your use case is query data in BigQuery and store data in your MySql database you can use BigQueryToMySqlOperator.
Fetches the data from a BigQuery table (alternatively fetch data for
selected columns) and insert that data into a MySQL table.
I ran a query abc and got the result table m in Datalab.
Is there any way I can create a new table in Bigquery and write the content of table m to it in Datalab?
%%bq query --name abc
select *
from `test`
m=abc.execute().result()
Yes you can, Datalab supports a number of BigQuery magic commands, among them is create, which allows for the creation of datasets and tables.
See the documentation here:
http://googledatalab.github.io/pydatalab/datalab.magics.html
And specifically this section, which details all the available BigQuery commands:
positional arguments:
{sample,create,delete,dryrun,udf,execute,pipeline,table,schema,datasets,tables,extract,load}
commands
sample Display a sample of the results of a BigQuery SQL
query. The cell can optionally contain arguments for
expanding variables in the query, if -q/--query was
used, or it can contain SQL for a query.
create Create a dataset or table.
delete Delete a dataset or table.
dryrun Execute a dry run of a BigQuery query and display
approximate usage statistics
udf Create a named Javascript BigQuery UDF
execute Execute a BigQuery SQL query and optionally send the
results to a named table. The cell can optionally
contain arguments for expanding variables in the
query.
pipeline Define a deployable pipeline based on a BigQuery
query. The cell can optionally contain arguments for
expanding variables in the query.
table View a BigQuery table.
schema View a BigQuery table or view schema.
datasets List the datasets in a BigQuery project.
tables List the tables in a BigQuery project or dataset.
extract Extract BigQuery query results or table to GCS.
load Load data from GCS into a BigQuery table.