Ingesting partitioned BigQuery table in Azure Synapse - google-bigquery

I am trying to ingest a BigQuery table (from GA) into Azure Synapse but the table is partitioned and every day a new partition is added to it. The table names are like this:
analytics_36523423.events_20230207
analytics_36523423.events_20230208
analytics_36523423.events_20230209
How can I set it up to automatically copy data from new tables?

Related

How to query date-partitioned Google BigQuery table using AWS Glue BigQuery Connector?

I have linked Firebase events to BigQuery and my goal is to pull the events into S3 from BigQuery using AWS Glue.
When you link Firebase to BigQuery, it creates a default dataset and a date-partitioned table something like this:
analytics_456985675.events_20230101
analytics_456985675.events_20230102
I'm used to querying the events in BigQuery using
Select
...
from analytics_456985675.events_*
where date >= [date]
However, when configuring the Glue ETL job, it refuses to work with this format for a table analytics_456985675.events_* I get this error message:
it seems the Glue job will only work when I specify a single table.
How can I create a Glue ETL job that pulls data from BigQuery incrementally if I have to specify a single partition table?

How to create Column name containing spaces in GCP Bigquery table?

I am migrating mssql data into Bigquery using dataflow Jdbc to Bigquery template.
I am creating Table with same in schema in bigquery and then running dataflow pipeline.
But there are some tables in mssql where column name contains spaces (e.g. Employee Details). How can i create same columns in Bigquery which contains spaces?

Add new column to the existing table in Delta lake(Gen2 blob storage)

Curious to know, can we add a new column to the existing Delta Lake table stored in the Gen2 blob storage, based on the business use case, i will need to add additional 3 more columns to one of the table in delta lake.
ALTER TABLE doesn't worked for me.
Any help would be appreciated.

Update changes in Azure SQL Data Warehouse using polybase

I want help regarding Azure SQL Data Warehouse, I'm using Polybase to ELT data from Azure Data Lake Storage Gen2 to Azure SQL DW. When we load data first time into DW no issues. But when we load data again/incremental load how do we upsert data?
Flow we are using
ASDL2 -> (polybase) -> External table -> (CTAS) -> Staging tables -> (transformation) -> dimension tables
Everytime data changes we reload data into ASDL2,
What is the best way to UPSERT data or we should also reload data into SQLDW?
Because MERGE is not supported in Azure Data Warehouse, you need to use other means to load data from the External tables to your Stage tables. PolyBase can be used to load both initial and incremental data to the external table schema but it is how you perform the loading to the staging tables.
The following is a great tutorial on how to deploy this solution: Using PolyBase to Update Tables in Data Warehouse from ADLS
Once the data is loaded to the external tables via PolyBase in a ADFv2 pipeline, a trigger is called to execute an sp in ADWH to perform the load to the staging tables.

BigQuery: change date partitioned table to ingestion time partitioned table

I have an BigQuery date partitioned table that I want to convert to an ingestion time partitioned table (partitioned on _PARTITIONTIME), using the current date partitioning to feed into _PARTITIONTIME. How can I do this?
WHY? Because only ingestion partitioned tables can be incrementally loaded to using BigQuery's scheduled query functionality (by using the #rundate parameter as partition decorator)
One option is to disable the scheduled query first and copy the column-based partitioned table to a ingestion-time partitioned table. Then re-enable the scheduled query. Please follow steps:
Disable the scheduled query through the BigQuery UI: disable option on scheduled query
Create a new ingestion-time partitioned table (called ingestion_time_partitioned) and copy the column-based partitioned table (called table_column_partitioned) to the new table (ingestion_time_partitioned).
Edit the scheduled query to write to the new ingestion-time partitioned table (ingestion_time_partitioned). Please remember to re-enable the scheduled query and remove the partition field (which is used for column-based partition).
Copying from column-based partitioned table to a ingestion-time partitioned table will correctly map the column-based partition to the ingestion-time-based partition. And copy job on BigQuery is free. For more information about copying partitioned tables, please see https://cloud.google.com/bigquery/docs/managing-partitioned-tables#copying_partitioned_tables