I am trying to move the GA4 data of two websites from BigQuery to a Snowflake table using Matillion ETL. BigQuery tables are named events_YYYYMMDD format. The query I am using in the Matillion BigQuery orchestration job is below:
select * from events_* WHERE _table_suffix = FORMAT_DATE('%Y%m%d', DATE_SUB(CURRENT_DATE(), INTERVAL 2 day));
I have to run this job multiple times a day since the time at which GA4 data becomes available in BigQuery is unpredictable. Also, I will have multiple websites for which data comes to the same Bigquery account at uneven times. I need to capture all these data to Snowflake.
But Running this job multiple times results in duplicate records in the Snowflake table. How can I ensure only unique records are moved from BigQuery to Snowflake?
Related
The View logs routed to BigQuery document says:
When creating a sink to route your logs to BigQuery, you can use
either date-sharded tables or partitioned tables. The default
selection is a date-sharded table
The Introduction to partitioned tables document says:
You cannot use legacy SQL to query partitioned tables or to write
query results to partitioned tables.
While there is a Query partitioned tables document detailing available methods, this all seems like a lot of rigmarole for a simple store of logs data? Is there a good reason to use BigQuery as a log sink?
I have linked Firebase events to BigQuery and my goal is to pull the events into S3 from BigQuery using AWS Glue.
When you link Firebase to BigQuery, it creates a default dataset and a date-partitioned table something like this:
analytics_456985675.events_20230101
analytics_456985675.events_20230102
I'm used to querying the events in BigQuery using
Select
...
from analytics_456985675.events_*
where date >= [date]
However, when configuring the Glue ETL job, it refuses to work with this format for a table analytics_456985675.events_* I get this error message:
it seems the Glue job will only work when I specify a single table.
How can I create a Glue ETL job that pulls data from BigQuery incrementally if I have to specify a single partition table?
It seems that the BigQuery CLI supports restoring tables in a dataset after they have been deleted by using BigQuery Time Travel functionality -- as in:
bq cp dataset.table#TIME_AGO_UNIX dataset.table
However, this assumes we know the names of the tables. I want to write a script to iterate over all the tables that were in the dataset at TIME_AGO_UNIX time.
How would I go about finding those tables at that time?
I have a BigQuery database, i want to create dynamic tables.
Ex: table_20170609 - if date is 9th june 2017
table_20170610 - if date is 10th june 2017
Daily i will get some excel data and i have to upload to the above dynamically created table. Data in excel is not day wise, it will be from start date to today's date.
I know connecting bigquery to tableau and running queries. Is there any automated method where tableau will read dynamic table from bigquery and generate the report.
current working - i have created one table(reports) and everyday i will rename the table reports to reports_bkp_date and will create new table reports.
I'm new to bigquery and tableau, i would like to know -
How to create dynamic tables in bigquery?
How to connect dynamic table to tableau (daily i should not change table name manually)?
You have two immediate options - firstly, create a view in BigQuery (instead of a table) which will collate together all relevant tables, then connect to this in Tableau.
The better approach, given that you are having to manually upload a new table every day, is to use a wildcard table connection in Tableau and use a similar naming convention for your data tables, for example you might use DailyData_2017_* to capture all tables in he following format:
DailyData_2017_06_01
DailyData_2017_06_02
DailyData_2017_06_03
Finally, note that you can append to a table in BigQuery, rather than replacing it's contents. If your data is time stamped then this might work for you too.
Ben
Can anyone please suggest how to create partition table in Big Query ?.
Example: Suppose I have one log data in google storage for the year of 2016. I stored all data in one bucket partitioned by year , month and date wise. Here I want create table with partitioned by date.
Thanks in Advance
Documentation for partitioned tables is here:
https://cloud.google.com/bigquery/docs/creating-partitioned-tables
In this case, you'd create a partitioned table and populate the partitions with the data. You can run a query job that reads from GCS (and filters data for the specific date) and writes to the corresponding partition of a table. For example, to load data for May 1st, 2016 -- you'd specify the destination_table as table$20160501.
Currently, you'll have to run several query jobs to achieve this process. Please note that you'll be charged for each query job based on bytes processed.
Please see this post for some more details:
Migrating from non-partitioned to Partitioned tables
There are two options:
Option 1
You can load each daily file into separate respective table with name as YourLogs_YYYYMMDD
See details on how to Load Data from Cloud Storage
After tables created, you can access them either using Table wildcard functions (Legacy SQL) or using Wildcard Table (Standar SQL). See also Querying Multiple Tables Using a Wildcard Table for more examples
Option 2
You can create Date-Partitioned Table (just one table - YourLogs) - but you still will need to load each daily file into respective partition - see Creating and Updating Date-Partitioned Tables
After table is loaded you can easily Query Date-Partitioned Tables
Having partitions for an External Table is not allowed as for now. There is a Feature Request for it:
https://issuetracker.google.com/issues/62993684
(please vote for it if you're interested in it!)
Google says that they are considering it.