Is there a way to run a "backload" of data from a GA4 property to BigQuery? I've successfully connected the intraday and daily streaming but want to get 90 days of historical data.
Related
I've been running the GA4 to BigQuery Streaming export for almost a month now because the amount of daily events is bigger than the daily export limit (2.7 mio events VS 1 mio events).
Google docs (https://support.google.com/firebase/answer/7029846?hl=en):
If the Streaming export option is enabled, a table named events_intraday_YYYYMMDD is created. This table is populated continuously as events are recorded throughout the day. This table is deleted at the end of each day once events_YYYYMMDD is complete.
According to the docs I should have events_YYYYMMDD tables for previous days and events_intraday_YYYYMMDD table for current day. But that's not the case - all I'm stuck with are events_intraday_YYYYMMDD tables for previous days.
Am I missing something or not reading the docs correctly?
Should I or shouldn't I expect the events_YYYYMMDD tables to be automatically created and filled?
If that's the case then I guess I have to take care of doing this backup by myself?
I have a fact/table which used to run daily and it doesnot store the data of previous day. I want to create a fact on top of this which will store the data of prvious day along with the daily data.
Best Regards,
Santosh
I am looking at moving our shopify data to BigQuery for reporting purposes. I paginate through the customers endpoint from the shopify API and get all the customer level data. I then export this into a csv that I store on google cloud storage and then import to BigQuery. My question is what is the best way to deal with incremental data loads, given that some of the entries on the current customer datamart (for example, total order count) might have changed and some new customers might have been created since the last table udpate. any advice on the design pattern would be appreciated. Many thanks
To handle incremental data which is getting loaded on GCS (source) and target is Bigquery, you have couple of Google options:-
Dataflow:- You can create a Dataflow pipeline and load Incremental data to Bigquery (intermediate tables). Once data is loaded on Bigquery intermediate table, then you can calculate Current status on Joining 2 tables (target & intermediate) and get latest data appended to target Bigquery tables.
Data calculation can be done through scheduled Dataflow pipeline or through scheduled Bigquery.
DataPrep:- Here you can refer, how to create ETL Pipeline. You can add target (BigQuery table) as reference.
I am using firebase analytics and bigguery with average of 50~60 GB daily data.
For the most recent daily table, a query gives different result from yesterday even if query conditions are exact same including target date.
I just found that there are 1~2days gap between table creation date and last modified date.
I assume the difference between the query results are because of this. (Calculating on different data volume, maybe)
Is this date gap means a single daily table needs at least 2 days to be fully loaded from intraday table?
Thanks in advance.
biqguery table info
In the documentation we can find the following information:
After you link a project to BigQuery, the first daily export of
events creates a corresponding dataset in the associated BigQuery
project. Then, each day, raw event data for each linked app populates
a new daily table in the associated dataset, and raw event data is
streamed into a separate intraday BigQuery table in real-time.
It seems that the intraday table is loaded to the main table each day and if you want to access this data in real-time you`ll have to use this intraday separate table.
If this information doesn`t help you, please provide some extra information so I can help you more efficiently.
I am trying to explore BigQuery's abilities to load CSV file (Doulbelick impression data) into BigQuery's partitioned table. My use case includes:
1. Reading daily (nightly load) dumps (csv) from Google cloud storage for my customer's (ad agency) 30 different clients into BQ. Daily dump may contain data from previous day/week. All data should be loaded into respective daily partition (into BQ) so as to provide daily reporting to individual clients.
2.The purpose here is to build an analytical system that gives ad agency an ability to run "Trends & Pattern over time and across clients".
I am new to BQ and thus trying to understand its Schema layout.
Should i create a single table with daily partitions (holding data from all 50 clients/50 daily load -csv files)? Does the partitions need to be created well in advance ?
Should i create 50 different tables(partitioned by date) for each client so as NOT to run into any data sharing/security concerns of a single table option ?
My customer wants a simple solution with min cost.
If you are going to use transfer service (as mentioned in the comment), you don't need to create tables by hand. Instead transfer service will do that for you. Transfer service will schedule daily jobs and load data into partition. Also, if there is short delay (2-3 days), transfer service will still pick up the data.