I've been running the GA4 to BigQuery Streaming export for over a month now because the amount of daily events is bigger than the daily export limit (currently around 1.5 million events per day).
Google docs (https://support.google.com/analytics/answer/7029846#tables): If the Streaming export option is enabled, a table named events_intraday_YYYYMMDD is created. This table is populated continuously as events are recorded throughout the day. This table is deleted at the end of each day once events_YYYYMMDD is complete.
According to the docs I should have events_YYYYMMDD tables for previous days and events_intraday_YYYYMMDD table for current day. But that's not the case - all I'm stuck with are events_intraday_YYYYMMDD tables for previous days.
This is the same issue reported in the following posts (I actually copied and pasted from the first post):
BigQuery events_intraday_ tables are being generated daily but no daily events_ table is created
Firebase Analytics doesn't export events table to Bigquery, even though streaming export is enabled
GA4 exports only intraday tables to BigQuery
Unfortunately, none of these posts have a solution and I don't yet have enough reputation here on SO to post a comment to them. I'm currently not paying for Google support because I'm still evaluating GA4, so I’m hoping someone here can provide an answer (and maybe then I can share it with the other's that had the same problem).
Related
I'm trying to export my Google Analytics data from Firebase into Bigquery.
About 3 weeks ago the connector in Firebase was enabled with the "Streaming" export setting.
Just recently I decided to check BigQuery to start making some views and noticed that there is only 3 weeks' worth of "intraday" tables, which I understand are staging tables of sorts.
However, as per the documentation, there should also be another another table containing all the data simply called "events_", but these are completely missing:
"You should query events_YYYYMMDD rather than
events_intraday_YYYYMMDD"
https://support.google.com/analytics/answer/9358801?authuser=0
Where is the "events_" table? Is it safe to use the event_intraday tables instead despite what the documentation says?
Google Analytics for Firebase link to BigQuery.
My Firebase Project send the app log data over 950,000 records to BigQuery in a day.
BigQuery has two intraday tables for the past two days.
And yesterday intraday table has disappear around noon and archived for events table.
The document shows that intraday table is shown only today.
https://support.google.com/firebase/answer/7029846?hl=en&ref_topic=7029512
But why BigQuery has two intraday tables for the past two days? Do my setting is something wrong?
Or does it just takes a time to process the daily log which is large?
I would like to know it is right behavior for BigQuery.
If not so, I think my linking setting Firebase to BigQuery is something wrong.
I Attached two intraday tables for the past two days.
enter image description here
Best Regards,
I'm currently working on a data warehousing project with BigQuery.
BQ used to have this quota:
Daily destination table update limit — 1,000 updates per table per day
While this quota is still in the documentation, I understand that his has been removed according to this blog post:
https://cloud.google.com/blog/products/data-analytics/dml-without-limits-now-in-bigquery
In our project we need live data for which requires a lot of updates. Before this blog post I would have gathered the records e.g. on GCS and pushed them every ~14 minutes into BQ.
With the removal of the table update limit, we could now stream all data immediately into BQ which would be actually vital for our solution as live data are required.
Question: Would you recommend now to stream data directly to BQ? Any objections?
I'm asking this as I think just because the quota has been removed, this doesn't automatically become a best practice. How do you handle the requirement for live data? Another option before has been external data sources with the known limitations.
Thank you for your answers!
This quota never applied to streaming. The quota mentioned in the blog applied to updates via DML queries only - SQL statements with INSERT, UPDATE, MERGE, DELETE statements.
Streaming inserts (via tabledata.InsertAll API, not SQL command) have different limits:
Maximum rows per second: 1,000,000
Maximum bytes per second: 1 GB
If you do need live data - definitely go with streaming. Note that it is costlier than GCS updates, but if you need fresh data - this is the way to go.
Our company has many schedule reports in BigQuery that generate aggregation tables of Google Analytics data. Because we cannot control when Google Analytics data is imported into our BigQuery environment we keep getting days with no data.
This means we then have to manually run the data for missing days.
I have edited my schedule query to keep pushing back the time of day the scheduled query runs however it is now running around 8 AM. These queries are for reports for stakeholders and stakeholders are requesting them earlier. Is there any way to ensure Google Analytics export to BigQuery processing times?
You may also think about a Scheduled Query solution that reruns at a later time if the requested table isn't available yet.
You can't current add a conditional trigger to a BigQuery scheduled query.
You could manually add a fail safe to your query to check for table from yesterday using a combination of the code below and DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY):
SELECT
MAX(FORMAT_TIMESTAMP('%F %T', TIMESTAMP(PARSE_DATE('%Y%m%d',
REGEXP_EXTRACT(_TABLE_SUFFIX,r'^\d\d\d\d\d\d\d\d'))) ))
FROM `DATASET.ga_sessions_*` AS ga_sessions
Obviously this will fail if the conditions are not met and will not retry, which I understand is not an advancement on your current setup.
I've encountered this many times in the past and eventually had to move my data pipelines to another solution, as scheduled queries are still quite simplistic.
I would recommend you take a look at CRMint for simple pipelines into BigQuery:
https://github.com/google/crmint
If you still find this too simplistic then you should look at Google Cloud Composer, where you can check a table exists before running a particular job in a pipeline:
I am thinking to use Google Big Query to store realtime call records involving around 3 million rows per day inserted and never updated.
I have signed up for a trial account and ran some tests
I have few concerns before i can go ahead with development
When streaming data via PHP it takes around 10-20 minutes sometime to get loaded on my tables and this is a show stopper for us because network support engineers need this data updated realtime to troubleshoot quality issues
Partitions, we can store data in partitions divided for each day but that also involves one partition being 2.5 GB on any given day and that shoots my costs to query data in range of thousands per month. Is there any other way to bring down cost here? We can store data partitioned per hour but there is no such support available.
If not BigQuery what other solutions are out there in market which can deliver similar performance and can solve these problems ?
You have the "Streaming insert" option which enables the records to be searchable in few seconds (it has its price).
See: streaming-data-into-bigquery
Check table-decorators for limiting query scan.