I've been running the GA4 to BigQuery Streaming export for almost a month now because the amount of daily events is bigger than the daily export limit (2.7 mio events VS 1 mio events).
Google docs (https://support.google.com/firebase/answer/7029846?hl=en):
If the Streaming export option is enabled, a table named events_intraday_YYYYMMDD is created. This table is populated continuously as events are recorded throughout the day. This table is deleted at the end of each day once events_YYYYMMDD is complete.
According to the docs I should have events_YYYYMMDD tables for previous days and events_intraday_YYYYMMDD table for current day. But that's not the case - all I'm stuck with are events_intraday_YYYYMMDD tables for previous days.
Am I missing something or not reading the docs correctly?
Should I or shouldn't I expect the events_YYYYMMDD tables to be automatically created and filled?
If that's the case then I guess I have to take care of doing this backup by myself?
I am using Firebase and BigQuery to make a dashboard. I found the discrepancy once the data transferred to the "regular events table" from the "intraday table".
I've been saving the intraday table for the last three days to compare the values when the data is transferred to the regular event table. I found out there is some problem while transferring the events data to the regular table as some of the rows were removed while transferring the data.
Does anyone know what needs to be done here?
A table I had been updating every day disappeared, and I found out it may have expired. I hadn't taken notice of it.
Is there a simple way to recover that table?
If it is within 7 days, you may be able to fetch the data using time-travel:
https://cloud.google.com/bigquery/docs/reference/standard-sql/query-syntax#for_system_time_as_of
And once you are able to read the data, you can save the result to a destination table as kind of a way to recover the table.
I am using firebase analytics and bigguery with average of 50~60 GB daily data.
For the most recent daily table, a query gives different result from yesterday even if query conditions are exact same including target date.
I just found that there are 1~2days gap between table creation date and last modified date.
I assume the difference between the query results are because of this. (Calculating on different data volume, maybe)
Is this date gap means a single daily table needs at least 2 days to be fully loaded from intraday table?
Thanks in advance.
biqguery table info
In the documentation we can find the following information:
After you link a project to BigQuery, the first daily export of
events creates a corresponding dataset in the associated BigQuery
project. Then, each day, raw event data for each linked app populates
a new daily table in the associated dataset, and raw event data is
streamed into a separate intraday BigQuery table in real-time.
It seems that the intraday table is loaded to the main table each day and if you want to access this data in real-time you`ll have to use this intraday separate table.
If this information doesn`t help you, please provide some extra information so I can help you more efficiently.
I am designing a database to store product informations, and I want to store several months of historical (price) data for future reference. However, I would like to, after a set period, start overwriting initial entries with minimal effort to find the initial entries. Does anyone have a good idea of how to approach this problem? My initial design is to have a table named historical data, and everyday, it pulls the active data and stores it into the historical database with a time stamp. Does anyone have a better idea? Or can see what is wrong with mine?
First, I'd like to comment on your proposed solution. The weak part of course is that, there can, actually, be more than one change between your intervals. That means, the record was changed three times during the day, but you only archive the last change.
It's possible to have the better solution, but it must be event-driven. If you have the database server that supports events or triggers (like MS SQL), you should write a trigger code that creates entry in history table. If your server does not support triggers, you can add the archiving code to your application (during Save operation).
You could place a trigger on your price table. That way you can archive the old price in an other table at each update or delete event.
It's a much broader topic than it initially seems. Martin Fowler has a nice narrative about "things that change with time".
IMO your approach seems sound if your required history data is a snapshot of the end of the day's data - in the past I have used a similar approach with overnight jobs (SP's) that pick up the day's new data, timestamp it and then use a "delete all data that has a timestamp < today - x" where x is the time period of data I want to keep.
If you need to track all history changes, then you need to look at triggers.
I would like to, after a set period, start overwriting initial entries with minimal effort to find the initial entries
We store data in Archive tables, using a Trigger, as others have suggested. Our archive table has additional column for AuditDate, and stores the "Deleted" data - i.e. the previous version of the data. The current data is only stored in the actual table.
We prune the Archive table with a business rule along the lines of "Delete all Archive data more than 3 months old where there exists at least one archive record younger than 3 months old; delete all archive data more than 6 months old"
So if there has been no price change in the last 3 months you would still have a price change record from the 3-6 months ago period.
(Ask if you need an example of the self-referencing-join to do the delete, or the Trigger to store changes in the Archive table)