I've been running the GA4 to BigQuery Streaming export for almost a month now because the amount of daily events is bigger than the daily export limit (2.7 mio events VS 1 mio events).
Google docs (https://support.google.com/firebase/answer/7029846?hl=en):
If the Streaming export option is enabled, a table named events_intraday_YYYYMMDD is created. This table is populated continuously as events are recorded throughout the day. This table is deleted at the end of each day once events_YYYYMMDD is complete.
According to the docs I should have events_YYYYMMDD tables for previous days and events_intraday_YYYYMMDD table for current day. But that's not the case - all I'm stuck with are events_intraday_YYYYMMDD tables for previous days.
Am I missing something or not reading the docs correctly?
Should I or shouldn't I expect the events_YYYYMMDD tables to be automatically created and filled?
If that's the case then I guess I have to take care of doing this backup by myself?
I have a fact/table which used to run daily and it doesnot store the data of previous day. I want to create a fact on top of this which will store the data of prvious day along with the daily data.
Best Regards,
Santosh
I am using Firebase and BigQuery to make a dashboard. I found the discrepancy once the data transferred to the "regular events table" from the "intraday table".
I've been saving the intraday table for the last three days to compare the values when the data is transferred to the regular event table. I found out there is some problem while transferring the events data to the regular table as some of the rows were removed while transferring the data.
Does anyone know what needs to be done here?
Currently I have around 1000 tables in which I need to track around 500 tables in various bigquery datasets and generate a report or create of dashboard.so that we can monitor and act promptly if a table is not refreshed.
Could someone please tell me how can I do that with minimal usage of Bigquery slots.
I think you should be able to query the last modification time as shown here:
https://cloud.google.com/bigquery/docs/dataset-metadata
You could then add a table with the max allowed time interval for a table to be updated and include that table in the query to create your own alerts.
drftr
There is a Preview feature INFORMATION_SCHEMA.PARTITIONS giving you the LAST_MODIFIED_TIME per table in a dataset
select *
from yourDataset.INFORMATION_SCHEMA.PARTITIONS;
I'm using Google's Cloud Storage & BigQuery. I am not a DBA, I am a programmer. I hope this question is generic enough to help others too.
We've been collecting data from a lot of sources and will soon start collecting data real-time. Currently, each source goes to an independent table. As new data comes in we append it into the corresponding existing table.
Our data analysis requires each record to have a a timestamp. However our source data files are too big to edit before we add them to cloud storage (4+ GB of textual data/file). As far as I know there is no way to append a timestamp column to each row before bringing them in BigQuery, right?
We are thus toying with the idea of creating daily tables for each source. But don't know how this will work when we have real time data coming in.
Any tips/suggestions?
Currently, there is no way to automatically add timestamps to a table, although that is a feature that we're considering.
You say your source files are too big to edit before putting in cloud storage... does that mean that the entire source file should have the same timestamp? If so, you could import to a new BigQuery table without a timestamp, then run a query that basically copies the table but adds a timestamp. For example, SELECT all,fields, CURRENT_TIMESTAMP() FROM my.temp_table (you will likely want to use allow_large_results and set a destination table for that query). If you want to get a little bit trickier, you could use the dataset.DATASET pseudo-table to get the modified time of the table, and then add it as a column to your table either in a separate query or in a JOIN. Here is how you'd use the DATASET pseudo-table to get the last modified time:
SELECT MSEC_TO_TIMESTAMP(last_modified_time) AS time
FROM [publicdata:samples.__DATASET__]
WHERE table_id = 'wikipedia'
Another alternative to consider is the BigQuery streaming API (More info here). This lets you insert single rows or groups of rows into a table just by posting them directly to bigquery. This may save you a couple of steps.
Creating daily tables is a reasonable option, depending on how you plan to query the data and how many input sources you have. If this is going to make your queries span hundreds of tables, you're likely going to see poor performance. Note that if you need timestamps because you want to limit your queries to certain dates and those dates are within the last 7 days, you can use the time range decorators (documented here).