I have an issue with setting up Appsflyer Cost ETL with Google BigQuery. We get parquet files each day.
The issue is the following - each day you get the file with 10 dates.
enter image description here
The problem is that each day you have 6 dates that shoud rewrite yesterday file. And the task is how to set a data transfers or scheduled queries to override the data for each date that you have in newer file to make the data for long period in one table.
Related
I am using BigQuery to analyze FirebaseAnalytics events. I use events_intraday_ for real-time analysis and events_ for daily analysis, and the data is automatically transferred from events_intraday to events_ after a certain time, but some data will disappear at that time. The table exists, but the data is clearly reduced. About 2 days out of a week's data is lost here. Please tell me why this happens.
Thanks.
Data should not be lost when moved from events_intraday_ to events_.
A common problem that is easy problem fix is with the set up of intraday collects the data from “today” in realtime, you first need to agree with Google BigQuery on what “today” refers to. BigQuery can’t guess what timezone you want to query, which is why the default UNIX timestamp format of the event_timestamp column in BigQuery is always in UTC time. this post explains it clearly Firebase BigQuery server offset time
Also I am not sure your last statement is correct "events_intraday_" and "events_" are not quite the same thing, an "events_intraday_" table contains raw, unsampled event data for the current day while the "events_" table contains processed and aggregated event data.
This processing of data after its collected but before data is exported to BigQuery, this means you would expect some data to be lost. Generally, the affected fields are traffic sources and linked marketing products (AdWords, Campaign Manager, etc.), if these are areas you are looking at its probably a GA4 processing issue.
I get reports from 3rd party API on daily basis and going to store data in BigQuery table. Each report includes data for the last 90 days, so each new report has new records for new day, but loses some records for 91 day. My task is keeping data in Bigquery for period > 90 days.
I tried to setup BiqQuery data transfer from Cloud Storage with "Write preference" option "Mirror" and seems that it just overwrites my old data with new. If I change it to "Append" it will add data from new report to old with doubles.
Are there any ideas how can I just append new records to my table using BigQuery functional? Can't believe that it's impossible.
1. I have a Lambda function that is running monthly, it is running Athena query, and export the results in a CSV file to my S3 bucket.
2. Now i have a Quicksight dashboard which is using this CSV file in Dataset and visual all the rows from the report into a dashboard.
Everything is good and working until here.
3. Now every month I'm getting a new csv file in my S3 bucket, and i want to add a "Visual Type" in my main dashboard that will show me the difference in % from the previous csv file(previous month).
For example:
My dashboard is focusing on the collection of missing updates.
In May i see i have 50 missing updates.
In June i got a CSV file with 25 missing updates.
Now i want it somehow to reflect into my dashboard with a "Visual Type" that this month we have reduced the number of missing updates by 50%.
And in month July, i get a file with 20 missing updates, so i want to see that we reduced with with 60% from the month May.
Any idea how i can do it?
I'm not sure I quite understand where you're standing, but I'll assume that you have an S3 manifest that points to an S3 directory and not a different manifest (and dataset) per each file.
If that's your case you could try to tackle that comparison creating a calculated field and using the periodOverPeriodPercentDifference
Hope this helps!
I have run some queries against BigQuery public data sets for weather. I have used both GSOD (bigquery-public-data.noaa_gsod.gsod2019)and GHCN (bigquery-public-data.ghcn_d). In both cases, the most recent data I get is from 4 or 5 days ago. Why is that? What can I do to get more recent historic data e.g. this morning or yesterday by lat and long.
Those historical datasets are provided by NOAA a few days delayed. For real-time weather, you have to use one of the commercial weather data providers in the Marketplace.
I am trying to upload data into bigquery partitioned table using dataflow .I have successfully uploaded data on date basis and fetched this data on monthly basis using bigquery but my moto is to upload data on monthly basis/yearly basis. Is there any way to do that using dataflow.
You can have "monthly" partitions by using the date for the start of each month. For August, for example, you would store everything in the yourtable$20170801 partition. You would need to have some application-side logic to determine the appropriate $YYYYmmdd suffix for the table into which you are writing using Dataflow.