BigQuery Missing Events when the intraday table converts to a regular table - google-bigquery

For our project we use Firebase Analytics with BigQuery Integration to handle any events that we might need. Aside from some front-end events however we also have some back-end events that trigger once the operation is done with no redirection to the front-end.
What I do is that when the event needs sending, I just send them directly to BigQuery, specifically the intraday table for today since that's where they'll end up anyways, I have around 3 back-end events with this.
The problem is 2 of my backend events go missing when the intraday table converts to a regular table when the next day comes. I know this for sure as yesterday, near the end of the day I quickly whipped up a query to check the number of counts for each of the events. That time there are results, but when the next day comes and I check the regular table (using the same query only changing which table I'm selecting from) and every single instance of the 2 backend events go missing. Leaving me with no data for those events to work with.
Any thoughts why this might be happening?

Related

Fields calulated after the webhook

I use the podio API to create an item. In the form I have a few calculations. When I retrieve the item immediately after its creation, using the api, the fields are not calculated yet. The calculation is asynchronous, so that makes sense.
When I use a create hook, and fetch the itembased on the hook, the calculated fields are there.
Does anyboofy know if I can depend on his, meaning is the create hook fired after the fields are calculated?
Yes the Javascript calculations are asynchronous.
Plus also related, the MongoDB (which Podio uses on the back end) is "eventually consistent".
I faced this same problem, and ended up making a queueing system for my incoming webhooks, where I waited 30 seconds before actioning any record retrieval from Podio, to get updated values for our local reporting database to cache.
Also related more to the MongoDB asynchronicity... if you are using Globiflow to trigger updates in related tables using the javascript calulated field from the parent table, I found there was occasional incorrect values.
I solved it by adding a 30 second delay in the Globiflow script, before updating the related app/table with calculated fields from the parent app/table. This gave enough time for javascript to calculate and mongodb to save the calculated value
https://www.globiflow.com/help/wait-delay.php

Getting calculated result from BigQuery immediately after stream inserting data?

I have a firestore-based app where users can vote on various questions.
The app is then saving the vote to a firestore collection.
At the same time, we want to display real time (live updating) voting results to the users. These results should be saved in a firestore document that the clients can subscribe to.
These voting results can be based on complex queries that we need to run across all collected votes.
BigQuery seems to be ideal for these queries.
Therefore we want to have trigger to ensure that every time a vote document is created in firestore, it will be stream inserted to BigQuery.
After inserting into BigQuery, we want to run the specific query related to the category of the vote, and save the result into the corresponding voting result document, so the user clients will be updated.
But from what I can read, you cannot count on immediate results from BigQuery after the stream insert is accepted, and it can take several seconds before the row appears in a query.
We can live with that delay, but we need a way to trigger that, after the row is actually inserted into BigQuery, we will run the query and save the result in firestore.
What is the recommended approach for this, and are there any other tools that can help us out?

BigQuery Google Analytics Export Processing Time Management

Our company has many schedule reports in BigQuery that generate aggregation tables of Google Analytics data. Because we cannot control when Google Analytics data is imported into our BigQuery environment we keep getting days with no data.
This means we then have to manually run the data for missing days.
I have edited my schedule query to keep pushing back the time of day the scheduled query runs however it is now running around 8 AM. These queries are for reports for stakeholders and stakeholders are requesting them earlier. Is there any way to ensure Google Analytics export to BigQuery processing times?
You may also think about a Scheduled Query solution that reruns at a later time if the requested table isn't available yet.
You can't current add a conditional trigger to a BigQuery scheduled query.
You could manually add a fail safe to your query to check for table from yesterday using a combination of the code below and DATE_SUB(CURRENT_DATE(), INTERVAL 1 DAY):
SELECT
MAX(FORMAT_TIMESTAMP('%F %T', TIMESTAMP(PARSE_DATE('%Y%m%d',
REGEXP_EXTRACT(_TABLE_SUFFIX,r'^\d\d\d\d\d\d\d\d'))) ))
FROM `DATASET.ga_sessions_*` AS ga_sessions
Obviously this will fail if the conditions are not met and will not retry, which I understand is not an advancement on your current setup.
I've encountered this many times in the past and eventually had to move my data pipelines to another solution, as scheduled queries are still quite simplistic.
I would recommend you take a look at CRMint for simple pipelines into BigQuery:
https://github.com/google/crmint
If you still find this too simplistic then you should look at Google Cloud Composer, where you can check a table exists before running a particular job in a pipeline:

how to store weekly data from google analytics

I have some simple weekly aggregates from Google analytics that i'd like to store somewhere. The reason for storing is because if I run a query against too much data in google analytics, it becomes sampled and I want it to be totally accurate.
What is the best way to solve this?
My thoughts are:
1) Write a process in bigquery to append the data each week to a permanent dataset
2) Use an API that gets the data each week and stores the data in a google spreadsheet (appending a line each time)
What is the best recommendation for my problem - and how do I go about executing it?
Checking your previous questions, we see that you already use Bigquery.
When you run a query against the Google Analytics tables that is not sampled, as that has all the data in it. There is no need to store as you can query every time you need.
In case if you want to store, and pay for the addition table, you can go ahead store in a destination table.
If you want to access quickly, try creating a view.
I suggest the following:
1) make a roll-up table for your weekly data - you can do that either by writing a query for it and running manually or with a script in a Google Spreadsheet that uses the same query (using the API) and is scheduled to run every week. I tried a bunch of the tutorials out there and this one is the simplest to implement
2) depending on the data points you want, you can even use the Google Analytics API without having to go through BigQuery for this request, try pulling this report of yours from here . If it works there are a bunch of Google Sheets extensions that can make it a lot quicker to set up a weekly report. Or you can just code it yourself
Would that work for you?
thks!

Rest philosophy for updating and getting records

In my app I'm displaying Race objects that essentially have three states: pending, inProgress and completed. I want to display all Races that are currently pending or inProgress, but not the ones that are completed. To do this, I want to create a RESTful API for getting these resources from my server, but I'm not sure what the best (i.e. most RESTful) approach would be.
The issue is that when someone opens or refreshes the app, I need to two things:
Perform a GET on all the Races that are currently displayed in the client to update their status.
GET all of the new pending or inProgress Races that have been created since the client last updated
I've come up with a few different solutions, though I don't know which, if any, would be best:
Simply delete the old Race records on the client and always GET all new records
Perform 2 separate GET operations, the first which updates all the old records, and the second where I GET all the new pending / inProgress Races
Perform a single GET operation where I specify the created date of the last client record, and GET all records that are newer.
To me, this seems like a pretty common scenario but I haven't been able to find a specific answer to this type of problem. I'd like to see what SO thinks :)
Thanks in advance for your help!
Simply delete the old Race records on the client and always GET all new records
This is probably the easiest solution. However you shouldn't do that if you need a very smooth update on your client (for games, data visualization, etc.).
Perform 2 separate GET operations (...) / Perform a single GET operation where I specify the created date of the last client record, and GET all records that are newer.
I would definitely do it with a single operation. Better than an update timestamp (timestamp operations are costly, and several operations could happen at the same time), I would use a sequence number. This is the way CouchDB handles "changes".
Moreover, as you will see in the documentation, this solution can then be upgraded for asynchronous notifications (if you need so).