BigQuery - Why is UNNEST operator not required for pulling transactions data in Google Analytics? - google-bigquery

SELECT SUM(totals.totalTransactionRevenue)
FROM bigquery-public-data.google_analytics_sample.ga_sessions_*
WHERE _TABLE_SUFFIX BETWEEN '20170701' AND '20170701';
Transactions are a product level scope and one session can have multiple transactions. So, a session could hold an array of transactions. In such a case, why is the UNNEST parameter not required to run this query?
Thanks.

That's because the fields you selected is a simple INTEGER field and not ARRAY. You can check this by going to the Schema tab of the table. If the field contains ARRAY, it (or its parent field) should have the type "RECORD - REPEATED".

In the example you have given you're counting transactions as a sum of SUM(totals.totalTransactionRevenue). Note that totals contains aggregated data (as an interger, as the previous post explains), already unnested for you, that's why you don't need to UNNEST to read data from a field under totals.
You are correct in that if you wanted to ask another question from product level data, which hasn't already been aggregated for you in totals, (for example all of your transaction IDs from yesterday) then you need to unnest.
Also note that when you UNNEST you'll be duplicating the rows of totals, so be careful when using both in the same query, as you could end up double counting.
This previous answer explains this further, with some examples:
Unnest and totals.timeOnSite (BigQuery and Google Analytics data)

Related

Insert ceros instead of interopolate ARIMA_PLUS bigquery

I want to do ARIMA_plus forecasting on a series of sale records. The problem is that sale records only contain sales. When doing the forecast we need to insert for every product the "non sales", which, essentially, are rows with the import column set to cero for every day the product has not been sold. We have here two options:
Fill the database with those zero-rows (uses a lot of space)
When doing the forecasting with ARIMA_PLUS in bigquery tell the model to fill with zeros instead of interpolating (default and seemingly unique option).
I want to follow the second option, yet, i dont see how. Here you can see a screenshot of the documentation Google info about interpolation
The first option would be carried out with a merge, nevertheless I would prefer to discard it since it increases the size of the sales table.
I have scanned the documentation and havent seen any solution
You need to provide an input dataset covering the missing values with the right method for your use case.
In other words, the SQL query must solve the interpolation so that the input for the model already contains the expected data.
You can, for example, create a query to add a liner interpolation solution for your use case.
So, the first approach you mentioned can be solved using that input SQL (rather than adding the data to the source table) and the second approach is not valid in bigquery, as far as I know.
Here you have an example: https://justrocketscience.com/post/interpolation_sql/

Daily Retention with Filter in BigQuery

I am using a query to calculate daily retention on my Firebase Analytics data exported to BigQuery. It is working well and the numbers match with the numbers in Firebase, but when I try to filter the query by a cohort of users, the numbers don't add up.
I want to compare the results of an A/B test from Firebase, and so I've looked at the user_property "firebase_exp_2" which is my A/B test, and I've split up the users in each group (0/1). The retention numbers do not match (at all) the numbers that I can see in my A/B test results in Firebase - actually they show the opposite pattern.
The query is adapted from here: https://github.com/sagishporer/big-query-queries-for-firebase/wiki/Query:-Daily-retention
All I've changed is adding the following under the "WHERE" clause:
WHERE
event_name = 'user_engagement' AND user_pseudo_id IN
(SELECT user_pseudo_id
FROM `analytics_XXX.events_*`,
UNNEST (user_properties) user_properties
WHERE user_properties.key = 'firebase_exp_2' AND user_properties.value.string_value='1')
Firebase says that there are 6,043 users in the Control group and 6,127 in the Variant A group, but my numbers are 5,632 and 5,730, and the retained users are around 1,000 users more than what Firebase reports.
What am I doing wrong?
The export to BigQuery happens on a daily basis and each imported table is named events_YYYYMMDD. Additionally, a table is imported for events received throughout the current day. This table is named events_intraday_YYYYMMDD.
The additions you made are querying from events_* which is fine. The example uses events_201812* though which would ignore the intraday table. That would explain why your numbers a lower. You are missing users added to the A/B test during the current day.

Tableau count values after a GROUP BY in SQL

I'm using Tableau to show some schools data.
My data structure gives a table that has all de school classes in the country. The thing is I need to count, for example, how many schools has Primary and Preschool (both).
A simplified version of my table should look like this:
In that table, if I want to know the number needed in the example, the result should be 1, because in only one school exists both Primary and Preschool.
I want to have a multiple filter in Tableau that gives me that information.
I was thinking in the SQL query that should be made and it needs a GROUP BY statement. An example of the consult is here in a fiddle: Database example query
In the SQL query I group by id all the schools that meet either one of the conditions inside de IN(...) and then count how many of them meet both (c=2).
Is there a way to do something like this in Tableau? Either using groups or sets, using advanced filters or programming a RAW SQL calculated fiel?
Thanks!
Dubafek
PS: I add a link to my question in Tableu's forum because you can download my testing workbook there: Tableu's forum question
I've solved the issue using LODs (specifically INCLUDE and EXCLUDE statements).
I created two calculated fields having the aggregation I needed:
Then I made a calculated field that leaves only the School IDs that matches the number of types they have (according with the filtering) with the number of types selected in the multiple filter (both of the fields shown above):
Finally, I used COUNTD([Condition]) to display the amounts of schools matching with at least the School types selected.
Hope this helps someone with similar issue.
PS: If someone wants the Workbook with the solution I've uploaded it in an answer in the Tableau Forum

BigQuery Google Analytics sessionsWithEvent metric

I'm having trouble creating a BigQuery query that will allow for me to fetch the Google Analytics ga:sessionsWithEvent metric.
This is what I tried:
SELECT
EXACT_COUNT_DISTINCT(concat(fullvisitorid, string(visitid))) AS distinctVisitIds
FROM
(TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2016-11-30'), TIMESTAMP('2016-12-26')))
WHERE
hits.type='EVENT'
The logic in the query above seems sound - get all the rows that have a hit.type of 'EVENT' and sum up the exact count of distinct fullVisitorId/VisitId results - aka. the number of unique sessions with an event.
But the numbers I get from here are close but higher than what I get using query explorer
Thank you.
EDIT: Addressing comment below to use wider date range with date filter
With date range +-5 days, this makes the query
SELECT
EXACT_COUNT_DISTINCT(concat(fullvisitorid, string(visitid))) AS distinctVisitIds
FROM
(TABLE_DATE_RANGE([xxxxxxxx.ga_sessions_], TIMESTAMP('2016-11-25'), TIMESTAMP('2016-12-31')))
WHERE
hits.type='EVENT'
AND ('20161130'<=date AND date<='20161226')
Unfortunately I still get the same number
Don't rely on the table dates, usually even on later days you can have metrics from previous days. Instead use a larger date range on from and exact date range on columns.
AFAIK also the data explorer does approximations.

How to get a count of Distinct Dimension values in an SSAS MDX Query

I am trying to write an MDX query to return some information about survey questions. I want Average response and total responses in my results. I have two types of questions. One type of question has a single response. Another type of question can have multiple responses (Pick all that apply). Each question is tied to a question ID and a respondent ID. The following query works (somewhat)
Select NON EMPTY
{
[Measures].[Average Response], [Measures].[Total Count]
} ON 0
, NON EMPTY
{
([Question].[Question ID].[Question ID].ALLMEMBERS)
} ON 1
From [Cube]
Average Response is a combination from both single responses and multiple responses (two different fact tables). The total count is also a combination of the two tables. The problem is that for single response questions, I can just count the number of respondents. For multi response questions that falls down as I can have way more responses than I do people taking the survey. I really want to know how many people provided an answer. To do this, I think I need the distinct count of respondent IDs. So I tried changing my first axis to this.
[Measures].[Average Response], [Measures].[Total Count], DISTINCTCOUNT([Respondent].[Respondent ID])
Well, that doesn't work and I really didn't expect it to. I got "The function expects a tuple set expression for the 3 argument. A string or numeric expression was used." which is rapidly becoming my favorite SSAS error message. I am still green at this and I guess I am still thinking SQL. How can I get an average of the responses and a count of the distinct Dimension values in the same query. BTW, my query does have a slicer and I could provide that if needed but I don't think it relevant as I get the same problems with or without the slicer.
When working with the MDX DistinctCount function, it returns the count of distinct, non-empty tuples of a given set of data. Perhaps try doing something like
DistinctCount({[Respondent].[Respondent ID].members * [Measures].[Total Count]})
so that way you are working with a set (e.g. {...}) of data.
If you're working with a larger set of data, you may want to consider creating a Distinct Count measure. The DistinctCount function itself is a SSAS Formula Engine query while using the Distinct Count measure would allow Analysis Services to use both the Storage Engine and Formula Engine. For more information, please refer to Analysis Services Distinct Count Optimization.