What is the meaning of event_params value for firebase conversion? - google-bigquery

I am researching about firebase conversion in BigQuery and right now still not understanding at all about conversion meaning
I tried this query to check out the value of the 'firebase_conversion' key and see that all of the value is 1.
Is this value mean that the event is marked conversion in Firebase?
SELECT event_name, event_params.value.int_value FROM [firebase-public-project:analytics_153293282.events_20181003] where event_params.key = "firebase_conversion"
Is there anyone familiar with conversion?
Could you guys help me to explain how firebase calculate the conversion rate? and How could we calculate it through BigQuery

On top of the documentation rtenha mentioned, you can also find a specific Firebase in BigQuery section in [1]. It even has some SQL examples regarding Firebase data exploration with BigQuery.
As you say, the value of 1 in event.params.value.int_value indicates that it is marked as a conversion, and it might be useful when it comes to counting events of that type.
In order to calculate the conversion rate, you need to divide the number of USERS that have done some type of conversion among the total number of USERS.
Here is an SQL example [2] that would:
1-create a table with an only cell: the total number of users in the desired time
2-create a table with the number of users that performed each of the events marked as conversions
3-select, for each type of event, the ratio of users that performed such conversion and the total number of users
I hope this finds you well!
[1] https://support.google.com/firebase/answer/9037342?hl=en&ref_topic=7029512
[2]
WITH t_e as (select count(DISTINCT user_id) as total_events from table_of_events
WHERE
table_of_events.event_timestamp >
UNIX_MICROS(TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 10 DAY))
AND table_of_events._TABLE_SUFFIX BETWEEN '20180501' AND '20180511'),
t_c as (SELECT count(DISTINCT user_id) as total_conversions from table_of_events
WHERE
table_of_events.event.params.key = “firebase_conversion”,
table_of_events.event_timestamp >
UNIX_MICROS(TIMESTAMP_SUB(CURRENT_TIMESTAMP(), INTERVAL 10 DAY))
AND table_of_events._TABLE_SUFFIX BETWEEN '20180501' AND '20180511'
GROUP BY event_name)
select event_name, t_c.total_conversions/t_e.total_events as conversion_rate
FROM t_c, t_e

Related

Grafana User Growth Time Series with SQL Server

I'm trying display the user growth per day using Grafana Time Series with SQL Server. However I found the documentation to be unhelpful and my queries are incorrect.
The following returns a constant value of 1 for every day. What do I need to change to display the number of new users created per day?
Thank you very much in advance.
SELECT
$__timeGroup([created_at],'1d') as time,
COUNT(id) as value,
'users' as metric
FROM [db].[user]
WHERE $__timeFilter([created_at])
GROUP BY [created_at]
ORDER BY 1
This works for me:
SELECT
$__timeGroup(created_at, '1d') AS time,
COUNT(id) as 'New Users'
FROM [db].[user]
GROUP BY $__timeGroup(created_at, '1d')
ORDER BY 1

Calculating Outliers - Nested Aggregate Error

I am currently working SQL Workbench/J and Amazon Redshift.
I am working on a query with the intent to identify the number of outliers within a data set.
My source data contains one record per day for multiple symbols. I am utilizing 30 days of trailing data. In short, for 30 days there are ten symbols with 30 records each.
I am then utilizing the following query to calculate the mean, standard deviation, and upper/lower control limits for each unique symbol based upon the 30 day data set.
select
symbol,
avg(high) as MEAN,
cast(stddev_samp(high) as dec(14,2)) STDV,
(MEAN+STDV*3) as UCL,
(MEAN-STDV*3) as LCL
from historical
group by symbol
;
My next step will be calculating how many individual values from the 'high' column exceed the upper control limit calculated value. I have tried to add the following count(case...) statement, but it is failing:
select
symbol,
avg(high) as MEAN,
cast(stddev_samp(high) as dec(14,2)) STDV,
(MEAN+STDV*3) as UCL,
(MEAN-STDV*3) as LCL,
count(case when high>avg(high) then 1 else 0 end) as outlier
from historical
group by symbol
;
The specific error is
Amazon Invalid operation: aggregate function calls may not have nested aggregate or window function
Is a count(case..) statement the right method to utilize here, or what would the recommended approach or example be?
There are a number of ways to do this but I think all of them involve a sub-query. This is because you have an aggregate (avg) compared to a per-row value (high) and then summing the the comparison.
I'd go with a sub-query where you perform an avg() window function partitioned by symbol. This will give you the average of the group on every row then just do the query as you have it. Kinda like this:
I am currently working SQL Workbench/J and Amazon Redshift.
I am working on a query with the intent to identify the number of outliers within a data set.
My source data contains one record per day for multiple symbols. I am utilizing 30 days of trailing data. In short, for 30 days there are ten symbols with 30 records each.
I am then utilizing the following query to calculate the mean, standard deviation, and upper/lower control limits for each unique symbol based upon the 30 day data set.
select symbol, avg(high) as MEAN, cast(stddev_samp(high) as dec(14,2)) STDV, (MEAN+STDV3) as UCL, (MEAN-STDV3) as LCL from historical group by symbol ;
My next step will be calculating how many individual values from the 'high' column exceed the upper control limit calculated value. I have tried to add the following count(case...) statement, but it is failing:
select symbol, avg(high) as MEAN, cast(stddev_samp(high) as dec(14,2)) STDV, (MEAN+STDV3) as UCL,
(MEAN-STDV3) as LCL, count(case when high>group_avg then 1 else 0 end) as outlier
from (
select *, avg(high) over (partition by symbol) as group_avg
from historical )
group by symbol ;
(You could also replace "avg(high) as MEAN" with "min(group_avg) as MEAN" since you already computed the average in the window function. Just a possible slight optimization.)
Use window functions to calculate the values for the standard deviation and mean. Then aggregate:
select symbol, mean, STDV,
(MEAN+STDV*3) as UCL, (MEAN-STDV*3) as LCL,
sum( (high > mean)::int) ) as outlier
from (select h.*,
avg(high) over (partition by symbol) as mean,
cast(stddev_samp(high) over (partition by symbol) as dec(14,2)) as STDV
from historical h
) h
group by symbol, mean, STDV;
Your definition of "outlier" is rather strange -- merely being higher than the average is going to happen (very roughly) about half the time. The more typical definition I have seen is outside the range of 2 standard deviations.
As a comment not directly related to the SQL. It seems unusual for me to be using future data to determine outliers. I would expect that a trailing 30 days would be used for that purpose. However, that is not the question you have asked here.

Using group by with date

I have a table containing the following columns:
stats_date (YYYY-MM-DD)
registered (INT)
opened_form (INT)
Compose a query that will return the total registered, and opened_form by month for the last 3 months. Also a calculated column called conversion_rate which is the registered column divided by the opened_form.
Are you just looking for aggregation? Date/time functions differ significantly among databases, but the idea is:
select year(stats_date), month(stats_date),
sum(registered), sum(opened_form),
sum(registered) * 1.0 / sum(opened_form) as ratio
from t
group by year(stats_date), month(stats_date)
order by min(stats_date);
Of course, your database might have a different way of extracting the year and month from a date.
You can see the ANSI SQL at page 187 to understand how agregation works. To know how to group your column by Month you need to check the documentation of your db, usually is MONTH(COLUMN_NAME).

Extracting DAU, MAU using BigQuery

I'm trying to extract Firebase Analytics DAU and MAU using BigQuery. The query I'm using for daily users is below -
SELECT
event_date AS day,
COUNT(DISTINCT user_id) AS daily_visitors
FROM `XXXXXXX.analytics_153729556.events_20190825`
WHERE
app_info.id = 'XXXXXXX'
AND
event_name = 'user_engagement'
GROUP BY day;
I have a few questions I would love some help with.
There is a significant(2000+) difference between the value from the query result and the value the Firebase dashboard shows for the same date(s). Is there a specific reason for this or is my query just plain wrong?
There are instances where I see dates other than the actual table selected. Example, I see 20190502 in the results when 20190501 should be the only row (based on the table name). Is this possibly because the events being dumped into the table are for an app in a different timezone? If not, what else could be the reason behind this?
I also want to extract historical MAU and DAU data, and store it on MongoDB for any future requirements that may arise. Is there a specific way in which I can extract them - after overcoming the problem I'm facing, of course?

BigQuery: SELECT in WHERE-clause with filter based on a value in the current row

I know the title is probably pretty stupid but I have a hard time phrasing it differently.
I have to use BigQuery at work atm for some report. BigQuery is connected to a Google Analytics view of ours. This gives us a dataset with 1 table for each day. The rows of the tables are user-sessions on our site, while columns have some information about the sessions.
The problem I have is the following:
I want to select sessions with transactions, but only if the user was referred to our site by a certain referrer in the last x days before the transaction happened. I'm only familiar with basic SQL and not with any advanced concepts. It's really frustrating to me because this would be a no-brainer with any proper programming language given a .csv of the data, but I'm lacking knowledge of the relevant concepts in SQL.
#standardSQL
SELECT
COUNT(*)
FROM
`dataset.ga_sessions_2017*`
WHERE
totals.transactions > 0 AND
fullVisitorId IN (SELECT
fullVisitorId
FROM
`dataset.ga_sessions_2017*`
WHERE
trafficSource.source = "xyz.com"
) AND
< date difference thing>
I could filter for the date difference like I did with the trafficSource (referrer). The problem for me is that while "xyz.com" is a static thing, I'd need to reference the date value of the current row I'm in. So the date by which I'd filter the 2nd SELECT would be dynamically changing from row to row. Can anyone guide me on how this is usually done? This seems like a thing that would come up often.
I'm not familiar with the GA tables specifically, but having written some wildcard queries in BigQuery before, I think what you're looking for can be done using the _TABLE_SUFFIX pseudo column:
CAST(_TABLE_SUFFIX AS INT64) >= 1217
Where 1217 is today's date in MMDD format minus 3 days, assuming the table names are _20171217, _20171218, etc. Otherwise you can just use REPLACE to remove underscores before casting to an int. There are also functions that will generate today's date for you if you needed this query to run automatically.
Also, I think the fullVisitorId business could be replaced with a simple WHERE trafficSource.source = "xyz.com" but it's hard to say for sure without being able to run the query myself.
So the full query would look something like this:
#standardSQL
SELECT
COUNT(*)
FROM
`dataset.ga_sessions_2017*`
WHERE
totals.transactions > 0 AND
trafficSource.source = "xyz.com" AND
CAST(_TABLE_SUFFIX AS INT64) >= 1217