Google Data Studio custom metric multiplying 2 parameters - sql

I am using Firebase for collecting events from my application.
For example lets say I have an event print_attempt and it has 2 parameters page_count and copies. Something like this..
event {
name: print_attempt
param {
name: copies
int_value: 10
}
param {
name: page_count
int_value: 5
}
}
Now in Google Data Studio, I want to have a metric total page printed. How do I multiply 2 param values?
SUM(CASE WHEN Event Param Name = "page_count" THEN Event Param Value Int ELSE 0 END)
returns me the sum of page_count but in this scenario copies value is ignored.
I tried following, but this gives me error.
SUM(CASE
WHEN Event Param Name = "page_count" THEN (
Event Param Value Int * CASE WHEN EVENT PARAM NAME ="copies" THEN
Event Param Value Int ELSE 1 END)
ELSE 0 END)
Any pointers?

I got the answer from here.
It is not possible directly because the connector works with a flattened schema where the int values that I am trying to multiply will be in different records.
I ended up adding another int parameter in the event ex. total_pages with the value of page_count*copies.
Other solution could be to make a view or table where page_count and copies are as separate column of a row.

Related

SQL select column group by where the ratio of a value is 1

I am using PSQL.
I have a table with a few columns, one column is event that can have 4 different values - X1, X2, Y1, Y2. I have another column that is the name of the service and I want to group by using this column.
My goal is to make a query that take an event and verify that for a specific service name I have count(X1) == count(X2) if not display a new column with "error"
Is this even possible? I am kinda new to SQL and not sure how to write this.
So far I tried something like this
select
service_name, event, count(service_name)
from
service_table st
group by
(service_name, event);
I am getting the count of each event for specific service_name but I would like to verify that count of event 1 == count of event 2 for each service_name.
I want to add that each service_name have a choice of 2 different event only.
You may not need a subquery/CTE for this, but it will work (and makes the logic easier to follow):
WITH event_counts_by_service AS (SELECT
service_name
, COUNT(CASE WHEN event='X1' THEN 1 END) AS count_x1
, COUNT(CASE WHEN event='X2' THEN 1 END) AS count_x2
FROM service_table
GROUP BY service_name)
SELECT service_name
, CASE WHEN count_x1=count_x2 THEN NULL ELSE 'Error' END AS are_counts_equal
FROM event_counts_by_service

What is the value being returned using "" in SQL WHERE filter?

In a dataset I am cleaning, according to the schema, there should only be two distinct values under the "usertype" column. Upon further analysis I discovered there is an empty third value accounting for 5828994 empty rows of the total dataset.
I tested to see if the third value would return NULL and it did not. As well as counted for Null and it returned a count of "0".
SELECT SUM(CASE WHEN usertype is null THEN 1 ELSE 0 END)
AS number_of_nulll
, COUNT(usertype) number_of_non_null
FROM dataset
I filtered to see if it would return an empty value but the results were - "There is no data to display"
WHERE usertype = " "
By chance I filtered WHERE usertype = "" and it returned the 5828994 rows as empty rows I was looking to isolate.
WHERE usertype = ""
My question is, what value did the "" filter return?
WHERE usertype = " "
Selects where the usertype is a single space - you get no results
WHERE usertype = ""
Selects where the usertype is blank (this is not the same as NULL). - This is where you get results.
Therefore, your table has strings that are blank, but they are not considered NULL, which of course could be confusing.
If you're manually loading this data yourself, I would check the google CLI for the parameter --null_marker, which should give you some options on how to handle this upon ingestion.
If you are stuck with it the way it is, you can get in the habit of using NULLIF() which can search for a condition and return NULL.
For example,
SELECT
SUM(CASE WHEN usertype is null THEN 1 ELSE 0 END) AS number_of_null,
SUM(CASE WHEN NULLIF(usertype,'') is null THEN 1 ELSE 0 END) AS number_of_null_or_blank,
COUNT(usertype) number_of_non_null
FROM dataset

SQL - How to get rows within a date period that are within another date period?

I have the following table in the DDBB:
On the other side, i have an interface with an start and end filter parameters.
So i want to understand how to query the table to only get the data from the table which period are within the values introduces by the user.
Next I present the 3 scenarios possible. If i need to create one query per each scenario is ok:
Scenario 1:If the users only defines start = 03/01/2021, then the expected output should be rows with id 3,5 and 6.
Scenario 2:if the users only defines end = 03/01/2021, then the expected output shoud be rows with id 1 and 2.
Scenario 3:if the users defines start =03/01/2021 and end=05/01/2021 then the expected output should be rows with id 3 and 5.
Hope that makes sense.
Thanks
I will assume that start_date and end_date here are DateFields [Django-doc], and that you have a dictionary with a 'start' and 'end' as (optional) key, and these map to date object, so a possible dictionary could be:
# scenario 3
from datetime import date
data = {
'start': date(2021, 1, 3),
'end': date(2021, 1, 5)
}
If you do not want to filter on start and/or end, then either the key is not in the dictionary data, or it maps to None.
You can make a filter with:
filtr = {
lu: data[ky]
ky, lu in (('start', 'start_date__gte'), ('end', 'end_date__lte'))
if data.get(ky)
}
result = MyModel.objects.filter(**filtr)
This will then filter the MyModel objects to only retrieve MyModels where the start_date and end_date are within bounds.

Data Studio - Calculate Fields - Substraction of Total Events

I would like to create a calculated field, which would substract metrics total events from the event with more total events to event with less total events. Afterwards, I would like to create a line graph of this in DataStudio.
Basically, I would like to substract total events of the following events:
Event Category: Game
Event Action: Game Session Started
minus
Event Category: Game
Event Action: Game Session Finished
I was trying in CASE with functions such as ABS, SUM etc.. however, I can't seem to find a solution. Thank you.
Here is an example:
Example
Try
sum(case
WHEN Event_Category = 'Game' and Event_Action='Game Session Started' THEN 'Total Events'
ELSE 0 END)
-
sum(case
WHEN Event_Category = 'Game' and Event_Action='Game Session Finished' THEN 'Total Events'
ELSE 0 END)
You may need to split into 2 calculated metrics and then use a 3rd to minus the finished from the started.
I think this may not be feasible to do, however as you requested:
Field 1 - SUM(CASE WHEN Event Category = 'Game' and Event Action ='Game Session Started' THEN 1 ELSE 0 END)
Field 2 - SUM(CASE WHEN Event Category = 'Game' and Event Action ='Game Session Finished' THEN 1 ELSE 0 END)
Field 3 - (Field 1 - Field 2)
Field 4 - Count(Event Category)
Picture
in my case, recommendations with SUM(CASE WHEN Event Category = 'Game' and Event Action ='Game Session Started' THEN 1 ELSE 0 END) didn't work because basically, we need to mix dimensions with metrics...
in the best case it has to be something like this:
CASE WHEN Event Category = 'Game' and Event Action ='Game Session Started' THEN {{UNIQUE EVENTS}} ELSE 0 END
but it does not work because we mix Event Category / Event Action (dimensions) with calculated result {{UNIQUE EVENTS}} (metric) - maybe in the future, it will work...
To fix this task, I did next:
1 created 2 independent tables, each of them filtering by a specific event
2 blend data adding 3rd table with a date for the shared timeline
3 in blended data source - calculated SUM(Total Events (game start))-SUM(Total Events (game finished))
chart
data blending settings
filter for one table

Google Data Studio incorrect calculated metrics

I am creating calculated metrics in Data Studio and I am having trouble with the results.
Metric 1 uses this formula:
COUNT_DISTINCT(CASE WHEN ( Event Category = "ABC" AND Event Action = "XXX" AND Event Label = "123" ) THEN ga clientId (user) ELSE " " END )
[[To count the events with distinct clientIds]]
Metric 2 uses this formula:
COUNT_DISTINCT(CASE WHEN ( Event Category = "ABC" AND Event Action = "YYY" AND Event Label = "456" ) THEN ga clientId (user) ELSE " " END )
[[To count the events with distinct clientIds]]
Metric 3 uses this formula:
COUNT_DISTINCT(CASE WHEN ( Event Category = "ABC" AND Event Action = "ZZZ" AND Event Label = "789" ) THEN userId(user) ELSE " " END )
[[To count the events with distinct userIds]]
The formulas work fine and when I do Metric 2/ Metric 1 the number is correct for a one day time span. When I do Metric 3/Metric 2 the number is wrong. Why is this? It doesn't make sense to me since they are both numerical values.
Also, when I increase the date range the Metric 2 / Metric 1 is incorrect too! Any ideas why these are not working?
If you are aggregating over a certain amount of data, then these calculations will not be exact; they will be approximations.
I have noticed that Google Data Studio is more accurate with data properly loaded into BigQuery rather than data loaded through something else like a PostgreSQL connector. Otherwise, APPROX_COUNT_DISTINCT may be used.