Nested WHERE condition to pull counts from different segments of the table? - google-bigquery

I'm new to BQ/SQL.
I'm using Google Analytics dataset to pull a COUNT in the same query for two things:
Total event_name hits
Total event_name hits where a particular criteria was met
This is my query so far. How can I improve line #3 so that the second count occurs as a nested WHERE function while the first count queries the full table? Thanks.
SELECT
COUNT (event_name) AS total_events,
COUNT (event_name) AS goal WHERE event_name = 'visited x page',
FROM \foodotcom-app-plus-web.analytics_1234567.events_20200809``

Below is for BigQuery Standard SQL
#standardSQL
SELECT
COUNT(event_name) AS total_events,
COUNTIF(event_name = 'visited x page') AS goal
FROM `project.dataset.table`

select
count(event_name) as total_events,
count(case when event_name = 'visited x page' then event_name else null end) as goal
from `project.dataset.table`

Related

Group by after a partition by in MS SQL Server

I am working on some car accident data and am stuck on how to get the data in the form I want.
select
sex_of_driver,
accident_severity,
count(accident_severity) over (partition by sex_of_driver, accident_severity)
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
This is my code, which counts the accidents had per each sex for each severity. I know I can do this with group by but I wanted to use a partition by in order to work out % too.
However I get a very large table (I assume for each row that is each sex/severity. When I do the following:
select
sex_of_driver,
accident_severity,
count(accident_severity) over (partition by sex_of_driver, accident_severity)
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
group by
sex_of_driver,
accident_severity
I get this:
sex_of_driver
accident_severity
(No column name)
1
1
1
1
2
1
-1
2
1
-1
1
1
1
3
1
I won't give you the whole table, but basically, the group by has caused the count to just be 1.
I can't figure out why group by isn't working. Is this an MS SQL-Server thing?
I want to get the same result as below (obv without the CASE etc)
select
accident.accident_severity,
count(accident.accident_severity) as num_accidents,
vehicle.sex_of_driver,
CASE vehicle.sex_of_driver WHEN '1' THEN 'Male' WHEN '2' THEN 'Female' end as sex_col,
CASE accident.accident_severity WHEN '1' THEN 'Fatal' WHEN '2' THEN 'Serious' WHEN '3' THEN 'Slight' end as serious_col
from
SQL.dbo.accident as accident
inner join SQL.dbo.vehicle as vehicle on
accident.accident_index = vehicle.accident_index
where
sex_of_driver != 3
and
sex_of_driver != -1
group by
accident.accident_severity,
vehicle.sex_of_driver
order by
accident.accident_severity
You seem to have a misunderstanding here.
GROUP BY will reduce your rows to a single row per grouping (ie per pair of sex_of_driver, accident_severity values. Any normal aggregates you use with this, such as COUNT(*), will return the aggregate value within that group.
Whereas OVER gives you a windowed aggregated, and means you are calculating it after reducing your rows. Therefore when you write count(accident_severity) over (partition by sex_of_driver, accident_severity) the aggregate only receives a single row in each partition, because the rows have already been reduced.
You say "I know I can do this with group by but I wanted to use a partition by in order to work out % too." but you are misunderstanding how to do that. You don't need PARTITION BY to work out percentage. All you need to calculate a percentage over the whole resultset is COUNT(*) * 1.0 / SUM(COUNT(*)) OVER (), in other words a windowed aggregate over a normal aggregate.
Note also that count(accident_severity) does not give you the number of distinct accident_severity values, it gives you the number of non-null values, which is probably not what you intend. You also have a very strange join predicate, you probably want something like a.vehicle_id = v.vehicle_id
So you want something like this:
select
sex_of_driver,
accident_severity,
count(*) as Count,
count(*) * 1.0 /
sum(count(*)) over (partition by sex_of_driver) as PercentOfSex
count(*) * 1.0 /
sum(count(*)) over () as PercentOfTotal
from
dbo.accident as accident a
inner join dbo.vehicle as v on
a.vehicle_id = v.vehicle_id
group by
sex_of_driver,
accident_severity;

How to calculate number of users who did multiple events in big query?

I tried using JOIN function but not sure if that is the right/smart way to do it.
I want to know the number of users who did "first_open" and "BA_HOME"SCREEN" in particular time period.
You provided following dataset:
Date
event_name
user_id
05052021
first_open
123
25052021
ba_home
435
The goal is to count the occurrences of two different strings in the column event_name in two separated columns for each user_id.
This can be done with the pivot statement:
Select *
from (
Select "a" as user_id, "first_open" as event_name
Union all select "b" , "BA_HOME_SCREEN"
Union all select "b" , "first_open"
)
PIVOT(count(1) FOR event_name IN ("first_open", "BA_HOME_SCREEN"))
see big query references
Another approche would be to use a case statement (an if statement would be possible as well):
select user_id, sum(case when event_name="first_open" then 1 else 0 end) as first_open
from ( ..... )
group by 1

How do I count the rows with a where clause in SQL Server?

I am pretty much stuck with a problem I am facing with SQL Server. I want to show in a query the amount of times that specific value occurs. This is pretty easy to do, but I want to take it a step further and I think the best way to explain on what I am trying to achieve is to explain it using images.
I have two tables:
Plant and
Chest
As you can see with the chest the column 'hoeveelheid' tells how full the chest is, 'vol' == 1 and 3/4 is == 0,75. In the plant table there is a column 'Hoeveelheidperkist' which tells how much plants there can be in 1 chest.
select DISTINCT kist.Plantnaam, kist.Plantmaat, count(*) AS 'Amount'
from kist
group by kist.plantnaam, kist.Plantmaat
This query counts all the chests, but it does not seperate the count of 'Vol' chests and '3/4' chests. It only does This. What I want to achieve is this. But I have no idea how. Any help would be much appreciated.
If you use group by you don't need distinct
and if you want the seprated count for hoeveelheid you ust add to the group by clause
select DISTINCT kist.Plantnaam, kist.Plantmaat, kist.hoeveelheid, count(*) AS 'Amount'
from kist
group by kist.plantnaam, kist.Plantmaat, hoeveelheid
or if you want all the 3 count ond the samw rowx you could use a condition aggreagtion eg:
select DISTINCT kist.Plantnaam, kist.Plantmaat
, sum(case when kist.hoeveelheid ='Vol' then 1 else 0 end) vol
, sum(case when kist.hoeveelheid ='3/3' then 1 else 0 end) 3_4
, count(*) AS 'Amount'
from kist
group by kist.plantnaam, kist.Plantmaat
When you want to filter the data on the counts you have to use having clause. When ever you are using aggregate functions(sum, count, min, max) and you want to filter them on aggregation basis, use having clause
select DISTINCT kist.Plantnaam, kist.Plantmaat, count(*) AS 'Amount'
from kist
group by kist.plantnaam, kist.Plantmaat having count(*) = 1 -- or provide necessary conditions

Hive query to return result from table when the condition doesn't meet

I am looking for hive query which returns me all results only if the DataSet doesn't have subject : History in it.. Please note the dataset will change every time
DataSet :
DataSet :
So first DataSet above has record with Subject : History , so 0 records should be returned and second dataset doesn't have subject : history so in this case all 4 records should be returned
Analytic function will return the same history_exists flag for all and every row:
select id, name, age, subject, score
from
(
select t.*, max(case when t,subject='History' then true else false end) over() as history_exists --the same value for all dataset
from your_table t
)s
where NOT history_exists;

Is there a way to calculate the average number of times an event happens when all data is stored as string?

I am working in BigQuery and using SQL to calculate the average number of ads viewed per user based on their engagement level (levels range from 1 - 5). I previously calculated the average number of days users were active based on their engagement level, but when I do average number of ads viewed based on engagement level the query fails. My guess is that the string for ads viewed is stored as a string.
Is there a way to average the number of times 'ad viewed' occurs in a list of events, based on engagement?
I tried changing the original code I used where I extracted 'Average Days' to extract 'Ads Viewed' but that does not work.
I tried average(count(if(ads.viewed,1,0))), but that won't work either. I can't figure out what I am doing wrong.
I also checked this post (SQL average of string values) but this doesn't seem to apply.
SELECT
engagement_level,
COUNT(event="ADSVIEWED") AS AverageAds
I have also tried:
SELECT
engagement_level,
AVG(IF(event="ADSVIEWED",1,0)) AS AverageAds
But that doesn't work either.
It should put out a table of the engagement level with the corresponding average. For 'Average Days' it worked out to be Engagement Level: Average Days (1: 2.45, 2: 3.21, 3: 4.67, etc.). But it doesn't work for the ads_viewed event.
If I understand correctly, you can do this without a subquery:
SELECT engagement_level,
COUNTIF(event = 'ADSVIEWED') / COUNT(DISTINCT user_id) as avg_per_user
FROM t
GROUP BY engagement_level;
This counts the number of events and divides by the number of users. If you only want to count users who have the event:
SELECT engagement_level,
COUNT(*) / COUNT(DISTINCT user_id) as avg_per_user
FROM t
WHERE event = 'ADSVIEWED'
GROUP BY engagement_level;
... to calculate the average number of ads viewed per user based on their engagement level ...
Below is for BigQuery Standard SQL
#standardSQL
SELECT engagement_level, AVG(Ads) AverageAds FROM (
SELECT engagement_level, user_id, COUNTIF(event = 'ADSVIEWED') Ads
FROM `project.dataset.table`
GROUP BY engagement_level, user_id
)
GROUP BY engagement_level
You can test, play with above using dummy data like in below example
#standardSQL
WITH `project.dataset.table` AS (
SELECT 1 user_id, 1 engagement_level, 'ADSVIEWED' event UNION ALL
SELECT 1, 1, 'a' UNION ALL
SELECT 1, 1, 'ADSVIEWED' UNION ALL
SELECT 2, 1, 'b' UNION ALL
SELECT 2, 1, 'ADSVIEWED'
)
SELECT engagement_level, AVG(Ads) AverageAds FROM (
SELECT engagement_level, user_id, COUNTIF(event = 'ADSVIEWED') Ads
FROM `project.dataset.table`
GROUP BY engagement_level, user_id
)
GROUP BY engagement_level
with result
Row engagement_level AverageAds
1 1 1.5