SQL How to Query Total & Subtotal - sql

I have a table looks like below where day, order_id, and order_type are stored.
select day, order_id, order_type
from sample_table
day
order_id
order_type
2021-03-01
1
offline
2021-03-01
2
offline
2021-03-01
3
online
2021-03-01
4
online
2021-03-01
5
offline
2021-03-01
6
offline
2021-03-02
7
online
2021-03-02
8
online
2021-03-02
9
offline
2021-03-02
10
offline
2021-03-03
11
offline
2021-03-03
12
offline
Below is desired output:
day
total_order
num_offline_order
num_online_order
2021-03-01
6
4
2
2021-03-02
4
2
2
2021-03-03
2
2
0
Does anybody know how to query to get the desired output?

You need to pivot the data. A simple way to implement conditional aggregation in Vertica uses :::
select day, count(*) as total_order,
sum( (order_type = 'online')::int ) as num_online,
sum( (order_type = 'offline')::int ) as num_offline
from t
group by day;

Use case and sum:
select day,
count(1) as total_order
sum(case when order_type='offline' then 1 end) as num_offline_order,
sum(case when order_type='online' then 1 end) as num_online_order
from sample_table
group by day
order by day

you can also use count to aggregate values that are not null
select
day,
count(*) as total_order,
count(case when order_type='offline' then 1 else null end) as offline_orders,
count(case when order_type='online' then 1 else null end) as online_orders
from sample_table
group by day
order by day;

Related

SQL query to get top 24 records, then average the first 12 and bottom 12

I'm attempting to analyze each account's performance (A_Count & B_Count) during their first year versus their second year. This should only return clients who have at least 24 months of totals (records).
Volume Table
Account
ReportDate
A_Count
B_Count
1001A
2019-01-01
47
100
1001A
2019-02-01
50
105
1002A
2019-02-01
50
105
I think I'm on the right track by wanting to grab the top 24 records for each account (only if 24 exist) and then grabbing the top 12 and bottom 12, but not sure how to get there.
I guess ideal output would be:
Account
YR1_A_Avg
YR1_B_Avg
YR2_A_Avg
YR2_B_Avg
FirstDate
LastDate
1001A
47
100
53
115
2019-01-01
2021-12-31
1002A
50
105
65
130
2019-02-01
2022-01-01
1003A
15
180
38
200
2017-05-01
2019-04-01
I'm not too worried about performance.
Assuming there are no gaps in ReportDate (per Account).
select Account
,avg(case when year_index = 1 then A_Count end) as YR1_A_Avg
,avg(case when year_index = 1 then B_Count end) as YR1_B_Avg
,avg(case when year_index = 2 then A_Count end) as YR2_A_Avg
,avg(case when year_index = 2 then B_Count end) as YR2_B_Avg
,min(ReportDate) as FirstDate
,max(ReportDate) as LastDate
from
(
select *
,count(*) over(partition by Account) as cnt
,(row_number() over(partition by Account order by ReportDate)-1)/12 +1 as year_index
from Volume
) t
where cnt >= 24 and year_index <= 2
group by Account

SQL cohort calculations

I have my table of players activity like this:
user_id
event_name
install_date
event_date
1
active
2021-03-01
2021-03-01
1
active
2021-03-01
2021-03-01
1
active
2021-03-01
2021-03-02
2
active
2021-03-02
2021-03-02
2
active
2021-03-02
2021-03-04
2
active
2021-03-02
2021-03-04
and I want to calculate cohort retention like this
user_id
install_date
ret0
ret1
ret2
1
2021-03-01
1
1
0
2
2021-03-02
1
0
1
Help me please to write sql query. Thanks)
If I understand correctly, you just want to compare the event_date to the install_date and keep track of when "x" days appear between the two:
select user_id,
max(case when event_date = install_date then 1 else 0 end) as ret1,
max(case when event_date = date_add(install_date, interval 1 day) then 1 else 0 end) as ret1,
max(case when event_date = date_add(install_date, interval 2 day) then 1 else 0 end) as ret2
from t
group by user_id;
Consider below approach - less verbose and better manageable and expandable to more generic cases
select * from (
select user_id, install_date, date_diff(event_date, install_date, day) diff
from `project.dataset.table`
)
pivot (count(diff) as ret for diff in (0, 1, 2))
if applied to sample data in your question - output is
Btw, if you want to output 1 or 0 in respective columns - you can adjust above to
select * from (
select user_id, install_date, date_diff(event_date, install_date, day) diff
from `project.dataset.table`
group by 1,2,3
)
pivot (count(diff) as ret for diff in (0, 1, 2))
in this case - output is

Google Big Query - Calculating monthly totals by status based on multiple date conditionals

I have table with the following data:
customer_id subscription_id plan status trial_start trial_end activated_at cancelled_at
1 jg1 basic cancelled 2020-06-26 2020-07-14 2020-07-14 2020-09-25
2 ab1 basic cancelled 2020-08-10 2020-08-24 2020-08-24 2021-02-15
3 cf8 basic cancelled 2020-08-25 2020-09-04 2020-09-04 2020-10-24
4 bc2 basic active 2020-10-12 2020-10-26 2020-10-26
5 hg4 basic active 2021-01-09 2021-02-08 2021-02-08
6 cd5 basic in-trial 2021-02-26
As you notice from the table, status = in_trial when a subscription is in trial. When subscription converts from in_trial to active there is activated_at date. When an in_trial or active subscription is cancelled, status switches to cancelled and cancelled_at date is present. Status column always shows only most recent status of a subscription. For every change in status a new row does not appear for subscription. For every change in status, status is changed, and appropriate dates appear to reflect time when status was changed.
My goal is to calculate, month-over-month, how many subscriptions are in status = in_trial, how many are in status = active and how many are in status = cancelled. Because status column reflects the most recent status of subscription, a query has to be able to determine how many subscriptions were in status = in_trial, status = active, and status = active based on available dates column.
If a particular subscription had multiple statuses in a given month (for example, subscription_id = ab1 was in trial in Aug-2020 and also converted to active in Aug-2020), I want only the most recent status to be considered for that subscription. So, as example, for subscription_id = ab1 I want it to be counted as active subscription for the month of Aug-2020.
The output I am looking for is:
date in_trial active cancelled
2020-06-01 1 0 0
2020-07-01 0 1 0
2020-08-01 1 2 0
2020-09-01 0 2 1
2020-10-01 0 2 1
2020-11-01 0 2 0
2020-12-01 0 2 0
2021-01-01 1 2 0
2021-02-01 1 2 1
2021-03-01 1 2 0
Or, results can be displayed in a different format, as long as numbers are correct. Another example of output can be:
date status count
2020-06-01 in_trial 1
2020-06-01 active 0
2020-06-01 cancelled 0
2020-07-01 in_trial 0
2020-07-01 active 1
2020-07-01 cancelled 0
... ... ...
2021-03-01 in_trial 1
2021-03-01 active 2
2021-03-01 cancelled 0
Below is the query you can use to reproduce the example table provided in this question:
SELECT 1 AS customer_id, 'jg1' AS subscription_id, 'basic' AS plan, 'cancelled' AS status, '2020-06-26' AS trial_start, '2020-07-14' AS trial_end, '2020-07-14' AS activated_at, '2020-09-25' AS cancelled_at UNION ALL
SELECT 2 AS customer_id, 'ab1' AS subscription_id, 'basic' AS plan, 'cancelled' AS status, '2020-08-10' AS trial_start, '2020-08-24' AS trial_end, '2020-08-24' AS activated_at, '2021-02-15' AS cancelled_at UNION ALL
SELECT 3 AS customer_id, 'cf8' AS subscription_id, 'basic' AS plan, 'cancelled' AS status, '2020-08-25' AS trial_start, '2020-09-04' AS trial_end, '2020-09-04' AS activated_at, '2020-10-24' AS cancelled_at UNION ALL
SELECT 4 AS customer_id, 'bc2' AS subscription_id, 'basic' AS plan, 'active' AS status, '2020-10-12' AS trial_start, '2020-10-26' AS trial_end, '2020-10-26' AS activated_at, '' AS cancelled_at UNION ALL
SELECT 5 AS customer_id, 'hg4' AS subscription_id, 'basic' AS plan, 'active' AS status, '2021-01-09' AS trial_start, '2021-02-08' AS trial_end, '2021-02-08' AS activated_at, '' AS cancelled_at UNION ALL
SELECT 6 AS customer_id, 'cd5' AS subscription_id, 'basic' AS plan, 'in_trial' AS status, '2021-02-26' AS trial_start, '' AS trial_end, '' AS activated_at, '' AS cancelled_at
I have been working on this problem since yesterday morning and continuing to figure out a way to do this efficiently. Thank you in advance for helping me solve this problem.
Below should work for you
select month,
count(distinct if(status = 0, customer_id, null)) in_trial,
count(distinct if(status = 1, customer_id, null)) active,
count(distinct if(status = 2, customer_id, null)) canceled
from (
select month, customer_id,
array_agg(status order by status desc limit 1)[offset(0)] status
from (
select distinct customer_id, 0 status, date_trunc(date, month) month
from `project.dataset.table`,
unnest(generate_date_array(date(trial_start), ifnull(date(trial_end), current_date()))) date
union all
select distinct customer_id, 1 status, date_trunc(date, month) month
from `project.dataset.table`,
unnest(generate_date_array(date(activated_at), ifnull(date(cancelled_at), current_date()))) date
union all
select distinct customer_id, 2 status, date_trunc(date(cancelled_at), month) month
from `project.dataset.table`
)
where not month is null
group by month, customer_id
)
group by month
# order by month
If applied to sample data in your question - output is

DB2/SQL aggregates with preceeding weekdays

I have a query that currently gets daily records against a weekly number from a prepopulated table:
SELECT Employee,
sum(case when category = 'Shirts' then daily_total else 0 end) as Shirts_DAILY,
sum(case when category = 'Shirts' then weekly_quota else 0 end) as Shirts_QUOTA, -- this is a static column, this number is the same for every record
sum(case when category = 'Shoes' then daily_total else 0 end) as Shoes_DAILY,
sum(case when category = 'Shoes' then weekly_quota else 0 end) as Shoes_QUOTA, -- this is a static column, this number is the same for every record
CURRENT_DATE as DATE_OF_REPORT
from SalesNumbers
where date_of_report >= current_date
group by Employee;
This runs in a script nightly and returns records like this:
Employee | shirts_DAILY | shirts_QUOTA | Shoes_DAILY | Shoes_QUOTA | DATE_OF_REPORT
--------------------------------------------------------------------------------------------------------
123 15 75 14 85 2019-08-30
That's the record from last Friday Night's report. I'm trying to figure out a way to add a column for each category that would take the sum of daily totals (shirts_DAILY, shoes_DAILY) for each category on preceding weekdays (running sunday through saturday as a week) and divide by that category's quota (shirts_QUOTA, shoes_QUOTA).
For example, here are records from sunday through thursday
Employee | shirts_DAILY | shirts_QUOTA | Shoes_DAILY | Shoes_QUOTA | DATE_OF_REPORT
--------------------------------------------------------------------------------------------------------
123 15 75 16 85 2019-08-25
123 4 75 2 85 2019-08-26
123 8 75 6 85 2019-08-27
123 2 75 8 85 2019-08-28
123 15 75 14 85 2019-08-29
With my new change, I would want Friday night's record to take the sum of sunday through thursday's daily records and divide by the quota (including friday's daily in the sum)
Friday night's record with new column:
Employee | shirts_DAILY | shirts_QUOTA | shirtsPercent | Shoes_DAILY | Shoes_QUOTA | shoesPercent | DATE_OF_REPORT
-----------------------------------------------------------------------------------------------------------------------------------------------
123 2 75 61.3 7 85 62.4 2019-08-30
So friday's run added 15,4,8,2,15,2 for the shirts for 46/75 and 7,14,8,6,2,16 for shoes for 53/85. So the daily sum of each for the preceding week, including present day daily totals, if that makes sense.
What is the best way for me to achieve this?
SELECT Employee,
sum(case when category = 'Shirts' and date_of_report >= current date then
daily_total else 0 end) as Shirts_DAILY,
sum(case when category = 'Shirts' and date_of_report >= current date then
weekly_quota else 0 end) as Shirts_QUOTA,
( sum(case when category = 'Shirts' then
daily_total else 0 end) * 100 ) /
( sum(case when category = 'Shirts' and date_of_report >= current date then
weekly_quota else 0 end) ) as Shirts_PERCENT,
CURRENT_DATE as DATE_OF_REPORT
from SalesNumbers
where date_of_report >= ( current date - ( dayofweek(current date) - 1 ) days )
group by Employee

SQL count number of users every 7 days

I am new to SQL and I need to find count of users every 7 days. I have a table with users for every single day starting from April 2015 up until now:
...
2015-05-16 00:00
2015-05-16 00:00
2015-05-17 00:00
2015-05-17 00:00
2015-05-17 00:00
2015-05-17 00:00
2015-05-17 00:00
2015-05-18 00:00
2015-05-18 00:00
...
and I need to count the number of users every 7 days (weekly) so I have data weekly.
SELECT COUNT(user_id), Activity_Date FROM TABLE_NAME
I need output like this:
TotalUsers week1 week2 week3 ..........and so on
82 80 14 16
I am using DB Visualizer to query Oracle database.
You should try following,
Select
sum(Week1) + sum(Week2) + sum(Week3) + sum(Week4) + sum(Week5) as Total,
sum(Week1) as Week1,
sum(Week2) as Week2,
sum(Week3) as Week3,
sum(Week4) as Week4,
sum(Week5) as Week5
From (
select
case when week = 1 then 1 else 0 end as Week1,
case when week = 2 then 1 else 0 end as Week2,
case when week = 3 then 1 else 0 end as Week3,
case when week = 4 then 1 else 0 end as Week4,
case when week = 5 then 1 else 0 end as Week5
from
(
Select
CEILING(datepart(dd,visitdate)/7+1) week,
user_id
from visitor
)T
)D
Here is Fiddle
You need to add month & year in the result as well.
SELECT COUNT(user_id), Activity_Date FROM TABLE_NAME WHERE Activity_Date > '2015-06-31';
That would get the amount of users for the last 7 days.
This is my test table:
user_id act_date
1 01/04/2015
2 01/04/2015
3 04/04/2015
4 05/04/2015
..
This is my query:
select week_offset, count(*) nb from (
select trunc((act_date-to_date('01042015','DDMMYYYY'))/7) as week_offset from test_date)
group by week_offset
order by 1
and this is the output:
week_offset nb
0 6
1 3
4 5
5 7
6 3
7 1
18 1
Week offset is the number of the week from 01/04/2015, and we can show the first day of the week.
See here for live testing.
How do you define your weeks? Here's an approach for SQL Server that starts each seven-day block relative to the start of April. The expressions will vary according to your specific needs:
select
dateadd(
dd,
datediff(dd, cast('20150401' as date), Activity_Date) / 7 * 7,
cast('20150401' as date)
) as WeekStart,
count(*)
from T
group by datediff(dd, cast('20150401' as date), Activity_Date) / 7
Oracle:
select
trunc(Activity_date, 'DAY') as WeekStart,
count(*)
from T
group by trunc(Activity_date, 'DAY') /* D and DAY are the same thing */