SQL Add summary row for all days with same id - sql

We have the following results table from this query but how can we add a summary row to sum all of the days for the same ad id, as seen in the desired results table? Thanks.
Query:
SELECT
right(ad_id,6) AS ad_id,
CAST(date_start AS DATE) AS "Day",
objective,
SUM(impressions) AS Impressions,
sum(clicks) AS Clicks
FROM ads
WHERE date_start >= '2018-05-01' AND date_start < '2018-06-01'
GROUP BY ad_id, CAST(date_start AS DATE), objective
Order by ad_id, CAST(date_start AS DATE) desc
Results table:
+--------+----------+-------------+-------------+--------+
| ad_id | day | objective | impressions | clicks |
+--------+----------+-------------+-------------+--------+
| 36911 | 5/2/2018 | CONVERSIONS | 16689 | 160 |
| 36911 | 5/1/2018 | CONVERSIONS | 4223 | 59 |
| 37111 | 5/2/2018 | CONVERSIONS | 1964 | 9 |
| 37111 | 5/1/2018 | CONVERSIONS | 1409 | 19 |
| 279311 | 5/3/2018 | LINK_CLICKS | 309 | 10 |
| 279311 | 5/2/2018 | LINK_CLICKS | 2816 | 19 |
| 279311 | 5/1/2018 | LINK_CLICKS | 5876 | 66 |
| 279511 | 5/3/2018 | LINK_CLICKS | 3551 | 86 |
| 279511 | 5/2/2018 | LINK_CLICKS | 3334 | 76 |
| 279511 | 5/1/2018 | LINK_CLICKS | 17798 | 508 |
+--------+----------+-------------+-------------+--------+
Desired results table with summary row:
+--------+----------+-------------+-------------+--------+
| ad_id | day | objective | impressions | clicks |
+--------+----------+-------------+-------------+--------+
| 36911 | All | CONVERSIONS | 20912 | 219 |
| 36911 | 5/2/2018 | CONVERSIONS | 16689 | 160 |
| 36911 | 5/1/2018 | CONVERSIONS | 4223 | 59 |
| 37111 | All | CONVERSIONS | 3373 | 28 |
| 37111 | 5/2/2018 | CONVERSIONS | 1964 | 9 |
| 37111 | 5/1/2018 | CONVERSIONS | 1409 | 19 |
| 279311 | All | LINK_CLICKS | 9001 | 95 |
| 279311 | 5/3/2018 | LINK_CLICKS | 309 | 10 |
| 279311 | 5/2/2018 | LINK_CLICKS | 2816 | 19 |
| 279311 | 5/1/2018 | LINK_CLICKS | 5876 | 66 |
| 279511 | All | LINK_CLICKS | 24683 | 670 |
| 279511 | 5/3/2018 | LINK_CLICKS | 3551 | 86 |
| 279511 | 5/2/2018 | LINK_CLICKS | 3334 | 76 |
| 279511 | 5/1/2018 | LINK_CLICKS | 17798 | 508 |
+--------+----------+-------------+-------------+--------+

Use grouping sets:
SELECT COALESCE(right(ad_id, 6), 'All') AS ad_id,
CAST(date_start AS DATE) AS "Day",
objective,
SUM(impressions) AS Impressions,
sum(clicks) AS Clicks
FROM ads
WHERE date_start >= '2018-05-01' AND date_start < '2018-06-01'
GROUP BY GROUPING SETS ( (ad_id), (ad_id, CAST(date_start AS DATE), objective) )
Order by ad_id, CAST(date_start AS DATE) desc;
In earlier versions of Postgres, use a CTE and union all:
with t as (
SELECT right(ad_id, 6) AS ad_id,
CAST(date_start AS DATE) AS "Day",
objective,
SUM(impressions) AS Impressions,
sum(clicks) AS Clicks
FROM ads
WHERE date_start >= '2018-05-01' AND date_start < '2018-06-01'
GROUP BY GROUPING SETS (ad_id, CAST(date_start AS DATE), objective)
)
select *
from t
union all
select ad_id, NULL, 'All', sum(impressions), sum(clicks)
from t
group by ad_id
order by 1, 2 desc;

Related

How to get Max date and sum of its rows SQL

I have following table,
+------+-------------+----------+---------+
| id | date | amount | amount2 |
+------+-------------+----------+---------+
| | | | 500 |
| 1 | 1/1/2020 | 1000 | |
+------+-------------+----------+---------+
| | | | 100 |
| 1 | 1/3/2020 | 1558 | |
+------+-------------+----------+---------+
| | | | 200 |
| 1 | 1/3/2020 | 126 | |
+------+-------------+----------+---------+
| | | | 500 |
| 2 | 2/5/2020 | 4921 | |
+------+-------------+----------+---------+
| | | | 100 |
| 2 | 2/5/2020 | 15 | |
+------+-------------+----------+---------+
| | | | 140 |
| 2 | 1/1/2020 | 5951 | |
+------+-------------+----------+---------+
| | | | 10 |
| 2 | 1/2/2020 | 1588 | |
+------+-------------+----------+---------+
| | | | 56 |
| 2 | 1/3/2020 | 1568 | |
+------+-------------+----------+---------+
| | | | 45 |
| 2 | 1/4/2020 | 12558 | |
+------+-------------+----------+---------+
I need to get each Id's max date and its amount and amount2 summations, how can I do this. according to above data, I need following output.
+------+-------------+----------+---------+
| | | | 300 |
| 1 | 1/3/2020 | 1684 | |
+------+-------------+----------+---------+
| | | | 600 |
| 2 | 2/5/2020 | 4936 | |
+------+-------------+----------+---------+
How can I do this.
Aggregate and use MAX OVER to get the IDs' maximum dates:
select id, [date], sum_amount, sum_amount2
from
(
select
id, [date], sum(amount) as sum_amount, sum(amount2) as sum_amount2,
max([date]) over (partition by id) as max_date_for_id
from mytable group by id, [date]
) aggregated
where [date] = max_date_for_id
order by id;
first is to use dense_rank() to find the row with latest date
dense_rank () over (partition by id order by [date] desc)
after that, just simply group by with sum() on the amount
select id, [date], sum(amount), sum(amount2)
from
(
select *,
dr = dense_rank () over (partition by id order by [date] desc)
from your_table
) t
where dr = 1
group by id, [date]

SQL Windowing Aggregation Over Two Consecutive Date

I am not an SQL expert and finding this a bit challenging. Imagine I have the following table but with more users:
+---------+--------+--------+-------------+
| user_id | amount | date | sum_per_day |
+---------+--------+--------+-------------+
| user8 | 300 | 7/2/20 | 300 |
| user8 | 150 | 6/2/20 | 400 |
| user8 | 250 | 6/2/20 | 400 |
| user8 | 25 | 5/2/20 | 100 |
| user8 | 25 | 5/2/20 | 100 |
| user8 | 25 | 5/2/20 | 100 |
| user8 | 25 | 5/2/20 | 100 |
| user8 | 50 | 2/2/20 | 50 |
+---------+--------+--------+-------------+
As you see they are grouped by user_id. Now what I like to do is add a column called sum_over_two_day which satisfies the following conditions:
Grouped by user_id
For each user it is grouped by the date
The sum is then calculated per two consecutive calendar days for amount (today + previous calendar day)
So the output will be this:
+---------+--------+--------+-------------+------------------+
| user_id | amount | date | sum_per_day | sum_over_two_day |
+---------+--------+--------+-------------+------------------+
| user8 | 300 | 7/2/20 | 300 | 700 |
| user8 | 150 | 6/2/20 | 400 | 500 |
| user8 | 250 | 6/2/20 | 400 | 500 |
| user8 | 25 | 5/2/20 | 100 | 100 |
| user8 | 25 | 5/2/20 | 100 | 100 |
| user8 | 25 | 5/2/20 | 100 | 100 |
| user8 | 25 | 5/2/20 | 100 | 100 |
| user8 | 50 | 2/2/20 | 50 | 50 |
+---------+--------+--------+-------------+------------------+
The proper way is to use a window function with a RANGE clause:
SELECT user_id,
amount,
date,
sum(amount) OVER (PARTITION BY user_id
ORDER BY date
RANGE BETWEEN INTERVAL '1 day' PRECEDING
AND CURRENT ROW)
AS sum_over_two_day
FROM atable
ORDER BY user_id, date;
user_id | amount | date | sum_over_two_day
---------+--------+------------+------------------
user8 | 50 | 2020-02-02 | 50
user8 | 25 | 2020-02-05 | 100
user8 | 25 | 2020-02-05 | 100
user8 | 25 | 2020-02-05 | 100
user8 | 25 | 2020-02-05 | 100
user8 | 250 | 2020-02-06 | 500
user8 | 150 | 2020-02-06 | 500
user8 | 300 | 2020-02-07 | 700
(8 rows)
Try this workaround for your problem:
select
t1.user_id,
t1.amount,
date(t1.date_),
(select sum(amount) from tab where user_id=t1.user_id and date_=t1.date_ ),
(select sum(amount) from tab where user_id=t1.user_id and date_ between t1.date_-1 and t1.date_ )
from tab t1
with Window function for first sum
select
t1.user_id,
t1.amount,
date(t1.date_),
sum(t1.amount) over (partition by t1.user_id,t1.date_),
(select sum(amount) from tab where user_id=t1.user_id and date_ between t1.date_-1 and t1.date_ )
from tab t1
see DEMO

Find rows with adjourning date ranges and accumulate their durations

My PostgreSQL database stores school vacation, public holidays and weekend dates for parents to plan their vacation. Many times school vacations are adjourned by weekends or public holidays. I want to display the total number of non-school days for a school vacation. That should include any adjourned weekend or public holiday.
Example Data
locations
SELECT id, name, is_federal_state
FROM locations
WHERE is_federal_state = true;
| id | name | is_federal_state |
|----|-------------------|------------------|
| 2 | Baden-Württemberg | true |
| 3 | Bayern | true |
holiday_or_vacation_types
SELECT id, name FROM holiday_or_vacation_types;
| id | name |
|----|-----------------------|
| 1 | Herbst |
| 8 | Wochenende |
"Herbst" is German for "autumn" and "Wochenende" is German for "weekend".
periods
SELECT id, starts_on, ends_on, holiday_or_vacation_type_id
FROM periods
WHERE location_id = 2
ORDER BY starts_on;
| id | starts_on | ends_on | holiday_or_vacation_type_id |
|-----|--------------|--------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 8 |
Task
I want to select all periods where location_id equals 2. And I want to calculate the duration of each period in days. That can be done with this SQL query:
SELECT id, starts_on, ends_on,
(ends_on - starts_on + 1) AS duration,
holiday_or_vacation_type_id
FROM periods
| id | starts_on | ends_on | duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | 8 |
Any human looking at the calendar would see that the ids 670 (weekend), 532 (fall vacation) and 533 (fall vacation) are adjourned. So they add up to a 6 day vacation period. So far I do this with a program which computes this. But that takes quite a lot of resources (the actual table contains some 500,000 items).
Problem 1
Which SQL query would result in the following output (is adds a real_duration column)? Is that even possible with SQL?
| id | starts_on | ends_on | duration | real_duration | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 6 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 6 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 6 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | 2 | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | 2 | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | 2 | 8 |
Problem 2
It is possible to list the adjourning periods in a part_of_range field? This would be the result. Can that be done with SQL?
| id | starts_on | ends_on | duration | part_of_range | holiday_or_vacation_type_id |
|-----|--------------|--------------|----------|---------------|-----------------------------|
| 670 | "2019-10-26" | "2019-10-27" | 2 | 670,532,533 | 8 |
| 532 | "2019-10-28" | "2019-10-30" | 3 | 670,532,533 | 1 |
| 533 | "2019-10-31" | "2019-10-31" | 1 | 670,532,533 | 1 |
| 671 | "2019-11-02" | "2019-11-03" | 2 | | 8 |
| 672 | "2019-11-09" | "2019-11-10" | 2 | | 8 |
| 673 | "2019-11-16" | "2019-11-17" | 2 | | 8 |
This is a gaps and islands problem. In this case you can use lag() to see where an island starts and then a cumulative sum.
The final operation is some aggregation (using window functions):
SELECT p.*,
(Max(ends_on) OVER (PARTITION BY location_id, grp) - Min(starts_on) OVER (PARTITION BY location_id, grp) ) + 1 AS duration,
Array_agg(p.id) OVER (PARTITION BY location_id)
FROM (SELECT p.*,
Count(*) FILTER (WHERE prev_eo < starts_on - INTERVAL '1 day') OVER (PARTITION BY location_id ORDER BY starts_on) AS grp
FROM (SELECT id, starts_on, ends_on, location_id, holiday_or_vacation_type_id,
lag(ends_on) OVER (PARTITION BY location_id ORDER BY (starts_on)) AS prev_eo
FROM periods
) p
) p;

Display null value in record Oracle

I need to display '0' value in Avg Delay Departure table based on STATION in Schedule table.
Here's the Schedule table,
+---------+---------+----------+
| Station | On Time | Schedule |
+---------+---------+----------+
| AMQ | 174 | 202 |
| AMS | 21 | 27 |
| BDJ | 182 | 210 |
| BDO | 56 | 62 |
| BEJ | 59 | 62 |
| BIK | 74 | 93 |
| BKK | 81 | 87 |
| BKS | 73 | 87 |
| BMU | 60 | 60 |
| BOM | 2 | 7 |
| BPN | 413 | 452 |
+---------+---------+----------+
Here's the Avg Delay table,
+---------+---------------------+
| Station | Avg Delay Departure |
+---------+---------------------+
| AMQ | 53.21 |
| AMS | 49.5 |
| BDJ | 60.78 |
| BDO | 67.66 |
| BEJ | 46.33 |
| BIK | 47.53 |
| BKK | 55.5 |
| BKS | 67.56 |
| BOM | 45.2 |
| BPN | 53.81 |
+---------+---------------------+
Pay attention to BMU record in Schedule table. It has 60 schedules and 60 on time so there's no delay. I want to display the BMU record in Avg Delay table with value of '0' for Avg Delay Departure column. My current query don't display that.
Here's the query for Avg Delay table,
SELECT DEPAIRPORT AS STATION, to_number(to_char(trunc(sysdate) + avg(cast(ACTUAL_BLOCKOFF_LC as date) - cast(SCHEDULED_DEPDT_LC as date)), 'sssss'))/60 as DEPAVERAGE
FROM DBODSXML4OPS.XML4OPS
WHERE ACTUAL_BLOCKOFF_LC IS NOT NULL AND SERVICETYPE IN ('J','G') AND (ACTUAL_BLOCKOFF_LC - SCHEDULED_DEPDT_LC)*24*60 > '+000000015 00:00:00.000000000'
AND STATUS IN ('Scheduled') AND
TO_CHAR(SCHEDULED_DEPDT_LC, 'yyyy-mm-dd') BETWEEN '2018-04-14' AND '2018-05-14'
GROUP BY DEPAIRPORT
ORDER BY STATION ASC;

Difference between current and previous column using OLAP functions

I was asked to create a report (using Teradata SQL OLAP functions) as below
EMPL_ID | perd_end_d | pdct_I | Year to date sal Amnt | Diff in sale amnt from Prev month
-------------------------------------------------------------------------------------------
I was given the following "sales" dataset and I have to calculate "Year to date sale amount" and "difference in crrent and previous month's sale amount"
empl_id| perd_end_d | pdct_I|sale_amnt|
----------------------------------------
E1001 | 31-01-2010 | P2003 | 2,03 |
E1003 | 31-01-2010 | P2015 | 44 |
E1003 | 31-01-2010 | P2004 | 67,6 |
E1001 | 31-01-2010 | P2002 | 135 |
E1003 | 31-01-2010 | P2003 | 545 |
E1001 | 31-01-2010 | P2001 | 1,00 |
E1002 | 31-01-2010 | P2005 | 23 |
E1002 | 31-01-2010 | P2007 | 343 |
E1006 | 28-02-2010 | P2005 | 34 |
E1006 | 28-02-2010 | P2004 | 43 |
E1001 | 28-02-2010 | P2003 | 54 |
E1001 | 28-02-2010 | P2002 | 878 |
E1003 | 28-02-2010 | P2008 | 434 |
E1001 | 28-02-2010 | P2001 | 66 |
E1007 | 28-02-2010 | P2009 | 455 |
E1007 | 28-02-2010 | P2009 | 4,54 |
E1003 | 28-02-2010 | P2007 | 56 |
E1008 | 28-02-2010 | P2009 | 786 |
E1010 | 31-01-2011 | P2001 | 300 |
E1001 | 31-01-2011 | P2002 | 200 |
E1009 | 31-01-2011 | P2003 | 100 |
E1011 | 31-01-2012 | P2004 | 700 |
E1002 | 31-01-2012 | P2005 | 400 |
E1011 | 31-01-2012 | P2003 | 600 |
E1002 | 31-01-2012 | P2007 | 500 |
---------------------------------------
I want something like below
empl_id| perd_end_d | pdct_I|sale_amnt| diff(ur_mnt_sal - prev_mnt_sal)
-------------------------------------------------------------------------
E1001 | 31-01-2010 | P2003 | 2,03 | 203 -- or may be null
E1003 | 31-01-2010 | P2015 | 44 | 159
E1003 | 31-01-2010 | P2004 | 67,6 | 632
E1001 | 31-01-2010 | P2002 | 135 | 541
E1003 | 31-01-2010 | P2003 | 545 | 410
...
So far I managed to find the required result but it looks ugly, how can I improve the following solution.
SELECT perd_end_d
, pdct_I
, sale_amnt
, ABS( SUM(sale_amnt) over (partition by perd_end_d
order by perd_end_d
rows between 1 preceding and 1 preceding )
- SUM(sale_amnt) over (partition by perd_end_d
order by perd_end_d
rows current row ) )"prev_mnt_sal - cur_mnt_sal"
from sandbox.sales;
and the resultset is as following
SELECT perd_end_d
, pdct_I
, sale_amnt
, ABS( min(sale_amnt) over (partition by perd_end_d
order by perd_end_d
rows between 1 preceding and 1 preceding )
- sale_amnt) as "prev_mnt_sal - cur_mnt_sal"
from sandbox.sales;
To probably want something like this:
SELECT empl_id
, perd_end_d
, sum(sale_amnt) as sumsale
-- cumulative sum of sales per employee
, SUM(sumsale)
over (partition by empl_id
order by perd_end_d
rows unbounded preceding)
-- difference between current and previous month per employee
, sumsale -
SUM(sumsale)
over (partition by empl_id
order by perd_end_d
rows between 1 preceding and 1 preceding )
from sandbox.sales
group by 1,2;