SQL SUM label summary - sql

Is there any possibility in SQL to add a ROW with a summary MEAN, for example, sum and average. For example, something like this
| 2021-01 | 16 |
| 2020-12 | 15 |
| -------- | -------------- |
| SUM | 31 |
| Mean | 15.5 |
My code:
proc sql;
create table diff as
select today.policy_vintage
, today.number_policy as POLICY_TODAY
, prior.number_policy as POLICY_PRIOR
, today.number_policy - prior.number_policy as DIFFERENCE
, avg(prior.number_policy) as POLICY_MEAN_PRIOR
, today.number_policy - mean(prior.number_policy) as DIFFRENCE_MEAN
from policy_vintage_weekly today
LEFT JOIN
(select *
from _work.POLICY_VINTAGE_WEEKLY
where run_date < today()
having run_date = max(run_date)
) prior
ON today.policy_vintage = prior.policy_vintage
;
quit;

If your table contains:
Date
value
2021-01-00
16
2020-12-00
15
Than this query will get you the result you want:
SELECT * FROM test.test
union
select "SUM", sum(value) from test.test
union
select "Mean", avg(value) from test.test;
+------------+---------+
| date | value |
+------------+---------+
| 2021-01-00 | 16.0000 |
| 2020-12-00 | 15.0000 |
| SUM | 31.0000 |
| Mean | 15.5000 |
+------------+---------+
4 rows in set (0.000 sec)
Tested on Mariadb 10.6.4
But having said that, it would be more something that is calculated in some client software you are using.

Related

Subtracting previous row value from current row

I'm doing an aggregation like this:
select
date,
product,
count(*) as cnt
from
t1
where
yyyy_mm_dd in ('2020-03-31', '2020-07-31', '2020-09-30', '2020-12-31')
group by
1,2
order by
product asc, date asc
This produces data which looks like this:
| date | product | cnt | difference |
|------------|---------|------|------------|
| 2020-03-31 | p1 | 100 | null |
| 2020-07-31 | p1 | 1000 | 900 |
| 2020-09-30 | p1 | 900 | -100 |
| 2020-12-31 | p1 | 1100 | 200 |
| 2020-03-31 | p2 | 200 | null |
| 2020-07-31 | p2 | 210 | 10 |
| ... | ... | ... | x |
But without the difference column. How could I make such a calculation? I could pivot the date column and subtract that way but maybe there's a better way
Was able to use lag with partition by and order by to get this to work:
select
date,
product,
count,
count - lag(count) over (partition by product order by date, product) as difference
from(
select
date,
product,
count(*) as count
from
t1
where
yyyy_mm_dd in ('2020-03-31', '2020-07-31', '2020-09-30', '2020-12-31')
group by
1,2
) t

SQL - get summary of differences vs previous month

I have a table similar to this one:
| id | store | BOMdate |
| 1 | A | 01/10/2018 |
| 1 | B | 01/10/2018 |
| 1 | C | 01/10/2018 |
|... | ... | ... |
| 1 | A | 01/11/2018 |
| 1 | C | 01/11/2018 |
| 1 | D | 01/11/2018 |
|... | ... | ... |
| 1 | B | 01/12/2018 |
| 1 | C | 01/12/2018 |
| 1 | E | 01/12/2018 |
It contains the stores that are active at BOM (beginning of month).
How do I query it to get the amount of stores that are new that month - those that where not active the previous month?
The output should be this:
| BOMdate | #newstores |
| 01/10/2018 | 3 | * no stores on previous month
| 01/11/2018 | 1 | * D is the only new active store
| 01/12/2018 | 2 | * store B was not active on November, E is new
I now how to count the first time that each store is active (nested select, taking the MIN(BOMdate) and then counting). But I have no idea how to check each month vs its previous month.
I use SQL Server, but I am interested in the differences in other platforms if there are any.
Thanks
How do I query it to get the amount of stores that are new that month - those that where not active the previous month?
One option uses not exists:
select bomdate, count(*) cnt_new_stores
from mytable t
where not exists (
select 1
from mytable t1
where t1.store = t.store and t1.bomdate = dateadd(month, -1, t.bomdate)
)
group by bomdate
You can also use window functions:
select bomdate, count(*) cnt_new_stores
from (
select t.*, lag(bomdate) over(partition by store order by bomdate) lag_bomdate
from mytable t
) t
where bomdate <> dateadd(month, 1, lag_bomdate) or lag_bomdate is null
group by bomdate
you can compare a date with previous month's date using DATEDIFF function of TSQL.
Using NOT EXIST you can count the stores which did not appear in last month as well you can get the names in a list using STRING_AGG function of TSQL introduced from SQL 2017.
select BOMDate, NewStoresCount=count(1),NewStores= STRING_AGG(store,',') from
yourtable
where not exists
(
Select 1 from
yourtable y where y.store=store and DATEDIFF(m,y.BOMDate,BOMDate)=1
)
group by BOMDate

SQL: Get an aggregate (SUM) of a calculation of two fields (DATEDIFF) that has conditional logic (CASE WHEN)

I have a dataset that includes a bunch of stay data (at a hotel). Each row contains a start date and an end date, but no duration field. I need to get a sum of the durations.
Sample Data:
| Stay ID | Client ID | Start Date | End Date |
| 1 | 38 | 01/01/2018 | 01/31/2019 |
| 2 | 16 | 01/03/2019 | 01/07/2019 |
| 3 | 27 | 01/10/2019 | 01/12/2019 |
| 4 | 27 | 05/15/2019 | NULL |
| 5 | 38 | 05/17/2019 | NULL |
There are some added complications:
I am using Crystal Reports and this is a SQL Expression, which obeys slightly different rules. Basically, it returns a single scalar value. Here is some more info: http://www.cogniza.com/wordpress/2005/11/07/crystal-reports-using-sql-expression-fields/
Sometimes, the end date field is blank (they haven't booked out yet). If blank, I would like to replace it with the current timestamp.
I only want to count nights that have occurred in the past year. If the start date of a given stay is more than a year ago, I need to adjust it.
I need to get a sum by Client ID
I'm not actually any good at SQL so all I have is guesswork.
The proper syntax for a Crystal Reports SQL Expression is something like this:
(
SELECT (CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
And that's giving me the correct value for a single row, if I wanted to do this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 210 | // only days since June 4 2018 are counted
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 2 |
| 4 | 27 | 05/15/2019 | NULL | 21 |
| 5 | 38 | 05/17/2019 | NULL | 19 |
But I want to get the SUM of Duration per client, so I want this:
| Stay ID | Client ID | Start Date | End Date | Duration |
| 1 | 38 | 01/01/2018 | 01/31/2019 | 229 | // 210+19
| 2 | 16 | 01/03/2019 | 01/07/2019 | 4 |
| 3 | 27 | 01/10/2019 | 01/12/2019 | 23 | // 2+21
| 4 | 27 | 05/15/2019 | NULL | 23 |
| 5 | 38 | 05/17/2019 | NULL | 229 |
I've tried to just wrap a SUM() around my CASE but that doesn't work:
(
SELECT SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END)
)
It gives me an error that the StayDateEnd is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. But I don't even know what that means, so I'm not sure how to troubleshoot, or where to go from here. And then the next step is to get the SUM by Client ID.
Any help would be greatly appreciated!
Although the explanation and data set are almost impossible to match, I think this is an approximation to what you want.
declare #your_data table (StayId int, ClientId int, StartDate date, EndDate date)
insert into #your_data values
(1,38,'2018-01-01','2019-01-31'),
(2,16,'2019-01-03','2019-01-07'),
(3,27,'2019-01-10','2019-01-12'),
(4,27,'2019-05-15',NULL),
(5,38,'2019-05-17',NULL)
;with data as (
select *,
datediff(day,
case
when datediff(day,StartDate,getdate())>365 then dateadd(year,-1,getdate())
else StartDate
end,
isnull(EndDate,getdate())
) days
from #your_data
)
select *,
sum(days) over (partition by ClientId)
from data
https://rextester.com/HCKOR53440
You need a subquery for sum based on group by client_id and a join between you table the subquery eg:
select Stay_id, client_id, Start_date, End_date, t.sum_duration
from your_table
inner join (
select Client_id,
SUM(CASE
WHEN StayDateStart < DATEADD(year,-1,CURRENT_TIMESTAMP) THEN DATEDIFF(day,DATEADD(year,-1,CURRENT_TIMESTAMP),ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
ELSE DATEDIFF(day,StayDateStart,ISNULL(StayDateEnd,CURRENT_TIMESTAMP))
END) sum_duration
from your_table
group by Client_id
) t on t.Client_id = your_table.client_id

SQL Server: most efficient way to update multiple records depending on each other

I want to update multiple records from table "a" depending on each other. The values of the table "a" look like:
+------------+---------------+-------+
| date | transfervalue | value |
+------------+---------------+-------+
| 01.03.2018 | 0 | 10 |
| 02.03.2018 | 0 | 6 |
| 03.03.2018 | 0 | 13 |
+------------+---------------+-------+
After the update the values of the table "a" should look like:
+------------+---------------+-------+
| date | transfervalue | value |
+------------+---------------+-------+
| 01.03.2018 | 0 | 10 |
| 02.03.2018 | 10 | 6 |
| 03.03.2018 | 16 | 13 |
+------------+---------------+-------+
What is the most efficient way to do this? I've tried three different solutions, but the last solution doesn't work.
Solution 1: do a loop and iterate over each day to do the update statement
Solution 2: do an update statement statement for each day
Solution 3: do the update for the whole timespan in one statement
The output of solution 3 was:
+------------+---------------+-------+
| date | transfervalue | value |
+------------+---------------+-------+
| 01.03.2018 | 0 | 10 |
| 02.03.2018 | 10 | 6 |
| 03.03.2018 | 6 | 13 |
+------------+---------------+-------+
You seem to want a cumulative sum:
with toupdate as (
select t.*, sum(value) over (order by date rows between unbounded preceding and 1 preceding) as running_value
from t
)
update toupdate
set transfervalue = coalesce(running_value, 0);
This should work:
select t1.*,
coalesce((select sum(value) from table1 t2 where t2.date < t1.date), 0) MyNewValue
from table1 t1

Union in outer query

I'm attempting to combine multiple rows using a UNION but I need to pull in additional data as well. My thought was to use a UNION in the outer query but I can't seem to make it work. Or am I going about this all wrong?
The data I have is like this:
+------+------+-------+---------+---------+
| ID | Time | Total | Weekday | Weekend |
+------+------+-------+---------+---------+
| 1001 | AM | 5 | 5 | 0 |
| 1001 | AM | 2 | 0 | 2 |
| 1001 | AM | 4 | 1 | 3 |
| 1001 | AM | 5 | 3 | 2 |
| 1001 | PM | 5 | 3 | 2 |
| 1001 | PM | 5 | 5 | 0 |
| 1002 | PM | 4 | 2 | 2 |
| 1002 | PM | 3 | 3 | 0 |
| 1002 | PM | 1 | 0 | 1 |
+------+------+-------+---------+---------+
What I want to see is like this:
+------+---------+------+-------+
| ID | DayType | Time | Tasks |
+------+---------+------+-------+
| 1001 | Weekday | AM | 9 |
| 1001 | Weekend | AM | 7 |
| 1001 | Weekday | PM | 8 |
| 1001 | Weekend | PM | 2 |
| 1002 | Weekday | PM | 5 |
| 1002 | Weekend | PM | 3 |
+------+---------+------+-------+
The closest I've come so far is using UNION statement like the following:
SELECT * FROM
(
SELECT Weekday, 'Weekday' as 'DayType' FROM t1
UNION
SELECT Weekend, 'Weekend' as 'DayType' FROM t1
) AS X
Which results in something like the following:
+---------+---------+
| Weekday | DayType |
+---------+---------+
| 2 | Weekend |
| 0 | Weekday |
| 2 | Weekday |
| 0 | Weekend |
| 10 | Weekday |
+---------+---------+
I don't see any rhyme or reason as to what the numbers are under the 'Weekday' column, I suspect they're being grouped somehow. And of course there are several other columns missing, but since I can't put a large scope in the outer query with this as inner one, I can't figure out how to pull those in. Help is greatly appreciated.
It looks like you want to union all a pair of aggregation queries that use sum() and group by id, time, one for Weekday and one for Weekend:
select Id, DayType = 'Weekend', [time], Tasks=sum(Weekend)
from t
group by id, [time]
union all
select Id, DayType = 'Weekday', [time], Tasks=sum(Weekday)
from t
group by id, [time]
Try with this
select ID, 'Weekday' as DayType, Time, sum(Weekday)
from t1
group by ID, Time
union all
select ID, 'Weekend', Time, sum(Weekend)
from t1
group by ID, Time
order by order by 1, 3, 2
Not tested, but it should do the trick. It may require 2 proc sql steps for the calculation, one for summing and one for the case when statements. If you have extra lines, just use a max statement and group by ID, Time, type_day.
Proc sql; create table want as select ID, Time,
sum(weekday) as weekdayTask,
sum(weekend) as weekendTask,
case when calculated weekdaytask>0 then weekdaytask
when calculated weekendtask>0 then weekendtask else .
end as Task,
case when calculated weekdaytask>0 then "Weekday"
when calculated weekendtask>0 then "Weekend"
end as Day_Type
from have
group by ID, Time
;quit;
Proc sql; create table want2 as select ID, Time, Day_Type, Task
from want
;quit;