How do you get average of sums in SQL (multi-level aggregation)? - sql

I have a simplified table xx as follows:
rdate date
rtime time
rid integer
rsub integer
rval integer
primary key on (rdate,rtime,rid,rsub)
and I want to get the average (across all times) of the sums (across all ids) of the values.
By way of a sample table, I have (with consecutive identical values blanked out for readability):
rdate rtime rid rsub rval
-------------------------------------
2010-01-01 00.00.00 1 1 10
2 20
2 1 30
2 40
01.00.00 1 1 50
2 60
2 1 70
2 80
02.00.00 1 1 90
2 100
2010-01-02 00.00.00 1 1 999
I can get the sums I want with:
select rdate,rtime,rid, sum(rval) as rsum
from xx
where rdate = '2010-01-01'
group by rdate,rtime,rid
which gives me:
rdate rtime rid rsum
-------------------------------
2010-01-01 00.00.00 1 30 (10+20)
2 70 (30+40)
01.00.00 1 110 (50+60)
2 150 (70+80)
02.00.00 1 190 (90+100)
as expected.
Now what I want is the query that will also average those values across the time dimension, giving me:
rdate rtime ravgsum
----------------------------
2010-01-01 00.00.00 50 ((30+70)/2)
01.00.00 130 ((110+150)/2)
02.00.00 190 ((190)/1)
I'm using DB2 for z/OS but I'd prefer standard SQL if possible.

select rdate,rtime,avg(rsum) as ravgsum from (
select rdate,rtime,rid, sum(rval) as rsum
from xx
where rdate = '2010-01-01'
group by rdate,rtime,rid
) as subq
group by rdate,rtime

How about
select rdate,rtime, sum(rsum) / count(rsum) as sumavg
from
(select rdate, rtime, rid, sum(rval) as rsum
from xx
where rdate = '2010-01-01'
group by rdate,rtime,rid) as subq
group by rdate,rtime

Related

t-sql to summarize range of dates from flat list of dates, grouped by other columns

Suppose I had the following table:
UserId AttributeId DateStart
1 3 1/1/2020
1 4 1/9/2020
1 3 2/2/2020
2 3 3/5/2020
2 3 4/1/2020
2 3 5/1/2020
For each unique UserId/AttributeId pair, it is assumed that the DateEnd is the day prior to the next DateStart for that pair, otherwise it is null (or some default like crazy far into the future - 12/31/3000).
Applying this operation to the above table would yield:
UserId AttributeId DateStart DateEnd
1 3 1/1/2020 2/1/2020
1 4 1/9/2020 <null>
1 3 2/2/2020 <null>
2 3 3/5/2020 3/31/2020
2 3 4/1/2020 4/30/2020
2 3 5/1/2020 <null>
What T-SQL, executing in SQL Server 2008 R2, would accomplish this?
I have changed query)
Try this please:
SELECT
UserId,AttributeId,DateStart,Min(DateEnd)DateEnd
FROM
(
SELECT X.UserId,X.AttributeId,X.DateStart, DATEADD(DD,-1,Y.DateStart) DateEnd
FROM TAB X LEFT JOIN TAB Y
ON (X.UserId=Y.UserId) AND (X.AttributeId=Y.AttributeId)
AND (X.DateStart<Y.DateStart)
)
T
GROUP BY UserId,AttributeId,DateStart
ORDER BY DateStart
You are describing lead():
select t.*,
dateadd(day, -1, lead(dateStart) over (partition by userId, attributeId order by dateStart)) as dateEnd
from t;

How to get latest records based on two columns of max

I have a table called Inventory with the below columns
item warehouse date sequence number value
111 100 2019-09-25 12:29:41.000 1 10
111 100 2019-09-26 12:29:41.000 1 20
222 200 2019-09-21 16:07:10.000 1 5
222 200 2019-09-21 16:07:10.000 2 10
333 300 2020-01-19 12:05:23.000 1 4
333 300 2020-01-20 12:05:23.000 1 5
Expected Output:
item warehouse date sequence number value
111 100 2019-09-26 12:29:41.000 1 20
222 200 2019-09-21 16:07:10.000 2 10
333 300 2020-01-20 12:05:23.000 1 5
Based on item and warehouse, i need to pick latest date and latest sequence number of value.
I tried with below code
select item,warehouse,sequencenumber,sum(value),max(date) as date1
from Inventory t1
where
t1.date IN (select max(date) from Inventory t2
where t1.warehouse=t2.warehouse
and t1.item = t2.item
group by t2.item,t2.warehouse)
group by t1.item,t1.warehouse,t1.sequencenumber
Its working for latest date but not for latest sequence number.
Can you please suggest how to write a query to get my expected output.
You can use row_number() for this:
select *
from (
select
t.*,
row_number() over(
partition by item, warehouse
order by date desc, sequence_number desc, value desc
) rn
from mytable t
) t
where rn = 1

SQL : Sum by criteria

I'm working with Oracle and cannot achieve the query I need for the moment.
Suppose I have the following table :
- ID Date Type Value
- 1 01/12/2016 prod 1
- 2 01/01/2017 test 10
- 3 01/06/2017 test 20
- 4 01/12/2017 prod 30
- 5 15/12/2017 test 40
- 6 01/01/2018 test 50
- 7 01/06/2018 test 60
- 8 01/12/2018 prod 70
I need to sum the VALUES between the "prod" TYPES + the last "prod" VALUE.
The results should be :
- 1 01/01/2016 - 1
- 2 01/01/2017 - 60
- 3 01/06/2017 - 60
- 4 01/12/2017 - 60
- 5 15/12/2017 - 220
- 6 01/01/2018 - 220
- 7 01/06/2018 - 220
- 8 01/12/2018 - 220
I first had to sum VALUES by YEAR without taking TYPES into account.
The need changed and I don't see how to start to identify, for each line, which is the previous "prod" DATE and sum each VALUE including the last "prod" TYPE.
Thanks
You can define the groups using a cumulative sum on type = 'PROD' -- in reverse, then use a window function for the final summation:
select t.*,
sum(value) over (partition by grp) as total
from (select t.*,
sum(case when type = 'PROD' then 1 else 0 end) over (order by id desc) as grp
from t
) t
order by id;
To see the grouping logic, look at:
ID Date Type Value Grp
1 01/12/2016 prod 1 3
2 01/01/2017 test 10 2
3 01/06/2017 test 20 2
4 01/12/2017 prod 30 2
5 15/12/2017 test 40 1
6 01/01/2018 test 50 1
7 01/06/2018 test 60 1
8 01/12/2018 prod 70 1
This identifies the groups that need to be summed. The DESC is because "prod" ends a group. If "prod" started a group (i.e. was included with the sum on the next row), then ASC would be used.
Rextester Demo
Gordon Linoff's answer is great.
This below is just for a bit of a different flavor(12c+)
Setup:
ALTER SESSION SET NLS_DATE_FORMAT = 'DD/MM/YYYY';
CREATE TABLE TEST_TABLE(
THE_ID INTEGER,
THE_DATE DATE,
THE_TYPE CHAR(4),
THE_VALUE INTEGER);
INSERT INTO TEST_TABLE VALUES (1,TO_DATE('01/12/2016'),'prod',1);
INSERT INTO TEST_TABLE VALUES (2,TO_DATE('01/01/2017'),'test',10);
INSERT INTO TEST_TABLE VALUES (3,TO_DATE('01/06/2017'),'test',20);
INSERT INTO TEST_TABLE VALUES (4,TO_DATE('01/12/2017'),'prod',30);
INSERT INTO TEST_TABLE VALUES (5,TO_DATE('15/12/2017'),'test',40);
INSERT INTO TEST_TABLE VALUES (6,TO_DATE('01/01/2018'),'test',50);
INSERT INTO TEST_TABLE VALUES (7,TO_DATE('01/06/2018'),'test',70);
INSERT INTO TEST_TABLE VALUES (8,TO_DATE('01/12/2018'),'prod',60);
COMMIT;
Query:
SELECT
THE_ID, THE_DATE, MAX(RUNNING_GROUP_SUM) OVER (PARTITION BY THE_MATCH_NUMBER) AS GROUP_SUM
FROM TEST_TABLE
MATCH_RECOGNIZE (
ORDER BY THE_ID
MEASURES
MATCH_NUMBER() AS THE_MATCH_NUMBER,
RUNNING SUM(THE_VALUE) AS RUNNING_GROUP_SUM
ALL ROWS PER MATCH
AFTER MATCH SKIP PAST LAST ROW
PATTERN (TEST_TARGET{0,} PROD_TARGET)
DEFINE TEST_TARGET AS THE_TYPE = 'test',
PROD_TARGET AS THE_TYPE = 'prod')
ORDER BY THE_ID ASC;
Result:
THE_ID THE_DATE GROUP_SUM
---------- ---------- ----------
1 01/12/2016 1
2 01/01/2017 60
3 01/06/2017 60
4 01/12/2017 60
5 15/12/2017 220
6 01/01/2018 220
7 01/06/2018 220
8 01/12/2018 220

SQL difference in counter between two dates

I have a table like the below:
ID, MachineID Customer TimeStamp Counter type
1 A ABC 2017-10-25 3:08PM 1952 1
2 A ABC 2017-10-25 3:00PM 1940 1
3 A ABC 2017-10-25 12:05PM 1920 1
4 A ABC 2017-10-25 9:00AM 1900 1
5 B BCD 2017-10-25 3:11PM 1452 1
6 B BCD 2017-10-25 3:10PM 1440 1
7 B BCD 2017-10-25 12:15PM 1420 1
8 B BCD 2017-10-25 9:30AM 1400 1
9 A ABC 2017-10-23 3:08PM 1900 1
10 A ABC 2017-10-23 3:00PM 1840 1
11 A ABC 2017-10-23 12:05PM 1820 1
12 A ABC 2017-10-23 9:00AM 1800 1
13 B BCD 2017-10-23 3:11PM 1399 1
14 B BCD 2017-10-23 3:10PM 1340 1
15 B BCD 2017-10-23 12:15PM 1320 1
16 B BCD 2017-10-23 9:30AM 1300 1
The counter value increases whenever there is a click. I am trying to calculate number of clicks for each day by taking maximum counter value at the end of day and subtract the previous day maximum counter value and so on.
How do I do this in SQL server. Have to repeat this for each customer and Machine
Try this. I am using LAG function in order to achieve this. You can use where clause to filter out specific date you want :
Create table #counter(ID int, timeStamp datetime, Counter int, type int)
insert into #counter values
(1, '20171024 3:08PM' ,1952, 1),
(1, '20171025 3:00PM' ,1964, 1)
Select iq.*, (iq."counter" - iq.yesterday_counter) as today_count
from
(select id,
cast("timestamp" as date) as today_date,
"counter",
LAG("counter") over (order by cast("timestamp" as date)) yesterday_counter
from #counter
) iq
output:
id today_date counter yesterday_counter today_count
----------- ---------- ----------- ----------------- -----------
1 2017-10-24 1952 NULL NULL
1 2017-10-25 1964 1952 12
A SQL query to get the max counter for each day is:
SELECT CAST(timeStamp as date) AS [dateval]
,MAX(Counter) AS [maxCounter]
FROM YOURDATASET
GROUP BY CAST(timeStamp as date)
This is converting the datetime to date- cutting out the time, then taking the max(Counter).
One method to get the difference is to save the result in a temp datastructure, then query it to get the difference.
The question is whether your previous date is exactly the previous day, or if you're skipping days between counts, or taking the weekend off, etc. In that case you have to select the greatest previous date to the date being examined.
ex.
DECLARE #temp TABLE (dateval date, maxCounter int)
INSERT INTO #temp(dateval, maxCounter)
SELECT CAST(timeStamp as date) AS [dateval]
,MAX(Counter)
FROM YOURDATASET
GROUP BY CAST(timeStamp as date)
SELECT T.dateval
,T.dateval
-
(SELECT maxCounter
FROM #temp T2
WHERE T2.dateVal = (SELECT MAX(dateVal)
FROM #temp T3
WHERE T3.dateVal < T1.dateVal
)
) AS [Difference]
FROM #temp T
ORDER BY T.dateval

How to filter first appearance in table only

Here is the table structure:
tblApplicants:
applicantID (index) | ApplyingForYear (nvarchar)
------------------------------------------------------
1 2013/14
11 2013/14
13 2013/14
12 2013/14
15 2013/14
21 2012/13
tblApplicantSchools_shadow:
id (index) | applicantID | updated (datetime) | statusID (int) | schoolID (int)
-----------------------------------------------------------------------------------------------------
1 11 2012-09-24 00:00:00.000 3 2
1 13 2012-10-24 00:00:00.000 4 2
2 15 2012-11-24 00:00:00.000 3 4
3 13 2012-03-24 00:00:00.000 4 3
4 12 2012-09-24 00:00:00.000 4 1
5 21 2012-11-03 00:00:00.000 5 2
6 11 2012-09-04 00:00:00.000 4 4
What I need to do is:
get all applicants, that have an ApplyingForYear of '2013/14' in tblApplicants
have a statusID of 4
I only want to count them once - even if they appear twice or more in tblApplicantschools_show
group the number of distinct applicants (as per the above) - by the updated date column (grouped by week)
So based on the sample data above, there should be 3 rows that come out, (because ApplicantID 13 appears twice and I only want him once).
This is how the result should look:
Datesubmitted TotalAppsPerWeek
-------------------------------------------------------
2012-10-24 00:00:00.000 1
2012-09-24 00:00:00.000 1
2012-09-04 00:00:00.000 1
This is what I have so far - but it results in 4 rows, not 3 :(
select
DATEADD(ww,(DATEDIFF(ww,0,[tblApplicantSchools_shadow].updated)),0) AS Datesubmitted,
count(DISTINCT [tblApplicantSchools_shadow].applicantID) as TotalAppsPerWeek
FROM tblApplicants
INNER JOIN tblApplicantSchools_shadow
ON tblApplicantS.ApplicantID = tblApplicantSchools_shadow.applicantID
WHERE
ApplyingForYear = '2013/14'
AND [tblApplicantSchools_shadow].statusID = 4
GROUP BY
DATEADD(ww, (DATEDIFF(ww, 0, [tblApplicantSchools_shadow].updated)), 0)
And here is a Fiddle: http://sqlfiddle.com/#!3/3aa61/42
From your title, I'm assuming the one row you want from each applicant is the one with the smallest id. You can select one row per applicant ID with the ROW_NUMBER() function:
;with latestApplication AS
(
SELECT DATEADD(ww,(DATEDIFF(ww,0,[tblApplicantSchools_shadow].updated)),0)
AS Datesubmitted,
[tblApplicantSchools_shadow].applicantID,
ROW_NUMBER() OVER (PARTITION BY [tblApplicantSchools_shadow].applicantID
ORDER BY [tblApplicantSchools_shadow].id)
AS rn
FROM tblApplicants
INNER JOIN tblApplicantSchools_shadow
ON tblApplicantS.ApplicantID = tblApplicantSchools_shadow.applicantID
WHERE ApplyingForYear = '2013/14'
AND [tblApplicantSchools_shadow].statusID = 4
)
select Datesubmitted, COUNT(1) AS TotalAppsPerWeek
FROM latestApplication
WHERE rn = 1
group by Datesubmitted
order by Datesubmitted DESC
http://sqlfiddle.com/#!3/3aa61/57