How to create a new SQL table with Mean, Median, and Mode? - sql

Ok, so I am a new to SQL that's why I am asking this question.
I have got a table called: kpi_notification_metrics_per_month
This table has 2 columns:
Date
NotificationCount
I want to create a brand new table that will show
Mean
Median
Mode
For the NotificationCount column.
Example table:
Date NotificationCount
01/04/2018 00:00 0
31/03/2018 00:00 0
25/03/2018 00:00 0
24/03/2018 00:00 0
22/03/2018 00:00 0
18/03/2018 00:00 0
17/03/2018 00:00 0
14/03/2018 00:00 0
11/03/2018 00:00 0
07/04/2018 00:00 1
26/03/2018 00:00 1
21/03/2018 00:00 1
15/03/2018 00:00 1
13/03/2018 00:00 1
12/03/2018 00:00 1
10/03/2018 00:00 1
08/04/2018 00:00 2
30/03/2018 00:00 2
09/03/2018 00:00 2
08/03/2018 00:00 2
20/03/2018 00:00 3
19/03/2018 00:00 4
02/04/2018 00:00 9
23/03/2018 00:00 11
27/03/2018 00:00 22
03/04/2018 00:00 28
28/03/2018 00:00 34
04/04/2018 00:00 39
05/04/2018 00:00 43
29/03/2018 00:00 47
06/04/2018 00:00 50
16/03/2018 00:00 140
Expected results:
Mean Median Mode
13.90625 1 0

Here is how to do this in Oracle:
select
avg(notificationcount) as statistic_mean,
median(notificationcount) as statistic_median,
stats_mode(notificationcount) as statistic_mode
from mytable;
No need for another table. You can (and should) always query the data ad hoc. For convenience you can create a view as jarlh has suggested in the request comments.

Mean: Use Avg()
Select Avg(NotificationCount)
From kpi_notification_metrics_per_month
Median: Order by ASC and DESC for TOP 50 Percent of data, find the middle one.
Select ((
Select Top 1 NotificationCount
From (
Select Top 50 Percent NotificationCount
From kpi_notification_metrics_per_month
Where NotificationCount Is NOT NULL
Order By NotificationCount
) As A
Order By NotificationCountDESC) +
(
Select Top 1 NotificationCount
From (
Select Top 50 Percent NotificationCount
From kpi_notification_metrics_per_month
Where NotificationCount Is NOT NULL
Order By NotificationCount DESC
) As A
Order By NotificationCount Asc)) / 2
Mode: Get counts of each value set and get the top 1 row in DESC order.
SELECT TOP 1 with ties NotificationCount
FROM kpi_notification_metrics_per_month
WHERE NotificationCount IS Not NULL
GROUP BY NotificationCount
ORDER BY COUNT(*) DESC
All worked in Sql Server 2014.
Reference: http://blogs.lessthandot.com/index.php/datamgmt/datadesign/calculating-mean-median-and-mode-with-sq/

Related

Count median days per ID between one zero and the first transaction after the last zero in a running balance

I have a running balance sheet showing customer balances after inflows and (outflows) by date. It looks something like this:
ID DATE AMOUNT RUNNING AMOUNT
-- ---------------- ------- --------------
10 27/06/2019 14:30 100 100
10 29/06/2019 15:26 -100 0
10 03/07/2019 01:56 83 83
10 04/07/2019 17:53 15 98
10 05/07/2019 15:09 -98 0
10 05/07/2019 15:53 98.98 98.98
10 05/07/2019 19:54 -98.98 0
10 07/07/2019 01:36 90.97 90.97
10 07/07/2019 13:02 -90.97 0
10 07/07/2019 16:32 39.88 39.88
10 08/07/2019 13:41 50 89.88
20 08/01/2019 09:03 890.97 890.97
20 09/01/2019 14:47 -91.09 799.88
20 09/01/2019 14:53 100 899.88
20 09/01/2019 14:59 -399 500.88
20 09/01/2019 18:24 311 811.88
20 09/01/2019 23:25 50 861.88
20 10/01/2019 16:18 -861.88 0
20 12/01/2019 16:46 894.49 894.49
20 25/01/2019 05:40 -871.05 23.44
I have attempted using lag() but I seem not to understand how to use it yet.
SELECT ID, MEDIAN(DIFF) MEDIAN_AGE
FROM
(
SELECT *, DATEDIFF(day, Lag(DATE, 1) OVER(ORDER BY ID), DATE
)AS DIFF
FROM TABLE 1
WHERE RUNNING AMOUNT = 0
)
GROUP BY ID;
The expected result would be:
ID MEDIAN_AGE
-- ----------
10 1
20 2
Please help in writing out the query that gives the expected result.
As already pointed out, you are using syntax that isn't valid for Oracle, including functions that don't exist and column names that aren't allowed.
You seem to want to calculate the number of days between a zero running-amount and the following non-zero running-amount; lead() is probably easier than lag() here, and you can use a case expression to only calculate it when needed:
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table;
ID DATE_ AMOUNT RUNNING_AMOUNT DIFF
---------- -------------------- ---------- -------------- ----------
10 2019-06-27 14:30:00 100 100
10 2019-06-29 15:26:00 -100 0 3.4375
10 2019-07-03 01:56:00 83 83
10 2019-07-04 17:53:00 15 98
10 2019-07-05 15:09:00 -98 0 .0305555556
10 2019-07-05 15:53:00 98.98 98.98
10 2019-07-05 19:54:00 -98.98 0 1.2375
10 2019-07-07 01:36:00 90.97 90.97
10 2019-07-07 13:02:00 -90.97 0 .145833333
10 2019-07-07 16:32:00 39.88 39.88
10 2019-07-08 13:41:00 50 89.88
20 2019-01-08 09:03:00 890.97 890.97
20 2019-01-09 14:47:00 -91.09 799.88
20 2019-01-09 14:53:00 100 899.88
20 2019-01-09 14:59:00 -399 500.88
20 2019-01-09 18:24:00 311 811.88
20 2019-01-09 23:25:00 50 861.88
20 2019-01-10 16:18:00 -861.88 0 2.01944444
20 2019-01-12 16:46:00 894.49 894.49
20 2019-01-25 05:40:00 -871.05 23.44
Then use the median() function, rounding if desired to get your expected result:
select id, median(diff) as median_age, round(median(diff)) as median_age_rounded
from (
select id, date_, amount, running_amount,
case when running_amount = 0 then
lead(date_) over (partition by id order by date_) - date_
end as diff
from your_table
)
group by id;
ID MEDIAN_AGE MEDIAN_AGE_ROUNDED
---------- ---------- ------------------
10 .691666667 1
20 2.01944444 2
db<>fiddle

Want to get sum till date for past 30 days

Following is the code I wrote to get the records
SELECT run_time, SUM(rec_cnt) reg_cnt FROM(
select run_time,rec_cnt from
(select TO_DATE(TO_CHAR(LST_UPDT_TIME,'DD-MON-YYYY'),'DD-MON-YYYY') run_time,max(Running_Total) rec_cnt from (
SELECT
LST_UPDT_TIME,
(
SELECT COUNT(*)
FROM DM_REG_SMRY T2
WHERE T2.LST_UPDT_TIME <= T1.LST_UPDT_TIME AND REG_STS_ID = 14
) AS Running_Total
FROM
DM_REG_SMRY T1
order by T1.LST_UPDT_TIME
)
group by TO_DATE(TO_CHAR(LST_UPDT_TIME,'DD-MON-YYYY'),'DD-MON-YYYY')
order by TO_DATE(TO_CHAR(LST_UPDT_TIME,'DD-MON-YYYY'),'DD-MON-YYYY')
)
UNION
(SELECT TRUNC(SYSDATE+1 - ROWNUM) run_time , 0 as rec_cnt FROM DUAL CONNECT BY ROWNUM <= 30)
)GROUP BY run_time
ORDER BY run_time;
I got following output
18-06-2015 00:00 6
19-06-2015 00:00 7
20-06-2015 00:00 0
21-06-2015 00:00 0
22-06-2015 00:00 0
23-06-2015 00:00 0
24-06-2015 00:00 12
25-06-2015 00:00 0
26-06-2015 00:00 0
27-06-2015 00:00 0
28-06-2015 00:00 0
29-06-2015 00:00 0
30-06-2015 00:00 0
01-07-2015 00:00 0
02-07-2015 00:00 0
03-07-2015 00:00 49
04-07-2015 00:00 0
05-07-2015 00:00 0
06-07-2015 00:00 0
07-07-2015 00:00 0
08-07-2015 00:00 0
09-07-2015 00:00 0
10-07-2015 00:00 49
11-07-2015 00:00 0
12-07-2015 00:00 0
13-07-2015 00:00 65
14-07-2015 00:00 77
15-07-2015 00:00 101
16-07-2015 00:00 0
17-07-2015 00:00 0
But I want the last non zero value to be repeated for the zero place
Please help
I'm not 100% sure, but If I understand correctly, you want to count the number of rows in DM_REG_SMRY accumulated in the past 30 days (starting with SYSDATE-(30-1) and ending with today SYSDATE-(1-1)) and with REG_STS_ID=14. And you want to get the accumulated count by date.
This means that if you have on DM_REG_SMRY (with REG_STS_ID=14):
6 rows on 18-06-2015
1 row on 19-06-2015
5 rows on 24-06-2015
37 rows on 03-07-2016
16 rows on 13-07-2015
12 rows on 14-07-2015
24 rows on 15-07-2015
you really want this result:
18-06-2015 00:00 6
19-06-2015 00:00 7
20-06-2015 00:00 7
21-06-2015 00:00 7
22-06-2015 00:00 7
23-06-2015 00:00 7
24-06-2015 00:00 12
25-06-2015 00:00 12
26-06-2015 00:00 12
27-06-2015 00:00 12
28-06-2015 00:00 12
29-06-2015 00:00 12
30-06-2015 00:00 12
01-07-2015 00:00 12
02-07-2015 00:00 12
03-07-2015 00:00 49
04-07-2015 00:00 49
05-07-2015 00:00 49
06-07-2015 00:00 49
07-07-2015 00:00 49
08-07-2015 00:00 49
09-07-2015 00:00 49
10-07-2015 00:00 49
11-07-2015 00:00 49
12-07-2015 00:00 49
13-07-2015 00:00 65
14-07-2015 00:00 77
15-07-2015 00:00 101
16-07-2015 00:00 101
17-07-2015 00:00 101
If this is what you really want then a possible solution is:
SELECT runtime,
(SELECT COUNT(*)
FROM DM_REG_SMRY
WHERE LST_UPDT_TIME >= TRUNC(SYSDATE - (30-1))
AND LST_UPDT_TIME < (runtime+1)
AND REG_STS_ID = 14
) reg_cnt
FROM
(
SELECT TRUNC(SYSDATE - (LEVEL - 1)) runtime
FROM DUAL
CONNECT BY LEVEL <= 30
) dates
ORDER BY runtime;

how to find the date difference in hours between two records with nearest datetime value and it must be compared in same group

How to find the date difference in hours between two records with nearest datetime value and it must be compared in same group?
Sample Data as follows:
Select * from tblGroup
Group FinishedDatetime
1 03-01-2009 00:00
1 13-01-2009 22:00
1 08-01-2009 03:00
2 01-01-2009 10:00
2 13-01-2009 20:00
2 10:01-2009 10:00
3 27-10-2008 00:00
3 29-10-2008 00:00
Expected Output :
Group FinishedDatetime Hours
1 03-01-2009 00:00 123
1 13-01-2009 22:00 139
1 08-01-2009 03:00 117
2 01-01-2009 10:00 216
2 13-01-2009 20:00 82
2 10:01-2009 10:00 82
3 27-10-2008 00:00 48
3 29-10-2008 00:00 48
Try this:
Select t1.[Group], DATEDIFF(HOUR, z.FinishedDatetime, t1.FinishedDatetime)
FROM tblGroup t1
OUTER APPLY(SELECT TOP 1 *
FROM tblGroup t2
WHERE t2.[Group] = t1.[Group] AND t2.FinishedDatetime<t1.FinishedDatetime
ORDER BY FinishedDatetime DESC)z

SQL Top 1 query on multiple columns

I have a script which returns the following table. If I put the script in a subquery and give it a pseudonym, what script would generate the top row by EVENT_DATE for each CARE_ID? This has to be compatible with SQL2000. Thank you.
CARE_ID EVENT_ID EVENT_TYPE EVENT_DATE
3 18 B 13/07/2010 00:00
78 11 C 27/07/2009 00:00
78 9 T 28/07/2009 00:00
151 49 T 21/03/2010 00:00
217 102 C 30/03/2010 00:00
355 111 C 16/07/2010 00:00
355 56 T 17/07/2010 00:00
364 774 C 23/08/2012 00:00
369 117 C 28/07/2010 00:00
631 74 T 15/01/2010 00:00
631 148 C 02/02/2010 00:00
1066 91 T 15/11/2010 00:00
2123 280 T 10/07/2011 00:00
2265 448 C 31/05/2011 00:00
2512 183 B 04/02/2014 00:00
2691 906 C 12/01/2014 00:00
2694 307 T 15/06/2011 00:00
2694 544 C 02/07/2011 00:00
2892 85 B 19/12/2011 00:00
2892 641 C 13/02/2012 00:00
3038 660 C 09/08/2011 00:00
3162 407 T 15/04/2012 00:00
3178 780 C 01/09/2012 00:00
3311 175 B 27/01/2014 00:00
3344 869 C 01/10/2013 00:00
3426 474 T 13/07/2013 00:00
3606 479 T 03/01/2014 00:00
3770 917 C 11/01/2014 00:00
This is somewhat inefficient, but I see no better way to do it in SQL Server 2000:
select
t1.care_id,
t1.event_id,
t1.event_type,
t1.event_date
from TheTable t1
join TheTable t2
on t1.care_id = t2.care_id
and t1.event_date >= t2.event_date
group by
t1.care_id,
t1.event_id,
t1.event_type,
t1.event_date
having count(*) = 1
The query currently returns the most recent record per care_id. If you need the oldest, just change the >= to <=.
SQLFiddle: http://www.sqlfiddle.com/#!3/98536/6
A potential issue with the query above is that if you have two records with the same (latest) event_date, it will return none. Let me know if such cases are possible in your data set.
Try this, assume the earliest date is top row
select x.care_id,min(x.event_date) as FirstDate
from <table> x
group by x.care_id
To get all information, you need a bit more
select x.care_id,a.event_id,a.event_type,x.firstDate as Event_date
from <table> a
join (select b.care_id,min(b.event_date) as FirstDate
from <table> b
group by b.care_id ) x
on a.care_id=x.care_id and a.event_date=x.firstDate
Just type in on the fly, but should get you what you need.
Caveat, if care_id have identical event dates, you might get some duplicate rows.

SQL Server 2000 - Row of data based on closest date

I have two tables as below and I want to return the rows for CARE_ID and WHO_STATUS where the MDT_DATE is the closest date that is <= the earliest SURGERY_DATE for each CARE_ID.
For instance for CARE_ID 5 the closest MDT_DATE which is <= the earliest SURGERY_DATE of 18/07/2009 is 17/07/2009 so the WHO_STATUS would be 2, and so on.
The script below works fine in SQL Server 2005 but it isn't backwards compatible with SQL Server 2000.
How could I rework this script so it will run in SQL Server 2000?
CARE_ID SURGERY_DATE
5 18/07/2009 00:00
5 23/07/2009 00:00
5 23/07/2009 00:00
5 23/07/2009 00:00
5 01/09/2009 00:00
5 03/09/2009 00:00
70 20/07/2009 00:00
70 21/07/2009 00:00
76 03/03/2010 00:00
78 08/07/2009 00:00
81 27/07/2009 00:00
82 27/07/2009 00:00
83 30/07/2009 00:00
86 29/07/2009 00:00
91 30/07/2009 00:00
103 03/08/2009 00:00
106 05/08/2009 00:00
125 07/08/2009 00:00
172 19/05/2010 00:00
CARE_ID MDT_DATE WHO_STATUS
5 17/07/2009 00:00 2
5 03/11/2009 00:00 1
70 23/03/2010 00:00 0
81 03/11/2009 00:00 1
81 18/11/2009 00:00 1
81 27/11/2009 00:00 3
81 27/03/2010 00:00 1
103 03/12/2008 00:00 0
103 04/01/2009 00:00 2
103 06/01/2010 00:00 1
103 08/02/2010 00:00 1
103 14/01/2013 00:00 1
172 20/07/2009 00:00 4
172 08/01/2010 00:00 3
172 25/09/2010 00:00 1
The query (working in SQL Server 2005):
SELECT t1.*,t2.WHO_STATUS
FROM (SELECT ROW_NUMBER() OVER (PARTITION BY CARE_ID ORDER BY SURGERY_DATE) AS Seq,*
FROM Table1)t1
CROSS APPLY(SELECT TOP 1 WHO_STATUS FROM Table2
WHERE CARE_ID = t1.CARE_ID
AND MDT_DATE < = t1.SURGERY_DATE
ORDER BY MDT_DATE DESC)t2
WHERE t1.Seq=1
You can use a correlated subquery for this:
select t1.*,
(select top 1 who_status
from table2 t2
where t2.care_id = t1.care_id and
t2.mdt_date <= t1.surgery_date
order by t2.mdt_date desc
) as who_status
from Table1 t1;
This will also work in SQL Server 2005.