Summarising (permanently) data in a SQL table - sql

Geetings, Stackers.
I have a huge number of data-points in a SQL table, and I want to summarise them in a way reminiscent of RRD.
Assuming a table such as
ID | ENTITY_ID | SCORE_DATE | SCORE | SOME_OTHER_DATA
----+-----------+------------+-------+-----------------
1 | A00000001 | 01/01/2010 | 100 | some data
2 | A00000002 | 01/01/2010 | 105 | more data
3 | A00000003 | 01/01/2010 | 104 | various text
... | ......... | .......... | ..... | ...
... | A00009999 | 01/01/2010 | 101 |
... | A00000001 | 02/01/2010 | 104 |
... | A00000002 | 02/01/2010 | 119 |
... | A00000003 | 02/01/2010 | 119 |
... | ......... | .......... | ..... |
... | A00009999 | 02/01/2010 | 101 | arbitrary data
... | ......... | .......... | ..... | ...
... | A00000001 | 01/02/2010 | 104 |
... | A00000002 | 01/02/2010 | 119 |
... | A00000003 | 01/01/2010 | 119 |
I want to end up with one record per entity, per month:
ID | ENTITY_ID | SCORE_DATE | SCORE |
----+-----------+------------+-------+
... | A00000001 | 01/01/2010 | 100 |
... | A00000002 | 01/01/2010 | 105 |
... | A00000003 | 01/01/2010 | 104 |
... | A00000001 | 01/02/2010 | 100 |
... | A00000002 | 01/02/2010 | 105 |
... | A00000003 | 01/02/2010 | 104 |
(I Don't care about the SOME_OTHER_DATA - I'll pick something - either the first or last record probably.)
What's an easy way of doing this on a regular basis, so that anything in the last calendar month is summarised in this way?
At the moment my plan is kind of:
For each EntityID
For each month
Find average score for all records in given month
Update first record with results of previous step
Delete all records that aren't the first
I can't think of a neat way of doing it though, that doesn't involve lots of updates and iteration.
This can either be done in a SQL Stored Procedure, or it can be incorporated into the .Net app that's generating this data, so the solution doesn't really need to be "one big SQL script", but can be :)
(SQL-2005)

This will give you averages for all of your data:
select ENTITY_ID, year(SCORE_DATE) as Year, month(SCORE_DATE) as Month, avg(SCORE) as Avg
from MyTable
group by ENTITY_ID, year(SCORE_DATE), month(SCORE_DATE)
To restrict to a given month, e.g., last February, you can do:
select ENTITY_ID, year(SCORE_DATE) as Year, month(SCORE_DATE) as Month, avg(SCORE) as Avg
from MyTable
where year(SCORE_DATE) = 2010 and month(SCORE_DATE) = 2
group by ENTITY_ID, year(SCORE_DATE), month(SCORE_DATE)
This version would actually perform better, but the parameters are a little less friendly to deal with:
select ENTITY_ID, year(SCORE_DATE) as Year, month(SCORE_DATE) as Month, avg(SCORE) as Avg
from MyTable
where SCORE_DATE >= '2/1/2010' and SCORE_DATE < '3/1/2010'
group by ENTITY_ID, year(SCORE_DATE), month(SCORE_DATE)
If you want a query that always returns last month's data, you can do this:
select ENTITY_ID, year(SCORE_DATE) as Year, month(SCORE_DATE) as Month, avg(SCORE) as Avg
from MyTable
where year(SCORE_DATE) = year(dateadd(month, -1, getdate())) and month(dateadd(month, -1, getdate())) = 2
group by ENTITY_ID, year(SCORE_DATE), month(SCORE_DATE)
A better-performing version:
select ENTITY_ID, year(SCORE_DATE) as Year, month(SCORE_DATE) as Month, avg(SCORE) as Avg
from MyTable
where SCORE_DATE >= dateadd(month, ((year(getdate()) - 1900) * 12) + month(getdate())-2, 0)
and SCORE_DATE < dateadd(month, ((year(getdate()) - 1900) * 12) + month(getdate())-1, 0)
group by ENTITY_ID, year(SCORE_DATE), month(SCORE_DATE)

Give this a try:
--I am using #table variables here, you will want to use your actual table in place of #YourTable and a #Temptable for #YourTable2, with a PK on ID
SET NOCOUNT ON
DECLARE #YourTable table (ID int,ENTITY_ID char(9),SCORE_DATE datetime,SCORE int ,SOME_OTHER_DATA varchar(100))
DECLARE #YourTable2 table (ID int)
INSERT INTO #YourTable VALUES (1 , 'A00000001','01/01/2010',100,'some data')
INSERT INTO #YourTable VALUES (2 , 'A00000002','01/01/2010',105,'more data')
INSERT INTO #YourTable VALUES (3 , 'A00000003','01/01/2010',104,'various text')
INSERT INTO #YourTable VALUES (4 , 'A00009999','01/01/2010',101,null)
INSERT INTO #YourTable VALUES (5 , 'A00000001','02/01/2010',104,null)
INSERT INTO #YourTable VALUES (6 , 'A00000002','02/01/2010',119,null)
INSERT INTO #YourTable VALUES (7 , 'A00000003','02/01/2010',119,null)
INSERT INTO #YourTable VALUES (8 , 'A00009999','02/01/2010',101,'arbitrary data')
INSERT INTO #YourTable VALUES (9 , 'A00000001','01/02/2010',104,null)
INSERT INTO #YourTable VALUES (10, 'A00000002','01/02/2010',119,null)
INSERT INTO #YourTable VALUES (11, 'A00000003','01/01/2010',119,null)
SET NOCOUNT OFF
SELECT 'BEFORE',* FROM #YourTable ORDER BY ENTITY_ID,SCORE_DATE
UPDATE y
SET SCORE=dt_a.AvgScore
OUTPUT INSERTED.ID --capture all updated rows
INTO #YourTable2
FROM #YourTable y
INNER JOIN (SELECT --get avg score for each ENTITY_ID per month
ENTITY_ID
,AVG(SCORE) as AvgScore
, DATEADD(month,DATEDIFF(month,0,SCORE_DATE),0) AS MonthOf,DATEADD(month,1,DATEADD(month,DATEDIFF(month,0,SCORE_DATE),0)) AS MonthNext
FROM #YourTable
--group by 1st day of current month and 1st day of next month
--so an index can be used when joining derived table to UPDATE table
GROUP BY ENTITY_ID, DATEADD(month,DATEDIFF(month,0,SCORE_DATE),0),DATEADD(month,1,DATEADD(month,DATEDIFF(month,0,SCORE_DATE),0))
) dt_a ON y.ENTITY_ID=dt_a.ENTITY_ID AND y.SCORE_DATE>=dt_a.MonthOf AND y.SCORE_DATE<dt_a.MonthNext
INNER JOIN (SELECT--get first row for each ENTITY_ID per month
ID,ENTITY_ID,SCORE_DATE,SCORE
FROM (SELECT
ID,ENTITY_ID,SCORE_DATE,SCORE
,ROW_NUMBER() OVER(PARTITION BY ENTITY_ID,DATEADD(month,DATEDIFF(month,0,SCORE_DATE),0) ORDER BY ENTITY_ID,SCORE_DATE) AS RowRank
FROM #YourTable
) dt
WHERE dt.RowRank=1
) dt_f ON y.ID=dt_f.ID
DELETE #YourTable
WHERE ID NOT IN (SELECT ID FROM #YourTable2)
SELECT 'AFTER ',* FROM #YourTable ORDER BY ENTITY_ID,SCORE_DATE
OUTPUT:
ID ENTITY_ID SCORE_DATE SCORE SOME_OTHER_DATA
------ ----------- --------- ----------------------- ----------- ----------------------------------------------------------------------------------------------------
BEFORE 1 A00000001 2010-01-01 00:00:00.000 100 some data
BEFORE 9 A00000001 2010-01-02 00:00:00.000 104 NULL
BEFORE 5 A00000001 2010-02-01 00:00:00.000 104 NULL
BEFORE 2 A00000002 2010-01-01 00:00:00.000 105 more data
BEFORE 10 A00000002 2010-01-02 00:00:00.000 119 NULL
BEFORE 6 A00000002 2010-02-01 00:00:00.000 119 NULL
BEFORE 3 A00000003 2010-01-01 00:00:00.000 104 various text
BEFORE 11 A00000003 2010-01-01 00:00:00.000 119 NULL
BEFORE 7 A00000003 2010-02-01 00:00:00.000 119 NULL
BEFORE 4 A00009999 2010-01-01 00:00:00.000 101 NULL
BEFORE 8 A00009999 2010-02-01 00:00:00.000 101 arbitrary data
(11 row(s) affected)
(8 row(s) affected)
(3 row(s) affected)
ID ENTITY_ID SCORE_DATE SCORE SOME_OTHER_DATA
------ ----------- --------- ----------------------- ----------- ----------------------------------------------------------------------------------------------------
AFTER 1 A00000001 2010-01-01 00:00:00.000 102 some data
AFTER 5 A00000001 2010-02-01 00:00:00.000 104 NULL
AFTER 2 A00000002 2010-01-01 00:00:00.000 112 more data
AFTER 6 A00000002 2010-02-01 00:00:00.000 119 NULL
AFTER 3 A00000003 2010-01-01 00:00:00.000 111 various text
AFTER 7 A00000003 2010-02-01 00:00:00.000 119 NULL
AFTER 4 A00009999 2010-01-01 00:00:00.000 101 NULL
AFTER 8 A00009999 2010-02-01 00:00:00.000 101 arbitrary data
(8 row(s) affected)

Related

creating complete historical timeline from overlapping intervals

I have below table which contain a code, from, to and hour. The problem is that i have overlapping dates in the intervals. Instead of it i want to create a complete historical timeline. So whe the code is identical and there is a overlap it should sum the hours like in the desired result.
** table **
+------+-------+--------------------------------------+
| code | from | to | hours |
+------+-------+--------------------------------------+
| 1 | 2013-05-01 | 2013-09-30 | 37 |
| 1 | 2013-05-01 | 2014-02-28 | 10 |
| 1 | 2013-10-01 | 9999-12-31 | 5 |
+------+-------+--------------------------------------+
desired result:
+------+-------+--------------------------------------+
| code | from | to | hours |
+------+-------+--------------------------------------+
| 1 | 2013-05-01 | 2013-09-30 | 47 |
| 1 | 2013-10-01 | 2014-02-28 | 15 |
| 1 | 2014-02-29 | 9999-12-31 | 5 |
+------+-------+--------------------------------------+
Oracle Setup:
CREATE TABLE Table1 ( code, "FROM", "TO", hours ) AS
SELECT 1, DATE '2013-05-01', DATE '2013-09-30', 37 FROM DUAL UNION ALL
SELECT 1, DATE '2013-05-01', DATE '2014-02-28', 10 FROM DUAL UNION ALL
SELECT 1, DATE '2013-10-01', DATE '9999-12-31', 5 FROM DUAL;
Query:
SELECT *
FROM (
SELECT code,
dt AS "FROM",
LEAD( dt ) OVER ( PARTITION BY code ORDER BY dt ASC, value DESC, ROWNUM ) AS "TO",
hours
FROM (
SELECT code,
dt,
SUM( hours * value ) OVER ( PARTITION BY code ORDER BY dt ASC, VALUE DESC ) AS hours,
value
FROM table1
UNPIVOT ( dt FOR value IN ( "FROM" AS 1, "TO" AS -1 ) )
)
)
WHERE "FROM" + 1 < "TO";
Results:
CODE FROM TO HOURS
---- ---------- ---------- -----
1 2013-05-01 2013-09-30 47
1 2013-10-01 2014-02-28 15
1 2014-02-28 9999-12-31 5

Identify two rows with 1 year or more of difference

I have a table called finance that I store all payment of the customer. The main columns are: ID,COSTUMERID,DATEPAID,AMOUNTPAID.
What I need is a list of dates by COSTUMERID with dates of its first payment and any other payment that is grater than 1 year of the last one. Example:
+----+------------+------------+------------+
| ID | COSTUMERID | DATEPAID | AMOUNTPAID |
+----+------------+------------+------------+
| 1 | 1 | 2015-01-10 | 10 |
| 2 | 1 | 2016-01-05 | 30 |
| 2 | 1 | 2017-02-20 | 30 |
| 3 | 2 | 2016-03-15 | 100 |
| 4 | 2 | 2017-02-15 | 100 |
| 5 | 3 | 2017-05-01 | 25 |
+----+------------+------------+------------+
What I expect as result:
+------------+------------+
| COSTUMERID | DATEPAID |
+------------+------------+
| 1 | 2015-01-01 |
| 1 | 2017-02-20 |
| 2 | 2016-03-15 |
| 3 | 2017-05-01 |
+------------+------------+
Costumer 1 have 2 dates: the first one + one more that have more then 1 year after the last one.
I hope I make my self clear.
I think you just want lag():
select t.*
from (select t.*,
lag(datepaid) over (partition by customerid order by datepaid) as prev_datepaid
from t
) t
where prev_datepaid is null or
datepaid > dateadd(year, 1, prev_datepaid);
Gordon's solution is correct, as long as you are only looking at the previous row (previous payment) diff, but I wonder if Antonio is looking for payments greater than one year from the last 1 year payment, in which case this becomes a more complex problem to solve. Take the following example:
CREATE TABLE #Test (
CustomerID smallint
,DatePaid date
,AmountPaid smallint )
INSERT INTO #Test
SELECT 1, '2015-1-10', 10
INSERT INTO #Test
SELECT 1, '2016-1-05', 30
INSERT INTO #Test
SELECT 1, '2017-2-20', 30
INSERT INTO #Test
SELECT 1, '2017-6-30', 50
INSERT INTO #Test
SELECT 1, '2018-3-5', 50
INSERT INTO #Test
SELECT 1, '2018-5-15', 50
INSERT INTO #Test
SELECT 2, '2016-3-15', 100
INSERT INTO #Test
SELECT 2, '2017-6-15', 100
WITH CTE AS (
SELECT
CustomerID
,DatePaid
,LAG(DatePaid) OVER (PARTITION BY CustomerID ORDER BY DatePaid) AS PreviousPaidDate
,AmountPaid
FROM #Test )
SELECT
*
,-DATEDIFF(DAY, DatePaid, PreviousPaidDate) AS DayDiff
,CASE WHEN DATEDIFF(DAY, PreviousPaidDate, DatePaid) >= 365 THEN 1 ELSE 0 END AS Paid
FROM CTE
Row number 5 is > 1 year from the last 1 year payment, but subtracting from previous row doesn't address this. This may or may not matter but I wanted to point it out in case that is what he means.

Group by date and count of entries

I'll make it short, the table looks like this:
| id (int) | registerDate (DATETIME)
|----------|-----------------
| 1 | 2014-07-29 12:00:00
| 2 | 2014-08-01 12:00:00
| 3 | 2014-08-01 12:00:00
| 4 | 2014-08-01 12:00:00
| 5 | 2014-08-02 12:00:00
| 6 | 2014-08-02 12:00:00
| 7 | 2014-08-04 12:00:00
If today is 2014-08-05, I want results like this:
| registerDate (DATETIME) | count (int)
| 2014-08-04 | 1
| 2014-08-03 | 0
| 2014-08-02 | 2
| 2014-08-01 | 1
| 2014-07-31 | 0
| 2014-07-30 | 0
| 2014-07-29 | 1
So I want the count of registered users in the past week (daily).
I tried to find it out on google (unsuccessfully) - however, I hope you can help.
SELECT registerDate, count(registerDate) FROM [TABLE] WHERE
registerDate between (GETDATE()-7) and GETDATE()
group by registerDate
order by registerDate desc
This will take a table like:
2 |1905-06-26 00:00:00.000
4 |2014-08-03 00:00:00.000
5 |2014-08-02 00:00:00.000
1 |2014-08-01 00:00:00.000
3 |2014-07-01 00:00:00.000
6 |2010-07-01 00:00:00.000
7 |2015-07-01 00:00:00.000
8 |2014-08-28 00:00:00.000
9 |2014-08-26 00:00:00.000
10 |2014-08-26 00:00:00.000
And create:
2014-08-28 00:00:00.000 | 1
2014-08-26 00:00:00.000 | 2
The problem with this is it doesn't show the days that are not in the table.
Give me a little more time, I'll have an updated version.
EDIT:
Now the more complex one...
-- Declare how far back you want to go
DECLARE #DAYSBACK int = 6
-- Select into a temptable
select CONVERT(date, registerDate) as RegDate, count(registerDate) as DateCount
INTO #temptable
from Temp where registerDate between (GETDATE()-6) and GETDATE()
group by registerDate order by registerDate desc
-- Check to see if exists if not, insert row
WHILE #DAYSBACK >= 0 BEGIN
IF NOT EXISTS (select top 1 1 from #temptable
where RegDate= CONVERT(date, (GETDATE()-#DAYSBACK))
group by RegDate)
INSERT INTO #temptable values ((GETDATE()-#DAYSBACK), 0)
SET #DAYSBACK = #DAYSBACK -1
END
-- Select what you want
select * from #temptable order by RegDate desc
-- Drop the table you created.
DROP TABLE #temptable
Using the same table as above, it will output:
Register Date | Date Count
--------------------------
2014-08-28 | 1
2014-08-27 | 0
2014-08-26 | 2
2014-08-25 | 0
2014-08-24 | 0
2014-08-23 | 0
2014-08-22 | 0
Try something like this:
select registerDate = convert(date,t.registerDate) ,
registrations = count(*)
from dbo.my_special_registration_table t
where t.registrationDate >= dateadd(day,-7,convert(date,getdate()))
group by convert(date,t.registerDate)
order by 1
If you try to filter out registrations older than 7 days using something like datediff():
where datediff(day,t.registrationDate,getdate()) <= 7
you turned the column registrationDate into an expression. As a result the query optimizer can't make use of any indices that might apply, thus forcing a table scan. If you table is large, performance is likely to be ... suboptimal.

SQL - Grouping with aggregation

I have a table (TABLE1) that lists all employees with their Dept IDs, the date they started and the date they were terminated (NULL means they are current employees).
I would like to have a resultset (TABLE2) , in which every row represents a day starting since the first employee started( in the sample table below, that date is 20090101 ), till today. (the DATE field). I would like to group the employees by DeptID and calculate the total number of employees for each row of TABLE2.
How do I this query? Thanks for your help, in advance.
TABLE1
DeptID EmployeeID StartDate EndDate
--------------------------------------------
001 123 20100101 20120101
001 124 20090101 NULL
001 234 20110101 20120101
TABLE2
DeptID Date EmployeeCount
-----------------------------------
001 20090101 1
001 20090102 1
... ... 1
001 20100101 2
001 20100102 2
... ... 2
001 20110101 3
001 20110102 3
... ... 3
001 20120101 1
001 20120102 1
001 20120103 1
... ... 1
This will work if you have a date look up table. You will need to specify the department ID. See it in action.
Query
SELECT d.dt, SUM(e.ecount) AS RunningTotal
FROM dates d
INNER JOIN
(SELECT b.dt,
CASE
WHEN c.ecount IS NULL THEN 0
ELSE c.ecount
END AS ecount
FROM dates b
LEFT JOIN
(SELECT a.DeptID, a.dt, SUM([count]) AS ecount
FROM
(SELECT DeptID, EmployeeID, 1 AS [count], StartDate AS dt FROM TABLE1
UNION ALL
SELECT DeptID, EmployeeID,
CASE
WHEN EndDate IS NOT NULL THEN -1
ELSE 0
END AS [count], EndDate AS dt FROM TABLE1) a
WHERE a.dt IS NOT NULL AND DeptID = 1
GROUP BY a.DeptID, a.dt) c ON c.dt = b.dt) e ON e.dt <= d.dt
GROUP BY d.dt
Result
| DT | RUNNINGTOTAL |
-----------------------------
| 2009-01-01 | 1 |
| 2009-02-01 | 1 |
| 2009-03-01 | 1 |
| 2009-04-01 | 1 |
| 2009-05-01 | 1 |
| 2009-06-01 | 1 |
| 2009-07-01 | 1 |
| 2009-08-01 | 1 |
| 2009-09-01 | 1 |
| 2009-10-01 | 1 |
| 2009-11-01 | 1 |
| 2009-12-01 | 1 |
| 2010-01-01 | 2 |
| 2010-02-01 | 2 |
| 2010-03-01 | 2 |
| 2010-04-01 | 2 |
| 2010-05-01 | 2 |
| 2010-06-01 | 2 |
| 2010-07-01 | 2 |
| 2010-08-01 | 2 |
| 2010-09-01 | 2 |
| 2010-10-01 | 2 |
| 2010-11-01 | 2 |
| 2010-12-01 | 2 |
| 2011-01-01 | 3 |
| 2011-02-01 | 3 |
| 2011-03-01 | 3 |
| 2011-04-01 | 3 |
| 2011-05-01 | 3 |
| 2011-06-01 | 3 |
| 2011-07-01 | 3 |
| 2011-08-01 | 3 |
| 2011-09-01 | 3 |
| 2011-10-01 | 3 |
| 2011-11-01 | 3 |
| 2011-12-01 | 3 |
| 2012-01-01 | 1 |
Schema
CREATE TABLE TABLE1 (
DeptID tinyint,
EmployeeID tinyint,
StartDate date,
EndDate date)
INSERT INTO TABLE1 VALUES
(1, 123, '2010-01-01', '2012-01-01'),
(1, 124, '2009-01-01', NULL),
(1, 234, '2011-01-01', '2012-01-01')
CREATE TABLE dates (
dt date)
INSERT INTO dates VALUES
('2009-01-01'), ('2009-02-01'), ('2009-03-01'), ('2009-04-01'), ('2009-05-01'),
('2009-06-01'), ('2009-07-01'), ('2009-08-01'), ('2009-09-01'), ('2009-10-01'),
('2009-11-01'), ('2009-12-01'), ('2010-01-01'), ('2010-02-01'), ('2010-03-01'),
('2010-04-01'), ('2010-05-01'), ('2010-06-01'), ('2010-07-01'), ('2010-08-01'),
('2010-09-01'), ('2010-10-01'), ('2010-11-01'), ('2010-12-01'), ('2011-01-01'),
('2011-02-01'), ('2011-03-01'), ('2011-04-01'), ('2011-05-01'), ('2011-06-01'),
('2011-07-01'), ('2011-08-01'), ('2011-09-01'), ('2011-10-01'), ('2011-11-01'),
('2011-12-01'), ('2012-01-01')
you need somthing along these lines.
SELECT *
, ( SELECT COUNT(EmployeeID) AS EmployeeCount
FROM TABLE1 AS f
WHERE t.[Date] BETWEEN f.BeginDate AND f.EndDate
)
FROM ( SELECT DeptID
, BeginDate AS [Date]
FROM TABLE1
UNION
SELECT DeptID
, EndDate AS [Date]
FROM TABLE1
) AS t
EDIT since OP clarified that he wants all the dates here is the updated solution
I have excluded a Emplyee from Count if his job is ending on that date.But if you want to include change t.[Date] < f.EndDate to t.[Date] <= f.EndDate in the below solution. Plus I assume the NULL value in EndDate mean Employee still works for Department.
DECLARE #StartDate DATE = (SELECT MIN(StartDate) FROM Table1)
,#EndDate DATE = (SELECT MAX(EndDate) FROM Table1)
;WITH CTE AS
(
SELECT DISTINCT DeptID,#StartDate AS [Date] FROM Table1
UNION ALL
SELECT c.DeptID, DATEADD(dd,1,c.[Date]) AS [Date] FROM CTE AS c
WHERE c.[Date]<=#EndDate
)
SELECT * ,
EmployeeCount=( SELECT COUNT(EmployeeID)
FROM TABLE1 AS f
WHERE f.DeptID=t.DeptID AND t.[Date] >= f.StartDate
AND ( t.[Date] < f.EndDate OR f.EndDate IS NULL )
)
FROM CTE AS t
ORDER BY 1
OPTION ( MAXRECURSION 0 )
here is SQL Fiddler demo.I have added another department and added an Employee to it.
http://sqlfiddle.com/#!3/5c4ec/1

Finding simultaneous events in a database between times

I have a database that stores phone call records. Each phone call record has a start time and an end time. I want to find out what is the maximum amount of phone calls that are simultaneously happening in order to know if we have exceed the amount of available phone lines in our phone bank. How could I go about solving this problem?
Disclaimer: I'm writing my answer based on the (excelent) following post:
https://www.itprotoday.com/sql-server/calculating-concurrent-sessions-part-3 (Part1 and 2 are recomended also)
The first thing to understand here with that problem is that most of the current solutions found in the internet can have basically two issues
The result is not the correct answer (for example if range A overlaps with B and C but B dosen't overlaps with C they count as 3 overlapping ranges).
The way to compute it is very innefficient (because is O(n^2) and / or they cicle for each second in the period)
The common performance problem in solutions like the proposed by Unreasons is a cuadratic solution, for each call you need to check all the other calls if they are overlaped.
there is an algoritmical linear common solution that is list all the "events" (start call and end call) ordered by date, and add 1 for a start and substract 1 for a hang-up, and remember the max. That can be implemented easily with a cursor (solution proposed by Hafhor seems to be in that way) but cursors are not the most efficient ways to solve problems.
The referenced article has excelent examples, differnt solutions, performance comparison of them. The proposed solution is:
WITH C1 AS
(
SELECT starttime AS ts, +1 AS TYPE,
ROW_NUMBER() OVER(ORDER BY starttime) AS start_ordinal
FROM Calls
UNION ALL
SELECT endtime, -1, NULL
FROM Calls
),
C2 AS
(
SELECT *,
ROW_NUMBER() OVER( ORDER BY ts, TYPE) AS start_or_end_ordinal
FROM C1
)
SELECT MAX(2 * start_ordinal - start_or_end_ordinal) AS mx
FROM C2
WHERE TYPE = 1
Explanation
suppose this set of data
+-------------------------+-------------------------+
| starttime | endtime |
+-------------------------+-------------------------+
| 2009-01-01 00:02:10.000 | 2009-01-01 00:05:24.000 |
| 2009-01-01 00:02:19.000 | 2009-01-01 00:02:35.000 |
| 2009-01-01 00:02:57.000 | 2009-01-01 00:04:04.000 |
| 2009-01-01 00:04:12.000 | 2009-01-01 00:04:52.000 |
+-------------------------+-------------------------+
This is a way to implement with a query the same idea, adding 1 for each starting of a call and substracting 1 for each ending.
SELECT starttime AS ts, +1 AS TYPE,
ROW_NUMBER() OVER(ORDER BY starttime) AS start_ordinal
FROM Calls
this part of the C1 CTE will take each starttime of each call and number it
+-------------------------+------+---------------+
| ts | TYPE | start_ordinal |
+-------------------------+------+---------------+
| 2009-01-01 00:02:10.000 | 1 | 1 |
| 2009-01-01 00:02:19.000 | 1 | 2 |
| 2009-01-01 00:02:57.000 | 1 | 3 |
| 2009-01-01 00:04:12.000 | 1 | 4 |
+-------------------------+------+---------------+
Now this code
SELECT endtime, -1, NULL
FROM Calls
Will generate all the "endtimes" without row numbering
+-------------------------+----+------+
| endtime | | |
+-------------------------+----+------+
| 2009-01-01 00:02:35.000 | -1 | NULL |
| 2009-01-01 00:04:04.000 | -1 | NULL |
| 2009-01-01 00:04:52.000 | -1 | NULL |
| 2009-01-01 00:05:24.000 | -1 | NULL |
+-------------------------+----+------+
Now making the UNION to have the full C1 CTE definition, you will have both tables mixed
+-------------------------+------+---------------+
| ts | TYPE | start_ordinal |
+-------------------------+------+---------------+
| 2009-01-01 00:02:10.000 | 1 | 1 |
| 2009-01-01 00:02:19.000 | 1 | 2 |
| 2009-01-01 00:02:57.000 | 1 | 3 |
| 2009-01-01 00:04:12.000 | 1 | 4 |
| 2009-01-01 00:02:35.000 | -1 | NULL |
| 2009-01-01 00:04:04.000 | -1 | NULL |
| 2009-01-01 00:04:52.000 | -1 | NULL |
| 2009-01-01 00:05:24.000 | -1 | NULL |
+-------------------------+------+---------------+
C2 is computed sorting and numbering C1 with a new column
C2 AS
(
SELECT *,
ROW_NUMBER() OVER( ORDER BY ts, TYPE) AS start_or_end_ordinal
FROM C1
)
+-------------------------+------+-------+--------------+
| ts | TYPE | start | start_or_end |
+-------------------------+------+-------+--------------+
| 2009-01-01 00:02:10.000 | 1 | 1 | 1 |
| 2009-01-01 00:02:19.000 | 1 | 2 | 2 |
| 2009-01-01 00:02:35.000 | -1 | NULL | 3 |
| 2009-01-01 00:02:57.000 | 1 | 3 | 4 |
| 2009-01-01 00:04:04.000 | -1 | NULL | 5 |
| 2009-01-01 00:04:12.000 | 1 | 4 | 6 |
| 2009-01-01 00:04:52.000 | -1 | NULL | 7 |
| 2009-01-01 00:05:24.000 | -1 | NULL | 8 |
+-------------------------+------+-------+--------------+
And there is where the magic occurs, at any time the result of #start - #ends is the amount of cocurrent calls at this moment.
for each Type = 1 (start event) we have the #start value in the 3rd column. and we also have the #start + #end (in the 4th column)
#start_or_end = #start + #end
#end = (#start_or_end - #start)
#start - #end = #start - (#start_or_end - #start)
#start - #end = 2 * #start - #start_or_end
so in SQL:
SELECT MAX(2 * start_ordinal - start_or_end_ordinal) AS mx
FROM C2
WHERE TYPE = 1
In this case with the prposed set of calls, the result is 2.
In the proposed article, there is a little improvment to have a grouped result by for example a service or a "phone company" or "phone central" and this idea can also be used to group for example by time slot and have the maximum concurrency hour by hour in a given day.
Given the fact that the maximum number of connections is going to be a StartTime points, you can
SELECT TOP 1 count(*) as CountSimultaneous
FROM PhoneCalls T1, PhoneCalls T2
WHERE
T1.StartTime between T2.StartTime and T2.EndTime
GROUP BY
T1.CallID
ORDER BY CountSimultaneous DESC
The query will return for each call the number of simultaneous calls. Either order them descending and select first one or SELECT MAX(CountSimultaneous) from the above (as subquery without ordering and without TOP).
try this:
DECLARE #Calls table (callid int identity(1,1), starttime datetime, endtime datetime)
INSERT #Calls (starttime,endtime) values ('6/12/2010 10:10am','6/12/2010 10:15am')
INSERT #Calls (starttime,endtime) values ('6/12/2010 11:10am','6/12/2010 10:25am')
INSERT #Calls (starttime,endtime) values ('6/12/2010 12:10am','6/12/2010 01:15pm')
INSERT #Calls (starttime,endtime) values ('6/12/2010 11:10am','6/12/2010 10:35am')
INSERT #Calls (starttime,endtime) values ('6/12/2010 12:10am','6/12/2010 12:15am')
INSERT #Calls (starttime,endtime) values ('6/12/2010 10:10am','6/12/2010 10:15am')
DECLARE #StartDate datetime
,#EndDate datetime
SELECT #StartDate='6/12/2010'
,#EndDate='6/13/2010'
;with AllDates AS
(
SELECT #StartDate AS DateOf
UNION ALL
SELECT DATEADD(second,1,DateOf) AS DateOf
FROM AllDates
WHERE DateOf<#EndDate
)
SELECT
a.DateOf,COUNT(c.callid) AS CountOfCalls
FROM AllDates a
INNER JOIN #Calls c ON a.DateOf>=c.starttime and a.DateOf<=c.endtime
GROUP BY a.DateOf
ORDER BY 2 DESC
OPTION (MAXRECURSION 0)
OUTPUT:
DateOf CountOfCalls
----------------------- ------------
2010-06-12 10:10:00.000 3
2010-06-12 10:10:01.000 3
2010-06-12 10:10:02.000 3
2010-06-12 10:10:03.000 3
2010-06-12 10:10:04.000 3
2010-06-12 10:10:05.000 3
2010-06-12 10:10:06.000 3
2010-06-12 10:10:07.000 3
2010-06-12 10:10:08.000 3
2010-06-12 10:10:09.000 3
2010-06-12 10:10:10.000 3
2010-06-12 10:10:11.000 3
2010-06-12 10:10:12.000 3
2010-06-12 10:10:13.000 3
2010-06-12 10:10:14.000 3
2010-06-12 10:10:15.000 3
2010-06-12 10:10:16.000 3
2010-06-12 10:10:17.000 3
2010-06-12 10:10:18.000 3
2010-06-12 10:10:19.000 3
2010-06-12 10:10:20.000 3
2010-06-12 10:10:21.000 3
2010-06-12 10:10:22.000 3
2010-06-12 10:10:23.000 3
2010-06-12 10:10:24.000 3
2010-06-12 10:10:25.000 3
2010-06-12 10:10:26.000 3
2010-06-12 10:10:27.000 3
....
add a TOP 1 or put this query in a derived table and further aggergate it if necessary.
SELECT COUNT(*) FROM calls
WHERE '2010-06-15 15:00:00' BETWEEN calls.starttime AND calls.endtime
and repeat this for every second.
The only practical method I can think of is as follows:
Split the period you want to analyze in arbitrary "buckets", say, 24 1-hour buckets over the day. For each Bucket count how many calls either started or finished between the start or the end of the interval
Note that the 1-hour limit is not a hard-and-fast rule. You could make this shorter or longer, depending on how precise you want the calculation to be.
You could make the actual "length" of the bucket a function of the average call duration.
So, let's assume that your average call is 3 minutes. If it is not too expensive in terms of calculations, use buckets that are 3 times longer than your average call (9 minutes) this should be granular enough to give precise results.
-- assuming calls table with columns starttime and endtime
declare #s datetime, #e datetime;
declare #t table(d datetime);
declare c cursor for select starttime,endtime from calls order by starttime;
open c
while(1=1) begin
fetch next from c into #s,#e
if ##FETCH_STATUS<>0 break;
update top(1) #t set d=#e where d<=#s;
if ##ROWCOUNT=0 insert #t(d) values(#e);
end
close c
deallocate c
select COUNT(*) as MaxConcurrentCalls from #t