I have a hospital database which looks something like this
id | patient_name | admitDate | DischargeDate |RoomCategory
1 | john |3/01/2011 | 5/01/2011 |Category1
2 | lisa |3/01/2011 | 4/01/2011 |Category2
3 | ron |5/01/2011 | 10/01/2011 |Category1
4 | howard |6/01/2012 | 10/01/2012 |Category3
5 | john |6/05/2011 | 7/05/2011 |Category4
6 | rammy |6/02/2011 | 7/03/2011 |Category4
I have to calculate the number of patients in hospital on each day (both admit and discharge date to be counted) and group them by category
Suppose on 3/01/2011 we have 2 patients, one in category 1 and one in category 2 on 4/01/2011 we again have same 2 patients but on 5/01/2011 lisa (id 2) is discharged so we only have 1 patient from category 1 but now ron (id 3) is also admitted so now we also have to count him.
The output should look something like this
Date | Category1 | Category2 | Category3 |Category4
3/01/2011 | 1 | 1 | 0 | 0
4/01/2011 | 1 | 1 | 0 | 0
5/01/2011 | 2 | 0 | 0 | 0
I am not able to figure out how to list all the dates which might have a patient, because the actual table is huge and a lot of dates don't have any patient. I also am not able to get how will I count distinctively to get count under each category.
I have 15 categories in total in my actual table so using where for each one of them separately wouldn't be very efficient.
You have 2 problems here. 1 you need a calendar table, and then 2 a pivot. I suggest, if I am honest, you invest in creating a calendar table firstly, but I use an inline one here. Then you can use pivoting to convert the values to columns. I use conditional aggregation here, as it is transferable and less restrictive.
SELECT *
INTO dbo.YourTable
FROM (VALUES(1,'john ',CONVERT(date,'3/01/2011'),CONVERT(date,'5/01/2011 '),'Category1'),
(2,'lisa ',CONVERT(date,'3/01/2011'),CONVERT(date,'4/01/2011 '),'Category2'),
(3,'ron ',CONVERT(date,'5/01/2011'),CONVERT(date,'10/01/2011'),'Category1'),
(4,'howard',CONVERT(date,'6/01/2012'),CONVERT(date,'10/01/2012'),'Category3'),
(5,'john ',CONVERT(date,'6/05/2011'),CONVERT(date,'7/05/2011 '),'Category4'),
(6,'rammy ',CONVERT(date,'6/02/2011'),CONVERT(date,'7/03/2011 '),'Category4'))V(id,patient_name,admitDate,DischargeDate,RoomCategory)
GO
WITH N AS(
SELECT N
FROM (VALUES(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL),(NULL))N(N)),
Tally AS(
SELECT 0 AS I
UNION ALL
SELECT TOP (SELECT DATEDIFF(DAY, MIN(admitDate), MAX(DischargeDate)) FROM dbo.YourTable)
ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS I
FROM N N1, N N2, N N3), --UP to 1000 days. Add more cross joins for more days
Calendar AS(
SELECT DATEADD(DAY, T.I, YT.MinAdmitDate) AS D
FROM Tally T
CROSS APPLY (SELECT MIN(admitDate) AS MinAdmitDate FROM dbo.YourTable) YT)
SELECT C.D AS [Date],
COUNT(CASE YT.RoomCategory WHEN 'Category1' THEN 1 END) AS Category1,
COUNT(CASE YT.RoomCategory WHEN 'Category2' THEN 1 END) AS Category2,
COUNT(CASE YT.RoomCategory WHEN 'Category3' THEN 1 END) AS Category3,
COUNT(CASE YT.RoomCategory WHEN 'Category4' THEN 1 END) AS Category4
FROM Calendar C
LEFT JOIN dbo.YourTable YT ON C.D >= YT.admitDate
AND C.D <= DischargeDate
GROUP BY C.D;
GO
DROP TABLE dbo.YourTable;
db<>fiddle Note that that results might not be what you expect as DB Fiddle defaults to American, and you provide an ambiguous date format and I don't provide an explicit style in the CONVERT functions.
Related
My table looks like that:
ID | Start | End
1 | 2010-01-02 | 2010-01-04
1 | 2010-01-22 | 2010-01-24
1 | 2011-01-31 | 2011-02-02
2 | 2012-05-02 | 2012-05-08
3 | 2013-01-02 | 2013-01-03
4 | 2010-09-15 | 2010-09-20
4 | 2010-09-30 | 2010-10-05
I'm looking for a way to count the number of occurrences for each ID in a Year per Month.
But what is important, If some record has a Start date in the following month compared to the End date (of course from the same year) then occurrence should be counted for both months [e.g. ID 1 in the 3rd row has a situation like that. So in this situation, the occurrence for this ID should be +1 for January and +1 for February].
So I'd like to have it in this way:
Year | Month | Id | Occurrence
2010 | 01 | 1 | 2
2010 | 09 | 4 | 2
2010 | 10 | 4 | 1
2011 | 01 | 1 | 1
2011 | 02 | 1 | 1
2012 | 05 | 2 | 1
2013 | 01 | 3 | 1
I created only this for now...
CREATE TABLE IF NOT EXISTS counts AS
(SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source)
And I don't know how to move with that further. I'd appreciate your help.
I'm using Spark SQL.
Try the following strategy to achieve this:
Note:
I have created few intermediate tables. If you wish you can use sub-query or CTE depending on the permissions
I have taken care of 2 scenarios you mentioned (whether to count it as 1 occurrence or 2 occurrence) as you explained
Query:
Firstly, creating a table with flags to decide whether start and end date are falling on same year and month (1 means YES, 2 means NO):
/* Creating a table with flags whether to count the occurrences once or twice */
CREATE TABLE flagged as
(
SELECT *,
CASE
WHEN Year_st = Year_end and Month_st = Month_end then 1
WHEN Year_st = Year_end and Month_st <> Month_end then 2
Else 0
end as flag
FROM
(
SELECT
id,
YEAR (CAST(Start AS DATE)) AS Year_St,
MONTH (CAST(Start AS DATE)) AS Month_St,
YEAR (CAST(End AS DATE)) AS Year_End,
MONTH (CAST(End AS DATE)) AS Month_End
FROM source
) as calc
)
Now the flag in the above table will have 1 if year and month are same for start and end 2 if month differs. You can have more categories of flag if you have more scenarios.
Secondly, counting the occurrences for flag 1. As we know year and month are same for flag 1, we can take either of it. I have taken start:
/* Counting occurrences only for flag 1 */
CREATE TABLE flg1 as (
SELECT distinct id, year_st, month_st, count(*) as occurrence
FROM flagged
where flag=1
GROUP BY id, year_st, month_st
)
Similarly, counting the occurrences for flag 2. Since month differs for both the dates, we can UNION them before counting to get both the dates in same column:
/* Counting occurrences only for flag 2 */
CREATE TABLE flg2 as
(
SELECT distinct id, year_dt, month_dt, count(*) as occurrence
FROM
(
select ID, year_st as year_dt, month_st as month_dt FROM flagged where flag=2
UNION
SELECT ID, year_end as year_dt, month_end as month_dt FROM flagged where flag=2
) as unioned
GROUP BY id, year_dt, month_dt
)
Finally, we just have to SUM the occurrences from both the flags. Note that we use UNION ALL here to combine both the tables. This is very important because we need to count duplicates as well:
/* UNIONING both the final tables and summing the occurrences */
SELECT distinct year, month, id, SUM(occurrence) as occurrence
FROM
(
SELECT distinct id, year_st as year, month_st as month, occurrence
FROM flg1
UNION ALL
SELECT distinct id, year_dt as year, month_dt as month, occurrence
FROM flg2
) as fin_unioned
GROUP BY id, year, month
ORDER BY year, month, id, occurrence desc
Output of above query will be your expected output. I know this is not an optimized one, yet it works perfect. I will update if I come across optimized strategy. Comment if you have question.
db<>fiddle link here
Not sure if this works in Spark SQL.
But if the ranges aren't bigger than 1 month, then just add the extra to the count via a UNION ALL.
And the extra are those with the end in a higher month than the start.
SELECT YearOcc, MonthOcc, Id
, COUNT(*) as Occurrence
FROM
(
SELECT Id
, YEAR(CAST(Start AS DATE)) as YearOcc
, MONTH(CAST(Start AS DATE)) as MonthOcc
FROM source
UNION ALL
SELECT Id
, YEAR(CAST(End AS DATE)) as YearOcc
, MONTH(CAST(End AS DATE)) as MonthOcc
FROM source
WHERE MONTH(CAST(Start AS DATE)) < MONTH(CAST(End AS DATE))
) q
GROUP BY YearOcc, MonthOcc, Id
ORDER BY YearOcc, MonthOcc, Id
YearOcc | MonthOcc | Id | Occurrence
------: | -------: | -: | ---------:
2010 | 1 | 1 | 2
2010 | 9 | 4 | 2
2010 | 10 | 4 | 1
2011 | 1 | 1 | 1
2011 | 2 | 1 | 1
2012 | 5 | 2 | 1
2013 | 1 | 3 | 1
db<>fiddle here
I know the question is poorly worded, I'm sorry, I can't really put this problem into words. Here is a representation:
I have two tables: product and availability. A product can have multiple dates when it's available. Example:
Table 1 (products):
id | name | ....
----------------------------------
1 | My product 1 | ....
2 | My product 2 | ....
Table 2 (availability):
id | productId | date
-----------------------------------------
1 | 1 | 2021-01-15
2 | 1 | 2021-01-16
3 | 1 | 2021-01-17
4 | 2 | 2021-01-15
5 | 2 | 2021-01-16
Is there an sql statement that, given an interval, allows us to fetch a list of products having a row in the availabilty table for each element of the interval?
For example, given the interval [2021-01-15 -> 2021-01-17], the request should return product 1 because it's available during the entire period (it has a row for each element: the 15th, 16th and 17th). Product2 isn't returned because it's not available on 2021-01-17.
Is there a way to do this in SQL or do I have to use PL/SQL?
Any help is appreciated,
Thanks
You can use analytical function as follows:
select p.* from
(select p.*, count(distinct a.date) over (partition by a.productid) as cnt
from products p
join availability a on a.productid = p.id
where a.date >= date '201-01-15'
and a.date < date '201-01-17' + 1 )
where cnt = date '201-01-17' - date '201-01-15' + 1
Finally, came up with this, thanks #Popeye for the inspiration.
select occurence.pid from
(
select a.product_id as pid, count(distinct a.date::date) as cnt
from availability a
where a.date >= '2021-01-15'
and a.date < '2021-01-17'::date + 1
group by a.product_id
) as occurence
where cnt = '2021-01-17'::date - '2021-01-15'::date + 1;
I have a table which consists of dates and names. I want to group the result by names and dates with a condition that the resultant dates selected are at least 10 days apart. (starting from first date present in the table for that name)
This is an example:
________________________
Names | Dates
-----------------------
John | 2-2-2000
________________________
John | 5-2-2000
________________________
John | 16-2-2000
________________________
John | 17-2-2000
________________________
John | 20-2-2000
________________________
John | 31-2-2000
________________________
John | 5-3-2000
________________________
John | 14-3-2000
________________________
The output of the query should be the sum of count of these values (John,2-2-2000),(John,16-2-2000),(John,31-2-2000),(John,14-3-2000) That is, 4.
How do I write a query in SQL Server for this?
This is a bit tricky, because you need to keep track of the last row that was "picked" to select the next one. This means that you need to kind of iterative process, which in turns suggests a recursive query:
with
data as (
select t.*, row_number() over(partition by names order by dates) rn
from mytable t
),
rcte as (
select d.*, dates dates_base from data d where rn = 1
union all
select
d.*,
case when d.dates >= dateadd(day, 10, r.dates_base) then d.dates else r.dates_base end
from rcte r
inner join data d on d.rn = r.rn + 1 and d.names = r.names
)
select names, count(distinct dates_base) res from rcte group by names
Demo on DB Fiddlde:
names | res
:---- | --:
John | 4
Your question is unclear. Also consistent with your desired results is that you want to count rows where the gap from the previous row is 10+ days. For that, simply use lag():
select count(*)
from (select t.*,
lag(date) over (partition by name) as prev_date
from t
) t
where prev_date is null or prev_date < dateadd(day, -10, date);
Use select * to get the list of records.
In Teradata SQL how to assign same row numbers for the group of records created with in 8 seconds of time Interval.
Example:-
Customerid Customername Itembought dateandtime
(yyy-mm-dd hh:mm:ss)
100 ALex Basketball 2017-02-10 10:10:01
100 ALex Circketball 2017-02-10 10:10:06
100 ALex Baseball 2017-02-10 10:10:08
100 ALex volleyball 2017-02-10 10:11:01
100 ALex footbball 2017-02-10 10:11:05
100 ALex ringball 2017-02-10 10:11:08
100 Alex football 2017-02-10 10:12:10
My Expected result shoud have additional column with Row_number where it should assign the same number for all the purchases of the customer with in 8 seconds: Refer the below expected result
Customerid Customername Itembought dateandtime Row_number
(yyy-mm-dd hh:mm:ss)
100 ALex Basketball 2017-02-10 10:10:01 1
100 ALex Circketball 2017-02-10 10:10:06 1
100 ALex Baseball 2017-02-10 10:10:08 1
100 ALex volleyball 2017-02-10 10:11:01 2
100 ALex footbball 2017-02-10 10:11:05 2
100 ALex ringball 2017-02-10 10:11:08 2
100 Alex football 2017-02-10 10:12:10 3
This is one way to do it with a recursive cte. Reset the running total of difference from the previous row's timestamp when it gets > 8 to 0 and start a new group.
WITH ROWNUMS AS
(SELECT T.*
,ROW_NUMBER() OVER(PARTITION BY ID ORDER BY TM) AS RNUM
/*Replace DATEDIFF with Teradata specific function*/
,DATEDIFF(SECOND,COALESCE(MIN(TM) OVER(PARTITION BY ID
ORDER BY TM ROWS BETWEEN 1 PRECEDING AND CURRENT ROW), TM),TM) AS DIFF
FROM T --replace this with your tablename and add columns as required
)
,RECURSIVE CTE(ID,TM,DIFF,SUM_DIFF,RNUM,GRP) AS
(SELECT ID,
TM,
DIFF,
DIFF,
RNUM,
CAST(1 AS int)
FROM ROWNUMS
WHERE RNUM=1
UNION ALL
SELECT T.ID,
T.TM,
T.DIFF,
CASE WHEN C.SUM_DIFF+T.DIFF > 8 THEN 0 ELSE C.SUM_DIFF+T.DIFF END,
T.RNUM,
CAST(CASE WHEN C.SUM_DIFF+T.DIFF > 8 THEN T.RNUM ELSE C.GRP END AS int)
FROM CTE C
JOIN ROWNUMS T ON T.RNUM=C.RNUM+1 AND T.ID=C.ID
)
SELECT ID,
TM,
DENSE_RANK() OVER(PARTITION BY ID ORDER BY GRP) AS row_num
FROM CTE
Demo in SQL Server
I am going to interpret the problem differently from vkp. Any row within 8 seconds of another row should be in the same group. Such values can chain together, so the overall span can be more than 8 seconds.
The advantage of this method is that recursive CTEs are not needed, so it should be faster. (Of course, this is not an advantage if the OP does not agree with the definition.)
The basic idea is to look at the previous date/time value; if it is more than 8 seconds away, then add a flag. The cumulative sum of the flag is the row number you are looking for.
select t.*,
sum(case when prev_dt >= dateandtime - interval '8' second
then 0 else 1
end) over (partition by customerid order by dateandtime
) as row_number
from (select t.*,
max(dateandtime) over (partition by customerid order by dateandtime row between 1 preceding and 1 preceding) as prev_dt
from t
) t;
Using Teradata's PERIOD data type and the awesome td_normalize_overlap_meet:
Consider table test32:
SELECT * FROM test32
+----+----+------------------------+
| f1 | f2 | f3 |
+----+----+------------------------+
| 1 | 2 | 2017-05-11 03:59:00 PM |
| 1 | 3 | 2017-05-11 03:59:01 PM |
| 1 | 4 | 2017-05-11 03:58:58 PM |
| 1 | 5 | 2017-05-11 03:59:26 PM |
| 1 | 2 | 2017-05-11 03:59:28 PM |
| 1 | 2 | 2017-05-11 03:59:46 PM |
+----+----+------------------------+
The following will group your records:
WITH
normalizedCTE AS
(
SELECT *
FROM TABLE
(
td_normalize_overlap_meet(NEW VARIANT_TYPE(periodCTE.f1), periodCTE.fper)
RETURNS (f1 integer, fper PERIOD(TIMESTAMP(0)), recordCount integer)
HASH BY f1
LOCAL ORDER BY f1, fper
) as output(f1, fper, recordcount)
),
periodCTE AS
(
SELECT f1, f2, f3, PERIOD(f3, f3 + INTERVAL '9' SECOND) as fper FROM test32
)
SELECT t2.f1, t2.f2, t2.f3, t1.fper, DENSE_RANK() OVER (PARTITION BY t2.f1 ORDER BY t1.fper) as fgroup
FROM normalizedCTE t1
INNER JOIN periodCTE t2 ON
t1.fper P_INTERSECT t2.fper IS NOT NULL
Results:
+----+----+------------------------+-------------+
| f1 | f2 | f3 | fgroup |
+----+----+------------------------+-------------+
| 1 | 2 | 2017-05-11 03:59:00 PM | 1 |
| 1 | 3 | 2017-05-11 03:59:01 PM | 1 |
| 1 | 4 | 2017-05-11 03:58:58 PM | 1 |
| 1 | 5 | 2017-05-11 03:59:26 PM | 2 |
| 1 | 2 | 2017-05-11 03:59:28 PM | 2 |
| 1 | 2 | 2017-05-11 03:59:46 PM | 3 |
+----+----+------------------------+-------------+
A Period in Teradata is a special data type that holds a date or datetime range. The first parameter is the start of the range and the second is the ending time (up to, but not including which is why it's "+ 9 seconds"). The result is that we get a 8 second time "Period" where each record might "intersect" with another record.
We then use td_normalize_overlap_meet to merge records that intersect, sharing the f1 field's value as the key. In your case that would be customerid. The result is three records for this one customer since we have three groups that "overlap" or "meet" each other's time periods.
We then join the td_normalize_overlap_meet output with the output from when we determined the periods. We use the P_INTERSECT function to see which periods from the normalized CTE INTERSECT with the periods from the initial Period CTE. From the result of that P_INTERSECT join we grab the values we need from each CTE.
Lastly, Dense_Rank() gives us a rank based on the normalized period for each group.
I want to know how to use loops to fill in missing dates with value zero based on the start/end dates by groups in sql so that i have consecutive time series in each group. I have two questions.
how to loop for each group?
How to use start/end dates for each group to dynamically fill in missing dates?
My input and expected output are listed as below.
Input: I have a table A like
date value grp_no
8/06/12 1 1
8/08/12 1 1
8/09/12 0 1
8/07/12 2 2
8/08/12 1 2
8/12/12 3 2
Also I have a table B which can be used to left join with A to fill in missing dates.
date
...
8/05/12
8/06/12
8/07/12
8/08/12
8/09/12
8/10/12
8/11/12
8/12/12
8/13/12
...
How can I use A and B to generate the following output in sql?
Output:
date value grp_no
8/06/12 1 1
8/07/12 0 1
8/08/12 1 1
8/09/12 0 1
8/07/12 2 2
8/08/12 1 2
8/09/12 0 2
8/10/12 0 2
8/11/12 0 2
8/12/12 3 2
Please send me your code and suggestion. Thank you so much in advance!!!
You can do it like this without loops
SELECT p.date, COALESCE(a.value, 0) value, p.grp_no
FROM
(
SELECT grp_no, date
FROM
(
SELECT grp_no, MIN(date) min_date, MAX(date) max_date
FROM tableA
GROUP BY grp_no
) q CROSS JOIN tableb b
WHERE b.date BETWEEN q.min_date AND q.max_date
) p LEFT JOIN TableA a
ON p.grp_no = a.grp_no
AND p.date = a.date
The innermost subquery grabs min and max dates per group. Then cross join with TableB produces all possible dates within the min-max range per group. And finally outer select uses outer join with TableA and fills value column with 0 for dates that are missing in TableA.
Output:
| DATE | VALUE | GRP_NO |
|------------|-------|--------|
| 2012-08-06 | 1 | 1 |
| 2012-08-07 | 0 | 1 |
| 2012-08-08 | 1 | 1 |
| 2012-08-09 | 0 | 1 |
| 2012-08-07 | 2 | 2 |
| 2012-08-08 | 1 | 2 |
| 2012-08-09 | 0 | 2 |
| 2012-08-10 | 0 | 2 |
| 2012-08-11 | 0 | 2 |
| 2012-08-12 | 3 | 2 |
Here is SQLFiddle demo
I just needed the query to return all the dates in the period I wanted. Without the joins. Thought I'd share for those wanting to put them in your query. Just change the 365 to whatever timeframe you are wanting.
DECLARE #s DATE = GETDATE()-365, #e DATE = GETDATE();
SELECT TOP (DATEDIFF(DAY, #s, #e)+1)
DATEADD(DAY, ROW_NUMBER() OVER (ORDER BY number)-1, #s)
FROM [master].dbo.spt_values
WHERE [type] = N'P' ORDER BY number
The following query does a union with tableA and tableB. It then uses group by to merge the rows from tableA and tableB so that all of the dates from tableB are in the result. If a date is not in tableA, then the row has 0 for value and grp_no. Otherwise, the row has the actual values for value and grp_no.
select
dat,
sum(val),
sum(grp)
from
(
select
date as dat,
value as val,
grp_no as grp
from
tableA
union
select
date,
0,
0
from
tableB
where
date >= date '2012-08-06' and
date <= date '2012-08-13'
)
group by
dat
order by
dat
I find this query to be easier for me to understand. It also runs faster. It takes 16 seconds whereas a similar right join query takes 32 seconds.
This solution only works with numerical data.
This solution assumes a fixed date range. With some extra work this query can be adapted to limit the date range to what is found in tableA.