Recursive CTE - consolidate start and end dates - sql

I have the following table:
row_num customer_status effective_from_datetime
------- ------------------ -----------------------
1 Active 2011-01-01
2 Active 2011-01-02
3 Active 2011-01-03
4 Suspended 2011-01-04
5 Suspended 2011-01-05
6 Active 2011-01-06
And am trying to achieve the following result whereby consecutive rows with the same status are merged into one row with an effective from and to date range:
customer_status effective_from_datetime effective_to_datetime
--------------- ----------------------- ---------------------
Active 2011-01-01 2011-01-04
Suspended 2011-01-04 2011-01-06
Active 2011-01-06 NULL
I can get a recursive CTE to output the correct effective_to_datetime based on the next row, but am having trouble merging the ranges.
Code to generate sample data:
CREATE TABLE #temp
(
row_num INT IDENTITY(1,1),
customer_status VARCHAR(10),
effective_from_datetime DATE
)
INSERT INTO #temp
VALUES
('Active','2011-01-01')
,('Active','2011-01-02')
,('Active','2011-01-03')
,('Suspended','2011-01-04')
,('Suspended','2011-01-05')
,('Active','2011-01-06')

EDIT SQL updated as per comment.
WITH
group_assigned_data AS
(
SELECT
ROW_NUMBER() OVER (PARTITION BY customer_status ORDER BY effective_from_date) AS status_sequence_id,
ROW_NUMBER() OVER ( ORDER BY effective_from_date) AS sequence_id,
customer_status,
effective_from_date
FROM
your_table
)
,
grouped_data AS
(
SELECT
customer_status,
MIN(effective_from_date) AS min_effective_from_date,
MAX(effective_from_date) AS max_effective_from_date
FROM
group_assigned_data
GROUP BY
customer_status,
sequence_id - status_sequence_id
)
SELECT
[current].customer_status,
[current].min_effective_from_date AS effective_from,
[next].min_effective_from_date AS effective_to
FROM
grouped_data AS [current]
LEFT JOIN
grouped_data AS [next]
ON [current].max_effective_from_date = [next].min_effective_from_date + 1
ORDER BY
[current].min_effective_from_date
This isn't recursive, but that's possibly a good thing.
It doesn't deal with gaps in your data. To deal with that you could create a calendar table, with every relevant date, and join on that to fill missing dates with 'unknown' status, and then run the query against that. (Infact you cate do it it a CTE that is used by the CTE above).
At present...
- If row 2 was missing, it would not change the result
- If row 3 was missing, the end_date of the first row would change
Different behaviour can be determined by preparing your data, or other methods. We'd need to know the business logic you need though.
If any one date can have multiple status entries, you need to define what logic you want it to follow. At present the behaviour is undefined, but you could correct that as simply as adding customer_status to the ORDER BY portions of ROW_NUMBER().

Related

Turn several queries into one in SQL Server

I have a table in SQL Server called schedule that has the following columns (and others not listed):
scheduleId
roomId
dateRegistered
dateFreed
4564
2
2022-12-25
2022-12-26
4565
3
2022-12-25
2022-12-27
4566
15
2022-12-26
2022-12-27
4567
2
2022-12-28
2022-12-31
4568
3
2022-12-28
2022-12-30
In some part of my app I need to show all the rooms occupied at a certain date.
Currently I run a query like this:
SELECT TOP (1) *
FROM schedule
WHERE roomId = [theNeededRoom] AND dateFreed < [providedDate]
ORDER BY dateFreed DESC
The thing is that I have to run that query in a for loop so that I get the information for every room.
I'm sure there has to be a better way to do this in a single query that returns a row for each of the different roomIds possible, how can I go about this?
Also, when the room is first registered, the dateFreed column has a null value, if I wanted to take this into account, how can I make the query so that, in the case the dateFreed value is null, that is the row that gets chosen?
You can use TOP(1) WITH TIES, while ordering on the last "dateFreed" date.
In order to have a "tied" value to match on, instead of ordering on "dateFreed DESC" we can use the ROW_NUMBER window function to generate a ranking on the same field (which will store 1 for each most recent "dateFreed" value, per "roomId").
SELECT TOP (1) WITH TIES *
FROM schedule
WHERE dateFreed < [providedDate]
ORDER BY ROW_NUMBER() OVER(PARTITION BY roomId ORDER BY dateFreed DESC)
SELECT
t.*
FROM
(
SELECT
roomId AS rId,
max(dateFreed) AS dateFreedMax
FROM
schedule s
GROUP BY
s.roomId
) AS t
WHERE
t.dateFreedMax < [providedDate]
OR t.dateFreedMax IS NULL
Or
SELECT roomId
FROM
schedule s
GROUP BY s.roomId, dateFreed
HAVING
max(dateFreed)<[providedDate] OR dateFreed IS NULL

For each unique item in a redshift sql column, get the last rows based on a looking/scanning window

patient_id
alert_id
alert_timestamp
3
xyz
2022-10-10
1
anp
2022-10-12
1
gfe
2022-10-10
2
fgy
2022-10-02
2
gpl
2022-10-03
1
gdf
2022-10-13
2
mkd
2022-10-23
1
liu
2022-10-01
I have a sql table (see simplified version above) where for each patient_id, I want to only keep the latest alert (i.e. last one) that was sent out in a given window period e.g. window_size = 7.
Note, the window size needs to look at consecutive days i.e. between day 1 -> day 1 + window_size. The ranges of alert_timestamp for each patient_id varies and is usually well beyond the window_size range.
Note, that the table example given above, is a very simple example and will have many more patient_id's and will be in a mixed order in terms alert_timestamp and alert_id.
The approach is to start from the last alert_timstamp for a given patient_id and work back using the window_size to select the alert that was the last one in that window time frame.
Please note the idea is to have a scanning/looking window, example window_size = 7 days to move across the timestamps of each patient
The end result I want, is a table with the filtered out alerts
Expected output for (this example) window_size = 7:
patient_id
alert_id
alert_timestamp
1
liu
2022-10-01
1
gdf
2022-10-13
2
gpl
2022-10-03
2
mkd
2022-10-23
3
xyz
2022-10-10
What's the most efficient way to solve for this?
This can be done with the last_value window function but you need to prep your data a bit. Here's an example of what this could look like:
create table test (
patient_id int,
alert_id varchar(8),
alert_timestamp date);
insert into test values
(3, 'xyz', '2022-10-10'),
(1, 'anp', '2022-10-12'),
(1, 'gfe', '2022-10-10'),
(2, 'fgy', '2022-10-02'),
(2, 'gpl', '2022-10-03'),
(1, 'gdf', '2022-10-13'),
(2, 'mkd', '2022-10-23'),
(1, 'liu', '2022-10-01');
WITH RECURSIVE dates (dt) AS
(
SELECT '2022-09-30'::DATE AS dt UNION ALL SELECT dt + 1
FROM dates d
WHERE dt < '2022-10-31'::DATE
),
p_dates AS
(
SELECT pid,
dt
FROM dates d
CROSS JOIN (SELECT DISTINCT patient_id AS pid FROM test) p
),
combined AS
(
SELECT *
FROM p_dates d
LEFT JOIN test t
ON d.dt = t.alert_timestamp
AND d.pid = t.patient_id
),
latest AS
(
SELECT patient_id,
pid,
alert_id,
dt,
alert_timestamp,
LAST_VALUE(alert_id IGNORE NULLS) OVER (PARTITION BY pid ORDER BY dt ROWS BETWEEN CURRENT ROW AND 7 following) AS at
FROM combined
)
SELECT patient_id,
alert_id,
alert_timestamp
FROM latest
WHERE patient_id IS NOT NULL
AND alert_id = at
ORDER BY patient_id,
alert_timestamp;
This produces the results you are looking for with the test data but there are a few assumptions. The big one is that here is at most 1 alert per patient per day. If this isn't true then some more data massaging will be needed. Either way this should give you an outline on how to do this.
First need is to ensure that there is 1 row per patient per day so that the window function can operate on rows as these will be equivalent to days (for each patient). The date range is generated by a recursive CTE and joined to the test data to achieve the 1 row per day per patient.
The "ignore nulls" option is used in the last_value window function to ignore any of these "extra" rows create by the above process. The last step is to prune out all the unneeded rows and ensure that only the latest alert of the window is produced.

Using SQL, how do I select which column to add a value to, based on the contents of the row?

I'm having a difficult time phrasing the question, so I think the best thing to do is to give some example tables. I have a table, Attribute_history, I'm trying to pull data from that looks like this:
ID Attribute_Name Attribute_Val Time Stamp
--- -------------- ------------- ----------
1 Color Red 2022/09/28 01:00
2 Color Blue 2022/09/28 01:30
1 Length 3 2022/09/28 01:00
2 Length 4 2022/09/28 01:30
1 Diameter 5 2022/09/28 01:00
2 Diameter 10 2022/09/28 01:30
2 Diameter 11 2022/09/28 01:32
I want to create a table that pulls the attributes of each ID, and if the same ID and attribute_name has been updated, pull the latest info based on Time Stamp.
ID Color Length Diameter
---- ------ ------- --------
1 Red 3 5
2 Blue 4 11
I've achieved this by nesting several select statements, adding one column at a time. I achieved selecting the latest date using this stack overflow post. However, this code seems inefficient, since I'm selecting from the same table multiple times. It also only chooses the latest value for an attribute I know is likely to have been updated multiple times, not all the values I'm interested in.
SELECT
COLOR, DIAMETER, DATE_
FROM
(
SELECT
COLORS.COLOR, ATTR.ATTRIBUTE_NAME AS DIAMETER, ATTR.TIME_STAMP AS DATE_, RANK() OVER (PARTITION BY COLORS.COLOR ORDER BY ATTR.TIME_STAMP DESC) DATE_RANK -- https://stackoverflow.com/questions/3491329/group-by-with-maxdate
FROM
(
SELECT
ATTRIBUTE_HISTORY.ATTRIBUTE_VAL
FROM
ATTRIBUTE_HISTORY
WHERE
ATTRIBUTE_HISTORY.ATTRIBUTE_NAME = 'Color'
GROUP BY ATTRIBUTE_HISTORY.ID
) COLORS
INNER JOIN ATTRIBUTE_HISTORY ATTR ON COLORS.ID = ATTR.ID
WHERE
ATTR.ATTRIBUTE_NAME = 'DIAMETER'
)
WHERE
DATE_RANK = 1
(I copied my real query and renamed values with Find+Replace to obscure the data so this code might not be perfect, but it gets across the idea of how I'm achieving my goal now.)
How can I rewrite this query to be more concise, and pull the latest date entry for each attribute?
For MS SQL Server
Your Problem has 2 parts:
Identify the latest Attribute value based on Time Stamp Column
Convert the Attribute Names to columns ( Pivoting ) in the final
result.
Solution:
;with CTEx as
(
select
row_number() over(partition by id, Attr_name order by Time_Stamp desc) rnum,
id,Attr_name, Attr_value, time_stamp
from #temp
)
SELECT * FROM
(
SELECT id,Attr_name,Attr_value
FROM CTEx
where rnum = 1
) t
PIVOT(
max(Attr_value)
FOR Attr_name IN (Color,Diameter,[Length])
) AS pivot_table;
First part of the problem is taken care of by the CTE with the help of ROW_NUMBER() function. Second part is achieved by using PIVOT() function.
Definition of #temp for reference
Create table #temp(id int, Attr_name varchar(200), Attr_value varchar(200), Time_Stamp datetime)

Select most recent InstanceID base on max end date

I am trying to pull the memberinstance from a table based on the max DateEnd. If it is Null I want to pull that as it would be still ongoing. I am using sql server.
select memberinstanceid
from table
group by memberid
having MAX(ISNULL(date_end, '2099-12-31'))
This query above doesnt work for me. I have tried different ones and have gotten it to return the separate instances, but not just the one with the max date.
Below is what my table looks like.
MemberID MemberInstanceID DateStart DateEnd
2 abc12 2013-01-01 2013-12-31
4 abc21 2010-01-01 2013-12-31
2 abc10 2015-01-01 NULL
4 abc19 2014-01-01 2014-10-31
I would expect my results to look like this
MemberInstanceID
abc10
abc19
I have been trying to figure out how to do this but have not had much luck. Any help would be much appreciated. Thanks
I think you need something like the following:
select MemberID, MemberInstanceID
from table t
where (
-- DateEnd is null...
DateEnd is null
or (
-- ...or pick the latest DateEnd for this member...
DateEnd = (
select max(DateEnd)
from table
where MemberID = t.MemberID
)
-- ... and check there's not a NULL entry for DateEnd for this member
and not exists (
select 1
from table
where MemberID = t.MemberID
and DateEnd is null
)
)
)
The problem with this approach would be if there are multiple rows that match for each member, i.e. multiple NULL rows with the same MemberID, or multiple rows with the same DateEnd for the same MemberID.
SELECT TOP 1 memberinstanceid
from table
ORDER BY (CASE WHEN [DateEnd] IS NULL THEN 1 ELSE 0 END) DESC,
[DateEnd] DESC
The ORDER BY is essentially creating a "column" to sort the NULL values to the top, then doing a secondary sort on the dates that are not null.
You have a good start but you don't need to perform any explicit grouping. What you want is the row where the EndDate is null or is the largest value (latest date) of all the records with the same MemberID. You also realized that the Max couldn't return the latest non-null date because the null, if one exists, must be the latest date.
select m.*
from Members m
where m.DateEnd is null
or m.DateEnd =(
select Max( IsNull( DateEnd, '9999-12-31' ))
from Members
where MemberID = m.MemberID );

Unclear on LAST_VALUE - Preceding

I have a table that looks like this,
Date Value
01/01/2010 03:59:00 324.44
01/02/2010 09:31:00 NULL
01/02/2010 09:32:00 NULL
.
.
.
01/02/2010 11:42:00 NULL
I want the first valid value to appear in all following rows. This is what I did,
select date,
nvl(value, LAST_VALUE(value IGNORE NULLS) over (order by value RANGE BETWEEN 1 PRECEDING AND CURRENT ROW)) value
from
table
This shows no difference at all, but if I say RANGE BETWEEN 3 PRECEDING AND CURRENT ROW it copies the data to all the rows. I'm not clear why this is happening. Can anyone explain if I'm misunderstanding how to use preceding?
Analytic functions still work on sets of data. They do not process one row at a time, you would need PL/SQL or MODEL to do that. PRECEDING refers to the last X rows, but before the analytic function has been applied.
These problems can be confusing in SQL because you have to build the logic into defining the set, instead of trying to pass data from one row to another. That's why I used CASE with LAST_VALUE in my previous answer.
Edit:
I've added a simple data set so we can all run the exact same query. VALUE1 seems to work to me, am I missing something? Part of the problem with VALUE2 is that the analytic ORDER BY uses VALUE, instead of the date.
select id, the_date, value
,last_value(value ignore nulls) over
(partition by id order by the_date) value1
,nvl(value, LAST_VALUE(value IGNORE NULLS) over
(order by value RANGE BETWEEN 1 PRECEDING AND CURRENT ROW)) value2
from
(
select 1 id, date '2011-01-01' the_date, 100 value from dual union all
select 1 id, date '2011-01-02' the_date, null value from dual union all
select 1 id, date '2011-01-03' the_date, null value from dual union all
select 1 id, date '2011-01-04' the_date, null value from dual union all
select 1 id, date '2011-01-05' the_date, 200 value from dual
)
order by the_date;
Results:
ID THE_DATE VALUE VALUE1 VALUE2
1 1/1/2011 100 100 100
1 1/2/2011 100
1 1/3/2011 100
1 1/4/2011 100
1 1/5/2011 200 200 200
It is possible to copy one row at time because i had done that using java Logic and Sql query
Statement sta;
ResultSet rs,rslast;
try{
//Connection creation code and "con" is an object of Connection Class so don't confuse about that.
sta = con.createStatement();
rs=sta.executeQuery("SELECT * FROM TABLE NAME");
rslast=sta.executeQuery("SELECT * FROM TABLENAME WHERE ID = (SELECT MAX(ID) FROM TABLENAME)");
rslast.next();
String digit =rslast.getString("ID");
System.out.print("ID"+rslast.getString("ID")); // it gives ID of the Last Record.
Instead using this method u can also use ORDER by Date in Descending order.
Now i hope u make logic that only insert Last record.