Need to count certain rows based on date criteria in SQL Server - sql

I'm using SQL Server 2012 and have a table with these 2 columns. I need to count an ORG_ID once ONLY IF the EndDate for every row or that ORG_ID falls within/before the timeframe of '1-1-2018' and '1-31-2018' (or before but NOT after) for ALL rows for that org. An ORE with an EndDate of NULL would also NOT be in my results
ORG_ID EndDate
99968042 1/31/2018
99968042 2/14/2018
99968042 2/14/2018
99900699 1/10/2018
99900699 1/10/2018
99900699 1/10/2018
99900699 1/10/2018
99899776 1/20/2018
99843366 12/17/2017
99843366 1/4/2018
99841000 2/1/2016
99651255 NULL
99651255 1/15/2018
The rows that should output are:
99900699
99899776
99843366
I haven't tried anything, because I can't think how to approach it.
So now I've tried this:
select distinct ORG_ID
from ##PLCMT p1
where not exists (
select *
from ##PLCMT p2
where p1.ORG_ID = p2.ORG_ID and
(p1.EndDate <= '2018-01-01' or p1.enddate >= '2018-01-31' or p1.EndDate is NULL)
)
and it is still resulting back an org that has a NULL enddate, I can't figure out why. ORG_ID 3098376 is in my results but if I look at all the rows for that ORG_ID, it looks like this:
select *
from ##PLCMT
where org_id = '3098376'
results:
ORG_ID EndDate
3098376 2017-09-11
3098376 NULL
3098376 NULL

Use group by and having and NOT IN (to eliminate those who have any NULL value):
SELECT org_id
FROM t
GROUP BY org_id
WHERE org_id NOT IN (SELECT org_id FROM t WHERE enddate IS NULL)
HAVING MAX(enddate) <= '2018-01-31';
Your are probably safe with this logic:
having max(enddate) < '2018-02-01'
This works even if enddate has a time component.

Another way is with NOT EXISTS()
SELECT DISTINCT org_id
from t
WHERE NOT EXISTS(
SELECT * FROM t t1
WHERE t1.org_id=t.org_id
AND (t1.enddate > '20180131' OR t1.enddate IS NULL)
)

Related

SQL left join same column and table

I have a customer order data and would like to do analysis on customer retention after price changes.
The order table is as follows:
customer_id order_number order_delivered_date
14156 R980193622 2/6/2020 14:51
1926396 R130222714 22/5/2020 11:02
1085123 R313065343 22/5/2020 14:50
699858 R693959049 8/6/2020 17:03
1609769 R195969327 3/6/2020 16:14
14156 R997103187 27/6/2020 14:01
1926396 R403942827 11/6/2020 14:42
1926396 R895013611 8/7/2020 17:04
So, I would like to pull order in the period before new price. Assume the new price implementation is on 10/6/2020. I would like to do left join to order after the new price on the customer_id.
Before is a set of data dated 10/5/2020 00:00:00 to 9/6/2020 23:59:59 while After is a set of data dated 10/6/2020 00:00:00 to 9/7/2020 23:59:59.
The desired table:
Before After
14156 14156
1926396 1926396
1085123 Null
699858 Null
1609769 Null
If customer_id is found side by side it means they are retained. It should be simple...But I have been stucked.
EDIT:
This is few code that I have been trying
First try:
select ol2.customer_id as before, ol.customer_id as after
from master.order_level ol,
left join master.order_level ol2
on ol2.customer_id = ol.customer_id
where order_delivered_date between '2020-05-10 00:00:00' and '2020-07-09 23:59:59' and country_id = 2
Second try:
SELECT ol.customer_id as before, ol2.customer_id as after
FROM master.order_level ol,master.order_level ol2
left join master.order_level
ON ol.customer_id = ol2.customer_id
WHERE ol.order_delivered_date between '2020-05-10 00:00:00' and '2020-06-09 23:59:59' and ol.country_id =2 and ol2.order_delivered_date between '2020-06-10 00:00:00' and '2020-07-09 23:59:59' and ol2.country_id =2
No need to do a join, you can just use you can do a simple group by and use case and aggregate functions. I also made a fiddle showing it in action here
SELECT customer_id,
CASE
WHEN MIN(order_delivered_date) < '3-15-2019' THEN customer_id
ELSE NULL END customer_before,
CASE
WHEN MAX(order_delivered_date) >= '3-15-2019' THEN customer_id
ELSE NULL END customer_after
FROM my_table
GROUP BY customer_id
there qyery will giva you results like this
customer_id customer_before customer_after
4 4 (null)
1 1 1
3 3 (null)
2 2 2
with before (customer_id) as
( select distinct customer_id from orders where order_delivered_date <= '10/06/2020'
),
after (customer_id) as
(select distinct customer_id from orders where order_delivered_date between '10/06/2020' and '09/07/2020')
select
before.customer_id,
after.customer_id
from before left outer join after on before.customer_id = after.customer_id
you can use union
select customer_id as before, null as after
from #order
where order_delivered_date <'2020-06-10'
union
select null as before, customer_id as after
from #order
where order_delivered_date >='2020-06-10'
results

Select which has matching date or latest date record

Here are two tables.
ItemInfo
Id Description
1 First Item
2 Second Item
ItemInfoHistory
Id ItemInfoId Price StartDate EndDate
1 1 45 2020-09-01 2020-09-15
2 2 55 2020-09-26 null
3 1 50 2020-09-16 null
Here is SQL query.
SELECT i.Id, Price, StartDate, EndDate
FROM Itemsinfo i
LEFT JOIN ItemInfoHistory ih ON i.id= ih.ItemsMasterId AND CONVERT(DATE, GETDATE()) >= StartDate AND ( CONVERT(DATE, GETDATE()) <= EndDate OR EndDate IS NULL)
Which gives following results, when runs the query on 9/20
Id Price StartDate EndDate
1 50 2020-09-16 NULL
2 NULL NULL NULL
For the second item, I want to get latest record from history table, as shown below.
Id Price StartDate EndDate
1 50 2020-09-16 NULL
2 55 2020-09-26 NULL
Thanks in advance.
Probably the most efficient method is two joins. Assuming the "latest" record has a NULL values for EndDate, then:
SELECT i.Id,
COALESCE(ih.Price, ih_last.Price) as Price,
COALESCE(ih.StartDate, ih_last.StartDate) as StartDate,
COALESCE(ih.EndDate, ih_last.EndDate) as EndDate
FROM Itemsinfo i LEFT JOIN
ItemInfoHistory ih
ON i.id = ih.ItemsMasterId AND
CONVERT(DATE, GETDATE()) >= StartDate AND
(CONVERT(DATE, GETDATE()) <= EndDate OR EndDate IS NULL) LEFT JOIN
ItemInfoHistory ih_last
ON i.id = ih_last.ItemsMasterId AND
ih_last.EndDate IS NULL;
Actually, the middle join doesn't need to check for NULL, so that could be removed.

Find From/To Dates across multiple rows - SQL Postgres

I want to be able to "book" within range of dates, but you can't book across gaps of days. So booking across multiple rates is fine as long as they are contiguous.
I am happy to change data structure/index, if there are better ways of storing start/end ranges.
So far I have a "rates" table which contains Start/End Periods of time with a daily rate.
e.g. Rates Table.
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-16 2016-04-17
3 50.00 2016-04-18 2016-04-30
For the above data I would want to return:
From To
2015-04-12 2016-4-30
For simplicity sake it is safe to assume that dates are safely consecutive. For contiguous dates To is always 1 day before from.
For the case there is only 1 row, I would want it to return the From/To of that single row.
Also to clarify if I had the following data:
ID Price From To
1 75.00 2015-04-12 2016-04-15
2 100.00 2016-04-17 2016-04-18
3 50.00 2016-04-19 2016-04-30
4 50.00 2016-05-01 2016-05-21
Meaning where there is a gap >= 1 day it would count as a separate range.
In which case I would expect the following:
From To
2015-04-12 2016-04-15
2015-04-17 2016-05-21
Edit 1
After playing around I have come up with the following SQL which seems to work. Although I'm not sure if there are better ways/issues with it?
WITH grouped_rates AS
(SELECT
from_date,
to_date,
SUM(grp_start) OVER (ORDER BY from_date, to_date) group
FROM (SELECT
gite_id,
from_date,
to_date,
CASE WHEN (from_date - INTERVAL '1 DAY') = lag(to_date)
OVER (ORDER BY from_date, to_date)
THEN 0
ELSE 1
END grp_start
FROM rates
GROUP BY from_date, to_date) AS start_groups)
SELECT
min(from_date) from_date,
max(to_date) to_date
FROM grouped_rates
GROUP BY grp;
This is identifying contiguous overlapping groups in the data. One approach is to find where each group begins and then do a cumulative sum. The following query adds a flag indicating if a row starts a group:
select r.*,
(case when not exists (select 1
from rates r2
where r2.from < r.from and r2.to >= r.to or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r;
The or in the correlation condition is to handle the situation where intervals that define a group overlap on the start date for the interval.
You can then do a cumulative sum on this flag and aggregate by that sum:
with r as (
select r.*,
(case when not exists (select 1
from rates r2
where (r2.from < r.from and r2.to >= r.to) or
(r2.from = r.from and r2.id < r.id)
)
then 1 else 0 end) as StartFlag
from rate r
)
select min(from), max(to)
from (select r.*,
sum(r.StartFlag) over (order by r.from) as grp
from r
) r
group by grp;
CREATE TABLE prices( id INTEGER NOT NULL PRIMARY KEY
, price MONEY
, date_from DATE NOT NULL
, date_upto DATE NOT NULL
);
-- some data (upper limit is EXCLUSIVE)
INSERT INTO prices(id, price, date_from, date_upto) VALUES
( 1, 75.00, '2015-04-12', '2016-04-16' )
,( 2, 100.00, '2016-04-17', '2016-04-19' )
,( 3, 50.00, '2016-04-19', '2016-05-01' )
,( 4, 50.00, '2016-05-01', '2016-05-22' )
;
-- SELECT * FROM prices;
-- Recursive query to "connect the dots"
WITH RECURSIVE rrr AS (
SELECT date_from, date_upto
, 1 AS nperiod
FROM prices p0
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_upto = p0.date_from) -- no preceding segment
UNION ALL
SELECT r.date_from, p1.date_upto
, 1+r.nperiod AS nperiod
FROM prices p1
JOIN rrr r ON p1.date_from = r.date_upto
)
SELECT * FROM rrr r
WHERE NOT EXISTS (SELECT * FROM prices nx WHERE nx.date_from = r.date_upto) -- no following segment
;
Result:
date_from | date_upto | nperiod
------------+------------+---------
2015-04-12 | 2016-04-16 | 1
2016-04-17 | 2016-05-22 | 3
(2 rows)

Concatenation of adjacent dates in SQL

I would like to know how to make intersections or concatenations of adjacent date ranges in sql.
I have a list of customer start and end dates, for example (in dd/mm/yyyy format, where 31/12/9999 means the customer is still a current customer).
CustID | StartDate | Enddate |
1 | 01/08/2011|19/06/2012|
1 | 20/06/2012|07/03/2012|
1 | 03/05/2012|31/12/9999|
2 | 09/03/2009|16/08/2009|
2 | 16/01/2010|10/10/2010|
2 | 11/10/2010|31/12/9999|
3 | 01/08/2010|19/08/2010|
3 | 20/08/2010|26/12/2011|
Although the dates in different rows don't overlap, I would consider some of the ranges as a contigous period of time, e.g when the start date comes one day after an end date (for a given customer). Hence I would like to return a query that returns just the intersection of the dates,
CustID | StartDate | Enddate |
1 | 01/08/2011|07/03/2012|
1 | 03/05/2012|31/12/9999|
2 | 09/03/2009|16/08/2009|
2 | 16/01/2010|31/12/9999|
3 | 01/08/2010|26/12/2011|
I've looked at CTE tables, but I can't figure out how to return just one row for one contigous block of dates.
This should work in 2005 forward:
;WITH cte2 AS (SELECT 0 AS Number
UNION ALL
SELECT Number + 1
FROM cte2
WHERE Number < 10000)
SELECT CustID, Min(GroupStart) StartDate, MAX(EndDate) EndDate
FROM (SELECT *
, DATEADD(DAY,b.number,a.StartDate) GroupStart
, DATEADD(DAY,1- DENSE_RANK() OVER (PARTITION BY CustID ORDER BY DATEADD(DAY,b.number,a.StartDate)),DATEADD(DAY,b.number,a.StartDate)) GroupDate
FROM Table1 a
JOIN cte2 b
ON b.number <= DATEDIFF(d, startdate, EndDate)
) X
GROUP BY CustID, GroupDate
ORDER BY CustID, StartDate
OPTION (MAXRECURSION 0)
Demo: SQL Fiddle
You can build a quick table of numbers 0-something large enough to cover the spread of dates in your ranges to replace the cte so it doesn't run each time, indexed properly it will run quickly.
you can do this with recursive common table expression:
with cte as (
select t.CustID, t.StartDate, t.EndDate, t2.StartDate as NextStartDate
from Table1 as t
left outer join Table1 as t2 on t2.CustID = t.CustID and t2.StartDate = case when t.EndDate < '99991231' then dateadd(dd, 1, t.EndDate) end
), cte2 as (
select c.CustID, c.StartDate, c.EndDate, c.NextStartDate
from cte as c
where c.NextStartDate is null
union all
select c.CustID, c.StartDate, c2.EndDate, c2.NextStartDate
from cte2 as c2
inner join cte as c on c.CustID = c2.CustID and c.NextStartDate = c2.StartDate
)
select CustID, min(StartDate) as StartDate, EndDate
from cte2
group by CustID, EndDate
order by CustID, StartDate
option (maxrecursion 0);
sql fiddle demo
Quick performance tests:
Results on 750 rows, small periods of 2 days length:
sql fiddle demo
My query: 300 ms
Goat CO query with CTE: 10804 ms
Goat CO query with table of fixed numbers: 7 ms
Results on 5 rows, large periods:
sql fiddle demo
My query: 1 ms
Goat CO query with CTE: 700 ms
Goat CO query with table of fixed numbers: 36 ms

Adding a Date column based on the next row date value

Im using SQL Server 2005. From the tbl_temp table below, I would like to add an EndDate column based on the next row's StartDate minus 1 day until there's a change in AID and UID combination. This calculated EndDate will go to the row above it as the EndDate. The last row of the group of AID and UID will get the system date as its EndDate. The table has to be ordered by AID, UID, StartDate sequence. Thanks for the help.
-- tbl_temp
AID UID StartDate
1 1 2013-02-20
2 1 2013-02-06
1 1 2013-02-21
1 1 2013-02-27
1 2 2013-02-02
1 2 2013-02-04
-- Result needed
AID UID StartDate EndDate
1 1 2013-02-20 2013-02-20
1 1 2013-02-21 2013-02-26
1 1 2013-02-27 sysdate
1 2 2013-02-02 2013-02-03
1 2 2013-02-04 sysdate
2 1 2013-02-06 sysdate
The easiest way to do this is with a correlated subquery:
select t.*,
(select top 1 dateadd(day, -1, startDate )
from tbl_temp t2
where t2.aid = t.aid and
t2.uid = t.uid and
t2.startdate > t.startdate
) as endDate
from tbl_temp t
To get the current date, use isnull():
select t.*,
isnull((select top 1 dateadd(day, -1, startDate )
from tbl_temp t2
where t2.aid = t.aid and
t2.uid = t.uid and
t2.startdate > t.startdate
), getdate()
) as endDate
from tbl_temp t
Normally, I would recommend coalesce() over isnull(). However, there is a bug in some versions of SQL Server where it evaluates the first argument twice. Normally, this doesn't make a difference, but with a subquery it does.
And finally, the use of sysdate makes me think of Oracle. The same approach will work there too.
;WITH x AS
(
SELECT AID, UID, StartDate,
ROW_NUMBER() OVER(PARTITION BY AID, UID ORDER BY StartDate) AS rn
FROM tbl_temp
)
SELECT x1.AID, x1.UID, x1.StartDate,
COALESCE(DATEADD(day,-1,x2.StartDate), CAST(getdate() AS date)) AS EndDate
FROM x x1
LEFT OUTER JOIN x x2 ON x2.AID = x1.AID AND x2.UID = x1.UID
AND x2.rn = x1.rn + 1
ORDER BY x1.AID, x1.UID, x1.StartDate
SQL Fiddle example