Excluding dates based upon a date value in a column - sql

I have the below table.
I want to exclude rows where the start_cycle date is >than the date where the 'source' column = END_DATE. So for this example, removing any rows where the start_cycle date is > than 2/11/2019
END_DATE could be different for each ID
ID START_CYCLE END_CYCLE SOURCE
1 1/20/2019 2/1/2019 START
1 2/2/2019 2/2/2019 START_BRA
1 2/3/2019 2/5/2019 ASSGN
1 2/6/2019 2/10/2019 CUST_START
1 2/11/2019 2/12/2019 ASSGN
1 2/11/2019 12/31/2999 END_DATE
1 1/1/3000 2/12/2019 END_DATE_BRA
For this example, expected results would be: (Removing the last row)
ID START_CYCLE END_CYCLE SOURCE
1 1/20/2019 2/1/2019 START
1 2/2/2019 2/2/2019 START_BRA
1 2/3/2019 2/5/2019 ASSGN
1 2/6/2019 2/10/2019 CUST_START
1 2/11/2019 2/12/2019 ASSGN
1 2/11/2019 12/31/2999 END_DATE

You can do it without a join, assuming that there is only 1 row for each id with source = 'END_DATE':
select * from tablename t
where start_cycle <= (select start_cycle from tablename where id = t.id and source = 'END_DATE')

You can do that with a CTE. First you will query for the START_CYCLE for all ID with SOURCE = 'END_DATE'. Afterwards you will join this Result:
WITH id_end_date as
(
SELECT id, start_cycle
FROM table1
WHERE source = 'END_DATE'
)
SELECT to.*
FROM table1 to
INNER JOIN id_end_date
ON to.id = id_end_date.id
WHERE to.start_cycle > id_end_date.start_cycle
;

Below would be the query, Assuming there would be multiple ID's in the table.
select from <tableName> t1 inner join (select * from <tableName> where source='END_DATE') t2
on t1.id=t2.id and t1.start_cycle > t2.start_cycle;

Related

How to get the next non-zero value in table partitioned by id?

Here is a subset of my table:
id
date
value
1
01/01/2022
5
1
02/02/2022
0
1
03/01/2022
0
1
04/02/2022
10
2
01/04/2022
5
2
02/04/2022
3
2
03/04/2022
0
2
04/04/2022
10
Where there are 0s in the value field, i would like to replace them with the non-zero value that occurs after the sequence of 0s are over, partitioned by id.
I have tried to use LAG but im really struggling as it takes the value above the current value in the table.
Any help will be appreciated.
Transformed table to look like
id
date
value
1
01/01/2022
5
1
02/02/2022
10
1
03/01/2022
10
1
04/02/2022
10
2
01/04/2022
5
2
02/04/2022
3
2
03/04/2022
10
2
04/04/2022
10
you can use cross apply;
select T1.id, T1.date, CASE WHEN T1.value = 0 THEN X.value ELSE T1.value END value from TestTable T1
OUTER APPLY (SELECT TOP 1 * FROM TestTable T2
WHERE T1.id = T2.id AND T2.date > T1.date
AND T2.value > 0
ORDER BY T2.date) X
sqlfiddle
Assuming by replace them you mean to update the table, simplest way would be a correlated subquery:
update t set value = (
select top(1) value
from t t2
where t2.id = t.id
and t2.value > 0
and t2.date > t.date
order by t2.date
)
where t.value = 0;
We group every 0 with the first value after it that's not 0 and then we use max() over() to replace the 0s in the group.
select id
,date
,max(value) over(partition by id, grp) as value
from
(
select *
,count(case when value != 0 then 1 end) over(partition by id order by date desc) as grp
from t
) t
order by id, date
id
date
value
1
2022-01-01
5
1
2022-02-02
10
1
2022-03-01
10
1
2022-04-02
10
2
2022-01-04
5
2
2022-02-04
3
2
2022-03-04
10
2
2022-04-04
10
Fiddle
You can do it using outer apply:
select
d.id, d.date_,
case when d.value != 0 then d.value else nz.value end as value
from data d
outer apply (
select min(value) as value
from data dd
where dd.id = d.id
and dd.date_ > d.date_
and dd.value <> 0
) nz
You can test on this db<>fiddle

SQL Select Distinct Records From Two Tables

I am trying to write a SQL statement that will return a set of Distinct set of CompanyNames from a table based on the most recent SaleDate withing a specified date range from another table.
T01 = Account
T02 = TransHeader
The fields of importance are:
T01.ID, T01.CompanyName
T02.AccountID, T02.SaleDate
T01.ID = T02.AccountID
What I want to return is the Max SaleDate for each CompanyName without any duplicate CompanyNames and only the Max(SaleDate) as LastSale. I will be using a Where Clause to limit the SaleDate range.
I tried the following but it returns all the records for all SalesDates in the range. This results in the same company being listed multiple times.
Current MS-SQL Query
SELECT T01.CompanyName, T02.LastSale
FROM
(SELECT DISTINCT ID, IsActive, ClassTypeID, CompanyName FROM Account) T01
FULL OUTER JOIN
(SELECT DISTINCT AccountID, TransactionType, MAX(SaleDate) LastSale FROM TransHeader group by AccountID, TransactionType, SaleDate) T02
ON T01.ID = T02.AccountID
WHERE ( ( T01.IsActive = 1 )AND
( (Select Max(SaleDate)From TransHeader Where AccountID = T01.ID AND TransactionType in (1,6) AND SaleDate is NOT NULL)
BETWEEN '01/01/2016' AND '12/31/2018 23:59:00' AND (Select Max(SaleDate)From TransHeader Where AccountID = T01.ID AND TransactionType in (1,6) AND SaleDate is NOT NULL) IS NOT NULL
)
)
ORDER BY T01.CompanyName
I thought the FULL OUTER JOIN was the ticket but it did not work and I am stuck.
Sample data Account Table (T01)
ID CompanyName IsActive ClassTypeID
1 ABC123 1 1
2 CDE456 1 1
3 EFG789 1 1
4 Test123 0 1
5 Test456 1 1
6 Test789 0 1
Sample data Transheader table (T02)
AccountID TransactionType SaleDate
1 1 02/03/2012
2 1 03/04/2013
3 1 04/05/2014
4 1 05/06/2014
5 1 06/07/2014
6 1 07/08/2015
1 1 08/09/2016
1 1 01/15/2016
2 1 03/20/2017
2 1 03/21/2017
3 1 03/04/2017
3 1 04/05/2018
3 1 05/27/2018
4 1 06/01/2018
5 1 07/08/2018
5 1 08/01/2018
5 1 10/11/2018
6 1 11/30/2018
Desired Results
CompanyName LastSale (Notes note returned in the result)
ABC123 01/15/2016 (Max(SaleDate) LastSale for ID=1)
CDE456 03/21/2017 (Max(SaleDate) LastSale for ID=2)
EFG789 05/27/2018 (Max(SaleDate) LastSale for ID=3)
Testing456 10/11/2018 (Max(SaleDate) LastSale for ID=5)
ID=4 & ID=6 are note returned because IsActive = 0 for these records.
One option is to select the maximum date in the select clause.
select
a.*,
(
select max(th.saledate)
from transheader th
where th.accountid = a.id
and th.saledate >= '2016-01-01'
and th.saledate < '2019-01-01'
) as max_date
from account a
where a.isactive = 1
order by a.id;
If you only want to show transaction headers with sales dates in the given date range, then you can just inner join the maximum dates with the accounts. In order to do so, you must group your date aggregation per account:
select a.*, th.max_date
from account a
join
(
select accountid, max(saledate) as max_date
from transheader
and saledate >= '2016-01-01'
and saledate < '2019-01-01'
group by accountid
) th on th.accountid = a.id
where a.isactive = 1
order by a.id;
select CompanyName,MAX(SaleDate) SaleDate from Account a
inner join Transheader b on a.id = b.accountid
group by CompanyName
order by 1

Reusing value for multiple dates in SQL

I have a table that looks like this
ID Type Change_Date
1 t1 2015-10-08
1 t2 2016-01-03
1 t3 2016-03-07
2 t1 2017-12-13
2 t2 2018-02-01
It shows if a customer has changed account type and when. However, I'd like a query that can give me the follow output
ID Type Change_Date
1 t1 2015-10
1 t1 2015-11
1 t1 2015-12
1 t2 2016-01
1 t2 2016-02
1 t3 2016-03
1 t3 2016-04
... ... ...
1 t3 2018-10
for each ID. The output shows what account type the customer had for each month until the current month. My problem is filling in the "empty" months. In some cases the interval between account changes can be more than a year.
I hope this makes sense.
Thanks in advance.
Base on Presto SQL(because your origin question is about Presto/SQL)
Update in 2018-11-01: use lead() to simplify SQL
Prepare data
Table mytable same as yours
id type update_date
1 t1 2015-10-08
1 t2 2016-01-03
1 t3 2016-03-07
2 t1 2017-12-13
2 t2 2018-02-01
Table t_month is a dictionary table which has all month data from 2015-01 to 2019-12. This kind of dictionary tables are useful.
ym
2015-01
2015-02
2015-03
2015-04
2015-05
2015-06
2015-07
2015-08
2015-09
...
2019-12
Add lifespan for mytable
Normally, your should 'manage' your data like their lifespan. So mytable should like
id type start_date end_date
1 t1 2015-10-08 2016-01-03
1 t2 2016-01-03 2016-03-07
1 t3 2016-03-07 null
2 t1 2017-12-13 2018-02-01
2 t2 2018-02-01 null
But in this case, you don't. So next step is 'create' one. Use lead() window function.
select
id,
type,
date_format(update_date, '%Y-%m') as start_month,
lead(
date_format(update_date, '%Y-%m'),
1, -- next one
date_format(current_date+interval '1' month, '%Y-%m') -- if null return next month
) over(partition by id order by update_date) as end_month
from mytable
Output
id type start_month end_month
1 t1 2015-10 2016-01
1 t2 2016-01 2016-03
1 t3 2016-03 2018-11
2 t1 2017-12 2018-02
2 t2 2018-02 2018-11
Cross join id and month
It's simple
with id_month as (
select * from t_month
cross join (select distinct id from mytable)
)
select * from id_month
Output
ym id
2015-01 1
2015-02 1
2015-03 1
...
2019-12 1
2015-01 2
2015-02 2
2015-03 2
...
2019-12 2
Finally
Now, you can use subquery in select clause
select
id,
type,
ym
from (
select
t1.id,
t1.ym,
(select type from mytable2 where t1.id = id and t1.ym >= start_month and t1.ym < end_month) as type
from id_month t1
)
where type is not null
-- order by id, ym
Full sql
with mytable2 as (
select
id,
type,
date_format(update_date, '%Y-%m') as start_month,
lead(
date_format(update_date, '%Y-%m'),
1, -- next one
date_format(current_date+interval '1' month, '%Y-%m') -- if null return next month
) over(partition by id order by update_date) as end_month
from mytable
)
, id_month as (
select * from t_month
cross join (select distinct id from mytable)
)
select
id,
type,
ym
from (
select
t1.id,
t1.ym,
(select type from mytable2 where t1.id = id and t1.ym >= start_month and t1.ym < end_month) as type
from id_month t1
)
where type is not null
--order by id, ym
Output
id type ym
1 t1 2015-10
1 t1 2015-11
1 t1 2015-12
1 t2 2016-01
1 t2 2016-02
1 t3 2016-03
1 t3 2016-04
...
1 t3 2018-10
2 t1 2017-12
2 t1 2018-01
2 t2 2018-02
...
2 t2 2018-10

TSQL returning the first record

I have a SQL Server 2008 table with the following data (Small sample)
id Date Value
20448D6F-4099-408D-85FE-11EC6690CDB8 2010-06-01 1
20448D6F-4099-408D-85FE-11EC6690CDB8 2010-06-02 2
20448D6F-4099-408D-85FE-11EC6690CDB8 2010-06-03 3
20448D6F-4099-408D-85FE-11EC6690CDB8 2010-06-04 4
20448D6F-4099-408D-85FE-11EC6690CDB8 2010-06-05 NULL
EF595DE6-FF57-4625-8254-287F49843445 2010-06-02 2
EF595DE6-FF57-4625-8254-287F49843445 2010-06-03 3
EF595DE6-FF57-4625-8254-287F49843445 2010-06-04 4
EF595DE6-FF57-4625-8254-287F49843445 2010-06-05 NULL
C6F459EF-1493-4864-81C2-E5B55283EF0C 2010-06-04 45
C6F459EF-1493-4864-81C2-E5B55283EF0C 2010-06-05 NULL
I am running the query
select *
from [test].[dbo].[testtable]
where id in
(
select id
from [test].[dbo].[testtable]
where Date='2010-06-05' and Value is null
)
and Date = DATEADD(D, -4, '2010-06-05')
which returns
id Date Value
20448D6F-4099-408D-85FE-11EC6690CDB8 2010-06-01 1
but when a record does not exist for 2010-06-01 I would like to return the next min date
So the results I would see from the sample would be
id Date Value
20448D6F-4099-408D-85FE-11EC6690CDB8 2010-06-01 1
EF595DE6-FF57-4625-8254-287F49843445 2010-06-02 2
C6F459EF-1493-4864-81C2-E5B55283EF0C 2010-06-04 45
I have millions of records how can I do this in a T-SQL query?
Thanks
You can use the MIN aggregate in a subquery:
SELECT t.Id, t.Date, t.Value
FROM [test].[dbo].[testtable] t
JOIN (
SELECT Min(Date) MinDate, Id
FROM [test].[dbo].[testtable]
WHERE Date >= '6/1/2010'
GROUP BY Id
) t2 ON t.Id = t2.Id AND t.Date = t2.MinDate
WHERE t.Id IN (
SELECT id
FROM [test].[dbo].[testtable]
WHERE Date='2010-06-05' and Value is null
)
SQL Fiddle Demo

Find next date value in the column

I have a large table with the following columns and sample values:
ID Ser Reg Date
1 12345 001 1/3/2011
1 12345 001 2/2/2011
1 12345 002 1/3/2011
1 12345 002 2/2/2011
2 23456 001 1/3/2011
2 23456 001 2/7/2011
2 23456 001 3/5/2011
I tried this query from a previous post SQL - Select next date query - but did not get the desired results:
SELECT
mytable.id,
mytable.date,
(
SELECT
MIN(mytablemin.date)
FROM mytable AS mytablemin
WHERE mytablemin.date > mytable.date
) AS NextDate
FROM mytable
This is what I am trying to accomplish:
ID Ser Reg curr_Date prev_Date
1 12345 001 2/2/2011 1/3/2011
1 12345 002 2/2/2011 1/3/2011
2 23456 001 2/7/2011 1/5/2011
2 23456 001 3/5/2011 2/7/2011
I would appreciate any help with this task.
if you are using oracle database (as you have not mentioned then I can assume anything)
then you can use lead and lag function/command for this ..
select id,ser, reg, curr_date ,prev_date
from
(
select id,ser, reg, ser, date curr_date
LEAD(date, 1, 0) OVER (PARTITION BY id,ser, reg, curr_date ORDER BY date DESC NULLS LAST) prev_date,
)
where prev_date is not null;
There was a condition missing from correlated subquery joining mytablemin copy of mytable table with mytable. Also you would eliminate records which do not have NextDate - but this might give incorrect results in case when only one record in group (Id, Ser, Reg) exists by eliminating it from result set.
select * from
(
SELECT
mytable.id,
mytable.date,
(
SELECT
MIN(mytablemin.date)
FROM mytable AS mytablemin
WHERE mytablemin.date > mytable.date
and mytablemin.id = mytable.id
and mytablemin.Ser = mytable.Ser
and mytablemin.Reg = mytable.Reg
) AS NextDate
FROM mytable
) a
where a.NextDate is not null
And here is version using derived table with aggregation:
SELECT
mytable.id,
mytable.date,
mytablemin.minDate
FROM mytable
inner join
(
SELECT mytablemin.id,
mytablemin.Ser,
mytablemin.Reg,
MIN(mytablemin.date) minDate
FROM mytable AS mytablemin
group by mytablemin.id,
mytablemin.Ser,
mytablemin.Reg
having MIN(mytablemin.date) is not null
) AS mytablemin
on mytablemin.id = mytable.id
and mytablemin.Ser = mytable.Ser
and mytablemin.Reg = mytable.Reg