SQL window function to remove multiple values with different criteria

SQL window function to remove multiple values with different criteria - sql

I have a data set where I'm trying to remove records with the following conditions:
If a practid has multiple records with the same date and at least one record has a reason of "L&B" then I want all the practid's for that date to be removed.
DECLARE t table(practid int, statusdate date, reason varchar(100)
INSERT INTO t VALUES (1, '2018-03-01', 'L&B'),
(1, '2018-03-01', 'NULL'),
(1, '2018-04-01, 'R&D'),
(2, '2018-05-01, 'R&D'),
(2, '2018-05-01, 'R&D'),
(2, '2018-03-15', NULL),
(2, '2018-03-15', 'R&D),
(3, '2018-07-01, 'L&B)
With this data set I would want the following result:
PractId StatusDate Reason
1 2018-04-01 R&D
2 2018-05-01 R&D
2 2018-05-01 R&D
2 2018-03-15 NULL
2 2018-03-15 R&D
I tried solving this with a window function but am getting stuck:
SELECT *, ROW_NUMBER() OVER
(PARTITION BY practid, statusdate, CASE WHEN reason = 'L&B' THEN 0 ELSE 1 END) AS rn
FROM table
From my query I can't figure out how to keep Practid = 2 since I would want to keep all the records.

To continue along your current approach, we can use COUNT as an analytic function. We can count the occurrences of the L&B reason over each practid/statusdate window, and then retain only groups where this reason never occurs.
SELECT practid, statusdate, reason
FROM
(
SELECT *,
COUNT(CASE WHEN reason = 'L&B' THEN 1 END) OVER
(PARTITION BY practid, statusdate) cnt
FROM yourTable
) t
WHERE cnt = 0;
Demo

You can try to use not exists with a subquery.
Select *
from t t1
where not exists (
select 1
from t tt
where tt.reason = 'L&B' and t1.statusdate = tt.statusdate
)
sqlfiddle

Related

Need two columns in one row in SQL

Given the values below I need records with the same ID to be on the same line ONLY if either the Paid or Interest values are 0 and the other is not. In this case ID= 1 would be on one line, but the others would remain the same.
declare #t table(id int, paid varchar(20), interest varchar (20))
insert into #t values(1, '0.00', '3.51'),
(1, '1000', '0.00'),
(3, '2.50', '0.00'),
(4, '50.00', '2.20'),
(4, '75.00', '0.10')
select * from #t
I need a result like this:
ID Paid Interest
1 1000 3.51
3 2.50 0.00
4 50.00 2.20
4 75.00 0.00
I tried creating something using a windows function but couldn't come close. Anyone have any ideas?

Hmmm . . . this approach splits the columns apart and then re-combines them.
Try this:
select coalesce(ti.id, tp.id), tp.paid, ti.interest
from (select t.*,
row_number() over (partition by paid) as seqnum
from t
where paid <> 0
) tp full join
(select t.*,
row_number() over (partition by interest) as seqnum
from t
where interest <> 0
) ti
on tp.id = ti.id and tp.seqnum = ti.seqnum;

Count length of consecutive duplicate values for each id

I have a table as shown in the screenshot (first two columns) and I need to create a column like the last one. I'm trying to calculate the length of each sequence of consecutive values for each id.
For this, the last column is required. I played around with
row_number() over (partition by id, value)
but did not have much success, since the circled number was (quite predictably) computed as 2 instead of 1.
Please help!

First of all, we need to have a way to defined how the rows are ordered. For example, in your sample data there is not way to be sure that 'first' row (1, 1) will be always displayed before the 'second' row (1,0).
That's why in my sample data I have added an identity column. In your real case, the details can be order by row ID, date column or something else, but you need to ensure the rows can be sorted via unique criteria.
So, the task is pretty simple:
calculate trigger switch - when value is changed
calculate groups
calculate rows
That's it. I have used common table expression and leave all columns in order to be easy for you to understand the logic. You are free to break this in separate statements and remove some of the columns.
DECLARE #DataSource TABLE
(
[RowID] INT IDENTITY(1, 1)
,[ID]INT
,[value] INT
);
INSERT INTO #DataSource ([ID], [value])
VALUES (1, 1)
,(1, 0)
,(1, 0)
,(1, 1)
,(1, 1)
,(1, 1)
--
,(2, 0)
,(2, 1)
,(2, 0)
,(2, 0);
WITH DataSourceWithSwitch AS
(
SELECT *
,IIF(LAG([value]) OVER (PARTITION BY [ID] ORDER BY [RowID]) = [value], 0, 1) AS [Switch]
FROM #DataSource
), DataSourceWithGroup AS
(
SELECT *
,SUM([Switch]) OVER (PARTITION BY [ID] ORDER BY [RowID]) AS [Group]
FROM DataSourceWithSwitch
)
SELECT *
,ROW_NUMBER() OVER (PARTITION BY [ID], [Group] ORDER BY [RowID]) AS [GroupRowID]
FROM DataSourceWithGroup
ORDER BY [RowID];

You want results that are dependent on actual data ordering in the data source. In SQL you operate on relations, sometimes on ordered set of relations rows. Your desired end result is not well-defined in terms of SQL, unless you introduce an additional column in your source table, over which your data is ordered (e.g. auto-increment or some timestamp column).
Note: this answers the original question and doesn't take into account additional timestamp column mentioned in the comment. I'm not updating my answer since there is already an accepted answer.

One way to solve it could be through a recursive CTE:
create table #tmp (i int identity,id int, value int, rn int);
insert into #tmp (id,value) VALUES
(1,1),(1,0),(1,0),(1,1),(1,1),(1,1),
(2,0),(2,1),(2,0),(2,0);
WITH numbered AS (
SELECT i,id,value, 1 seq FROM #tmp WHERE i=1 UNION ALL
SELECT a.i,a.id,a.value, CASE WHEN a.id=b.id AND a.value=b.value THEN b.seq+1 ELSE 1 END
FROM #tmp a INNER JOIN numbered b ON a.i=b.i+1
)
SELECT * FROM numbered -- OPTION (MAXRECURSION 1000)
This will return the following:
i id value seq
1 1 1 1
2 1 0 1
3 1 0 2
4 1 1 1
5 1 1 2
6 1 1 3
7 2 0 1
8 2 1 1
9 2 0 1
10 2 0 2
See my little demo here: https://rextester.com/ZZEIU93657
A prerequisite for the CTE to work is a sequenced table (e. g. a table with an identitycolumn in it) as a source. In my example I introduced the column i for this. As a starting point I need to find the first entry of the source table. In my case this was the entry with i=1.
For a longer source table you might run into a recursion-limit error as the default for MAXRECURSION is 100. In this case you should uncomment the OPTION setting behind my SELECT clause above. You can either set it to a higher value (like shown) or switch it off completely by setting it to 0.

IMHO, this is easier to do with cursor and loop.
may be there is a way to do the job with selfjoin
declare #t table (id int, val int)
insert into #t (id, val)
select 1 as id, 1 as val
union all select 1, 0
union all select 1, 0
union all select 1, 1
union all select 1, 1
union all select 1, 1
;with cte1 (id , val , num ) as
(
select id, val, row_number() over (ORDER BY (SELECT 1)) as num from #t
)
, cte2 (id, val, num, N) as
(
select id, val, num, 1 from cte1 where num = 1
union all
select t1.id, t1.val, t1.num,
case when t1.id=t2.id and t1.val=t2.val then t2.N + 1 else 1 end
from cte1 t1 inner join cte2 t2 on t1.num = t2.num + 1 where t1.num > 1
)
select * from cte2

Calculating a fields value according to the values of the previous and next fields

For clarity assume that I have a table with a carID, a mileage and a date. The dates are always months (eg 01/02/2015, 01/03/2015, ...). Each carID has a row for each month, but not each row has values for the mileage field, some are NULL.
Example table:
carID mileage date
-----------------------------------------
1 400 01/01/2015
2 NULL 01/02/2015
3 NULL 01/03/2015
4 1050 01/04/2015
If such a field is NULL I need to calculate what value it should have by looking at the previous and next values (these aren't necessarily the next or previous month, they can be months apart).
I want to do this by taking the difference of the previous and next values, then calculate the time between them and make the value accordingly to the time. I have no idea however as how to do this.
I have already used a bit of code to look at the next value before, it looks like this:
, carKMcombiDiffList as (
select ml.*,
(ml.KM - mlprev.KM) as diff
from carKMcombilist ml outer apply
(select top 1 ml2.*
from carKMcombilist ml2
where ml2.FK_CarID = ml.FK_CarID and
ml2.beginmonth < ml.beginmonth
order by ml2.beginmonth desc
) mlprev
)
What this does is check if the current value is larger then the previous value. I assume I can use this as well to check the previous one in my current problem, I just don't know how I can add the next one in it AND all the logic that I need to make the calculations.

Assumption: CarID and date are always a unique combination
This is what i came up with:
select with_dates.*,
prev_mileage.mileage as prev_mileage,
next_mileage.mileage as next_mileage,
next_mileage.mileage - prev_mileage.mileage as mileage_delta,
datediff(month,prev_d,next_d) as month_delta,
(next_mileage.mileage - prev_mileage.mileage)/datediff(month,prev_d,next_d)*datediff(month,prev_d,with_dates.d) + prev_mileage.mileage as estimated_mileage
from (select *,
(select top 1 d
from mileage as prev
where carid = c.carid
and prev.d < c.d
and prev.mileage is not null
order by d desc ) as prev_d,
(select top 1 d
from mileage as next_rec
where carid = c.carid
and next_rec.d > c.d
and next_rec.mileage is not null
order by d asc) as next_d
from mileage as c
where mileage is null) as with_dates
join mileage as prev_mileage
on prev_mileage.carid = with_dates.carid
and prev_mileage.d = with_dates.prev_d
join mileage as next_mileage
on next_mileage.carid = with_dates.carid
and next_mileage.d = with_dates.next_d
Logic:
First, for every mileage is nullrecord i select the previous and next date where mileage is not null. After this i just join the rows based on carid and date and do some simple math to approximate.
Hope this helps, it was quite fun.

The following query obtains the previous and next available mileages for a record.
with data as --test data
(
select * from (VALUES
(0, null, getdate()),
(1, 400, '20150101'),
(1, null, '20150201'),
(1, null, '20150301'),
(1, 1050, '20150401'),
(2, 300, '20150101'),
(2, null, '20150201'),
(2, null, '20150301'),
(2, 1235, '20150401'),
(2, null, '20150501'),
(2, 1450, '20150601'),
(3, 200, '20150101'),
(3, null, '20150201')
) as v(carId, mileage, [date])
where v.carId != 0
)
-- replace 'data' with your table name
select d.*,
(select top 1 mileage from data dprev where dprev.mileage is not null and dprev.carId = d.carId and dprev.[date] <= d.date order by dprev.[date] desc) as 'Prev available mileage',
(select top 1 mileage from data dnext where dnext.mileage is not null and dnext.carId = d.carId and dnext.[date] >= d.date order by dnext.[date] asc) as 'Next available mileage'
from data d
Note that these columns can still be null if there is no data available before/after a specific date.
From here it's up to you on how you use these values. Probably you want to interpolate values for records where mileage is missing.
Edit
In order to interpolate the values for missing mileages I had to compute three auxiliary columns:
ri - index of record in a continuous group where mileage is missing
gi - index of a continuous group where mileage is missing per car
gc - count of records per continuous group where mileage is missing
The limit columns from the query above where renamed to
pa (Previous Available) and
na (Next Available).
The query is not compact and I am sure it can be improved but the good part of the cascading CTEs is that you can easily check intermediary results and understand each step.
SQL Fiddle: SO 29363187
with data as --test data
(
select * from (VALUES
(0, null, getdate()),
(1, 400, '20150101'),
(1, null, '20150201'),
(1, null, '20150301'),
(1, 1050, '20150401'),
(2, 300, '20150101'),
(2, null, '20150201'),
(2, null, '20150301'),
(2, 1235, '20150401'),
(2, null, '20150501'),
(2, 1450, '20150601'),
(3, 200, '20150101'),
(3, null, '20150201')
) as v(carId, mileage, [date])
where v.carId != 0
),
-- replace 'data' with your table name
limits AS
(
select d.*,
(select top 1 mileage from data dprev where dprev.mileage is not null and dprev.carId = d.carId and dprev.[date] <= d.date order by dprev.[date] desc) as pa,
(select top 1 mileage from data dnext where dnext.mileage is not null and dnext.carId = d.carId and dnext.[date] >= d.date order by dnext.[date] asc) as na
from data d
),
t1 as
(
SELECT l.*,
case when mileage is not null
then null
else row_number() over (partition by l.carId, l.pa, l.na order by l.carId, l.[date])
end as ri, -- index of record in a continuous group where mileage is missing
case when mileage is not null
then null
else dense_rank() over (partition by carId order by l.carId, l.pa, l.na)
end as gi -- index of a continuous group where mileage is missing per car
from limits l
),
t2 as
(
select *,
(select count(*) from t1 tm where tm.carId = t.carId and tm.gi = t.gi) gc --count of records per continuous group where mileage is missing
FROM t1 t
)
select *,
case when mileage is NULL
then pa + (na - pa) / (gc + 1.0) * ri -- also converts from integer to decimal
else NULL
end as 'Interpolated value'
from t2
order by carId, [date]

MSSQL ORDER BY Passed List

I am using Lucene to perform queries on a subset of SQL data which returns me a scored list of RecordIDs, e.g. 11,4,5,25,30 .
I want to use this list to retrieve a set of results from the full SQL Table by RecordIDs.
So SELECT * FROM MyFullRecord
where RecordID in (11,5,3,25,30)
I would like the retrieved list to maintain the scored order.
I can do it by using an Order by like so;
ORDER BY (CASE WHEN RecordID = 11 THEN 0
WHEN RecordID = 5 THEN 1
WHEN RecordID = 3 THEN 2
WHEN RecordID = 25 THEN 3
WHEN RecordID = 30 THEN 4
END)
I am concerned with the loading of the server loading especially if I am passing long lists of RecordIDs. Does anyone have experience of this or how can I determine an optimum list length.
Are there any other ways to achieve this functionality in MSSQL?
Roger

You can record your list into a table or table variable with sorting priorities.
And then join your table with this sorting one.
DECLARE TABLE #tSortOrder (RecordID INT, SortOrder INT)
INSERT INTO #tSortOrder (RecordID, SortOrder)
SELECT 11, 1 UNION ALL
SELECT 5, 2 UNION ALL
SELECT 3, 3 UNION ALL
SELECT 25, 4 UNION ALL
SELECT 30, 5
SELECT *
FROM yourTable T
LEFT JOIN #tSortOrder S ON T.RecordID = S.RecordID
ORDER BY S.SortOrder

Instead of creating a searched order by statement, you could create an in memory table to join. It's easier on the eyes and definitely scales better.
SQL Statement
SELECT mfr.*
FROM MyFullRecord mfr
INNER JOIN (
SELECT *
FROM (VALUES (1, 11),
(2, 5),
(3, 3),
(4, 25),
(5, 30)
) q(ID, RecordID)
) q ON q.RecordID = mfr.RecordID
ORDER BY
q.ID
Look here for a fiddle

Something like:
SELECT * FROM MyFullRecord where RecordID in (11,5,3,25,30)
ORDER BY
CHARINDEX(','+CAST(RecordID AS varchar)+',',
','+'11,5,3,25,30'+',')
SQLFiddle demo

Newbie: Cache event-change table with every date

I have a table of items which change status every few weeks. I want to look at an arbitrary day and figure out how many items were in each status.
For example:
tbl_ItemHistory
ItemID
StatusChangeDate
StatusID
Sample data:
1001, 1/1/2010, 1
1001, 4/5/2010, 2
1001, 6/15/2010, 4
1002, 4/1/2010, 1
1002, 6/1/2010, 3
...
So I need to figure out how many items were in each status for a given day. So on 5/1/2010, there was one item (1001) in status 2 and one item in status 1 (1002).
I want to create a cached table every night that has a row for every item and every day of the year so I can show status changes over time in a chart. I don't know much about SQL. I was thinking about using a for loop, but based on some of the creative answers I've seen on the forum, I doubt that's the right way.
I'm using SQL Server 2008R2
I looked around and I think this is similar to this question: https://stackoverflow.com/questions/11183164/show-data-change-over-time-in-a-chart but that one wasn't answered. Is there a way to do these things?

A coworker showed me a cool way to do it so I thought I would contribute it to the community:
declare #test table (ItemID int, StatusChangeDate datetime, StatusId tinyint);
insert #test values
(1001, '1/1/2010', 1),
(1001, '4/5/2010', 2),
(1001, '6/15/2010', 4),
(1002, '4/2/2010', 1),
(1002, '6/1/2010', 3);
with
itzik1(N) as (
select 1 union all select 1 union all
select 1 union all select 1), --4
itzik2(N) as (select 1 from itzik1 i cross join itzik1), --16
itzik3(N) as (select 1 from itzik2 i cross join itzik2), --256
itzik4(N) as (select 1 from itzik3 i cross join itzik3), --65536 (184 years)
tally(N) as (select row_number() over (order by (select null)) from itzik4)
select ItemID, StatusChangeDate, StatusId from(
select
test.ItemID,
dates.StatusChangeDate,
test.StatusId,
row_number() over (
partition by test.ItemId, dates.StatusChangeDate
order by test.StatusChangeDate desc) as rnbr
from #test test
join (
select dateadd(dd, N,
(select min(StatusChangeDate) from #test) --First possible date
) as StatusChangeDate
from tally) dates
on test.StatusChangeDate <= dates.StatusChangeDate
and dates.StatusChangeDate <= getdate()
) result
where rnbr = 1

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

SQL window function to remove multiple values with different criteria - sql

You can try to use not exists with a subquery. Select * from t t1 where not exists ( select 1 from t tt where tt.reason = 'L&B' and t1.statusdate = tt.statusdate ) sqlfiddle

Related

Need two columns in one row in SQL

Count length of consecutive duplicate values for each id

Calculating a fields value according to the values of the previous and next fields

MSSQL ORDER BY Passed List

Newbie: Cache event-change table with every date

Categories

Resources