How to find duplicate sets of values in column SQL

How to find duplicate sets of values in column SQL - sql

I have a database table in SQL Server like this:
+----+--------+
| ID | Number |
+----+--------+
| 1 | 4 |
| 2 | 2 |
| 3 | 6 |
| 4 | 5 |
| 5 | 3 |
| 6 | 2 |
| 7 | 6 |
| 8 | 4 |
| 9 | 5 |
| 10 | 1 |
| 11 | 6 |
| 12 | 4 |
| 13 | 2 |
| 14 | 6 |
+----+--------+
I want to get all values of rows that are the same with last row or last 2 rows or last 3 rows or .... in column Number, and when finding those values, will go on to get the values that appear next and count the number its appearance.
Result output like this:
If the same with the last row:
We see that the number next to 6 in column Number is 4 and 5.
Times appear in column Number of pair 6,4 is 2 and pair 6,5 is 1.
+---------------------+-------------------------+--------------+
| "Condition to find" | "Next Number in column" | Times appear |
+---------------------+-------------------------+--------------+
| 6 | 5 | 1 |
| 6 | 4 | 2 |
+---------------------+-------------------------+--------------+
If the same with the last two rows:
+---------------------+-------------------------+--------------+
| "Condition to find" | "Next Number in column" | Times appear |
+---------------------+-------------------------+--------------+
| 2,6 | 5 | 1 |
| 2,6 | 4 | 1 |
+---------------------+-------------------------+--------------+
If the same with the last 3 rows:
+---------------------+-------------------------+--------------+
| "Condition to find" | "Next Number in column" | Times appear |
+---------------------+-------------------------+--------------+
| 4,2,6 | 5 | 1 |
+---------------------+-------------------------+--------------+
And if the last 4,5,6...rows, find until Times appear returns 0
+---------------------+-------------------------+--------------+
| "Condition to find" | "Next Number in column" | Times appear |
+---------------------+-------------------------+--------------+
| 6,4,2,6 | | |
+---------------------+-------------------------+--------------+
Any idea how to get this. Thank so much!

Here's an answer which uses the 'Lead' function - which (once ordered) takes a value from a certain number of rows ahead.
It converts your table with 1 number, to also include the next 3 numbers on each row.
Then you can join on those columns to get numbers etc.
CREATE TABLE #Src (Id int PRIMARY KEY, Num int)
INSERT INTO #Src (Id, Num) VALUES
( 1, 4),
( 2, 2),
( 3, 6),
( 4, 5),
( 5, 3),
( 6, 2),
( 7, 6),
( 8, 4),
( 9, 5),
(10, 1),
(11, 6),
(12, 4),
(13, 2),
(14, 6)
CREATE TABLE #SrcWithNext (Id int PRIMARY KEY, Num int, Next1 int, Next2 int, Next3 int)
-- First step - use LEAD to get the next 1, 2, 3 values
INSERT INTO #SrcWithNext (Id, Num, Next1, Next2, Next3)
SELECT ID, Num,
LEAD(Num, 1, NULL) OVER (ORDER BY Id) AS Next1,
LEAD(Num, 2, NULL) OVER (ORDER BY Id) AS Next2,
LEAD(Num, 3, NULL) OVER (ORDER BY Id) AS Next3
FROM #Src
SELECT * FROM #SrcWithNext
/* Find number with each combination */
-- 2 chars
SELECT A.Num, A.Next1, COUNT(*) AS Num_Instances
FROM (SELECT DISTINCT Num, Next1 FROM #SrcWithNext) AS A
INNER JOIN #SrcWithNext AS B ON A.Num = B.Num AND A.Next1 = B.Next1
WHERE A.Num <= B.Num
GROUP BY A.Num, A.Next1
ORDER BY A.Num, A.Next1
-- 3 chars
SELECT A.Num, A.Next1, A.Next2, COUNT(*) AS Num_Instances
FROM (SELECT DISTINCT Num, Next1, Next2 FROM #SrcWithNext) AS A
INNER JOIN #SrcWithNext AS B
ON A.Num = B.Num
AND A.Next1 = B.Next1
AND A.Next2 = B.Next2
WHERE A.Num <= B.Num
GROUP BY A.Num, A.Next1, A.Next2
ORDER BY A.Num, A.Next1, A.Next2
-- 4 chars
SELECT A.Num, A.Next1, A.Next2, A.Next3, COUNT(*) AS Num_Instances
FROM (SELECT DISTINCT Num, Next1, Next2, Next3 FROM #SrcWithNext) AS A
INNER JOIN #SrcWithNext AS B
ON A.Num = B.Num
AND A.Next1 = B.Next1
AND A.Next2 = B.Next2
AND A.Next3 = B.Next3
WHERE A.Num <= B.Num
GROUP BY A.Num, A.Next1, A.Next2, A.Next3
ORDER BY A.Num, A.Next1, A.Next2, A.Next3
Here's a db<>fiddle to check.
Notes
The A.Num <= B.Num means it finds all matches to itself, and then only counts others once
This answer finds all combinations. To filter, it currently would need to filter as separate columns e.g., instead of 2,6, you'd filter on Num = 2 AND Next1 = 6. Feel free to then do various text/string concatenation functions to create references for your preferred search/filter approach.

Hmmm . . . I am thinking that you want to create the "pattern to find" as a string. Unfortunately, string_agg() is not a windowing function, but you can use apply:
select t.*, p.*
from t cross apply
(select string_agg(number, ',') within group (order by id) as pattern
from (select top (3) t2.*
from t t2
where t2.id <= t.id
order by t2.id desc
) t2
) p;
You would change the "3" to whatever number of rows that you want.
Then you can use this to identify the rows where the patterns are matched and aggregate:
with tp as (
select t.*, p.*
from t cross apply
(select string_agg(number, ',') within group (order by id) as pattern
from (select top (3) t2.*
from t t2
where t2.id <= t.id
order by t2.id desc
) t2
) p
)
select pattern_to_find, next_number, count(*)
from (select tp.*,
first_value(pattern) over (order by id desc) as pattern_to_find,
lead(number) over (order by id) as next_number
from tp
) tp
where pattern = pattern_to_find
group by pattern_to_find, next_number;
Here is a db<>fiddle.
If you are using an older version of SQL Server -- one that doesn't support string_agg() -- you can calculate the pattern using lag():
with tp as (
select t.*,
concat(lag(number, 2) over (order by id), ',',
lag(number, 1) over (order by id), ',',
number
) as pattern
from t
)
Actually, if you have a large amount of data, it would be interesting to know which is faster -- the apply version or the lag() version. I suspect that lag() might be faster.
EDIT:
In unsupported versions of SQL Server, you can get the pattern using:
select t.*, p.*
from t cross apply
(select (select cast(number as varchar(255)) + ','
from (select top (3) t2.*
from t t2
where t2.id <= t.id
order by t2.id desc
) t2
order by t2.id desc
for xml path ('')
) as pattern
) p
You can use similar logic for lead().

I tried to solve this problem by converting "Number" column to a string.
Here is my code using a function with input of "number of last selected rows":
(Be careful that the name of the main table is "test" )
create function duplicate(#nlast int)
returns #temp table (RowNumbers varchar(20), Number varchar(1))
as
begin
declare #num varchar(20)
set #num=''
declare #count int=1
while #count <= (select count(id) from test)
begin
set #num = #num + cast((select Number from test where #count=ID) as varchar(20))
set #count=#count+1
end
declare #lastnum varchar(20)
set #lastnum= (select RIGHT(#num,#nlast))
declare #count2 int=1
while #count2 <= len(#num)-#nlast
begin
if (SUBSTRING(#num,#count2,#nlast) = #lastnum)
begin
insert into #temp
select #lastnum ,SUBSTRING(#num,#count2+#nlast,1)
end
set #count2=#count2+1
end
return
end
go
select RowNumbers AS "Condition to find", Number AS "Next Number in column" , COUNT(Number) AS "Times appear" from dbo.duplicate(2)
group by Number, RowNumbers

Related

SQL Pivot 2nd step -- Combine rows / Remove nulls

I am trying to pivot data. We start with two columns: ListNum and Value
The row ?integrity? doesn't matter, I just want all the values to collapse upwards and remove the nulls.
In this case ListNum is like an enum, the values are limited to List1, List2, or List3. Notice that they are not in order (1,3,3,1,2 rather than 1,2,3,1,2,3 etc.).
It would be nice to have a solution that uses standard sql so it would work across many databases.
Starting Point:
+---------+------------+
| ListNum | Value |
+---------+------------+
| List1 | A |
| List3 | 123 |
| List3 | CDE |
| List1 | Somestring |
| List2 | randString |
+---------+------------+
I was able to separate the Lists into columns with:
select
case when ListNum = "List1" then Value end as List1,
case when ListNum = "List2" then Value end as List2,
case when ListNum = "List3" then Value end as List3
from Table;
Midpoint:
+------------+------------+-------+
| List1 | List2 | List3 |
+------------+------------+-------+
| A | NULL | NULL |
| NULL | NULL | 123 |
| NULL | NULL | CDE |
| Somestring | NULL | NULL |
| NULL | randString | NULL |
+------------+------------+-------+
but now I need to collapse upwards/remove the nulls to get -
Desired Output:
+------------+------------+-------+
| List1 | List2 | List3 |
+------------+------------+-------+
| A | randString | 123 |
| Somestring | NULL | CDE |
+------------+------------+-------+

Aren't you missing some kind of grouping criterion? How do you determine, that A belongs to 123 and not to CDE? And why is randString in the first line and not in the second?
This is easy, with such a grouping key:
DECLARE #tbl TABLE(GroupingKey INT, ListNum VARCHAR(100),[Value] VARCHAR(100));
INSERT INTO #tbl VALUES
(1,'List1','A')
,(1,'List3','123')
,(2,'List3','CDE')
,(2,'List1','Somestring')
,(1,'List2','randString');
SELECT p.*
FROM #tbl
PIVOT
(
MAX([Value]) FOR ListNum IN(List1,List2,List3)
) p;
But with your data this seems rather random...
UPDATE: A random approach...
The following approach will sort the values into their columns rather randomly:
DECLARE #tbl TABLE(ListNum VARCHAR(100),[Value] VARCHAR(100));
INSERT INTO #tbl VALUES
('List1','A')
,('List3','123')
,('List3','CDE')
,('List1','Somestring')
,('List2','randString');
--This will use three independant, but numbered sets and join them:
WITH All1 AS (SELECT [Value],ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RandomNumber FROM #tbl WHERE ListNum='List1')
,All2 AS (SELECT [Value],ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RandomNumber FROM #tbl WHERE ListNum='List2')
,All3 AS (SELECT [Value],ROW_NUMBER() OVER(ORDER BY (SELECT NULL)) AS RandomNumber FROM #tbl WHERE ListNum='List3')
SELECT All1.[Value] AS List1
,All2.[Value] AS List2
,All3.[Value] AS List3
FROM All1
FULL OUTER JOIN All2 ON All1.RandomNumber=All2.RandomNumber
FULL OUTER JOIN All3 ON All1.RandomNumber=All3.RandomNumber ;
Hint: There is no implicit sort order in your table!
From your comment:
It’s simply the index / instance number. randString is the first non-null row.
Without a specific ORDER BY the same SELECT may return your data in any random order. So there is no first non-null row, at least not in the meaning of first comes before second...

Something with recursive CTE may work:
DECLARE #tbl TABLE( ListNum VARCHAR(100),[Value] VARCHAR(100));
INSERT INTO #tbl VALUES
( 'List1','A')
,( 'List3','123')
,( 'List3','CDE')
,( 'List1','Somestring')
,( 'List2','randString');
DECLARE #mmax int;
SELECT #mmax = cnt from (SELECT TOP 1 count(*) cnt from #tbl group by ListNum ORDER BY count(*) DESC) t;
With rec AS (
SELECT 1 AS num
UNION ALL
SELECT num+1 FROM rec WHERE num+1<=#mmax
)
SELECT t1.List1, t2.List2, t3.List3 FROM rec
FULL JOIN (
select Value as List1, row_number() over(order by ListNum) rn from #tbl where ListNum = 'List1'
) t1
ON rec.num = t1.rn
FULL JOIN
(
select Value as List2, row_number() over(order by ListNum) rn from #tbl where ListNum = 'List2'
) t2
ON rec.num = t2.rn
FULL JOIN
(
select Value as List3, row_number() over(order by ListNum) rn from #tbl where ListNum = 'List3'
) t3
ON rec.num = t3.rn;
DEMO

Ordinarily you'd use an aggregating operation such as MAX because it hides nulls (a null can never be the max unless there is no other valid value in the group).. but your query is a bit odd because you don't seem to have a solid pivot anchor, and you're allowing any of your data to become associated with anything else. In the real world this probably wouldn't happen because it isn't particularly useful
Better example data:
Person, Attribute, Value
1, Name, John
1, Age, 10
2, Name, Sarah
3 Age, 39
Pivoting query:
SELECT
Person,
MAX(case when attribute = 'name' then value end) as name,
MAX(case when attribute = 'age' then value end) as age
FROM
data
GROUP BY person
Result:
Person, Name, Age
1, John, 10
2, Sarah, NULL
3, NULL, 39

SQL Server retrieving multiple columns with rank 1

I will try to describe my issue as clearly as possible.
I have a dataset of unique 1000 clients, say ##temp1.
I have another dataset which holds the information related to the 1000 clients from ##temp1 across the past 7 years. Lets call this dataset ##temp2. There are 6 specific columns in this second dataset (##temp2) that I am interested in, lets call them column A, B, C, D, E, F. Just for information, the information that columns A, C, E hold is year of some form in data type float (2012, 2013, 2014..) and the information that columns B, D, F hold is ratings of some form (1,2,3,..upto 5) in data type float. Both year and ratings columns have NULL values which I have converted to 0 for now.
My eventual goal is to create a report which holds the information for the 1000 clients in ##temp1, such that each row should hold the information in the following form,
ClientID | ClientName | ColA_Latest_Year1 | ColB_Corresponding_Rating_Year_1 | ColC_Latest_Year2 | ColD_Corresponding_Rating_Year_2 | ColE_Latest_Year3 | ColF_Corresponding_Rating_Year3.
ColA_Latest_Year1 should hold the latest year for that particular client from dataset ##temp2 and ColB_Corresponding_Rating_Year_1 should hold the rating from Column B corresponding to the year pulled in from Column A. Same goes for the other columns.
The approach which I have taken so far, was,
Create ##temp1 as needed
Create ##temp2 as needed
##temp1 LEFT JOIN ##temp2 on client ids to retrieve the year and rating information for all the clients in ##temp1 and put all that information in ##temp3. There will be multiple rows for every client in ##temp3 because the data in ##temp3 is for multiple years.
Ranked the year column (B,D,F) partition by client_ids and put in in ##temp4,
What I have now is something like this,
Rnk_A | Rnk_C | Rnk_F | ColA | ColB | ColC | ColD | ColE | ColF | Client_id | Client_name
2 | 1 | 1 | 0 | 0 | 0 | 0 | 2014 | 1 | 111 | 'ABC'
1 | 2 | 1 | 2012 | 1 | 0 | 0 | 0 | 0 | 111 | 'ABC'
My goal is
Rnk_A | Rnk_C | Rnk_F | ColA | ColB | ColC | ColD | ColE | ColF | Client_id | Client_name
1 | 1 | 1 | 2012| 1 | 0 | 0 | 2014| 1 | 111 | 'ABC'
Any help is appreciated.

This answer assumes you don't have any duplicates per client in columns A, C, E. If you do have duplicates you'd need to find a way to differentiate them and make the necessary changes.
The hurdle that you've failed to overcome in your attempt (as described) is that you're trying to join from temp1 to temp2 only once for lookup information that could come from 3 distinct rows of temp2. This cannot work as you hope. You must perform separate joins for each pair [A,B] [C,D] and [E,F]. The following demonstrates a solution using CTEs to derive the lookup data for each pair.
/********* Prepare sample tables and data ***********/
declare #t1 table (
ClientId int,
ClientName varchar(50)
)
declare #t2 table (
ClientId int,
ColA datetime,
ColB float,
ColC datetime,
ColD float,
ColE datetime,
ColF float
)
insert into #t1
select 1, 'Client 1' union all
select 2, 'Client 2' union all
select 3, 'Client 3' union all
select 4, 'Client 4'
insert into #t2
select 1, '20001011', 1, '20010101', 7, '20130101', 14 union all
select 1, '20040101', 4, '20170101', 1, '20120101', 1 union all
select 1, '20051231', 0, '20020101', 15, '20110101', 1 union all
select 2, '20060101', 2, NULL, 15, '20110101', NULL union all
select 2, '20030101', 3, NULL, NULL, '20100101', 17 union all
select 3, NULL, NULL, '20170101', 42, NULL, NULL
--select * from #t1
--select * from #t2
/********* Solution ***********/
;with MaxA as (
select ROW_NUMBER() OVER (PARTITION BY t2.ClientId ORDER BY t2.ColA DESC) rn,
t2.ClientId, t2.ColA, t2.ColB
from #t2 t2
--where t2.ColA is not null and t2.ColB is not null
), MaxC as (
select ROW_NUMBER() OVER (PARTITION BY t2.ClientId ORDER BY t2.ColC DESC) rn,
t2.ClientId, t2.ColC, t2.ColD
from #t2 t2
--where t2.ColC is not null and t2.ColD is not null
), MaxE as (
select ROW_NUMBER() OVER (PARTITION BY t2.ClientId ORDER BY t2.ColE DESC) rn,
t2.ClientId, t2.ColE, t2.ColF
from #t2 t2
--where t2.ColE is not null and t2.ColF is not null
)
select t1.ClientId, t1.ClientName, a.ColA, a.ColB, c.ColC, c.ColD, e.ColE, e.ColF
from #t1 t1
left join MaxA a on
a.ClientId = t1.ClientId
and a.rn = 1
left join MaxC c on
c.ClientId = t1.ClientId
and c.rn = 1
left join MaxE e on
e.ClientId = t1.ClientId
and e.rn = 1
If you run this you may notice some peculiar results for Client 2 in columns C and F. This is because (as per your question) there may be some NULL values. ColC date is "unknown" and ColF rating is "unknown".
My solution preserves NULL values instead of converting them to zeroes. This allows you to handle them explicitly if you so choose. I commented out lines in the above query that could be used to ignore NULL dates and ratings if necessary.

How to order an already ordered subquery

Creating this table:
CREATE TABLE #Test (id int, name char(10), list int, priority int)
INSERT INTO #Test VALUES (1, 'One', 1, 1)
INSERT INTO #Test VALUES (2, 'Two', 2, 1)
INSERT INTO #Test VALUES (3, 'Three', 3, 2)
INSERT INTO #Test VALUES (4, 'Four', 4, 1)
INSERT INTO #Test VALUES (5, 'THREE', 3, 1)
and ordering it by, list and priority:
SELECT * FROM #Test ORDER BY list, priority
1 | One | 1 | 1
2 | Two | 2 | 1
5 | THREE | 3 | 1
3 | Three | 3 | 2
4 | Four | 4 | 1
However I want to step through rows one by one selecting the top one for each list ordered by priority, and start over when I get to the end.
For example with this simpler table:
1 | One | 1 | 1
2 | Two | 2 | 1
3 | Three | 3 | 1
4 | Four | 4 | 1
and this query:
SELECT TOP 1 * FROM #Test ORDER BY (CASE WHEN list>#PreviousList THEN 1 ELSE 2 END)
If #PreviousList is the list for the previous row I got, then this will select the next row and gracefully jump to the top when I have selected the last row.
But there are rows that will have the same list only ordered by priority - like my first example:
1 | One | 1 | 1
2 | Two | 2 | 1
5 | THREE | 3 | 1
3 | Three | 3 | 2
4 | Four | 4 | 1
Here id=3 should be skipped because id=5 have the same list ordering and a better priority. The only way I can think of doing this is simply by first order the entire list by list and priority, and then run the order by that goes through the rows one by one, like this:
SELECT TOP 1 * FROM (
SELECT * FROM #Test ORDER BY list, priority
) ORDER BY (CASE WHEN list>#PreviousList THEN 1 ELSE 2 END)
But of course I cannot order by an already ordered subquery and get the error:
The ORDER BY clause is invalid in views, inline functions, derived tables,
subqueries, and common table expressions, unless TOP or FOR XML is also
specified.
Are there any ways and can get past this problem or get the query down to a single query and order by?

Another possible solution is to use a subquery to select the min priority grouped by list and join it back to the table for the rest of the details
SELECT T2.*
FROM (SELECT MIN(priority) as priority, list
FROM #Test
GROUP BY list) AS T1
INNER JOIN #Test T2 ON T1.list = T2.list AND T1.priority = T2.priority
ORDER BY T1.list, T1.priority

I want to step through rows one by one selecting the top one for each
list ordered by priority, and start over when I get to the end.
You can use the built in ROW_NUMBER function that is designed for these scenarios with OVER(PARTITION BY name ORDER BY priority) to do this directly:
WITH CTE
AS
(
SELECT *, ROW_NUMBER() OVER(PARTITION BY name ORDER BY priority) AS RN
FROM #Test
)
SELECT *
FROM CTE
WHERE RN = 1;
Live DEMO
The ranking number rn generated by ROW_NUMBER() OVER(PARTITION BY name ORDER BY priority) will rank each group of rows that has the same name ordered by priority then when you filtered by WHERE rn = 1 it will remove all the duplicate with the same name and left only the first priority.

SELECT TOP 1 * FROM (
SELECT * FROM #Test
) ORDER BY (CASE WHEN list>#PreviousList THEN 1 ELSE 2 END)
Try this, because Order By is not allowed in CTE.

Perhaps I am missing the requirement that makes this harder than I realize, but what about a nice simple join to select highest priority for the list. To scale, performance would require an index on list.
select t.*
, ttop.id as firstid
from #test t
JOIN #test ttop on ttop.id = (SELECT TOP 1 ID
FROM #TEST tbest
WHERE t.list = tbest.list order by priority)
and ttop.id = t.id -- this does the trick!

SELECT First Group

Problem Definition
I have an SQL query that looks like:
SELECT *
FROM table
WHERE criteria = 1
ORDER BY group;
Result
I get:
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1
B | 2 | 1
B | 3 | 1
Expected Result
However, I would like to limit the results to only the first group (in this instance, A). ie,
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1
What I've tried
Group By
SELECT *
FROM table
WHERE criteria = 1
GROUP BY group;
I can aggregate the groups using a GROUP BY clause, but that would give me:
group | value
-------------
A | 0
B | 2
or some aggregate function of EACH group. However, I don't want to aggregate the rows!
Subquery
I can also specify the group by subquery:
SELECT *
FROM table
WHERE criteria = 1 AND
group = (
SELECT group
FROM table
WHERE criteria = 1
ORDER BY group ASC
LIMIT 1
);
This works, but as always, subqueries are messy. Particularly, this one requires specifying my WHERE clause for criteria twice. Surely there must be a cleaner way to do this.

You can try following query:-
SELECT *
FROM table
WHERE criteria = 1
AND group = (SELECT MIN(group) FROM table)
ORDER BY value;

If your database supports the WITH clause, try this. It's similar to using a subquery, but you only need to specify the criteria input once. It's also easier to understand what's going on.
with main_query as (
select *
from table
where criteria = 1
order by group, value
),
with min_group as (
select min(group) from main_query
)
select *
from main_query
where group in (select group from min_group);
-- this where clause should be fast since there will only be 1 record in min_group

Use DENSE_RANK()
DECLARE #yourTbl AS TABLE (
[group] NVARCHAR(50),
value INT,
criteria INT
)
INSERT INTO #yourTbl VALUES ( 'A', 0, 1 )
INSERT INTO #yourTbl VALUES ( 'A', 1, 1 )
INSERT INTO #yourTbl VALUES ( 'B', 2, 1 )
INSERT INTO #yourTbl VALUES ( 'B', 3, 1 )
;WITH cte AS
(
SELECT i.* ,
DENSE_RANK() OVER (ORDER BY i.[group]) AS gn
FROM #yourTbl AS i
WHERE i.criteria = 1
)
SELECT *
FROM cte
WHERE gn = 1
group | value | criteria
------------------------
A | 0 | 1
A | 1 | 1

MSSQL: Only last entry in GROUP BY (with id)

Following / copying computhomas's question, but adding some twists...
I have the following table in MSSQL2008
id | business_key | result | date
1 | 1 | 0 | 9
2 | 1 | 1 | 8
3 | 2 | 1 | 7
4 | 3 | n | 6
5 | 4 | 1 | 5
6 | 4 | 0 | 4
And now i want to group based on the business_key returning the complete entry with the newest date.
So my expected result is:
id | business_key | result | date
1 | 1 | 0 | 9
3 | 2 | 1 | 7
4 | 3 | n | 6
5 | 4 | 1 | 5
I also bet that there is a way to achieve that, i just can't find / see / think of it at the moment.
edit: sorry about this, I actually meant something else from original question I did. I felt like editing this might be better than accepting a solution and making another question. my original problem was that I am not filtering by id.

SELECT t.*
FROM
(
SELECT *, ROW_NUMBER() OVER
(
PARTITION BY [business_key]
ORDER BY [date] DESC
) AS [RowNum]
FROM yourTable
) AS t
WHERE t.[RowNum] = 1

SELECT
*
FROM
mytable
WHERE
ID IN (SELECT MAX(ID) FROM mytable GROUP BY business_key)

SELECT
MAX(T1.id) AS [id],
T1.business_key,
T1.result
FROM
dbo.My_Table T1
LEFT OUTER JOIN dbo.My_Table T2 ON
T2.business_key = T1.business_key AND
T2.id > T1.id
WHERE
T2.id IS NULL
GROUP BY T1.business_key,
T1.result
ORDER BY MAX(T1.id)
Edited based on clarifications
SELECT M1.*
FROM My_Table M1
INNER JOIN
(
SELECT [business_key], MAX([date]) as MaxDate
FROM My_Table
GROUP BY [business_key]
) M2 ON M1.business_key = M2.business_key AND M1.[date] = M2.MaxDate
ORDER BY M1.[id]

Assuming the combination of business_key & date is unique then....
Working example (3rd time is a charm):
declare #src as table(id int, business_key int,result int,[date] int)
insert into #src
SELECT 1,1,0,9
UNION SELECT 2,1,1,8
UNION SELECT 3,2,1,7
UNION SELECT 4,3,1,6
UNION SELECT 5,4,1,5
UNION SELECT 6,4,0,4
;with bkdate(business_key,[date])
AS
(
select business_key,MAX([date])
from #src
group by business_key
)
select src.* from #src src
inner join bkdate
ON src.[date] = bkdate.date
and src.business_key = bkdate.business_key
order by id

How about (edited after question change):
with latestdate as (
select business_key, maxdate=max(date)
from the_table
group by business_key
), latest as (
select ID = max(id)
from the_table
inner join latestdate
on the_table.business_key=latestdate.business_key
and the_table.date=latestdate.maxdate
group by the_table.business_key
)
select the_table.*
from the_table
inner join latest
on latest.id=the_table.id

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

How to find duplicate sets of values in column SQL - sql

Related

SQL Pivot 2nd step -- Combine rows / Remove nulls

SQL Server retrieving multiple columns with rank 1

How to order an already ordered subquery

SELECT First Group

MSSQL: Only last entry in GROUP BY (with id)

Categories

Resources