SQL: Order by date and distinct Number without losing fields - sql

My knowledge about SQL is not the best but I need a quick solution for this. I have a table
number | date | text
1 | 2018-01-13 | A
2 | 2018-01-15 | B
1 | 2018-02-15 | C
Now I need to remove the duplicate value "number" in the output(select) based on the date. It should look like this:
number | date | text
2 | 2018-01-15 | B
1 | 2018-02-15 | C
I tried
SELECT DISTINCT number, date ORDER BY date DESC FROM table
The problem is that I now miss the field "text" in the output. I also tried
SELECT * DISTINCT(SELECT * number ORDER BY(date) DESC) FROM table
Any ideas?

One option uses ROW_NUMBER:
SELECT number, date, text
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY number ORDER BY date DESC) rn
FROM yourTable
) t
WHERE rn = 1;

You can achieve it using below query:
SELECT t1.number, t1.date, t1.text
FROM yourTable t1
INNER JOIN
(
SELECT number, MAX(date) AS max_date
FROM yourTable
GROUP BY number
) t2
ON t1.number = t2.number AND
t1.date = t2.max_date;

use below query this will work for ur requirement
SELECT number, date, text
FROM
(
SELECT *, ROW_NUMBER() OVER (PARTITION BY number ORDER BY date DESC) rn
FROM test
) t
WHERE rn = 1;

It can be done just using window functions:
create table #TEST ([NUMBER] int, [DATE] date, [TEXT] char(1))
insert into #TEST values (1, '2018-01-13', 'A'), (2, '2018-01-15', 'B'), (1, '2018-02-15', 'C')
SELECT DISTINCT [NUMBER]
, FIRST_VALUE([DATE]) OVER (PARTITION BY [NUMBER] ORDER BY [DATE] DESC) AS [DATE]
, FIRST_VALUE([TEXT]) OVER (PARTITION BY [NUMBER] ORDER BY [DATE] DESC) AS [TEXT]
FROM #TEST

Related

U-SQL Is it accepted to use more than one FIRST_VALUE to remove duplicates in specific columns?

I have a table where multiple rows are duplicates because of two date columns which their value are different.
I want to know if it is accepted to use FIRST_VALUE in both columns like this, to remove duplicate on specified columns:
SELECT
EmployeeName,
FIRST_VALUE(StartDateTime) OVER(ORDER BY UpdatedDateTime DESC) AS StartDateTime,
FIRST_VALUE(UpdatedDateTime) OVER(ORDER BY UpdatedDateTime DESC) AS UpdatedDateTime
FROM #Employees;
If you need to remove duplicates over some field, you need to use ROW_NUMBER() and CTE:
-- Sample data: dates duplicates
declare #t table (id int, dt date);
insert into #t values
(1, '2018-01-14'),
(1, '2018-01-14'),
(1, '2018-01-15'),
(1, '2018-01-15');
with cte as (
select *,
-- assign row number for each partition consisting of same date
row_number() over (partition by dt order by dt) as cnt
from #t
)
-- we're interested only in one row (i.e. first)
select id, dt from cte where cnt = 1;
/*
Output:
+-------+---------+
| id | dt |
+----+------------+
| 1 | 2018-01-14 |
|----|------------|
| 1 | 2018-01-15 |
+----+------------+
*/

Is it possible to find (in an ordered table) multiple rows in sequence? [duplicate]

This question already has an answer here:
Compare Current Row with Previous/Next row in SQL Server
(1 answer)
Closed 4 years ago.
If I have a table ordered by ID like so:
|---------------------|------------------|
| ID | Key |
|---------------------|------------------|
| 1 | Foo |
|---------------------|------------------|
| 2 | Bar |
|---------------------|------------------|
| 3 | Test |
|---------------------|------------------|
| 4 | Test |
|---------------------|------------------|
Is there a way to detect two rows that match a where clause in sequence?
For example, in the table above, I would like to see if any two rows in succession have a Key of 'test'.
Is this possible in SQL?
Another option is a variation of Gaps-and-Islands
Example
Declare #YourTable Table ([ID] int,[Key] varchar(50))
Insert Into #YourTable Values
(1,'Foo')
,(2,'Bar')
,(3,'Test')
,(4,'Test')
Select ID_R1 = min(ID)
,ID_R2 = max(ID)
,[Key]
From (
Select *
,Grp = ID-Row_Number() over(Partition By [Key] Order by ID)
From #YourTable
) A
Group By [Key],Grp
Having count(*)>1
Returns
ID_R1 ID_R2 Key
3 4 Test
EDIT - Just in case the IDs are NOT Sequential
Select ID_R1 = min(ID)
,ID_R2 = max(ID)
,[Key]
From (
Select *
,Grp = Row_Number() over(Order by ID)
-Row_Number() over(Partition By [Key] Order by ID)
From #YourTable
) A
Group By [key],Grp
Having count(*)>1
You can try to use ROW_NUMBER window function check the gap.
SELECT [Key]
FROM (
SELECT *,ROW_NUMBER() OVER(ORDER BY ID) -
ROW_NUMBER() OVER(PARTITION BY [Key] ORDER BY ID) grp
FROM T
)t1
GROUP BY [Key]
HAVING COUNT(grp) = 2
You can do a self join as
CREATE TABLE T(
ID INT,
[Key] VARCHAR(45)
);
INSERT INTO T VALUES
(1, 'Foo'),
(2, 'Bar'),
(3, 'Test'),
(4, 'Test');
SELECT MIN(T1.ID) One,
MAX(T2.ID) Two,
T1.[Key] OnKey
FROM T T1 JOIN T T2
ON T1.[Key] = T2.[Key]
AND
T1.ID <> T2.ID
GROUP BY T1.[Key];
Or a CROSS JOIN as
SELECT MIN(T1.ID) One,
MAX(T2.ID) Two,
T1.[Key] OnKey
FROM T T1 CROSS JOIN T T2
WHERE T1.[Key] = T2.[Key]
AND
T1.ID <> T2.ID
GROUP BY T1.[Key]
Demo
You can use the LEAD() window function, as in:
with
x as (
select
id, [key],
lead(id) over(order by id) as next_id,
lead([key]) over(order by id) as next_key
from my_table
)
select id, next_id from x where [key] = 'test' and next_key = 'test'

How to get rows from two tables on maximum value of particular field

I have two tables that has date_updated column.
TableA is like below
con_id date_updated type
--------------------------------------------
123 19/06/2018 2
123 15/06/2018 1
123 01/05/2018 3
101 06/04/2018 1
101 05/03/2018 2
And I have TableB that also has the same structure
con_id date_updated type
--------------------------------------------
123 15/05/2018 2
123 01/05/2018 1
101 07/06/2018 1
The resultant table should have the data with the recent date
con_id date_updated type
--------------------------------------------
123 19/06/2018 2
101 07/06/2018 1
Here the date_updated column is datetime datatype of sql server. I tried this by using group by and selecting the maximum date_updated. But i am not able to include column type in select statement. When i used type in group by ,the result is not correct as the type is also grouped. How can i query this. Please help
SELECT *
FROM
(SELECT *, ROW_NUMBER() OVER(Partition By con_id ORDER BY date_updated DESC) as seq
FROM
(SELECT * FROM TableA
UNION ALL
SELECT * FROM TableB) as tblMain) as tbl2
WHERE seq = 1
One method:
WITH A AS(
SELECT TOP 1 con_id,
date_updated,
type
FROM TableA
ORDER BY date_updated DESC),
B AS(
SELECT TOP 1 con_id,
date_updated,
type
FROM TableB
ORDER BY date_updated DESC),
U AS(
SELECT *
FROM A
UNION ALL
SELECT *
FROM B)
SELECT *
FROM U;
The 2 CTE's at the top get your most recent rows from the tables, and then the end statement unions them together.
For the benefit of the person who says this doesn't work:
USE Sandbox;
GO
CREATE TABLE tablea (con_id int, date_updated date, [type] tinyint);
CREATE TABLE tableb (con_id int, date_updated date, [type] tinyint);
GO
INSERT INTO tablea
VALUES
(123,'19/06/2018',2),
(123,'15/06/2018',1),
(123,'01/05/2018',3),
(101,'06/04/2018',1),
(101,'05/03/2018',2);
INSERT INTO tableb
VALUES
(123,'15/05/2018',2),
(123,'01/05/2018',1),
(101,'07/06/2018',1);
GO
WITH A AS(
SELECT TOP 1 con_id,
date_updated,
[type]
FROM TableA
ORDER BY date_updated DESC),
B AS(
SELECT TOP 1 con_id,
date_updated,
[type]
FROM TableB
ORDER BY date_updated DESC),
U AS(
SELECT *
FROM A
UNION ALL
SELECT *
FROM B)
SELECT *
FROM U;
GO
DROP TABLE tablea;
DROP TABLE tableb;
This returns the dataset:
con_id date_updated type
----------- ------------ ----
123 2018-06-19 2
101 2018-06-07 1
Which is identical to the OP's data:
con_id date_updated type
--------------------------------------------
123 19/06/2018 2
101 07/06/2018 1
Hope this helps:
WITH combined
AS(
select * FROM tableA
UNION
select * FROM tableB)
SELECT t1.con_id,
t1.date_updated,
t1.type
FROM (
SELECT con_id,
date_updated,
type,
row_number() OVER(partition BY con_id ORDER BY date_updated DESC) AS rownumber
FROM combined) t1
WHERE rownumber = 1;
Can be done using window functions:
declare #TableA table (con_id int, date_updated date, [type] int)
declare #TableB table (con_id int, date_updated date, [type] int)
insert into #TableA values
(123, '2018-06-19', 2)
, (123, '2018-06-15', 1)
, (123, '2018-05-01', 3)
, (101, '2018-04-06', 1)
, (101, '2018-03-05', 2)
insert into #TableB values
(123, '2018-05-15', 2)
, (123, '2018-05-01', 1)
, (101, '2018-06-07', 1)
select distinct con_id
, first_value(date_updated) over (partition by con_id order by con_id, date_updated desc) as con_id
, first_value([type]) over (partition by con_id order by con_id, date_updated desc) as [type]
from
(Select * from #TableA UNION Select * from #TableB) x

How to do an inner join to get rid of dupes

How can I use an inner join to get rid of the dupes that I dont want?
The table I'm working on looks like this:
ID Edited_date Status
------------------------
1 1/1/2015 A
1 1/1/2016 B
1 2/1/2016 C
2 1/1/2017 D
2 3/1/2017 B
3 1/1/2016 C
3 4/1/2017 B
3 1/1/2014 D
However, I only want the status of each loan from the most recent edited_date
ID Edited_date Status
------------------------
1 2/1/2016 C
2 3/1/2017 B
3 4/1/2017 B
select * from [table] t1
inner join
(
select ID, max(Edited_date) maxDt
from [Table]
group by ID
) t2
on t1.ID = t2.ID
and t1.Edited_date = t2.maxDt;
For select only:
SELECT *
FROM
(
SELECT *, ROW_NUMBER()OVER(PARTITION BY ID ORDER BY Edited_date desc) as Indicator
FROM TABLE_NAME
) as ABC
WHERE ABC.Indicator = 1
For delete:
WITH ABC
AS
(
SELECT *, ROW_NUMBER()OVER(PARTITION BY ID ORDER BY Edited_date desc) as Indicator
FROM TABLE_NAME
)
DELETE FROM ABC
WHERE ABC.Indicator != 1
using row_number() partitioned by id to get the latest edited_date
select id, edited_date, status
from (
select *
, rn = row_number() over (partition by id order by edited_date desc)
from t
) as s
where rn = 1
top with ties version:
select top 1 with ties
id
, edited_date
, status
from t
order by row_number() over (partition by id order by edited_date desc)
Begin Transaction
Create table #temp (Id int, Edited_date date, Status char(2))
Insert into #temp
Values
('1','1/1/2015','A'),
('1','1/1/2016','B'),
('1','2/1/2016','C'),
('2','1/1/2017','D'),
('2','3/1/2017','B'),
('3','1/1/2016','C'),
('3','4/1/2017','B'),
('3','1/1/2014','D')
Create table #temp2 (Id int, Edited_date date, Status char(2))
Insert into #temp2
Values
('1','2/1/2016','C'),
('2','3/1/2017','B'),
('3','4/1/2017','B')
/** emphasis on the below **/
;with cte as (
Select max(Edited_date) as Edited_date, Status From #temp Group By Status
union all
Select max(Edited_date) as Edited_date, Status From #temp2 Group By Status
)
Select Status, max(Edited_date) as Recent_Edited_date From cte Group By Status
/** End of Emphasis **/
Drop table #temp
Drop table #temp2
Rollback
--- Result ---
Status| Recent_Edited_date
A| 2015-01-01
B| 2017-04-01
C| 2016-02-01
D| 2017-01-01

select distinct list of ids from table with earliest value in same table

I have the following table,
SDate Id Balance
2016-01-01 ABC 3
2016-01-01 DEF 7
2016-01-01 GHI 2
2016-02-01 ABC 6
2016-02-01 DEF 4
2016-02-01 GHI 8
2016-02-01 XYZ 12
I need to write a query that gives me a distinct list of Id's over a date range (so in this example SDate >= '2016-01-01' and SDate <= '2016-02-01') but also give me the earliest balance so the result from the table above I would like to see is,
Id Balance
ABC 3
DEF 7
GHI 2
XYZ 12
Is this possible?
UPDATE
Sorry I should have specified that for each date the Id is unique.
You can do this with a derived table that first works out the minimum SDate value for each Id value. Using this you then join back to your original table to find the Balance for the row that matches those values:
declare #t table(SDate date,Id nvarchar(3),Balance int);
insert into #t values ('2016-01-01','ABC',3),('2016-01-01','DEF',7),('2016-01-01','GHI',2),('2016-02-01','ABC',6),('2016-02-01','DEF',4),('2016-02-01','GHI',8),('2016-02-01','XYZ',12);
declare #StartDate date = '20160101';
declare #EndDate date = '20160201';
with d as
(
select Id
,min(SDate) as MinSDate
from #t
where SDate between #StartDate and #EndDate
group by id
)
select d.Id
,t.Balance
from d
inner join #t t
on(d.Id = t.Id
and d.MinSDate = t.SDate
);
Output:
Id | Balance
----+--------
ABC | 3
DEF | 7
GHI | 2
XYZ | 12
This should be possible with a window function - all you have to do is
partition by id
assign a row number, and
select the top row for each id
Example:
select id,
balance
from (
select id,
balance,
row_number() over( partition by id order by SDate ) as row_num
from table1
where SDate between '2016-01-01' and '2016-02-01'
) as a
where row_num = 1
Note: the advantage of this method is it is a lot more flexible. Say you wanted the 2 oldest records, you could just change to where row_num <= 2.
Analytic row_number() should be the fastest
select *
from (
select
t.*,
row_number() over (partition by Id order by SDate) rn
from your_table t
) t where rn = 1;
You can achieve this with a self join, which may not be the fastest or most elegant solution:
CREATE TABLE #SOPostSample
(
SDate DATE ,
Id NVARCHAR(5) ,
Balance INT
);
INSERT INTO #SOPostSample
( SDate, Id, Balance )
VALUES ( '2016-01-01', 'ABC', 3 ),
( '2016-01-01', 'DEF', 7 ),
( '2016-01-01', 'GHI', 2 ),
( '2016-02-01', 'ABC', 6 ),
( '2016-02-01', 'DEF', 4 ),
( '2016-02-01', 'GHI', 8 ),
( '2016-02-01', 'XYZ', 12 );
SELECT t1.Id ,
MIN(t2.Balance) Balance
FROM #SOPostSample t1
INNER JOIN #SOPostSample t2 ON t1.Id = t2.Id
GROUP BY t1.Id ,
t2.SDate
HAVING t2.SDate = MIN(t1.SDate);
DROP TABLE #SOPostSample;
Produces:
id Balance
============
ABC 3
DEF 7
GHI 2
XYZ 12
This works for the sample data, but please test with more data as I just wrote it quickly.
This should work, Top 1 just inserted for safety, should not be needed if SDate and Id are unique in combination
SELECT o.Id ,
( SELECT TOP 1
Balance
FROM tbl
WHERE Id = o.Id
AND SDate = MIN(o.SDate)
) Balance
FROM tbl o
GROUP BY Id
HAVING sDate BETWEEN '20160101' AND '20160201';
You can use sub-query
SELECT Id ,
( SELECT TOP 1
Balance
FROM [TableName] AS T1
WHERE T1.Id = [TableName].Id
ORDER BY SDate
) AS Balance
FROM [TableName]
GROUP BY Id;