How can I SELECT distinct data based on a date field? - sql

I have table that stores a log of changes to objects in another table. Here are my table contents:
ObjID Color Date User
------- ------- ------------------------ --------
1 Red 2010-01-01 12:22:00.000 Joe
1 Blue 2010-01-02 15:22:00.000 Jill
1 Green 2010-01-03 16:22:00.000 Joe
1 White 2010-01-10 09:22:00.000 Mike
2 Red 2010-01-09 10:22:00.000 Mike
2 Blue 2010-01-12 09:22:00.000 Jill
2 Orange 2010-01-12 15:22:00.000 Joe
I want to select the most recent date for each Object, as well as the Color and User on the date of that record.
Bascically, I want this result set:
ObjID Color Date User
------- ------- ------------------------ --------
1 White 2010-01-10 09:22:00.000 Mike
2 Orange 2010-01-12 15:22:00.000 Joe
I'm having trouble wrapping my head around the SQL query I need to write to get this data...
I am retrieving data via ODBC from an iSeries DB2 database (AS/400).

Hey there, I think you want the following (where ColorTable is your table name):
SELECT Color.*
FROM ColorTable as Color
INNER JOIN
(
SELECT ObjID, MAX(Date) as Date
FROM ColorTable
GROUP BY ObjID
) as MaxDateByColor
ON Color.ObjID = MaxDateByColor.ObjID
AND Color.Date = MaxDateByColor.Date

Assuming at least SQL Server 2005
DECLARE #T TABLE (ObjID INT,Color VARCHAR(10),[Date] DATETIME,[User] VARCHAR(50))
INSERT INTO #T
SELECT 1,'Red',' 2010-01-01 12:22:00.000','Joe' UNION ALL
SELECT 1,'Blue','2010-01-02 15:22:00.000','Jill' UNION ALL
SELECT 1,'Green',' 2010-01-03 16:22:00.000','Joe' UNION ALL
SELECT 1,'White',' 2010-01-10 09:22:00.000','Mike' UNION ALL
SELECT 2,'Red',' 2010-01-09 10:22:00.000','Mike' UNION ALL
SELECT 2,'Blue','2010-01-12 09:22:00.000','Jill' UNION ALL
SELECT 2,'Orange','2010-01-12 15:22:00.000','Joe'
;WITH T AS
(
SELECT *,
ROW_NUMBER() OVER (PARTITION BY ObjID ORDER BY Date DESC) AS RN
FROM #T
)
SELECT ObjID,
Color,
[Date],
[User]
FROM T
WHERE RN=1
Or a SQL Server 2000 method from the article linked to in the comments
SELECT ObjID,
CAST(SUBSTRING(string, 24, 33) AS VARCHAR(10)) AS Color,
CAST(SUBSTRING(string, 1, 23) AS DATETIME ) AS [Date],
CAST(SUBSTRING(string, 34, 83) AS VARCHAR(50)) AS [User]
FROM
(
SELECT ObjID,
MAX((CONVERT(CHAR(23), [Date], 126)
+ CAST(Color AS CHAR(10))
+ CAST([User] AS CHAR(50))) COLLATE Latin1_General_BIN) AS string
FROM #T
GROUP BY ObjID) T;

If you have an Objects table and your ObjectHistory table has an index on ObjID and date, then this could perform better than other queries given so far:
SELECT
X.*
FROM
Objects O
CROSS APPLY (
SELECT TOP 1 *
FROM ObjectHistory H
WHERE O.ObjID = O.ObjID
ORDER BY H.[Date] DESC
) X
The performance improvement may only come if you're pulling columns from the Objects table, too, but it's worth a shot.
If you want all Objects regardless of whether they have a history entry, switch to OUTER APPLY (and of course use O.ObjID instead of H.ObjID).
The neat thing about this query is that
It solves for situations where the Date value can have duplicates
It can support an arbitrary number of items per group (say, the top 5 instead of the top 1)

See these two related questions:
SQL/mysql - Select distinct/UNIQUE but return all columns?
And:
How to efficiently determine changes between rows using SQL

SELECT t1.* FROM Table_name as t1
INNER JOIN (
SELECT MAX(Date) as MaxDate, ObjID FROM Table_name
GROUP BY ObjID
) as t2
ON t1.ObjID = t2.ObjID AND t1.Date = t2.MaxDate

You can find out, per object, its most recent change like this:
select objectid, max(changedate) as LatestChange
from LOG
group by objectid
You can then get the color and user columns by linking the set returned above, instantiated as an inline view that has been given an alias, to the same table again:
select color, user, FOO.objectid, FOO.LatestChange
from LOG
inner join
(
select objectid, max(changedate) as LatestChange
from LOG
group by objectid
) as FOO
on LOG.objectid = FOO.objectid and LOG.changedate = FOO.LatestChange

like martin smiths above,
simply just do a row number over partition and pick one of the rows that is most recent
like
SELECT Color,Date,User
FROM (
SELECT *,
ROW_NUMBER() OVER (PARTITION BY User ORDER BY [DATE]) AS ROW_NUMBER
FROM [tablename]
) AS ROWS
WHERE
ROW_NUMBER = 2

Related

deleting specific duplicate and original entries in a table based on date

i have a table called "main" which has 4 columns, ID, name, DateID and Sign.
i want to create a query that will delete entries in this table if there is the same ID record in twice within a certain DateID.
i have my where clause that searches the previous 3 weeks
where DateID =((SELECT MAX( DateID)
WHERE DateID < ( SELECT MAX( DateID )-3))
e.g of my dataset im working with:
id
name
DateID
sign
12345
Paul
1915
Up
23658
Danny
1915
Down
37868
Jake
1916
Up
37542
Elle
1917
Up
12345
Paul
1917
Down
87456
John
1918
Up
78563
Luke
1919
Up
23658
Danny
1920
Up
in the case above, both entries for ID 12345 would need to be removed.
however the entries for ID 23658 would need to be kept as the DateID > 3
how would this be possible?
You can use window functions for this.
It's not quite clear, but it seems LAG and conditional COUNT should fit what you need.
DELETE t
FROM (
SELECT *,
CountWithinDate = COUNT(CASE WHEN t.PrevDate >= t.DateId - 3 THEN 1 END) OVER (PARTITION BY t.id)
FROM (
SELECT *,
PrevDate = LAG(t.DateID) OVER (PARTITION BY t.id ORDER BY t.DateID)
FROM YourTable t
) t
) t
WHERE CountWithinDate > 0;
db<>fiddle
Note that you do not need to re-join the table, you can delete directly from the t derived table.
Hope this works:
DELETE FROM test_tbl
WHERE id IN (
SELECT T1.id
FROM test_tbl T1
WHERE EXISTS (SELECT 1 FROM test_tbl T2 WHERE T1.id = T2.id AND ABS(T2.dateid - T1.dateid) < 3 AND T1.dateid <> T2.dateid)
)
In case you need more logic for data processing, I would suggest using Stored Procedure.

Mapping All Terminal IDs to Previous IDs

I have a table in SQL Server that contains a list of all ID migrations overtime. An individual's ID can change overtime, and this table helps us understand when the change occurs, and what the ID changes from/to. What I'd ultimately like is a way to list all of the previous IDs for the most recent ID (which I'm referring to as the terminal ID). I'm assuming this will require some sort of CTE, but my brain is in a bit of a fog as to how I should set this up.
CREATE TABLE #ExampleIdCrosswalk
(
CurrentId VARCHAR(3)
,PreviousId VARCHAR(3)
,PreviousIdObsoleteDate DATE
)
INSERT INTO #ExampleIdCrosswalk
VALUES
('DEF','ABC','2021-01-01')
,('WVU','ZYX','2021-01-01')
,('MNO','ONM','2021-02-01')
,('PPP','EEE','2021-02-01')
,('GHI','DEF','2021-03-01')
,('TSR','WVU','2021-03-01')
,('NRP','QRS','2021-03-01')
,('JKL','GHI','2021-04-01')
SELECT * FROM #ExampleIdCrosswalk
Ultimately, what I'd like to show is a table with all the terminal ID's along with each of their corresponding previous IDs.
Any help would be appreciated!
You can use a recursive CTE for this:
with cte as (
select currentid, previousid
from ExampleIdCrosswalk ec
where not exists (select 1 from ExampleIdCrosswalk ec2 where ec2.previousId = ec.currentid)
union all
select cte.currentid, ec.previousid
from cte join
ExampleIdCrosswalk ec
on ec.currentId = cte.previousId
)
select *
from cte;
Here is a db<>fiddle.
You can use a recursive CTE, as in:
with
n (last, curr, prev) as (
select currentid, currentid, previousid
from ExampleIdCrosswalk where currentid not in (
select previousid from ExampleIdCrosswalk
)
union all
select n.last, c.currentid, c.previousid
from n
join ExampleIdCrosswalk c on c.currentid = n.prev
)
select last, prev
from n
order by last, prev
Result:
last prev
----- ----
JKL ABC
JKL DEF
JKL GHI
MNO ONM
NRP QRS
PPP EEE
TSR WVU
TSR ZYX
See running example at db<>fiddle.

datediff for row that meets my condition only once per row

I want to do a datediff between 2 dates on different rows only if the rows have a condition.
my table looks like the following, with additional columns (like guid)
Id | CreateDateAndTime | condition
---------------------------------------------------------------
1 | 2018-12-11 12:07:55.273 | with this
2 | 2018-12-11 12:07:53.550 | I need to compare this state
3 | 2018-12-11 12:07:53.550 | with this
4 | 2018-12-11 12:06:40.780 | state 3
5 | 2018-12-11 12:06:39.317 | I need to compare this state
with this example I would like to have 2 rows in my selection which represent the difference between the dates from id 5-3 and from id 2-1.
As of now I come with a request that gives me the difference between dates from id 5-3 , id 5-1 and id 2-1 :
with t as (
SELECT TOP (100000)
*
FROM mydatatable
order by CreateDateAndTime desc)
select
DATEDIFF(SECOND, f.CreateDateAndTime, s.CreateDateAndTime) time
from t f
join t s on (f.[guid] = s.[guid] )
where f.condition like '%I need to compare this state%'
and s.condition like '%with this%'
and (f.id - s.id) < 0
My problem is I cannot set f.id - s.id to a value since other rows can be between the ones I want to make the diff on.
How can I make the datediff only on the first rows that meet my conditions?
EDIT : To make it more clear
My condition is an eventname and I want to calculate the time between the occurence of my event 1 and my event 2 and fill a column named time for example.
#Salman A answer is really close to what I want except it will not work when my event 2 is not happening (which was not in my initial example)
i.e. in table like the following , it will make the datediff between row id 5 and row id 2
Id | CreateDateAndTime | condition
---------------------------------------------------------------
1 | 2018-12-11 12:07:55.273 | with this
2 | 2018-12-11 12:07:53.550 | I need to compare this state
3 | 2018-12-11 12:07:53.550 | state 3
4 | 2018-12-11 12:06:40.780 | state 3
5 | 2018-12-11 12:06:39.317 | I need to compare this state
the code I modified :
WITH cte AS (
SELECT id
, CreateDateAndTime AS currdate
, LAG(CreateDateAndTime) OVER (PARTITION BY guid ORDER BY id desc ) AS prevdate
, condition
FROM t
WHERE condition IN ('I need to compare this state', 'with this ')
)
SELECT *
,DATEDIFF(second, currdate, prevdate) time
FROM cte
WHERE condition = 'I need to compare this state '
and DATEDIFF(second, currdate, prevdate) != 0
order by id desc
Perhaps you want to match ids with the nearest smaller id. You can use window functions for this:
WITH cte AS (
SELECT id
, CreateDateAndTime AS currdate
, CASE WHEN LAG(condition) OVER (PARTITION BY guid ORDER BY id) = 'with this'
THEN LAG(CreateDateAndTime) OVER (PARTITION BY guid ORDER BY id) AS prevdate
, condition
FROM t
WHERE condition IN ('I need to compare this state', 'with this')
)
SELECT *
, DATEDIFF(second, currdate, prevdate)
FROM cte
WHERE condition = 'I need to compare this state'
The CASE expression will match this state with with this. If you have mismatching pairs then it'll return NULL.
try by using analytic function lead()
with cte as
(
select 1 as id, '2018-12-11 12:07:55.273' as CreateDateAndTime,'with this' as condition union all
select 2,'2018-12-11 12:07:53.550','I need to compare this state' union all
select 3,'2018-12-11 12:07:53.550','with this' union all
select 4,'2018-12-11 12:06:40.780','state 3' union all
select 5,'2018-12-11 12:06:39.317','I need to compare this state'
) select *,
DATEDIFF(SECOND,CreateDateAndTime,lead(CreateDateAndTime) over(order by Id))
from cte
where condition in ('with this','I need to compare this state')
You Ideally want LEADIF/LAGIF functions, because you are looking for the previous row where condition = 'with this'. Since there are no LEADIF/LAGIFI think the best option is to use OUTER/CROSS APPLY with TOP 1, e.g
CREATE TABLE #T (Id INT, CreateDateAndTime DATETIME, condition VARCHAR(28));
INSERT INTO #T (Id, CreateDateAndTime, condition)
VALUES
(1, '2018-12-11 12:07:55', 'with this'),
(2, '2018-12-11 12:07:53', 'I need to compare this state'),
(3, '2018-12-11 12:07:53', 'with this'),
(4, '2018-12-11 12:06:40', 'state 3'),
(5, '2018-12-11 12:06:39', 'I need to compare this state');
SELECT ID1 = t1.ID,
Date1 = t1.CreateDateAndTime,
ID2 = t2.ID,
Date2 = t2.CreateDateAndTime,
Difference = DATEDIFF(SECOND, t1.CreateDateAndTime, t2.CreateDateAndTime)
FROM #T AS t1
CROSS APPLY
( SELECT TOP 1 t2.CreateDateAndTime, t2.ID
FROM #T AS t2
WHERE t2.Condition = 'with this'
AND t2.CreateDateAndTime > t1.CreateDateAndTime
--AND t2.GUID = t.GUID
ORDER BY CreateDateAndTime
) AS t2
WHERE t1.Condition = 'I need to compare this state';
Which Gives:
ID1 Date1 D2 Date2 Difference
-------------------------------------------------------------------------------
2 2018-12-11 12:07:53.000 1 2018-12-11 12:07:55.000 2
5 2018-12-11 12:06:39.000 3 2018-12-11 12:07:53.000 74
I would enumerate the values and then use window functions for the difference.
select min(id), max(id),
datediff(second, min(CreateDateAndTime), max(CreateDateAndTime)) as seconds
from (select t.*,
row_number() over (partition by condition order by CreateDateAndTime) as seqnum
from t
where condition in ('I need to compare this state', 'with this')
) t
group by seqnum;
I cannot tell what you want the results to look like. This version only output the differences, with the ids of the rows you care about. The difference can also be applied to the original rows, rather than put into summary rows.

Extract non-existent values based on previous months?

I HAVE tb1
code Name sal_month
==== ===== ========
101 john 02/2017
102 mathe 02/2017
103 yara 02/2017
104 sara 02/2017
101 john 03/2017
102 mathe 03/2017
103 yara 03/2017
104 sara 03/2017
101 john 04/2017
103 yara 04/2017
In February all of them received salaries as well as March
How do I extract non-existent values based on previous months?
the result should be come
code sal_month
==== =======
102 04/2017
104 04/2017
Thank in advance
First I created this table:
create table #T(code int, sal_month varchar(10))
insert into #T values(101,'2/2017'),(102,'2/2017'),(103,'2/2017'),(104,'2/2017'),
(101,'3/2017'),(102,'3/2017'),(104,'3/2017'),(101,'4/2017'),(103,'4/2017')
Second, I executed this query:
SELECT code, Max(sal_Month)
From #T
Where code not in (select code from #T where sal_Month = (select Max(sal_Month) from #T))
Group by code
Then I got the following results:
Note: I am using SQL SERVER 2012
I think you can count salary_month grouped by id, something like this, and select the rows that shows less than 3 times.
select code, count (sal_month) from tb1
group by code
having count (sal_month) < 3
After that you join with initial table (just to filter the full rows which you need) on code.
So the final query will look like his:
select code, sal_month
from tb1 a
join (select code, count (sal_month) from tb1
group by code
having code < 3) X on a.code = X.code
Something like this:
DECLARE #DataSource TABLE
(
[code] INT
,[sal_month] VARCHAR(12)
);
INSERT #DataSource ([code], [sal_month])
VALUES (101, '2/2017')
,(102, '2/2017')
,(103, '2/2017')
,(104, '2/2017')
,(101, '3/2017')
,(102, '3/2017')
,(104, '3/2017')
,(101, '4/2017')
,(103, '4/2017');
WITH DataSource AS
(
SELECT *
,DENSE_RANK() OVER (ORDER BY [sal_month]) AS [MonthID]
,MAX([sal_month]) OVER () AS [MaxMonth]
FROM #DataSource DS1
)
SELECT DS1.[code]
,DS1.[sal_month]
FROM DataSource DS1
LEFT JOIN DataSource DS2
ON DS1.[code] = DS2.[code]
AND DS1.[MonthID] = DS2.[MonthID] - 1
LEFT JOIN DataSource DS3
ON DS1.[code] = DS3.[code]
AND DS1.[MonthID] = DS3.[MonthID] + 1
WHERE DS2.[code] IS NULL
AND DS3.[code] IS NOT NULL
AND DS1.[sal_month] <> DS1.[MaxMonth];
Some notes:
we need a way to sort the months and it is not easy as you are storing them in very unpractical way; you are not using a date/datetime column and your string is not a valid date; also, the string you are using is not good at all, because if you have [sal_month] from different years, we will not be able to sort them; you should think about this - one alternative is to use this format:
201512
201701
201702
201711
In this way we can sort by string.
in the core I am using ROW_NUMBER and sorting months as strings;
the idea is to look for all records, that have not exists in the next month, but have a record in the previous; at the same time, excluded records which are from the last month, as it's not possible for them to have record in the next month;
Try this:
select tb2.code, tb2.sal_month from tb
right join (
select code, sal_month, datepart(month, sal_month) + 1 as next_sal_month from tb) as tb2
on (tb.code = tb2.code and datepart(month, tb.sal_month) = tb2.next_sal_month)
where tb2.next_sal_month < 5 and tb.sal_month is null
In the result set there's one additional record: code 103 didn't receive salary in March, but did so in February, so it is included as well.
Here's SQL fiddle, to try :)
In the absence of more facts about your tables, create a cartesian product of the 2 axes of month & code, then left join the stored data. Then it is easy to identify missing items when no stored data exists when compared to every possible combination.
You might already have master tables of sal_month and/or code to use, if you do use those, but if not you can dynamically create them using select distinct as seen below.
create table tbl1 (code int, sal_month varchar(10))
insert into tbl1 values(101,'2/2017'),(102,'2/2017'),(103,'2/2017'),(104,'2/2017'),
(101,'3/2017'),(102,'3/2017'),(104,'3/2017'),(101,'4/2017'),(103,'4/2017')
select c.code, m.sal_month
from ( select distinct sal_month from tbl1 ) m
cross join ( select distinct code from tbl1 ) c
Left join tbl1 t on m.sal_month = t.sal_month and c.code = t.code
Where t.sal_month IS NULL
code | sal_month
---: | :--------
103 | 3/2017
102 | 4/2017
104 | 4/2017
dbfiddle here

Concatenate multiple rows from inside a correlated 'group by' subquery into a single text string

Similar questions have been asked before but I am specifically looking for an answer to do much the same with a correlated subquery.
I am doing this on SQL Server, and I cannot utilize stored procedure or temp table creation approach.
For those familiar with Client Matter billing; I have formulated a 'group by' query using row_number technique to return me back the top 3 performers for each unique clientmatter, summing their amounts over a period of time.
This gives me something like this:
clientmatterno attorneyname amount seq_num
111111.00001 John Doe $30,000 1
111111.00001 Mark Tim $23,000 2
111111.00001 Jane Sue $15,000 3
111111.00001 Mary Ann $5,000 4
222221.00501 John Doe $35,000 1
222221.00501 David Hu $30,000 2
444444.00003 Shelly Y $50,000 1
I think, I would have to first do a group by clause to sum up the amounts for each attorney in order to find the totals and hence get the correct seq_num to appear across.
I am now trying to use this subquery results to do the string concatenation such that I get the following results:
111111.00001 John Doe|Mark Tim|Jane Sue
222221.00501 John Doe|David Hu
444444.00003 Shelly Y
The Query that I think will work, seeing past questions on this topic:
select subq.clientmatterno as [Id],
,
STUFF(
(SELECT DISTINCT ',' + subq.attorneyname
FROM ????
WHERE ????
FOR XML PATH (''))
, 1, 1, '') AS TopPerformers
from (
SELECT clientmatterno, attorneyname, sum(amount),
row_number() over (partition by clientmatterno order by sum(amount) desc) as seq_num
FROM ...
WHERE ...
GROUP BY clientmatterno, attorneyname
) as subq
where seq_num <= 3
group by clientmatterno
My problem is on how to connect and build up the STUFF function. The error is very simple: I cannot seem to use the subquery set 'subq' in the FROM clause inside the STUFF function.
I have not tried out XML FOR Auto approach.
Try using a common table expression instead of a derived table:
with cte as (
SELECT
clientmatterno,
attorneyname,
sum(amount) amount,
seq = row_number() over (partition by clientmatterno order by sum(amount) desc)
FROM ...
WHERE ...
GROUP BY clientmatterno, attorneyname
)
SELECT
clientmatterno,
STUFF(
(
SELECT '|' + attorneyname
FROM cte
WHERE clientmatterno = a.clientmatterno
AND seq <= 3
FOR XML PATH ('')
), 1, 1, ''
) AS Attorneynames
FROM cte AS a
GROUP BY clientmatterno