SQL Server: Select duplicate rows - sql

I have a table:
personId
Date
location
abc123
15-09-2022
London
abc123
15-09-2022
Nottingham
efg321
12-09-2022
Leeds
abc123
13-09-2022
Birmingham
I want to select and return the duplicate rows based on Date and location columns, for example, in the above table: personId 'abc123' is present at location both 'London' and 'Nottingham' on the same date, so I would like to return these rows.
I have tried this query:
SELECT personId, Date FROM sampleTable GROUP BY personId, Date HAVING COUNT(*) > 1
But it gives me the count. I want the rows with all three columns. Expected result:
personId
Date
location
abc123
15-09-2022
London
abc123
15-09-2022
Nottingham
Can anyone please help me with this? Thanks

Try something like this:
SELECT
sampleTable.*
FROM
sampleTable
INNER JOIN -- acts as a filter here
(
SELECT
personId,
Date
FROM
sampleTable
GROUP BY
personId,
Date
HAVING
COUNT(*) > 1
) problemTable
ON sampleTable.personId = problemTable.personId
AND sampleTable.Date = problemTable.Date
ORDER BY
sampleTable.personId,
sampleTable.Date,
sampleTable.location
;
The derived problemTable calculates personId/Date combos that have multiple sampleTable rows. INNER JOINing sampleTable with problemTable, by nature of an INNER JOIN, returns an abridged version of sampleTable: one that only contains combos found within problemTable as well—and those are the ones you care about!
Using INNER JOIN as a filter mechanism is a common theme in SQL, so keep it in the back of your mind.

Its pretty easy using window functions.
Inner SQL returns same table with extra col that marks duplicate rows. Then outer sql filters rows that has duplicate
inner sql result
personid date location check
abc123 13-09-2022 Birmingham 1
abc123 15-09-2022 London 2
abc123 15-09-2022 Nottingham 2
efg321 12-09-2022 Leeds 1
final
personid date location check
abc123 15-09-2022 London 2
abc123 15-09-2022 Nottingham 2
SQL
WITH temp AS (
SELECT
personid,
datecol,
location,
COUNT( personid ) OVER (PARTITION BY personid, datecol) AS check
FROM sampletable
)
SELECT *
FROM temp
WHERE check > 1

Related

SQL: Take maximum value, but if a field is missing for a particular ID, ignore all values

This is somewhat difficult to explain...(this is using SQL Assistant for Teradata, which I'm not overly familiar with).
ID creation_date completion_date Difference
123 5/9/2016 5/16/2016 7
123 5/14/2016 5/16/2016 2
456 4/26/2016 4/30/2016 4
456 (null) 4/30/2016 (null)
789 3/25/2016 3/31/2016 6
789 3/1/2016 3/31/2016 30
An ID may have more than one creation_date, but it will always have the same completion_date. If the creation_date is populated for all records for an ID, I want to return the record with the most recent creation_date. However, if ANY creation_date for a given ID is missing, I want to ignore all records associated with this ID.
Given the data above, I would want to return:
ID creation_date completion_date Difference
123 5/14/2016 5/16/2016 2
789 3/25/2016 3/31/2016 6
No records are returned for 456 because the second record has a missing creation_date. The record with the most recent creation_date is returned for 123 and 789.
Any help would be greatly appreciated. Thanks!
Depending on your database, here's one option using row_number to get the max date per group. You can then filter those results with not exists to check against null values:
select *
from (
select *,
row_number() over (partition by id order by creation_date desc) rn
from yourtable
) t
where rn = 1 and not exists (
select 1
from yourtable t2
where t2.creationdate is null and t.id = t2.id
)
row_number is a window function that is supported in many databases. mysql doesn't but you can achieve the same result using user-defined variables.
Here is a more generic version using conditional aggregation:
select t.*
from yourtable t
join (select id, max(creation_date) max_creation_date
from yourtable
group by id
having count(case when creation_date is null then 1 end) = 0
) t2 on t.id = t2.id and t.creation_date = t2.max_creation_date
SQL Fiddle Demo

fill in a null cell with cell from previous record

Hi I am using DB2 sql to fill in some missing data in the following table:
Person House From To
------ ----- ---- --
1 586 2000-04-16 2010-12-03
2 123 2001-01-01 2012-09-27
2 NULL NULL NULL
2 104 2004-01-01 2012-11-24
3 987 1999-12-31 2009-08-01
3 NULL NULL NULL
Where person 2 has lived in 3 houses, but the middle address it is not known where, and when. I can't do anything about what house they were in, but I would like to take the previous house they lived at, and use the previous To date to replace the NULL From date, and use the next address info and use the From date to replace the null To date ie.
Person House From To
------ ----- ---- --
1 586 2000-04-16 2010-12-03
2 123 2001-01-01 2012-09-27
2 NULL 2012-09-27 2004-01-01
2 104 2004-01-01 2012-11-24
3 987 1999-12-31 2009-08-01
3 NULL 2009-08-01 9999-01-01
I understand that if there is no previous address before a null address, that will have to stay null, but if a null address is the last know address I would like to change the To date to 9999-01-01 as in person 3.
This type of problem seems to me where set theory no longer becomes a good solution, however I am required to find a DB2 solution because that's what my boss uses!
any pointers/suggestions welcome.
Thanks.
It might look something like this:
select
person,
house,
coalesce(from_date, prev_to_date) from_date,
case when rn = 1 then coalesce (to_date, '9999-01-01')
else coalesce(to_date, next_from_date) end to_date
from
(select person, house, from_date, to_date,
lag(to_date) over (partition by person order by from_date nulls last) prev_to_date,
lead(from_date) over (partition by person order by from_date nulls last) next_from_date,
row_number() over (partition by person order by from_date desc nulls last) rn
from temp
) t
The above is not tested but it might give you an idea.
I hope in your actual table you have a column other than to_date and from_date that allows you to order rows for each person, otherwise you'll have trouble sorting NULL dates, as you have no way of knowing the actual sequence.
create table Temp
(
person varchar(2),
house int,
from_date date,
to_date date
)
insert into temp values
(1,586,'2000-04-16','2010-12-03 '),
(2,123,'2001-01-01','2012-09-27'),
(2,NULL,NULL,NULL),
(2,104,'2004-01-01','2012-11-24'),
(3,987,'1999-12-31','2009-08-01'),
(3,NULL,NULL,NULL)
select A.person,
A.house,
isnull(A.from_date,BF.to_date) From_date,
isnull(A.to_date,isnull(CT.From_date,'9999-01-01')) To_date
from
((select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) A left join
(select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) BF
on A.person = BF.person and
A.rownum = BF.rownum + 1)left join
(select *,ROW_NUMBER() over (order by (select 0)) rownum from Temp) CT
on A.person = CT.person and
A.rownum = CT.rownum - 1

Get the lastest record with distinct values for a column

I have the following data:
SalesID Source Name Modified On
S12345 ABC John 5/8/2013 5:44
S12345 ABC Tom 5/8/2013 5:45
S11111 EFG Sam 5/8/2013 5:46
S11111 EFG Don 5/8/2013 5:47
I want to write a SP or a query that will return me the 2nd and the 4th row i.e I want to write a query that will return the lasted modified records based on distinct sales ID.
Try Following:
select * from tableName where ModifiedOn=(select max(ModifiedOn) from tableName) group by SalesID
i used the following query and it worked just fine for me
SELECT * FROM (Select Asu_OrderId ,ModifiedOn ,ROW_NUMBER() OVER(PARTITION BY Asu_OrderId order by ModifiedOn desc)AS R from Asu_callreason
where <condition>) AS A WHERE r=1

Create table with distinct values based on date

I have a table which fills up with lots of transactions monthly, like below.
Name ID Date OtherColumn
_________________________________________________
John Smith 11111 2012-11-29 Somevalue
John Smith 11111 2012-11-30 Somevalue
Adam Gray 22222 2012-12-11 Somevalue
Tim Blue 33333 2012-12-15 Somevalue
John NewName 11111 2013-01-01 Somevalue
Adam Gray 22222 2013-01-02 Somevalue
From this table i want to create a dimension table with the unique names and id's. The problem is that a person can change his/her name, like "John" in the example above. The Id's are otherwise always unique. In those cases I want to only use the newest name (the one with the latest date).
So that I end up with a table like this:
Name ID
______________________
John NewName 11111
Adam Gray 22222
Tim Blue 33333
How do I go about achieving this?
Can I do it in a single query?
Use a CTE for this. It simplifies ranking and window functions.
;WITH CTE as
(SELECT
RN = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [Date] DESC),
ID,
Name
FROM
YourTable)
SELECT
Name,
ID
FROM
CTE
WHERE
RN = 1
I think creating a table is a bad idea, but this is how you get the most recent name.
select name
from yourtable yt join
(select id, max(date) maxdate
from yourtable
group by id ) temp on temp.id = yt.id and yt.date = maxdate
JNK's CTE solution is an equivalent of the following.
SELECT
Name,
ID
FROM (
SELECT
RN = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY [Date] DESC),
Name,
ID
FROM theTable
)
WHERE RN = 1
Trying to think a way to get rid of the partition function without introducing the possible duplicates.

Count two Columns with two Where Clauses

I know it's just late in the day and my brain is just fried....
Using Teradata, I need to COUNT DISTINCT MEMBERS that haven't had a TRANS in the past six months and also COUNT the number of TRANS they had historically (prior to the six months). We can just assume the cutoff date to be 01/01/2012. All table is contained in a single table.
For example:
Member | Tran Date
123 | 01/01/2011
789 | 06/01/2011
123 |10/31/2011
678 | 04/03/2011
789 | 06/01/2012
So 2 members had a total of 3 transactions dated prior to 1/1/2012 with no transactions later than 1/1/2012.
In this example, my result would be:
MEMBERS | TRANS
2 | 3
Try this solution:
SELECT
COUNT(DISTINCT member_id) AS MEMBERS,
COUNT(*) AS TRANS
FROM
tbl
WHERE
member_id NOT IN
(
SELECT DISTINCT member_id
FROM tbl
WHERE trans_date > '2012-01-01'
)
You can't do it in one SQL statement. Use subqueries. This is TSQL coz I am unfamiliar with Teradata.
DECLARE #CUTOFF DATETIME = DATEADD(MO,-6,GETDATE()) --6MTHS AGO
SELECT COUNT(MEMBERID) AS MEMBERS, SUM(TRANSCOUNT) AS TRANS FROM (
SELECT DISTINCT
MEMBERID,
(SELECT COUNT(*) TRANSDATE WHERE TRANSDATA.MEMBERID = MEMBER.MEMBERIF) AS TRANSCOUNT
FROM MEMBER WHERE NOT EXISTS
(SELECT * FROM TRANSDATA, MEMBER WHERE
TRANSDATA.MEMBERID = MEMBER.MEMBERIF
AND TRANDATE > #CUTOFF)
)