How can I remove partial duplicates in my MS SQL Database? - sql

I have a database table which is automatically filled from different sources. Now I have the problem that there are some duplicate entries.
For example:
EID | TID | StartDate | EndDate
--------------------------------------------
1 | 1 | 20.01.2012 | 23.01.2012
1 | 2 | 25.01.2012 | 26.01.2012
1 | 3 | 27.01.2012 | 30.01.2012
2 | 2 | 20.02.2012 | 23.02.2012
2 | 2 | 25.01.2012 | 26.01.2012
3 | 1 | 20.01.2012 | 23.01.2012
As you can see, there are two rows in which EID and TID are the same. What I am trying to achieve is, that one the row, where the date is higher is deleted.
The only workaround I found, is a query where only the lower ones are selected.
SELECT EID, TID, Min(StartDate), Min(EndDate) FROM Table1 GROUP BY EID, TID

You can use a CTE and the ROW_NUMBER function:
WITH CTE AS
(
SELECT EID, TID, StartDate, EndDate,
RN = ROW_NUMBER() OVER (PARTITION BY EID, TID ORDER BY StartDate, EndDate)
FROM Table1
)
DELETE FROM CTE WHERE RN > 1
DEMO

Related

Select the highest value of column 2 per column 1

Given the following table P_PROV
+----+-----------+-----------+
| id | date | person_id |
+----+-----------+-----------+
| 1 |19/06/2019 | 1 |
| 2 |18/07/2010 | 2 |
| 3 |19/06/2020 | 1 |
| 4 |17/06/2020 | 2 |
| 5 |28/06/2020 | 3 |
+----+-----------+-----------+
I want this output
+----+-----------+-----------+
| id | date | person_id |
+----+-----------+-----------+
| 3 |19/06/2020 | 1 |
| 4 |17/06/2020 | 2 |
| 5 |28/06/2020 | 3 |
+----+-----------+-----------+
Putting this in words, I want to return per person the maximum date. I tried something like this
SELECT DISTINCT pp.date, pp.id FROM P_PROV pp
WHERE (SELECT MAX(aa.date)
FROM P_PROV aa) = pp.date;
This one is only returning one row (of course, because the MAX will return the maximum date only), but I really don't know how to approach this issue, any kind of help would be appreciated
ROW_NUMBER provides one way to handle this:
SELECT id, date, person_id
FROM
(
SELECT t.*, ROW_NUMBER() OVER (PARTITION BY person_id ORDER BY date DESC) rn
FROM yourTable t
) t
WHERE rn = 1;
Oracle has a fun way to do this using aggregation:
select max(id) keep (dense_rank first order by date desc) as id,
max(date) as date, person_id
from P_PROV
group by person_id;
Given that your ids are increasing, this probably also does what you want:
select max(id) as id, max(date) as date, person_id
from P_PROV
group by person_id;

SQL query for selecting multiple records for one product for a single id

My table looks like this, what I'm trying to achieve is to pull out all the records for one user for the product that have the earliest date
product |type_id| user | Date |Desired ROW_NUMBER as output |
-------+--------+------+-------+---------------------
1 | 1 | A | 0101 | 1
1 | 1 | A | 0102 | 1
2 | 3 | A | 0105 | 2
2 | 5 | A | 0105 | 2
3 | 7 | B | 0101 | 1
3 | 8 | B | 0104 | 1
So I want to pull all the records with "1" in the desired row_num column, but I haven't figured out hot to get this without doing another group by. Any helps would be appreciated.
You can use window functions:
select t.*
from (select t.*,
rank() over (partition by user order by min_date) as seqnum
from (select t.*,
min(date) over (partition by user, product) as min_date
from t
) t
) t
where seqnum = 1;
Or, with only one subquery:
select t.*
from (select t.*,
min(date) over (partition by user, product) as min_date_up,
min(date) over (partition by user) as min_date_u
from t
) t
where min_date_u = min_date_up;
You can interpret this as "return all rows where the product has the minimum date for the user".
Here is a db<>fiddle.
SELECT * FROM [tableName] WHERE Desired ROW_NUMBER = 1 ORDER BY Date[DESC, ASC]
Pass the Desired ROW_NUMBER value dynamically as a parameter.

Select entire partition where max row in partition is greater than 1

I'm partitioning by some non unique identifier, but I'm only concerned in the partitions with at least two results. What would be the way to get out all the instances where there's exactly one of the specified identifier?
Query I'm using:
SELECT ROW_NUMBER() OVER
(PARTITION BY nonUniqueId ORDER BY nonUniqueId, aTimeStamp) as row
,nonUniqueId
,aTimeStamp
FROM myTable
What I'm getting:
row | nonUniqueId | aTimeStamp
---------------------------------
1 | 1234 | 2014-10-08...
2 | 1234 | 2014-10-09...
1 | 1235 | 2014-10-08...
1 | 1236 | 2014-10-08...
2 | 1236 | 2014-10-09...
What I want:
row | nonUniqueId | aTimeStamp
---------------------------------
1 | 1234 | 2014-10-08...
2 | 1234 | 2014-10-09...
1 | 1236 | 2014-10-08...
2 | 1236 | 2014-10-09...
Thanks for any direction :)
Based on syntax, I'm assuming this is SQL Server 2005 or higher. My answer will be meant for that.
You have a couple options.
One, use a CTE:
;WITH CTE AS (
SELECT ROW_NUMBER() OVER
(PARTITION BY nonUniqueId ORDER BY nonUniqueId, aTimeStamp) as row
,nonUniqueId
,aTimeStamp
FROM myTable
)
SELECT *
FROM CTE t
WHERE EXISTS (SELECT 1 FROM CTE WHERE row = 2 and nonUniqueId = t.nonUniqueId);
Or, you can use subqueries:
SELECT ROW_NUMBER() OVER
(PARTITION BY nonUniqueId ORDER BY nonUniqueId, aTimeStamp) as row
,nonUniqueId
,aTimeStamp
FROM myTable t
WHERE EXISTS (SELECT 1 FROM myTable
WHERE nonUniqueId = t.nonUniqueId GROUP BY nonUniqueId, aTimeStamp HAVING COUNT(*) >= 2);

Oracle ranking columns on multiple fields

I am having some issues with ranking some columns in Oracle. I have two columns I need to rank--a group id and a date.
I want to group the table two ways:
Rank the records in each GROUP_ID by DATETIME (RANK_1)
Rank the GROUP_IDs by their DATETIME, GROUP_ID (RANK_2)
It should look like this:
GROUP_ID | DATE | RANK_1 | RANK_2
----------|------------|-----------|----------
2 | 1/1/2012 | 1 | 1
2 | 1/2/2012 | 2 | 1
2 | 1/4/2012 | 3 | 1
3 | 1/1/2012 | 1 | 2
1 | 1/3/2012 | 1 | 3
I have been able to do the former, but have been unable to figure out the latter.
SELECT group_id,
datetime,
ROW_NUMBER() OVER (PARTITION BY group_id ORDER BY datetime) AS rn,
DENSE_RANK() OVER (ORDER BY group_id) AS rn2
FROM table_1
ORDER BY group_id;
This incorrectly orders the RANK_2 field:
GROUP_ID | DATE | RANK_1 | RANK_2
----------|------------|-----------|----------
1 | 1/3/2012 | 1 | 1
2 | 1/1/2012 | 1 | 2
2 | 1/2/2012 | 2 | 2
2 | 1/4/2012 | 3 | 2
3 | 1/1/2012 | 1 | 3
Assuming you don't have an actual id column in the table, it appears that you want to do the second rank by the earliest date in each group. This will require a nested subquery:
select group_id, datetime, rn,
dense_rank() over (order by EarliestDate, group_id) as rn2
from (SELECT group_id, datetime,
ROW_NUMBER() OVER (PARTITION BY group_id ORDER BY datetime) AS rn,
min(datetime) OVER (partition by group_id) as EarliestDate
FROM table_1
) t
ORDER BY group_id;

Make a field monotonic across all rows

I have table in my sql server database which I want to convert to PK column
To do that I want to change value of each row in this column to 1,2,3 ...
Could You write T-Sql query for that task ?
Thanks for help
begin state:
Id | Name |
----------
1 | One |
2 | Two |
2 | Three|
x | xxx |
result:
Id | Name |
----------
1 | One |
2 | Two |
3 | Three|
4 | xxx |
;with cte as
(
SELECT Id, ROW_NUMBER() over (order by Id) as rn
from YourTable
)
UPDATE cte SET Id = rn
you can also do it with name if you dont have the id!
;with cte as
(
SELECT Id, ROW_NUMBER() over (order by name) as rn
from YourTable
)
UPDATE cte SET Id = rn