Delete Occurrence of Unique ID from SQL Server Table [duplicate] - sql

This question already has answers here:
How can I remove duplicate rows?
(43 answers)
Delete all records except the most recent one?
(3 answers)
Closed last year.
I have a SQL Server Table where I have a Column that contains a unique ID. I also have another column called Level, every time a new occurrence of a unique ID enters the table the Level will increase.
ID Level DateTime Symbol Exchange
XRP/USD_FTXSPOT 1 2022-01-04 17:03:24.027 XRP/USD FTX
XRP/USD_FTXSPOT 2 2022-01-04 17:03:31.147 XRP/USD FTX
Therefore it would look something like this. The more recent the row entered the higher the level gets basically.
What I am trying to do is be able to say once a new row is entered for a unique ID, remove all previous occurrences based on its Level. Meaning, remove all rows where the level is < the greatest.
SELECT * FROM
Thursday_crypto JOIN
(
SELECT ID, MAX(Level) Level
FROM Thursday_crypto
GROUP BY ID
) max_date ON Thursday_crypto.ID = max_date.ID AND Thursday_crypto.Level = max_date.Level
I have this which basically returns the rows where each unique ID has its highest Level. But I am wondering how I can alter this to then remove all rows not within this selection. I want to reduce the size of the table, so I guess my main goal is to remove all rows not within this selection.

You can calculate a row_number based on the ID and the level.
Then remove the dups based on the row_number.
WITH CTE_DATA AS (
SELECT [RowNum] = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY Level DESC)
FROM Thursday_crypto
)
DELETE
FROM CTE_DATA
WHERE RowNum > 1
Demo on db<>fiddle here

Related

How to delete rows in SQL that contain duplicate values in just one column? [duplicate]

This question already has answers here:
Delete duplicate rows from small table
(15 answers)
Closed 6 months ago.
I have the following data in a table:
id
name
symbol
1
Two
Three
2
Two
Three
3
Three
Three
4
Three
Three
5
Three
Three
and want to delete all rows such as the column name only contains unique values. It doesn't matter which rows gets deleted in case of duplicate values in column name.
So desired output would be e.g.
id
name
symbol
1
Two
Three
3
Three
Three
I have a postgres td and this is what I tried based on a tutorial:
;with cte as
(
select
*,
row_num = row_number() over (partition by name order by ID)
from public.tentacle_ticker
)
delete
from cte
where row_num > 1;
which returns column "row_num" does not exist
Not sure if this is what you require. An expanded comment more than an answer
delete from table
where
id not in
(
select min(id)
from
table
group by name,symbol
)

group and return rows with the minimum value

There is a tasks table.
id | name | project_id | created | ...
Tasks can be in different projects. I need to return one task from each project with a minimum creation date. Here is my solution
SELECT *
FROM tasks a
JOIN (
SELECT project_id, min(created) as created
FROM tasks
GROUP BY project_id
) b
ON a.project_id=b.project_id AND a.created = b.created;
but if there are points in the project with the same creation dates, then I return two records for one project
To ensure that 1, and only 1, row is returned per project_id a better method is to use row_number() over() where the partition by within the over() clause is similar to what you would have grouped by and the order by controls which row within each partition is given the value of 1. In this case the value of 1 is given to a row with the earliest created date, and further columns can also be referenced as tie-breakers (e.g. using id). Every other row within the partition is given the next integer value so only one row in each partition can be equal to 1. So to limit the final result, use a derived table (subquery) followed by a where clause that restricts the result to the first row per partition i.e. where rn = 1.
SELECT
*
FROM (SELECT *
, row_number() over(partition by project_id order by created, id) as rn
FROM tasks
) AS derived
WHERE rn = 1
nb: to get the most recent row reverse the direction of ordering on the date column
Not only will this technique ensure only 1 row per partition is returned it also requires fewer passes through the data (than your original approach), so it is efficient as well.
tip: if you did want to get more than 1 row per partition returned then use rank() or dense_rank() instead of row_number() - because the ranking functions will recognize rows of equal rank and hence return the same rank value. i.e. more than 1 row could get a rank value of 1

Find the max value in a row and hide repeated values in a different row Oracle SQL [duplicate]

This question already has answers here:
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
Select First Row of Every Group in sql [duplicate]
(2 answers)
Return row with the max value of one column per group [duplicate]
(3 answers)
SQL Selecting dates with maximum sale for each department [duplicate]
(3 answers)
SQL: getting the max value of one column and the corresponding other columns [duplicate]
(2 answers)
Closed 1 year ago.
Running this query it will create a table with 3 columns (place name, id of vaccines and the count of how many the vaccines were applied to different persons).
select vaccines.placename, vaccinetype.idvaccine,count(*)
from vaccines,request,vaccinetype
where request.idvaccine = vaccines.idvaccine
and vaccinetype.idvaccine = request.idvaccine
group by vaccines.placename,vaccinetype.idvaccine
order by vaccines.placename, vaccinetype.idvaccine
In the image of the query result above, you will see that the same vaccine id was applied in different places but this is something that i want to filter, i want to only show those vaccines id where was the most applied to the persons. For example, in this table we would have to eliminate row 6 because row 1 already exists with the same vaccine code and also that in the column count (*) the value is higher than row 6.
I have tried to do a sub query but it didn't filter correctly.
Here's one option: rank rows per vaccineid, sorted by count result, and then fetch rows that rank as "highest". Read comments within code.
WITH
your_query
AS
-- this is your query, slightly rewritten so that it uses
-- * JOINs, so that you actually join tables there and leave WHERE clause for
-- conditions (if any; there are none in your query)
-- * table aliases, which make it easier to read
-- * column alias applied to COUNT function
( SELECT v.placename, t.idvaccine, COUNT (*) cnt
FROM vaccines v
JOIN request r ON r.idvaccine = v.idvaccine
JOIN vaccinetype t ON t.idvaccine = r.idvaccine
GROUP BY v.placename, t.idvaccine),
temp
AS
-- it fetches values from your query, with addition of the RANK analytic function
-- which ranks rows per each IDVACCINE, sorted by COUNT result in descending order
(SELECT placename,
idvaccine,
cnt,
RANK () OVER (PARTITION BY idvaccine ORDER BY cnt DESC) rnk
FROM your_query)
-- finally, retrieve rows that rank as "highest" in TEMP CTE
SELECT placename, idvaccine, cnt
FROM temp
WHERE rnk = 1
ORDER BY placename, idvaccine;

SQL query to remove duplicate records without primary key, keeping the most recent [duplicate]

This question already has answers here:
How to delete duplicate rows in SQL Server?
(26 answers)
Get top 1 row of each group
(19 answers)
Closed 1 year ago.
I have a transaction table (SQL Express 2014) that hold sales transactions. I need to remove duplicate records, retaining only the most recent record
Example of current data
ACC_PART_MAT TX_DATE
A1025-A552 2021-09-02
A1025-B1994 2121-04-28
A1025-B1994 2121-09-02
A1025-B1994 2121-03-21
A1025-B1960 2121-05-20
End result required
ACC_PART_MAT TX_DATE
A1025-A552 2021-09-02
A1025-B1994 2121-09-02
A1025-B1960 2121-05-20
There are many examples addressing duplicate records but I cannot get them work with no primary key and dates.
Many thanks in advance
For your example, you can just use aggregation:
select ACC_PART_MAT, min(TX_DATE) as TX_DATE
from t
group by ACC_PART_MAT;
If you actually wanted to delete rows from a table, you can use an updatable CTE -- but be careful because this changes the table:
with todelete as (
select t.*,
row_number() over (partition by ACC_PART_MAT order by TX_DATE asc) as seqnum
from t
)
delete from todelete
where seqnum > 1;

How to remove duplicate data from microsoft sql database(on the result only)

the column code has values that have duplicate on it , i do want to remove the duplicate of that row.
for example i want to remove the duplicates of column code as well the row that has duplicate on it. it doesent matter if the other column has duplicate but i do want to base it on the code column. what sql query can i use.? Thank you
this is the table I am working to.
as you can see there are isdeleted column that has value of 1 on them. I only want the recored with a value of 0 on them
here is a sample record, in here you can see that row 1 has a isdeleted value of 1, which mean that this record is deleted and i only need the row 2 of this code.
You could use the windowing function ROW_NUMBER() to single out the last entry per code like in:
SELECT code, shortdesc, longdesc, isobsolete, effectivefromdate
FROM (
SELECT ROW_NUMBER() OVER(PARTITION BY code ORDER BY effectivefromdate DESC) AS rn, *
FROM CodingSuite_STG
WHERE isobsolete=1 AND isdeleted=0
) AS cs
WHERE rn=1
ORDER BY effectivefromdate
Explanation:
Core of the operation is a "sub-query". That is a "table-like" expression generated by having a SELECT clause surrounded by parentheses and followed by a table name like:
( SELECT * FROM CodingSuite_STG WHERE iobsolete=1 ) AS cs
For the outer SELECT it will appear like a table with the name "cs".
Within this sub-query I placed a special function (a "window function") consisting of two parts:
ROWN_NUMBER() OVER ( PARTITION BY code ORDER BY effectivefromdate DESC) AS rn
The ROW_NUMBER() function returns a sequential number for a certain "window" of records defined by the immediately following OVER ( ... ) clause. The PARTITION BY inside it defines a group division scheme (similar to GROUP BY), so the row numbers start from 1 for each partitioned group. ORDER BY determines the numbering order within each group. So, with entries having the same code value ROW_NUMBER() will supply the number sequence 1, 2, 3... for each record, with 1 being assigned to the record with the highest value of effectivefromdate because of ORDER BY effectivefromdate DESC.
All we need to do in the outer SELECT clause is to pick up those records from the sub-query cs that have an rn-value of 1 and we're done!