Create table of unique values from table of duplicates - sql

I have a table with 500,000+ rows and the following columns:
Symbol, ExternalCode, ExternalCodeType, StartDate
Symbol should be unique but it's not.
There are a handful of rows (~60) that have the same value for Symbol but have a different ExternalCode+StartDate pair.
I want to create a table of uniques so that, when there are multiple entries for the same Symbol, I only take the one with the most recent StartDate.
Is there a simple/elegant way to do this?

In SQL-Server this can be solved without JOINing.
Try this:
The ROW_NUMBER function starts a new series of 'ID's ordered by date (so that the latest always equals 1) and partitioned by Symbol, so that each symbol has it's own set of IDs.
Hope the answer is clear.


How to remove duplicate data from microsoft sql database(on the result only)

the column code has values that have duplicate on it , i do want to remove the duplicate of that row.
for example i want to remove the duplicates of column code as well the row that has duplicate on it. it doesent matter if the other column has duplicate but i do want to base it on the code column. what sql query can i use.? Thank you
this is the table I am working to.
as you can see there are isdeleted column that has value of 1 on them. I only want the recored with a value of 0 on them
here is a sample record, in here you can see that row 1 has a isdeleted value of 1, which mean that this record is deleted and i only need the row 2 of this code.
You could use the windowing function ROW_NUMBER() to single out the last entry per code like in:
SELECT code, shortdesc, longdesc, isobsolete, effectivefromdate
FROM CodingSuite_STG
WHERE isobsolete=1 AND isdeleted=0
) AS cs
WHERE rn=1
ORDER BY effectivefromdate
Core of the operation is a "sub-query". That is a "table-like" expression generated by having a SELECT clause surrounded by parentheses and followed by a table name like:
( SELECT * FROM CodingSuite_STG WHERE iobsolete=1 ) AS cs
For the outer SELECT it will appear like a table with the name "cs".
Within this sub-query I placed a special function (a "window function") consisting of two parts:
ROWN_NUMBER() OVER ( PARTITION BY code ORDER BY effectivefromdate DESC) AS rn
The ROW_NUMBER() function returns a sequential number for a certain "window" of records defined by the immediately following OVER ( ... ) clause. The PARTITION BY inside it defines a group division scheme (similar to GROUP BY), so the row numbers start from 1 for each partitioned group. ORDER BY determines the numbering order within each group. So, with entries having the same code value ROW_NUMBER() will supply the number sequence 1, 2, 3... for each record, with 1 being assigned to the record with the highest value of effectivefromdate because of ORDER BY effectivefromdate DESC.
All we need to do in the outer SELECT clause is to pick up those records from the sub-query cs that have an rn-value of 1 and we're done!

how to create an order column in sql server

i want an order info for every row in my table. the following is my table. ID column is identity and its primary key. Order column is computed from id column.
ID -- Name -- Order
1 kiwi 1
2 banana 2
3 apple 3
everything is fine and i have an order column. But i cant switch the orders of rows.
for example i cant say that from now on kiwi's order becomes 2 and banana's order becomes 1
in other words if we would update a computed column then my problem could be solved.
if i dont create order column as computed column then for every new entry i have to compute largest order so that i can write (largest order) + 1 for new entry's order. But i do not calculate largest number for every entry since it is costly.
So what should i do now?
I ve searched and the solutions i found creating trigger function etc. i do not want to do that too.
I might not have understood the question - I don't think its very clear.
but why use a counter to order the set, couldnt you just use a timestamp for each order and use that to dictate which order is more recent?
Name varchar(50),
OrderTime Datetime
INSERT INTO dbo.TEST (Name,OrderTime)
VALUES ('kiwi',Getdate())
ORDER BY OrderTime
if you needed an integer based on the order time you could use a rownumber function to return one;
ROW_NUMBER() OVER (ORDER BY OrderTime Desc) as OrderInt

Selecting the last entry in sql database for each id field

Hi all I am using SQL server.
I have one table that has a whole list of details on cars and events that have happened with those cars.
What I need is to be able to pick out the last entry for each vehicle based on their (Reg_No) registration number.
I have the following to work with
Table name = UnitHistory
Columns = indx (This is just the primary key, with increment)
Transdate(This is my date time column) and have Reg_No (Unique to each vehicle) .
There are about 45 vehicles with registration numbers if that helps?
I have looked at different examples but they all seem to have another table to work with.
Please help me. Thanks in advance for the help
WITH cte
) AS RowNumber
FROM unithistory
FROM cte
WHERE RowNumber = 1
If you only need the index and the transdatem and they are both incremental (I am assuming that a later date corresponds to a higher index number) then the simplest query would be:
SELECT Reg_No, MAX(indx), MAX(Transdate)
FROM UnitHistory
If you want all data for a known Reg_No, you can use Dd2's answer
If you want a list of all Reg_No's with thier data, you will need a subquery

Interview - Detect/remove duplicate entries

how to detect/remove duplicate entries from a database in a table where there is no primary key ?
[If we use 'DISTINCT' how do we know which record is the correct one and duplicate one ? ]
delete f
select ROW_NUMBER()
over (partition by
order by WhateverFieldYouWantSortedBy) as DelId
from YourTable
) as f
where DelId > 1
I created a view where DISTINCT actually was not a part of the query, but PARTITION. I needed the most recent entry to records with the same Ordernum and RecordType fields, discarding the others. The partitions are ordered by date, and then the top row is selected, like this:
OVER (PARTITION BY OrderNum, RecordType ORDER BY DateChanged DESC) rn
FROM HistoryTable SELECT * FROM q WHERE rn = 1
If we use 'DISTINCT' how do we know which record is the correct one
and duplicate one?
If you have duplicate rows then doesn't matter which duplicate is picked because they are all the same!
I guess when you say "there is no primary key" that you actually mean there is no simple single-column 'surrogate' candidate key such as an incrementing sequence of integers, preferably with no gaps, but that there is a multi-column compound 'natural' candidate key (though does not comprise all the columns).
If this is the case, you'd look for something to break ties e.g. a column named DateChanged as per #Dave's answer. Otherwise, you need to pick am arbitrary row e.g. the answer by #Surfer513 does this using the ROW_NUMBER() windowed function over (YourFirstPossibleDuplicateField, YourSecondPossibleDuplicateField) (i.e. your natural key) then picking the duplicate that got arbitrarily assigned the row number 1.

SQL Max Function per group

I have a complex query and which may return more than one record per group. There is a field that has a numeric sequential number. If in a group there is more than one record returned I just want the record with the highest sequential number.
I’ve tried using the SQL MAX function, but if I try to add more than one field it returns all records, instead of the one with the highest sequential field in that group.
I am trying to accomplish this in MS Access.
Edit: 4/5/11
Trying to create a table as an example of what I am trying to do
I have the following table:
Eventseq is a number field that increments for each ItemID. (Don’t ask me why, that’s how the table was created.) Each ItemID can have one or many Evenseq’s. I only need the last record (max(Eventseq)) PER each ItemTypeID.
Hope this helps any.
FROM YourTable A
INNER JOIN (SELECT GroupColumn, MAX(SequentialColumn) MaxSeq
FROM YourTable
GROUP BY GroupColumn) B
ON A.GroupColumn = B.GroupColumn AND A.SequentialColumn = B.MaxSeq
If your SequentialNumber is an ID (unique across the table), then you could use
select *
from tbl
where seqnum in (
select max(seqnum) from tbl
group by groupcolumn)
If it is not, an alternative to Lamak's query is the Access domain function DMAX
select *
from tbl
where seqnum = DMAX("seqnum", "tbl", "groupcolumn='" & groupcolumn & "'")
Note: if the groupcolumn is a date, use # instead of single quotes ' in the above, if it is a numeric, remove the single quotes.