SQL Max returns duplicates if values are equal - sql

I've got a view that contains a document ID column and a date column as well as a dozen other columns that aren't relevant to this problem. There can be multiple rows with the same document ID, but the dates are usually different. This signifies that it's the same document, just a revision of it. The problem is if I have two rows where the document ID and the date are the same, I get both. I just want to get one. It doesn't matter which one, as long as I only get one.
The following has duplicates where the document ID and date are the same.
SELECT FSD.*
FROM vFSD FSD
INNER JOIN
(
SELECT InternalID, MAX(FileLastUploadedDate) AS FileLastUploadedDate
FROM vFSD
GROUP BY InternalID
) gFSD ON FSD.InternalID = gFSD.InternalID AND FSD.FileLastUploadedDate = gFSD.FileLastUploadedDate
I've also tried it with DISTINCT, but it didn't fix the problem.
SELECT DISTINCT FSD.*
FROM vFSD FSD
INNER JOIN
(
SELECT DISTINCT InternalID, MAX(FileLastUploadedDate) AS FileLastUploadedDate
FROM vFSD
GROUP BY InternalID
) gFSD ON FSD.InternalID = gFSD.InternalID AND FSD.FileLastUploadedDate = gFSD.FileLastUploadedDate

You can use ROW_NUMBER to only bring back one arbitrary row in the event that two are tied with the same greatest FileLastUploadedDate for an InternalID
WITH CTE
AS (SELECT *,
ROW_NUMBER() OVER (PARTITION BY InternalID
ORDER BY FileLastUploadedDate DESC) AS RN
FROM vFSD)
SELECT InternalID,
FileLastUploadedDate
/*Other desired columns*/
FROM CTE
WHERE RN = 1

Related

How create a unique ID based on conditions in SQL?

I would like to get a new ID, no matter the format (in the example below 11,12,13...)
Based on the following condition:
Every time the days column value is greater then 1 and not null then current row and all following ones will get the same ID until a new value will meet the condition.
Within the same email
Below you can see the expected 1 (in the format of XX)
I thought about using two conditions with the following order between them
Every time the days column value is greater then 1 then all following rows will get the same ID until a new value will meet the condition.
2.AND When lag (previous) is equal to 0/1/null.
Assuming you have an EmailDate column over which you're ordering (a DATETIME field, really), try something like this:
WITH
TableNameWithEmailDateIDs AS (
SELECT
*,
ROW_NUMBER() OVER (
ORDER BY
Email DESC,
EmailDate
) AS EmailDateID
FROM
TableName
),
IDs AS (
SELECT
*,
LEAD(EmailDateID, 1) OVER (
ORDER BY
Email,
EmailDate
) AS LeadEmailDateID
FROM
(
SELECT
*,
-- REMOVE +10 if you don't want 11 to be starting ID
ROW_NUMBER() OVER (
ORDER BY
Email DESC,
EmailDate
)+10 AS ID
FROM
TableNameWithEmailDateIDs
WHERE
Days > 1
OR Days IS NULL
) X
)
SELECT
COALESCE(TableName.EmailDate, IDs.EmailDate) AS EmailDate,
IDs.Email,
COALESCE(TableName.Days, IDs.Days) AS Days,
IDs.ID
FROM
IDs
LEFT JOIN TableNameWithEmailDateIDs TableName
ON IDs.Email = TableName.Email
AND TableName.EmailDateID BETWEEN
IDs.EmailDateID
AND IDs.LeadEmailDateID-1
ORDER BY
ID DESC,
TableName.EmailDate DESC
;
First, create a CTE that generates IDs for each distinct Email/Date combo (helpful for LEFT JOIN condition later). Then, create a CTE that generates IDs for rows that meet your condition (i.e. the important rows). Finally, LEFT JOIN your main table onto that CTE to fill in the "gaps", so to speak.
I suggest running each of the components of this query independently to fully understand what's going on.
Hope it helps!

SQL query for filtering duplicate rows of a column by the minimum DateTime of those corresponding rows

I have a SQL database table, "Helium_Test_Data", that has multiple entries based on the KeyID column (the KeyID represents a single tested part ). I need to query the entries and only show one entry per KeyID (part) based on the earliest creation date-time (format example is 2018-12-29 08:22:11.123). This is because the same part was tested several times but the first reading is the one I need to use. Here is the query currently tried:
SELECT mt.*
FROM Helium_Test_Data mt
INNER JOIN
(
SELECT
KeyID,
MIN(DateTime) AS DateTime
FROM Helium_Test_Data
WHERE PSNo='11166565'
GROUP BY KeyID
) t ON mt.KeyID = t.KeyID AND mt.DateTime = t.DateTime
WHERE PSNo='11167197'
AND (mt.DateTime > '2018-12-29 07:00')
AND (mt.DateTime < '2018-12-29 18:00') AND OK=1
ORDER BY KeyId,DateTime
It returns only the rows that have no duplicate KeyID present in the table whereas I need one row per every single KeyID (duplicate or not). And for the duplicate ones, I need the earliest date.
Thanks in advance for the help.
use row_number() window function which support most dbms
select * from
(
select *,row_number() over(partition by KeyID order by DateTime) rn
from Helium_Test_Data
) t where t.rn=1
or you could use corelated subquery
select t1.* from Helium_Test_Data t1
where t1.DateTime= (select min(DateTime)
from Helium_Test_Data t2
where t2.KeyID=t1.KeyID
)

Remove duplicate records except the first record in SQL

I want to remove all duplicate records except the first one.
Like :
NAME
R
R
rajesh
YOGESH
YOGESH
Now in the above I want to remove the second "R" and the second "YOGESH".
I have only one column whose name is "NAME".
Use a CTE (I have several of these in production).
;WITH duplicateRemoval as (
SELECT
[name]
,ROW_NUMBER() OVER(PARTITION BY [name] ORDER BY [name]) ranked
from #myTable
ORDER BY name
)
DELETE
FROM duplicateRemoval
WHERE ranked > 1;
Explanation: The CTE will grab all of your records and apply a row number for each unique entry. Each additional entry will get an incrementing number. Replace the DELETE with a SELECT * in order to see what it does.
Seems like a simple distinct modifier would do the trick:
SELECT DISTINCT name
FROM mytable
This is bigger code but it works perfectly where you don't take the original row but find all the duplicate Rows
select majorTable.RowID,majorTable.Name,majorTable.Value from
(select outerTable.Name, outerTable.Value, RowID, ROW_NUMBER()
over(partition by outerTable.Name,outerTable.Value order by RowID)
as RowNo from #Your_Table outerTable inner join
(select Name, Value,COUNT(*) as duplicateRows FROM #Your_Table group by Name, Value
having COUNT(*)>1)innerTable on innerTable.Name = outerTable.Name
and innerTable.Value = outerTable.Value)majorTable where MajorTable.ROwNo <>1

Filter SQL data by repetition on a column

Very simple basic SQL question here.
I have this table:
Row Id __________Hour__Minute__City_Search
1___1409346767__23____24_____Balears (Illes)
2___1409346767__23____13_____Albacete
3___1409345729__23____7______Balears (Illes)
4___1409345729__23____3______Balears (Illes)
5___1409345729__22____56_____Balears (Illes)
What I want to get is only one distinct row by ID and select the last City_Search made by the same Id.
So, in this case, the result would be:
Row Id __________Hour__Minute__City_Search
1___1409346767__23____24_____Balears (Illes)
3___1409345729__23____7______Balears (Illes)
What's the easier way to do it?
Obviously I don't want to delete any data just query it.
Thanks for your time.
SELECT Row,
Id,
Hour,
Minute,
City_Search
FROM Table T
JOIN
(
SELECT MIN(Row) AS Row,
ID
FROM Table
GROUP BY ID
) AS M
ON M.Row = T.Row
AND M.ID = T.ID
Can you change hour/minute to a timestamp?
What you want in this case is to first select what uniquely identifies your row:
Select id, max(time) from [table] group by id
Then use that query to add the data to it.
SELECT id,city search, time
FROM (SELECT id, max(time) as lasttime FROM [table] GROUP BY id) as Tkey
INNER JOIN [table] as tdata
ON tkey.id = tdata.id AND tkey.lasttime = tdata.time
That should do it.
two options to do it without join...
use Row_Number function to find the last one
Select * FROM
(Select *,
row_number() over(Partition BY ID Order BY Hour desc Minute Desc) as RNB
from table)
Where RNB=1
Manipulate the string and using simple Max function
Select ID,Right(MAX(Concat(Hour,Minute,RPAD(Searc,20,''))),20)
From Table
Group by ID
avoiding Joins is usually much faster...
Hope this helps

SQL Server query. JOIN by latest date

I have 3 tables:
UnitInfo(UnitID, ...),
UnitList(UnitID, ...)
UnitMonitoring(RecordID,UnitID, EventDate, ...)
UnitList is a subset of UnitInfo (in terms of data and in terms of columns). UnitMonitoring receives records time to time pertaining to UnitList (for every UnitID in UnitMonitoring we will have many records) filling EventDate. (UnitInfo has extended info).
I can't figure how to build a query so that for every UnitID in UnitList I get UnitMonitoring record such that EventDate is the latest one.
So far I have
SELECT a.UnitID, a.Name, b.EventDate
FROM UnitInfo a INNER JOIN UnitMonitoring b
WHERE a.UnitID IN (SELECT UnitID FROM UnitList)
which yields all records from UnitMonitoring
SELECT ul.unitId, um.*
FROM UnitList ul
OUTER APPLY
(
SELECT TOP 1 *
FROM UnitMonitoring umi
WHERE umi.UnitID = ul.unitID
ORDER BY
EventDate DESC
)
This will handle the duplicates correctly and will return all units (those with no records in UnitMonitoring will have NULL values in corresponding fields)
I chose to go with a Common Table Expression (CTE) to apply a ranking function (ROW_NUMBER):
;WITH NumberedMonitoring as (
SELECT RecordID,UnitID, EventDate, ...
ROW_NUMBER() over (PARTITION BY UnitID ORDER BY EventDate desc) rn
FROM UnitMonitoring
)
SELECT * FROM
UnitList ul
inner join
NumberedMonitoring nm
on
ul.UnitID = nm.UnitID and nm.rn = 1
But there are many different solutions (the above could also be done using a subselect).
Common Table Expressions (quoting from above link):
A common table expression (CTE) can be thought of as a temporary result set
That is, it lets you write a bit of the query first. In this case, I'm using it because I want to number the rows (using the ROW_NUMBER function). I'm telling it to restart the numbering for each UnitID (PARTITION BY UnitID), and within each unit ID, I want the rows numbered based on the EventDate descending (ORDER BY EventDate desc). This means that the row that receives row number 1 (within each UnitID partition) is the most recent row.
In the following select, I'm able to treat my CTE (NumberedMonitoring) as if it's any other table. So I'm just joining it to the UnitList table, and ensuring as part of the join conditions that I'm only selecting row number 1 (rn = 1)
Try:
Select M.*
From UnitList L
Join UnitMonitoring M
On M.UnitId = L.UnitId
Where M.EventDate =
(Select Max(EventDate) From UnitMonitoring
Where UnitId = M.UnitId)
If There are multiple records with the same UnitId and EventDate, then you can still use this technique, but you need to filter on a unique field, say the PK field in UnitMonitoring in this case is named PkId.
Select M.*
From UnitList L
Join UnitMonitoring M
On M.UnitId = L.UnitId
Where M.PkId =
(Select Max(PkId) From UnitMonitoring iM
Where UnitId = M.UnitId
And EventDate =
(Select Max(EventDate) From UnitMonitoring
Where UnitId = M.UnitId))
SELECT a.UnitID, a.Name, MAX(b.EventDate)
FROM UnitInfo a
INNER JOIN UnitMonitoring b
WHERE a.UnitID IN (SELECT UnitID FROM UnitList)
GROUP BY a.UnitID, a.Name