SQL newbie hear tearing my hair out trying to work this one out! I have a problem that is similar to this.
I have the following data and all fields are defined as CHARACTER - No DATE or TIME unfortunately thanks to poor design by the original DBA
Surname Name LoginDate LoginTime
Smith John 2014-06-25 13.00
Smith John 2014-06-24 14.00
Smith Susan 2014-06-26 09.00
Smith Susan 2014-06-26 11.30
Jones Bill 2014-06-25 09.30
Jones Bill 2014-06-25 12.30
Jones Bill 2014-06-26 07.00
What i'm trying to get on my output is the most recent log in for each person so I would expect to see
Smith John 2014-06-25 13.00
Smith Susan 2014-06-26 11.30
Jones Bill 2014-06-26 07.00
I've tried different combinations of temporary tables, using CONCAT on the Date and Time and the MAX function but I just keep drawing a blank. I think I know the tools and commands I need to use I just can't seem to string them all together properly.
I know I have to group them by name/surname then somehow combine the date and time in a way that lets me use the MAX function but when I output them I can never seem to get the LoginDate and LoginTime to appear as seperate fields on the output because they're not included in any GROUP BY that I use.
Is anyone able to show me how to do this as I haven't got a lot of hair to start with :)
Try this Query -
With MaxTimeStamp as
(
SELECT Surname, Name, Max(TIMESTAMP(LoginDate, LoginTime)) as LoginDateTime
FROM YourTable
group by Surname, Name
)
select c.Surname, c.Name, d.LoginDate, d.LoginTime,
from MaxTimeStamp c
Join YourTable d
on c.Surname = d.Surname
and c.Name = d.Name
and Date(c.LoginDateTime) = d.LoginDate
and Time(c.LoginDateTime) = d.LoginTime
You don't do this using group by. Here is a method using not exists:
select t.*
from table t
where not exists (select 1
from table t2
where t2.name = t.name and t2.surname = t.surname and
(t2.logindate > t.logindate or
t2.logindate = t.logindate and
t2.logintime > t.logintime
);
This transforms the query to: "Get me all rows from the table where there is no row with the same name and a later login." It would be simpler if the login information were stored as a single datetime.
Also, the above will work in just about all SQL databases. Many databases support window functions which can also be used for this problem.
In addition to Gordon Linoff's answer, if your SQL database supports window functions, you can also use this to get the desired result:
;With Cte As
(
Select *,
Row_Number() Over (Partition By Surname, Name Order By LoginDate Desc, LoginTime Desc) RN
From Table
)
Select Surname, Name, LoginDate, LoginTime
From Cte
Where RN = 1
Why not a group by? i´m missing anything?
Select name, surname, LoginDate, max(LoginTime) from table where (name,surname,login_date) in (select name, surname, max(LoginDate) from table group by name,surname) group by name, surname, LoginDate
If the LoginDate is not a date or similar datatype, it should run i guess...
Br.
I have split it into two parts, to find the newest login per user and do the group by. I then join to the data again to find the record.
select
dt.Surname
, dt.Name
, dt.LoginDate
, dt.LoginTime
from
dataTable dt
join
(select Surname, Name, MAX(LoginDate+LoginTime) ts from #temp group by surname, name) sub
on sub.Name = dt.Name
and sub.Surname = dt.Surname
and dt.LoginDate+dt.LoginTime = ts
WHERE
not sub.Name is null
You can also do it without the need to join, by splitting the timestamp out again.
select
sub.Surname
, sub.Name
, LEFT(ts,10) LoginDate
, RIGHT(ts,5) LoginTime
from
(select Surname, Name, MAX(LoginDate+LoginTime) ts from #temp group by surname, name) sub
Related
I have alot of columns in select statement, many of which are derived calculations.
I am trying to group multiple rows into one using listagg() in select statement, but without having to group by rest of columns in select statement. Along the lines of listagg() within group() over (partition by id).
Right now I have something along the lines of:
select id, listagg(distinct annual_bill_rate, ', ') within group (order by bill_rate) as annual_bill_rate, email, state
from table
group by 1,3,4
It doesnt seem to be possible to avoid this group by based on documentation, but are there alternatives? I have 30+ columns, I can't group by all of them. Thank you!
Sample data:
id bill_rate email state
1 0.0035 a#gmail.com NJ
1 0.0045 a#gmail.com NJ
1 0.0055 a#gmail.com NJ
2 0.0065 b#gmail.com NY
2 0.0075 b#gmail.com NY
3 0.0085 c#gmail.com PA
Desired result- WITHOUT GROUP BY:
id bill_rate email state
1 0.0035, 0.0045, 0.0055 a#gmail.com NJ
2 0.0065, 0.0075 b#gmail.com NY
3 0.0085 c#gmail.com PA
Here's a not so great idea to avoid typing the GROUP BY. It will almost definitely be slower and it's much more difficult to read and understand. I would be an unhappy fella if I ran into this in production code:
WITH table_distinct AS
(
SELECT DISTINCT id, email, state
FROM table
)
,table_group_by AS
(
SELECT id, listagg(distinct annual_bill_rate, ', ') within group (order by bill_rate) as annual_bill_rate
FROM table
GROUP BY id
)
SELECT
td.*,
tgb.annual_bill_rate
FROM table_distinct td
INNER JOIN table_group_by tgb
ON td.id = tgb.id;
Now you really only need monkey with that table_distinct CTE to add more columns to your result set.
There is a solution to your problem without a Distinct or Group By clause usage.
You can use LISTAGG as an analytical function as well and then you can remove the duplicates using row_number. PSB -
select * from
(select
id, listagg(annual_bill_rate, ', ') within group (order by bill_rate) over (partition by id order by id) as annual_bill_rate,
email, state, row_number() over (partition by id order by id) RN
from table) Tab where RN=1;
I have a question which looks easy but I can't figure it out.
I have the following:
Name Zipcode
ER 5354
OL 1234
AS 1234
BH 3453
BH 3453
HZ 1234
I want to find those rows where the ID does not define clearly one row.
So here I want to see:
OL 1234
AS 1234
HZ 1234
Or simply the zipcode enough.
I am sorry I forget to mention an important part. If the name is the same its not a problem, only if there are different names for the same zipcode.
So this means: BH 3453 does not return
I think this is what you want
select zipcode
from yourTable
group by zipcode
having count(*) > 1
It selects the zipcodes associated to more than one record
to answer your updated question:
select zipcode
from
(
select name, zipcode
from yourTable
group by name, zipcode
)
group by zipcode
having count(*) > 1
should do it. It might not be optimal in terms of performance in which case you could use window functions as suggested by #a1ex07
Try this:
select yt.*
from YOUR_TABLE yt
, (select zipcode
from YOUR_TABLE
group by zipcode
having count(*) > 1
) m
where yt.zipcode = m.zipcode
If you need just zipcode, use vc 74's solution. For all columns , solution based on window functions supposedly outperforms self join approach:
SELECT a.zipcode, a.name
FROM
(
SELECT zipcode, name, count(1) over(partition by zipcode) as cnt
FROM your_table
)a
WHERE a.cnt >1
I have been hacking at this for a while, but cannot seem to get it to work. I think some other SQL Server function or criteria that might be beyond me is needed to get it to work.
I have this sample data set:
Test1#gmail.com FirstName LastName
Test1#gmail.com DiffFirstName DiffLastName
MyOtherEmail#gmail.com Jane Doe
MyOtherEmail#gmail.com John Doe
MyOtherEmail#gmail.com Jack Doe
What I need is that data returned where we only take the first row is the email is duplicated and the other is discarded as we do not need it. So, this would be the selected return set:
Test1#gmail.com FirstName LastName
MyOtherEmail#gmail.com Jane Doe
I was trying group by, Over, Partition By, and Temp Tables, but I just could not seem to get around having all the names returned.
Any help would be greatly appreciated!
Thank you.
Dennis
Here you go. You don't even need to modify your table structure (although you still may want to).
SELECT Email, FirstName, LastName FROM (
SELECT Email, FirstName, LastName,
ROW_NUMBER() OVER(PARTITION BY Email ORDER BY LastName) AS RowNum
FROM Table1
) a
WHERE a.RowNum = 1;
The ORDER BY inside the PARTITION will determine which record floats to the top. I used LastName to sort by. Change it to whatever you want.
Use ROW_NUMBER() with PARTITION BY and ORDER BY. Modify the ORDER BY clause to suit your needs.
SQLFiddle
WITH contacts as (
SELECT ROW_NUMBER() OVER(PARTITION BY email ORDER BY first_name) AS row,
email, first_name, last_name
FROM contact
)
SELECT * FROM contacts where row = 1;
Similar to Ellesedil's answer except using a CTE instead of subquery. Note differences mentioned in the answer here.
Add an int ID to your table.
Make it the PK of your table.
Any table needs a PK anyway and you
don't have a suitable column for PK.
Then do something like this.
select t1.* from
TableName t1
inner join
(
select t0.email, min(t0.id) as id
from TableName t0
group by t0.email
) t2 on t1.id = t2.id
I think the DISTINCT keyword might be what you are looking for.
This question already has answers here:
How can I remove duplicate rows?
(43 answers)
Closed 9 years ago.
I need to remove duplicate fields from a temp table where the fields in question are not exactly identical.
For example, I have the following data:
First Last DOB
John Johnson 10.01.02
Steve Stephens 23.03.02
John Johnson 2.02.99
Dave Davies 3.03.03
Here, there are two John Johnson's. I only want to have one John Johnson - I don't care which one. So the resulting table will look something like:
First Last DOB
John Johnson 10.01.02
Steve Stephens 23.03.02
Dave Davies 3.03.03
I'm using TSQL, but I would prefer to use SQL that is non-proprietary.
Thanks
Sql Server supports Common Table Expression and Window Functions. With the use of ROW_NUMBER() which supplies rank number for every group, you can filter out records which rank is greater than one (this are duplicates one)
WITH records
AS
(
SELECT [First], [Last], DOB,
ROW_NUMBER() OVER (PARTITION BY [First], [Last] ORDER BY DOB) rn
FROM TableName
)
DELETE FROM records WHERE rn > 1
SQLFiddle Demo
TSQL Ranking Functions
You can use a CTE with ROW_NUMBER:
WITH CTE AS
(
SELECT RN = ROW_NUMBER() OVER (PARTITION BY First, Last ORDER BY First, Last)
FROM TempTable
)
DELETE CTE
WHERE RN > 1;
DEMO
Well, I'm late to the party, but here is a database agnostic solution:
SELECT A.*
FROM YourTable A
INNER JOIN (SELECT [First], [Last], MAX(DOB) MaxDob
FROM YourTable
GROUP BY [First], [Last]) B
ON A.[First] = B.[First]
AND A.[Last] = B.[Last]
AND A.DOB = B.MaxDob
And here is a sqlfiddle with a demo for it. (Thanks #JW for the schema of the fiddle)
You can use CTE with ROW_NUMBER() to accomplish this:
WITH CTE
AS
(
SELECT
First,
Last,
DOB,
ROW_NUMBER() OVER (PARTITION BY First, Last ORDER BY DOB) RN
FROM
Table1
)
DELETE FROM CTE WHERE RN > 1
SQL FIDDLE DEMO
I am validating a table which has a transaction level data of an eCommerce site and find the exact errors.
I want your help to find duplicate records in a 50 column table on SQL Server.
Suppose my data is:
OrderNo shoppername amountpayed city Item
1 Sam 10 A Iphone
1 Sam 10 A Iphone--->>Duplication to be detected
1 Sam 5 A Ipod
2 John 20 B Macbook
3 John 25 B Macbookair
4 Jack 5 A Ipod
Suppose I use the below query:
Select shoppername,count(*) as cnt
from dbo.sales
having count(*) > 1
group by shoppername
will return me
Sam 2
John 2
But I don't want to find duplicate just over 1 or 2 columns. I want to find the duplicate over all the columns together in my data. I want the result as:
1 Sam 10 A Iphone
with x as (select *,rn = row_number()
over(PARTITION BY OrderNo,item order by OrderNo)
from #temp1)
select * from x
where rn > 1
you can remove duplicates by replacing select statement by
delete x where rn > 1
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as cnt
FROM dbo.sales
GROUP BY OrderNo, shoppername, amountPayed, city, item
HAVING COUNT(*) > 1
SQL> SELECT JOB,COUNT(JOB) FROM EMP GROUP BY JOB;
JOB COUNT(JOB)
--------- ----------
ANALYST 2
CLERK 4
MANAGER 3
PRESIDENT 1
SALESMAN 4
Just add all fields to the query and remember to add them to Group By as well.
Select shoppername, a, b, amountpayed, item, count(*) as cnt
from dbo.sales
group by shoppername, a, b, amountpayed, item
having count(*) > 1
To get the list of multiple records use following command
select field1,field2,field3, count(*)
from table_name
group by field1,field2,field3
having count(*) > 1
Try this instead
SELECT MAX(shoppername), COUNT(*) AS cnt
FROM dbo.sales
GROUP BY CHECKSUM(*)
HAVING COUNT(*) > 1
Read about the CHECKSUM function first, as there can be duplicates.
Try this
with T1 AS
(
SELECT LASTNAME, COUNT(1) AS 'COUNT' FROM Employees GROUP BY LastName HAVING COUNT(1) > 1
)
SELECT E.*,T1.[COUNT] FROM Employees E INNER JOIN T1 ON T1.LastName = E.LastName
with x as (
select shoppername,count(shoppername)
from sales
having count(shoppername)>1
group by shoppername)
select t.* from x,win_gp_pin1510 t
where x.shoppername=t.shoppername
order by t.shoppername
First of all, I doubt that the result it not accurate? Seem like there are Three 'Sam' from the original table. But it is not critical to the question.
Then here we come for the question itself. Based on your table, the best way to show duplicate value is to use count(*) and Group by clause. The query would look like this
SELECT OrderNo, shoppername, amountPayed, city, item, count(*) as RepeatTimes FROM dbo.sales GROUP BY OrderNo, shoppername, amountPayed, city, item HAVING COUNT(*) > 1
The reason is that all columns together from your table uniquely identified each record, which means the records will be considered as duplicate only when all values from each column are exactly the same, also you want to show all fields for duplicate records, so the group by will not miss any column, otherwise yes because you can only select columns that participate in the 'group by' clause.
Now I would like to give you any example for With...Row_Number()Over(...), which is using table expression together with Row_Number function.
Suppose you have a nearly same table but with one extra column called Shipping Date, and the value may change even the rest are the same. Here it is:
OrderNo shoppername amountpayed city Item Shipping Date
1 Sam 10 A Iphone 2016-01-01
1 Sam 10 A Iphone 2016-02-02
1 Sam 5 A Ipod 2016-03-03
2 John 20 B Macbook 2016-04-04
3 John 25 B Macbookair 2016-05-05
4 Jack 5 A Ipod 2016-06-06
Notice that row# 2 is not a duplicate one if you still take all columns as a unit. But what if you want to treat them as duplicate as well in this case? You should use With...Row_Number()Over(...), and the query would look like this:
WITH TABLEEXPRESSION
AS
(SELECT *,ROW_NUMBER() OVER (PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier) --if you consider the one with late shipping date as the duplicate
FROM dbo.sales)
SELECT * FROM TABLEEXPRESSION
WHERE Identifier !=1 --or use '>1'
The above query will give result together with Shipping Date, for example:
OrderNo shoppername amountpayed city Item Shipping Date Identifier
1 Sam 10 A Iphone 2016-02-02 2
Note this one is different from the one with 2016-01-01, and the reason why 2016-02-02 has been filtered out is PARTITION BY OrderNo, shoppername, amountPayed, city, item ORDER BY [Shipping Date] as Identifier, and Shipping Date is NOT one of the column that need to be took care of for duplicate records, which means the one with 2016-02-02 still could be a perfect result for your question.
Now summarize it little bit, using count(*) and Group by clause together is the best choice when you only want to show all columns from Group byclause as the result, otherwise you will miss the columns that do not participate in group by.
While For With...Row_Number()Over(...), it is suitable in every scenario that you want to find duplicate records, however, it is little bit complicated to write the query and little bit over engineered compared to the former one.
If your purpose is to delete duplicate records from table, you have to use the later WITH...ROW_NUMBER()OVER(...)...DELETE FROM...WHERE one.
Hope this helps!
You can use below methods to find the output
with Ctec AS
(
select *,Row_number() over(partition by name order by Name)Rnk
from Table_A
)
select Name from ctec
where rnk>1
select name from Table_A
group by name
having count(*)>1
Select *
from dbo.sales
group by shoppername
having(count(Item) > 1)
Select EventID,count() as cnt
from dbo.EventInstances
group by EventID
having count() > 1
The following is running code:
SELECT abnno, COUNT(abnno)
FROM tbl_Name
GROUP BY abnno
HAVING ( COUNT(abnno) > 1 )