I have a table with duplicate member names but these duplicates also have more than one date and a specific ID. I want the row of the member name with the most recent date (because a member could have been called more than one time a day) and biggest CallID number.
MemberID FirstName LastName CallDate CallID
0123 Carl Jones 2019-03-01 123456
0123 Carl Jones 2020-10-12 215789
0123 Carl Jones 2020-10-12 312546
2045 Sarah Marty 2021-05-09 387945
2045 Sarah Marty 2021-08-11 398712
4025 Jane Smith 2021-10-18 754662
4025 Jane Smith 2021-11-03 761063
8282 Suzy Aaron 2019-12-12 443355
8282 Suzy Aaron 2019-12-12 443386
So the desired output from this table would be
MemberID FirstName LastName CallDate CallID
0123 Carl Jones 2020-10-12 312546
2045 Sarah Marty 2021-08-11 398712
4025 Jane Smith 2021-11-03 761063
8282 Suzy Aaron 2019-12-12 443386
The query I've tried is
SELECT DISTINCT MemberID, FirstName, LastName, MAX(CallDate) as CallDate, MAX(CallID) as CallID
FROM dbo.table
GROUP BY MemberID, FirstName, LastName, CallDate, CallID
ORDER BY LastName asc;
But I'm still getting duplicate names with all their calldates and CallID
try removing CallDate, CallID from the group by clause.
So :
SELECT MemberID, FirstName, LastName, MAX(CallDate) as CallDate, MAX(CallID) as CallID
FROM dbo.table
GROUP BY MemberID, FirstName, LastName
ORDER BY LastName asc;
Hopefully that should do it.
you can use window function:
select * from (
select * , row_number() over (partition by MemberID order by CallID desc) rn
from tablename
) t where rn = 1
Related
I have a table that contains a series of related records (batches). Each batch has a unique id and can contain customer payments. I want to find if a batch is duplicate even if it is submitted on different days.
A batch can have 1 or more records. Here is sample data set:
BatchId InputAmount CustomerName BatchDate
------- ----------- ------------ ----------
182944 $475.00 Barry Smith 16-Mar-2019
182944 $260.00 John Smith 16-Mar-2019
182944 $265.00 Jane Smith 16-Mar-2019
182944 $400.00 Sara Smith 16-Mar-2019
182944 $175.00 Andy Smith 16-Mar-2019
182945 $475.00 Barry Smith 16-Mar-2019
182945 $260.00 John Smith 16-Mar-2019
182945 $265.00 Jane Smith 16-Mar-2019
182945 $400.00 Sara Smith 16-Mar-2019
182945 $175.00 Andy Smith 16-Mar-2019
183194 $100.00 Paul Green 21-Mar-2019
183195 $100.00 Nancy Green 21-Mar-2019
183197 $150.00 John Brown 20-Mar-2019
183197 $210.00 Sarah Brown 20-Mar-2019
183198 $150.00 John Brown 21-Mar-2019
183198 $210.00 Sarah Brown 21-Mar-2019
183200 $125.00 John Doe 20-Mar-2019
183200 $110.00 Sarah Doe 20-Mar-2019
183202 $125.00 John Doe 21-Mar-2019
183202 $110.00 Sarah Doe 21-Mar-2019
183202 $115.00 Paul Rudd 21-Mar-2019
Batches (182944, 182945) and (183197,183198) are duplicate while the other batches are not.
I thought maybe I could create a summary table with counts and sums and get close but I'm having trouble finding the true duplicates by including the names as well.
DECLARE #Summaries TABLE(
BatchId INT,
BatchDate DATETIME,
BatchCount INT,
BatchAmount MONEY)
-- Summarize the Data so we can look for duplicates
INSERT INTO #Summaries
SELECT a.BatchId, a.BatchDate, COUNT(*) AS RecordCount, SUM(a.InputAmount) AS BatchAmount
FROM Batches a
WHERE a.BatchDate BETWEEN '20190316' and '20190321'
GROUP BY a.BatchId, a.BatchDate
ORDER BY a.BatchId DESC
-- find the potential duplicate batches based on the Counts and Sums
SELECT A.* FROM #Summaries A
INNER JOIN (SELECT BatchCount, BatchAmount, BatchDate FROM #Summaries
GROUP BY BatchCount, BatchAmount, BatchDate
HAVING COUNT(*) > 1) B
ON A.BatchCount = B.BatchCount
AND A.BatchAmount = B.BatchAmount
WHERE DATEDIFF(DAY, a.BatchDate, b.BatchDate) BETWEEN -1 AND 1
Thank you for the help. I'm using a SQL Server 2012 database.
you can try like below
with cte as
(select BatchId from table_name
group by BatchId
having count(*)>1
) select * from table_name a where a.BatchId in (select BatchId from cte)
I have a table as follows:
Name Customer Date Amount
Joe Aaron 2012-01-03 14:12:00.0 150
Joe Aaron 2012-02-03 14:12:00.0 150
Joe Danny 2012-03-03 14:12:00.0 150
Joe Karen 2012-07-03 14:12:00.0 150
Ronald Blake 2012-05-03 14:12:00.0 1501
I would like to query to retrieve data by specifying the Name and if there are duplicates for Customer column, the records for the latest Date is
For example, if I want to query Joe, I will get the following result:
Name Customer Date Amount
Joe Aaron 2012-02-03 14:12:00.0 150
Joe Danny 2012-03-03 14:12:00.0 150
Joe Karen 2012-07-03 14:12:00.0 150
How should I do this? Tried distinct but it doesnt work that way.
EDIT
I'm using SQL Server. Sorry I re-edit my question and this should be the correct question that I am asking.
Did you Try this?
SELECT Name, Customer, MAX(Date) as CurrentDate, Amount
FROM data
Group By Name, Customer, Amount
HAVING Name = 'Joe'
Another option is using the WITH TIES clause in concert with Row_Number()
Example
Select top 1 with ties *
From YourTable
Where Name='Joe'
Order By Row_Number() over (Partition By Customer Order By Date Desc)
Returns
Name Customer Date Amount
Joe Aaron 2012-02-03 14:12:00.000 150
Joe Danny 2012-03-03 14:12:00.000 150
Joe Karen 2012-07-03 14:12:00.000 150
One option:
SELECT * FROM Table WHERE (NAME, CUSTOMER, DATE) IN (SELECT NAME, CUSTOMER, MAX(DATE) WHERE NAME = 'Joe' GROUP BY NAME, CUSTOMER)
I have the following records. It is broken based on username, date and testscore.
Username date testscore
mike 2016-11-30 23:41:10.143 1
mike 2016-11-27 23:41:11.143 12
mike 2016-11-24 23:41:11.143 16
john 2016-11-28 23:41:11.143 7
john 2016-11-25 23:42:11.143 12
john 2016-11-25 23:42:11.143 7
mike 2016-10-30 23:41:10.143 1
mike 2016-10-27 23:41:11.143 5
mike 2016-10-24 23:41:11.143 16
john 2016-10-28 23:41:11.143 12
john 2016-10-25 23:42:11.143 8
john 2016-10-24 23:42:11.143 2
For each one of the users I like to get the latest test score (month wise) for the year broken down by month with their score. In other words, I like to get the last score per user per month for a given year.
so for the above it would be
username date testscore
mike 2016-11-30 23:41:10.143 1
john 2016-11-28 23:41:11.143 7
mike 2016-10-30 23:41:10.143 1
john 2016-10-28 23:41:11.143 12
Perhaps using the WITH TIES clause in concert with Row_Number()
Select top 1 with ties *
from YourTable
Order by Row_Number() over (partition by UserName,year(date),month(date) order by date desc)
Returns
Username date testscore
john 2016-10-28 23:41:11.143 12
john 2016-11-28 23:41:11.143 7
mike 2016-10-30 23:41:10.143 1
mike 2016-11-30 23:41:10.143 1
You can use ROW_NUMBER():
WITH CTE AS
(
SELECT *,
RN = ROW_NUMBER() OVER(PARTITION BY username, CONVERT(VARCHAR(6),[date],112)
ORDER BY [date] DESC)
FROM dbo.YourTable
)
SELECT *
FROM CTE
WHERE RN = 1;
What about this :
select top 1 from <mytable> group by date.year(),date.month(),username order by date;
It seems you need a primary key, but you can still get the job done.
Essentially, you want the row for each username that corresponds to the latest date for that user.
SELECT username, date, testscore
FROM MyTable m1
WHERE m1.date = (SELECT MAX(m2.DATE))
FROM MyTable m2
WHERE m2.username = m1.username)
I'm trying to get the SUM(Values) for each Acct, but my issue is trying to get at least one entire row for a DISTINCT Acct with the SUM(Values).
I have some sample data for example:
Acct Values Name Street
123456789 100.20 John 66 Main Street
123456789 200.80 John 22 Main Avenue
222222222 50.25 Jane 1 Blvd
333333333 25.00 Joe 55 Test Ave
333333333 50.00 Joe 8 Douglas Road
555555555 75.00 Tim 12 Clark Ave
666666666 500.00 Tim 12 Clark Street
666666666 500.00 Tim 3 Main Rd.
My query consisted of:
SELECT DISTINCT Acct, SUM(Value) AS [TOTAL]
FROM TABLE_NAME
GROUP BY Acct
The above query gets me close to what I need, but I need the entire row.
Example below of what I am looking for:
Acct Total Name Addr1
123456789 301.00 John 66 Main Street
222222222 50.25 Jane 1 Blvd
333333333 75.00 Joe 55 Test Ave
555555555 75.00 Tim 12 Clark Ave
666666666 1000.00 Tim 12 Clark Street
Thanks.
If it does not matter what address you return, then you can apply and aggregate to the other columns:
SELECT Acct,
SUM(Value) AS [TOTAL],
max(name) name,
max(Street) addr1
FROM TABLE_NAME
GROUP BY Acct;
See SQL Fiddle with Demo
You can do this using window functions such as row_number() in most databases:
select acct, total, name, addr1
from (select t.*, row_number() over (partition by acct order by acct) as seqnum,
sum(value) over (partition by acct) as Total
from table_name
) t
where seqnum = 1;
I would use Windowing Functions (the OVER clause) to solve this.
SELECT DISTINCT
Acct
,SUM([Values]) OVER (PARTITION BY Acct) AS 'Total'
,Name
,FIRST_VALUE(Street) OVER (PARTITION BY Acct ORDER BY Street DESC) AS 'Addr1'
FROM TABLE_NAME
;
The nice thing about Windowing Functions is that you do not add things to a grouping that you do not need in your functions (e.g. SUM), instead you can focus on describing what you are looking for.
In the SQL above, we are saying we want the SUM of Values grouped by (or PARTITION BY as it is called in the OVER clause) Acct. The FIRST_VALUE allows use to return the first value of the street address. The same did not have a DATETIME column so it is hard to say what the order should be for the first value. There is also a LAST_VALUE windowing function. Assuming you do have a DATETIME column you would want to ORDER BY that column value, if not you can just pick some value like I did with Street (MAX might also be a good option then too, but having some type of DATETIME value would be the best way to do it).
Check out this SQL Fiddle: http://sqlfiddle.com/#!6/a474c/8
Here is the BOL about SUM using the OVER clause: http://msdn.microsoft.com/en-us/library/ms187810.aspx
Here is more info on FIRST_VALUE: http://blog.sqlauthority.com/2011/11/09/sql-server-introduction-to-first-_value-and-last_value-analytic-functions-introduced-in-sql-server-2012/
Here is a blog post I've done on the Windowing Functions: http://comp-phil.blogspot.com/2013/03/higher-order-functions.html
For each distinct Name, I want to select the first three rows with the earliest time_stamp (or smallest number in UNIXTIME). What is the correct query?
Start Table:
Name Log-in Time
-------- -----------------
Don 05:30:00
Don 05:35:32
Don 07:12:43
Don 09:52:23
Don 05:32:43
James 03:30:00
James 03:54:23
James 09:51:54
James 14:43:34
James 43:22:11
James 59:43:33
James 20:12:11
Mindy 05:32:22
Mindy 15:14:44
Caroline 10:02:22
Rebecca 20:43:32
End Table:
Name Log-in Time
-------- -----------------
Don 05:30:00
Don 05:35:32
Don 07:12:43
James 03:30:00
James 03:54:23
James 09:51:54
Mindy 05:32:22
Mindy 15:14:44
Caroline 10:02:22
Rebecca 20:43:32
WITH Table (Name, LoginTime, Row) AS
(
SELECT
Name,
LoginTime,
ROW_NUMBER() OVER (PARTITION BY Name ORDER BY LoginTime)
FROM SomeTable
)
SELECT
Name,
LoginTime
FROM Table
WHERE
Row <= 3
An ansi standard approach actually looks to work with the following:
http://www.sqlfiddle.com/#!2/b814d/15
SELECT
NAME
, LOGIN
FROM (
SELECT
test_first.NAME,
test_first.LOGIN,
COUNT(*) CNT
FROM
TABLE_NAME test_first
LEFT OUTER JOIN
TABLE_NAME test_second
ON (test_first.NAME = test_second.NAME)
WHERE
test_first.LOGIN <= test_second.LOGIN
GROUP BY
test_first.NAME, test_first.LOGIN) test_order
WHERE
test_order.CNT <= 3
ORDER BY
NAME ASC, LOGIN ASC