SQL Server 2005 Query remove duplicates via date - sql-server-2005

I searched and searched and can't seem to figure out this issue:
We have three tables which have data I need to collect and show in a view.
SELECT
C.FirstName, C.LastName,
aspnet_Membership.LoweredEmail,
MAX(Bill.Code) AS BCodes,
MAX(Bill.BillDate)
FROM
dbo.Client C
INNER JOIN
dbo.Bill ON C.Id = Bill.BId
INNER JOIN
dbo.aspnet_Membership ON aspnet_Membership.UserId = C.UserGUID
WHERE
((Bill.Code='ASDF'
OR Bill.Code='XYZ'
OR Bill.Code='QWE'
OR Bill.Code='JKL')
AND C.LastName!='Unassigned')
GROUP BY
LastName, FirstName, LoweredEmail, Code, BDate
Client table has: FirstName LastName and UserGuid
Bill table has: BCode, BillDate
aspnet_Membership table has: E-mail, UserId
RESULTS:
FirstName LastName E-mail BCode BillDate
FName1 Lname1 fname#isp.com XYZ 2010-05-13 00:00:00.000
Fname2 Lname2 fname2#isp2.com XYZ 2010-06-05 00:00:00.000
Fname2 Lname2 fname2#isp2.com ASD 2008-09-17 12:01:45.407
As you can see Fname2 shows up twice, only difference is in the BCode and BillDate.
How can I make this go with the latest date so I get Fname2 record with Bcode of XYZ with date of 2010-06-05.
Any help would be appreciated, thank you in advance.

Seeing that you're using SQL Server 2005, I would probably use a CTE (Common Table Expression) to do this - something like:
;WITH MyData AS
(
SELECT
c.FirstName, c.LastName,
asp.LoweredEmail,
b.Code AS BCodes, b.BillDate,
ROW_NUMBER() OVER (PARTITION BY c.LastName,c.FirstName
ORDER BY BillDate DESC) AS 'RowNum'
FROM
dbo.Client c
INNER JOIN
dbo.Bill b ON C.Id = b.BId
INNER JOIN
dbo.aspnet_Membership asp ON asp.UserId = c.UserGUID
WHERE
b.Code IN ('ASDF', 'JKL', 'QWE', 'XYZ')
AND c.LastName != 'Unassigned'
)
SELECT
FirstName, LastName, LoweredEmail, BCodes, BillDate
FROM
MyData
WHERE
RowNum = 1
This CTE with the ROW_NUMBER() clause will:
"partition" your data by (FirstName,LastName) - each pair of those values gets a new sequential "row number"
order those values within each partition by descending BillDate
So the resulting set of data has each newest entry for any (FirstName,LastName) group with RowNum = 1 - and that's the data I'm selecting from that CTE.
Does that work for you??

Perform a second join (using a LEFT JOIN) to find a later row in Bill table, and then filter any results where that join succeeds:
SELECT
C.FirstName, C.LastName,
aspnet_Membership.LoweredEmail,
MAX(Bill.Code) AS BCodes,
MAX(Bill.BillDate)
FROM dbo.Client C
INNER JOIN dbo.Bill
ON C.Id=Bill.BId
INNER JOIN dbo.aspnet_Membership
ON aspnet_Membership.UserId=C.UserGUID
LEFT JOIN dbo.Bill b2
ON Bill.BId = b2.BId and
b2.Code in ('ASDF','XYZ','QWE','JKL') and
b2.BDate > Bill.BDate
WHERE
b2.BId is null and
((Bill.Code='ASDF'
OR Bill.Code='XYZ'
OR Bill.Code='QWE'
OR Bill.Code='JKL')
AND C.LastName!='Unassigned')
GROUP BY LastName, FirstName, LoweredEmail, Code, BDate

Related

Using SQL Group By while keeping same varchar values

I have a query that is returning two values. I want to have the largest value so I do a group by, then MAX. However, I have three other columns(varchar) that I would like to remain consistent with the id that is brought in with max.
Example.
OId CId FName LName BName
18477 110 Hubba Bubba whoa
158 110 Test2 Person2 leee
What I want is
OId CId FName LName BName
18477 110 Hubba Bubba whoa
So I want to group them by CId. And O Id I want to keep the largest number. I can't use Min or Max for the FName, LName, or BName because I want them to be the one with the OId that is selected. The FName, LName and BName for the other row I don't even want/need.
I tried using SELECT TOP, but that only pulls in literally one row and I need multiple.
SQL
INSERT INTO #CustomerInfoAll(FName, LName, BName, OwnerId, CustomerId)
SELECT
-- what goes here --(o.FirstName) AS FName,
-- what goes here --(o.LastName) AS LName,
-- what goes here --(o.BusinessName) AS BName,
MAX(o.OId) AS OId,
(r.CId) AS CId
FROM Owner o
INNER JOIN Report r
ON o.ReportId = r.ReportId
WHERE r.CId IN (SELECT CId FROM #ThisReportAll)
AND r.Completed IS NOT NULL
GROUP BY r.CId
ORDER BY OId DESC;
Assuming you have SQL Server 2005 or higher:
INSERT INTO #CustomerInfoAll (FName, LName, BName, OwnerId, CustomerId)
SELECT
FirstName,
LastName,
BusinessName,
Id,
CId
FROM
(
SELECT
Seq = ROW_NUMBER() OVER (PARTITION BY r.CId ORDER BY o.Id DESC),
o.Id,
r.CId,
o.FirstName,
o.LastName,
o.BusinessName
FROM
dbo.Owner o
INNER JOIN dbo.Report r
ON o.ReportId = r.ReportId
WHERE
EXISTS ( -- can be INNER JOIN instead if `CId` is unique in temp table
SELECT *
FROM #ThisReportAll tra
WHERE r.CId = tra.CId
)
AND r.Completed IS NOT NULL
GROUP BY
o.Id,
r.CId,
o.FirstName,
o.LastName,
o.BusinessName
) x
WHERE
x.Seq = 1;
DO use full schema names on all your objects (dbo.Owner and dbo.Report).
DO use a semi-join (an EXISTS clause) or INNER JOIN instead of IN when possible.

Getting oldest Date SQL Complexity

I have a problem which I cannot resolve no matter what without using code, instead of SQL SCRIPT.
I have 2 tables
Person
ID Name Type
1 A A1
2 B A2
3 C A3
4 D A4
5 E A6
PersonHomes
HOMEID Location PurchaseDate PersonID
1 CA 20160101 1
2 CT 20160202 1
3 DT 20160101 2
4 BT 20170102 3
5 CT 20160303 1
6 CA 20160101 2
PersonID is foreign key of Person Table
There are no other rowz in the tables
So, we have to show detail of EACH person WITH home
The rule to write output is
IF Person has SINGLE entry in PersonHomes then use it
IF Person has MORE than ONE entry in PersonHomes then we have to look at purchase date, IF they are different then USE the PersonHomes ROW with OLDEST date in it. AND DELETE OTHER ROWS OF HIM
IF Person has MORE than ONE entry in PersonHomes then we have to look at purchase date, and IF DATES are SAME then USE the ROW with LOWER ID AND DELETE THE OTHER ROWS of HIM
This is very easy to do in code but using SQL it is complex
What I tried was to
WITH PERSON (
SELECT * FROM Person)
SELECT * FROM PERSON
INNER JOIN PersonHomes ON Person.ID = PersonHomes.PersonID
WHERE PersonHomes.PersonID = CASE WHEN (COUNT (*) FROM PersonHomes...)
Then I think I can write SQL function ?
I am stuck, Please help!
SAMPLE OUTPUT for PERSON A
ID NAME Type HOMEID Location PurchaseDate
1 A A1 5 CT 20160303
For PERSON B
ID NAME Type HOMEID Location PurchaseDate
1 A A2 3 DT 20160101
Aiden
It is not so easy to get desired output with SQL. we should write more than one sql queries.
First I created a temp table which consists of home details:
select PersonID, count(*) as HomeCount, count(distinct PurchaseDate) as
PurchaseDateCount, min(PurchaseDate) oldestPurchaseDate, min(HOMEID) as
LowerHomeID into #PersonHomesAbstractTable from PersonHomes group by PersonID
Then for the output of your first rule:
select p.ID, p.NAME, p.Type, ph.HOMEID, ph.Location, ph.PurchaseDate from Person p
inner join #PersonHomesAbstractTable a on p.ID = a.PersonID
inner join PersonHomes ph on p.ID = ph.PersonID
where a.HomeCount = 1
For the output of your second rule:
select p.ID, p.NAME, p.Type, ph.HOMEID, ph.Location, ph.PurchaseDate
from Person p inner join #PersonHomesAbstractTable a on p.ID = a.PersonID
inner join PersonHomes ph on p.ID = ph.PersonID and
ph.PurchaseDate = a.oldestPurchaseDate
where a.HomeCount > 1 and a.PurchaseDateCount <> 1
And finally for the output of your third rule:
select p.ID, p.NAME, p.Type, ph.HOMEID, ph.Location, ph.PurchaseDate
from Person p inner join #PersonHomesAbstractTable a on p.ID = a.PersonID
inner join PersonHomes ph on p.ID = ph.PersonID and
ph.HOMEID = a.LowerHomeID
where a.HomeCount > 1 and a.PurchaseDateCount = 1
Of course there are some other ways, but now this way is come to my mind.
If you want to delete undesired rows, you can use scripts below:
delete from PersonHomes where HOMEID in
(
select ph.HOMEID from #PersonHomesAbstractTable a
inner join PersonHomes ph on a.PersonID = ph.PersonID and
ph.PurchaseDate <> a.oldestPurchaseDate
where a.HomeCount > 1 and a.PurchaseDateCount <> 1
union
select p.HOMEID from #PersonHomesAbstractTable a
inner join PersonHomes ph on a.PersonID = ph.PersonID and
ph.HOMEID <> a.LowerHomeID
where a.HomeCount > 1 and a.PurchaseDateCount = 1
)
You seem to have a prioritization query. I would solve this using row_number():
select ph.*
from (select ph.*,
row_number() over (partition by personid
order by purchasedate asc, homeid asc
) as seqnum
from personhomes ph
) ph
where seqnum = 1;
This doesn't actually change the data in the table. Although you say delete, it seems like you just want a result set with one home per person.
This is shortest approach got by Link
;WITH cte AS
(
SELECT *, RowN = ROW_NUMBER() OVER (PARTITION BY ID ORDER BY AddressMoveDate DESC) FROM Address
)
DELETE FROM cte WHERE RowN > 1

SQL - Way to find duplicate fields within multiple rows?

I'm wondering if there's a way to return duplicates of parts of rows.
IDTable setup:
ID# | Customer | EventID#
1 | Steve | 123
2 | Steve | 123
3 | John | 987
4 | John | 924
Since Steve and 123 appear twice together, I want to treat that as a 'duplicate' even though they have two different ID#'s. And if there's a 'duplicate', ideally I'd like to only return columns: ID#, Customer & EventID#. So for the above IDTable example, only return:
1 | Steve | 123
2 | Steve | 123
By running the following, it counts each ID + Customer + EventID# separately and returns all Count values as 1 (I'm using SQL Server 2008):
SELECT ID#, Customer, EventID#, COUNT({fn CONCAT(Customer,EventID#)})
FROM IDTable
GROUP BY ID#, Customer, EventID#
HAVING COUNT({fn CONCAT(Customer,EventID#)}) > 1
If I take out the ID# from the Select, it'l work but then we won't know what the ID#'s are.
EDIT:
I'm joining in the select columns from other tables. I initially left those out for simplicity sake by when trying to apply solutions below I'm getting confused. Apologies! Here's what is more in line with what I'm using:
SELECT A.ID#, C.Customer, E.EventID#
FROM IDTable A
INNER JOIN CustomerTable C
ON C.AccountID = A.AccountID
INNER JOIN EventTable E
ON E.AccountType = C.AccountType
WHERE C.StatusID = 'Active'
Self-Join should do the trick:
SELECT A.ID#, A.Customer, A.EventID#
FROM Table A
INNER JOIN Table A2 ON A.Customer = A2.Customer
AND A.EventID# = A2.EventID#
AND A.ID# <> A2.ID#
Edit for your joins:
You can still use a self-join, just with derived tables like so:
SELECT A.ID#, A.Customer, A.EventID#
FROM (SELECT ID#, Customer, EventID#
FROM IDTable A
INNER JOIN CustomerTable C ON C.AccountID = A.AccountID
INNER JOIN EventTable E ON E.AccountType = C.AccountType
WHERE C.StatusID = 'Active') A
INNER JOIN (SELECT ID#, Customer, EventID#
FROM IDTable A
INNER JOIN CustomerTable C ON C.AccountID = A.AccountID
INNER JOIN EventTable E ON E.AccountType = C.AccountType
WHERE C.StatusID = 'Active') A2 ON A.Customer = A2.Customer
AND A.EventID# = A2.EventID#
AND A.ID# <> A2.ID#
And cleaner with #TEMP:
SELECT A.ID#, C.Customer, E.EventID#
INTO #TEMP
FROM IDTable A
INNER JOIN CustomerTable C
ON C.AccountID = A.AccountID
INNER JOIN EventTable E
ON E.AccountType = C.AccountType
WHERE C.StatusID = 'Active'
;
SELECT A.ID#, A.Customer, A.EventID#
FROM #TEMP A
INNER JOIN #TEMP A2 ON A.Customer = A2.Customer
AND A.EventID# = A2.EventID#
AND A.ID# <> A2.ID#
Most version of SQL support window functions. The easiest way to solve this is:
select id, customer, eventid#
from (select i.*, count(*) over (partition by customer, eventid#) as cnt
from idtable i
) i
where cnt > 1;
SELECT i.*
FROM
IDTable i
INNER JOIN
(SELECT Customer, EventID#
FROM IDTable
GROUP BY ID#, Customer, EventID#
HAVING COUNT(*) > 1) t
ON i.Customer = t.Customer
AND i.EventId# = t.EventId#
There may be other ways of doing this and if you tag your specific rdbms (sql-server, oracle, mysql etc.) I am sure you will get additional answers but here is one way to do it which is to use your query (without the ID column) to identify the duplicates and then match it back to your original table via a inner join.
From #Aaron_D's answer I'd not join subqueries, instead you can do:
SELECT A.ID#, C.Customer, E.EventID#
FROM IDTable A
INNER JOIN IDTable B ON A.ID# = B.ID#
AND A.AccountID <> B.AccountID
INNER JOIN CustomerTable C
ON C.AccountID = A.AccountID
INNER JOIN EventTable E
ON E.AccountType = C.AccountType
WHERE C.StatusID = 'Active'
Since both CustomerTable AND EventTable are in last instance derived from AccountID it will work fine and will be faster.

Select Most Recent Date with Inner Join

Running into a wall when trying to pull info from tables similar to those below. Not sure how to approach this.
The results should have the most recent TRANSAMT for each ACCNUM along with NAME and address.
Select A.ACCNUM, MAX(B.TRANSAMT) as BAMT, B.ADDRESS from
From TableA A inner join TableB on A.ACCNUM = B.ACCNUM
This is what i have so far. Any help would be appreciated.
TableA
ACCNUM NAME ADDRESS
00001 R. GRANT Miami, FL
00002 B. PAUL Dallas, TX
TableB
ACCNUM TRANSAMT TRANSDATE
00001 150 1/1/2015
00001 200 13/2/2015
00002 100 2/1/205
00003 50 18/2/2015
You can use the ANSI standard row_number() function in most databases. This allows you to do conditional aggregation:
select a.accnum, a.name, b.amount, a.address
from tableA a left join
(select b.*, row_number() over (partition by accnum order by transdate desc) as seqnum
from tableB b
) b
on a.accnum = b.accnum and b.seqnum = 1;
Note: I changed the join to a left join. This will keep all records in tableA, even those with no matches. I am not sure if that is the intention of your query.
You can use row_number to order rows per each account number by the most recent first.
select accnum, amt, name, address
from (
select A.ACCNUM, B.TRANSAMT as BAMT, B.ADDRESS,A.Name,
row_number() over(partition by a.accnum order by b.transdate desc) as rn
From TableA A
inner join TableB on A.ACCNUM = B.ACCNUM
) t
where rn = 1;
Please note this will not work if you are using MySQL.
This one with no ROW_NUMBER():
with find_max as(
select acc_name,max(TRANSDATE) as TRANSDATE from talbeB group by acc_name)
select find_max.ACCNUM , A.TRANSAMT ,
find_max.TRANSDATE , B.ADDRESS,B.Name
from tableA as A
join find_max on find_max.ACCNUM=A.ACCNUM and find_max.ACCNUM=A.ACCNUM
join TableB B on A.ACCNUM = B.ACCNUM
First find the max date for each acc_name, the join both of tables to it.
Will work on most data bases.

SQL to find the most recent account transaction for each customer

EDIT: I'm using SQL Server
I looked around for an example of this but couldn't find anything so I'll start a new thread...
I have 3 tables.
Account
AccountID
FirstName
LastName
AccountEnroll
AccountEnrollID
AccountID
AccountTypeID
EnrollDate
AccountType
AccountTypeID
AccountType
The AccountEnroll table is a bridge table to track each customer's enrollment history. I want to use the "EnrollDate" column to determine the current account type for each customer. I need to write a SELECT statement that can display AccountID, FirstName, LastName, (current)AccountType.
I am having trouble getting my resultset to display only the MAX(EnrollDate) record for each customer.
You can use common table expressions to do this pretty simply.
with cte as (
select A.FirstName, A.LastName, AT.AccountType, AE.EnrollDate, row_number() over (partition by AE.AccountID order by EnrollDate desc) as [rn]
from Account as A
inner join AccountEnrolled as AE
on A.AccountId = AE.AccountId
inner join AccountType as AT
on AE.AccountTypeId = AT.AccountTypeId
)
select FirstName, LastName, AccountType, EnrollDate
from cte
where rn = 1
Try this:
SELECT
a.AccountID, a.FirstName, a.LastName,
at.AccountType AS 'Current AccountType'
FROM Account a
INNER JOIN
(
SELECT AccountID, MAX(EnrollDate) MaxDate
FROM AccountEnroll
GROUP BY AccountID
) t
INNER JOIN AccountEnroll ae ON ae.AccountID = t.AccountID
AND ae.EnrollDate = t.MaxDate
INNER JOIN AccountType at ON ae.AccountTypeID = at.AccountTypeID
You could use a correlated sub-query:
SELECT A.AccountId, A.FirstName, A.LastName, AT.AccountType
FROM Account A
JOIN AccountEnroll AE
ON A.AccountId = AE.AccountId
JOIN AccountType AT
ON AE.AccountTypeId = AT.AccountTypeId
WHERE NOT EXISTS (
SELECT 1
FROM AccountEnroll
WHERE AccountId = AE.AccountId
AND EnrollDate > AE.EnrollDate
)