SQL group by query - sql

I have a a UNION ALL on two result sets. This results in some like below.
TOTALSTABLE
FAMILYNAME-----FIRSTNAME------NUMBER------TOTAL
Brown Dave 1234 500.00
Brown Dave 1234 300.00
Smith Frank 4321 123.00
Smith Frank 4321 456.00
I run the following query...
SELECT TOTALSTABLE.FAMILYNAME,
TOTALSTABLE.FIRSTNAME,
TOTALSTABLE.NUMBER,
SUM(TOTALSTABLE.TOTAL) COMBINEDTOTAL
FROM TOTALSTABLE
GROUP BY TOTALSTABLE.FAMILYNAME,
TOTALSTABLE.FIRSTNAME,
TOTALSTABLE.NUMBER
Which gives me something like...
FAMILYNAME-----FIRSTNAME------NUMBER------COMBINEDTOTAL
Brown Dave 1234 800.00
Smith Frank 4321 579.00
This is what i need. However I need to add an additional column which is null in the result i am attempting to do a UNION ALL on.
Example:
T1
FAMILYNAME-----FIRSTNAME------NUMBER------TOTAL-------DATE
Brown Dave 1234 500.00 01/01/2001
Smith Frank 4321 123.00 01/01/2001
T2
FAMILYNAME-----FIRSTNAME------NUMBER------TOTAL-------DATE
Brown Dave 1234 300.00 NULL
Smith Frank 4321 456.00 NULL
COMBINED (UNION ALL)
FAMILYNAME-----FIRSTNAME------NUMBER------TOTAL-------DATE
Brown Dave 1234 500.00 01/01/2001
Brown Dave 1234 300.00 NULL
Smith Frank 4321 123.00 01/01/2001
Smith Frank 4321 456.00 NULL
I need to get the combined total like in the first example, using the date that isnt null.
Example of desired results.
FAMILYNAME-----FIRSTNAME------NUMBER------COMBINEDTOTAL----DATE
Brown Dave 1234 800.00 01/01/2001
Smith Frank 4321 579.00 01/01/2001
Can anyone tell me what I need to do here? Cheers.

It sounds like you just want
SELECT familyName,
firstName,
number,
SUM(total) combinedTotal,
MAX(date) date
FROM (<<union all query>>)
GROUP BY familyName,
firstName,
number

AS you state, you can try follow code
SELECT * FROM
(SELECT c1, c2, c3 FROM T1
UNION ALL
SELECT c1, c2, c3 FROM T2
) T
WHERE T.c3 IS NOT NULL
GROUP BY T.c1, T.c2, T.c3

Related

SQL - Joining Two Tables and Sum of Column

I have a situation where I have table A that have member info and table B that has a list of transactions for the members. I need to retrieve fields from Table A with a total of the transaction column for each member.
I have struggled with the proper SQL syntax and it keeps error on me. We are using MS Reporting Services to develop this if that helps.
Table A:
Member ID LName FName Phone
----------------------------------------------
1234 Doe John 555-555-5555
5678 Doe Jane 555-555-5550
Table B:
Member ID Transaction Date Transaction Total
----------------------------------------------------
1234 01-01-2020 120.00
1234 01-05-2020 25.00
5678 01-01-2020 50.00
5678 01-10-2020 50.00
5678 01-11-2020 25.00
1234 01-15-2020 25.00
Desired output:
Member ID: Last Name: First Name: Total:
----------------------------------------------------
1234 Doe John 170.00
5678 Doe Jane 125.00
You are looking for aggregation with group by & sum() :
select a.memberid, a.lname, a.fname, sum(b.transactiontotal) as total
from a inner join
b
on b.memberid = a.memberid
group by a.memberid, a.lname, a.fname;

Join multiple tables to return only one result for each record from main table

Currently I have three tables I am joining. I have data that was migrated from one system(old) to another system(new). I need to compare this data to ensure matches but also mismatches. I have three tables. One has the list of accounts being moved. The two systems have differnt ID types so this first table is a list of all IDs for the two tables and each account that was moved. So this is my base population.
ID1 ID2
ABC 123
ABC 123
ABC 123
DEF 456
DEF 456
DEF 456
I then have table 2 which is all the data from the old system.
ID Fname Lname
ABC John Smith
ABC Tom Smith
ABC Kate Smith
DEF Jason Thomas
DEF Ruby Thomas
DEF Alex Johnson
Then table 3 is all the data found in the new system.
ID Fname Lname
123 John Smith
123 Tom Smith
123 Kate Smith
456 Jason Thomas
456 Ruby Thomas
Right now when I join these tables on the ID I get a lot more rows than I need.
When I do my join I receive this:
ID Fname_old Lname_old ID2 Fname_new Lname_new
ABC John Smith 123 John Smith
ABC John Smith 123 Tom Smith
ABC John Smith 123 Kate Smith
I am trying to join them where it only returns the row that matches, and if it can't find a match I should still get the ID from the ID file and the data from table 2(old data) as this is the data that was sent to the new system.
ID1 ID2 Fname_old Lname_old Fname_new Lname_new
ABC 123 John Smith John Smith
ABC 123 Tom Smith Tom Smith
ABC 123 Kate Smith Kate Smith
DEF 456 Jason Thomas Jason Thomas
DEF 456 Ruby Thomas Ruby Thomas
DEF 456 Alex Johnson
The code I am using is:
Select a.ID1, a.ID2, b.fname as fname_old, b.lnam as lname_old,
c.fname as fname_new, c.lname as lname_new
from table1 a
left join table2 b
on a.ID1 = b.ID
left join table3 c
on a.ID2 = c.ID
If its just duplicate rows in your first table you could try distincting them in a derived table like below:
Select a.ID1, a.ID2, b.fname as fname_old, b.lnam as lname_old,
c.fname as fname_new, c.lname as lname_new
from (SELECT DISTINCT ID1, ID2 FROM table1) a
left join table2 b
on a.ID1 = b.ID
left join table3 c
on a.ID2 = c.ID
You are joining them on ID columns.
ID columns are usually UNIQUE while you have multiple identical IDs and specify join on those IDs.
Since you need to compare data, i suggest you lookup MATCH and how it works as that seems to be closer to what you are looking for here.
You can get a match using row_number():
Select a.ID1, a.ID2, b.fname as fname_old, b.lnam as lname_old,
c.fname as fname_new, c.lname as lname_new
from (select a.*,
row_number() over (partition by id order by id) as seqnum
from table1 a
) a left join
(select b.*,
row_number() over (partition by id order by id) as seqnum
from table2 b
) b
on a.ID1 = b.ID and a.seqnum = b.seqnum
(select c.*,
row_number() over (partition by id order by id) as seqnum
from table3 c
) c
on a.ID2 = c.ID and a.seqnum = c.seqnum;
Note: This does not preserve the "ordering" of the original values, so any rows can be matched with any other. Why? SQL tables represent unordered sets.
If there is an ordering in the tables, you can use that in the order by clauses to get a match consistent with the ordering.
If you have a compare chance for name and last name this code will work.
select DISTINCT a.ID1, a.ID2, b.fname as fname_old, b.lname as lname_old, c.fname as
fname_new, c.lname as lname_new from table2 b
left join table1 a on a.ID1=b.ID
left join table3 c on a.ID2=c.ID and b.Fname=c.Fname and b.Lname=c.Lname
My Result :
ID1 ID2 fname_old lname_old fname_new lname_new
ABC 123 John Smith John Smith
ABC 123 Kate Smith Kate Smith
ABC 123 Tom Smith Tom Smith
DEF 456 Alex Johnson NULL NULL
DEF 456 Jason Thomas Jason Thomas
DEF 456 Ruby Thomas Ruby Thomas
You say that this is data transferred to two systems. So you expect all data to match. You could hence reduce the query to only find data that doesn't match, if any.
Here is a SQL standard compliant query. You tagged your request with hive. I don't know about hive, so you may have to adjust the query.
select
t2.id as id1,
t3.id as id2,
t2.fname as fname_old,
t2.lname as lname_old,
t3.fname as fname_new,
t3.lname as lname_new
from table2 t2
full outer join t3
on t3.fname = t2.fname
and t3.lname = t2.lname
and exists (select null from table1 t1 where t1.id1 = t2.id and t1.id2 = t3.id)
where t2.id is null or t3.id is null;
This is a full anti join. It returns all rows that have no exact match in the other table. It doesn't, however guesstimate which deviating rows may be pairs. You will get a result like this:
ID1 | ID2 | Fname_old | Lname_old | Fname_new | Lname_new
----+-----+-----------+-----------+-----------+----------
DEF | | Alex | Johnson | |
GHI | | Jone | Miller | |
GHI | | Maxx | Miller | |
GHI | | Fritz | Miller | |
| 789 | | | Joan | Miller
| 789 | | | Max | Miller
| 799 | | | Fritz | Miller
As you see, you would have to examine this result manually. But ideally the query shouldn't return any row at all, which would just prove that everything went as expected and nobody (system or person) messed with the data :-)

Find duplicate batches based on multiple columns

I have a table that contains a series of related records (batches). Each batch has a unique id and can contain customer payments. I want to find if a batch is duplicate even if it is submitted on different days.
A batch can have 1 or more records. Here is sample data set:
BatchId InputAmount CustomerName BatchDate
------- ----------- ------------ ----------
182944 $475.00 Barry Smith 16-Mar-2019
182944 $260.00 John Smith 16-Mar-2019
182944 $265.00 Jane Smith 16-Mar-2019
182944 $400.00 Sara Smith 16-Mar-2019
182944 $175.00 Andy Smith 16-Mar-2019
182945 $475.00 Barry Smith 16-Mar-2019
182945 $260.00 John Smith 16-Mar-2019
182945 $265.00 Jane Smith 16-Mar-2019
182945 $400.00 Sara Smith 16-Mar-2019
182945 $175.00 Andy Smith 16-Mar-2019
183194 $100.00 Paul Green 21-Mar-2019
183195 $100.00 Nancy Green 21-Mar-2019
183197 $150.00 John Brown 20-Mar-2019
183197 $210.00 Sarah Brown 20-Mar-2019
183198 $150.00 John Brown 21-Mar-2019
183198 $210.00 Sarah Brown 21-Mar-2019
183200 $125.00 John Doe 20-Mar-2019
183200 $110.00 Sarah Doe 20-Mar-2019
183202 $125.00 John Doe 21-Mar-2019
183202 $110.00 Sarah Doe 21-Mar-2019
183202 $115.00 Paul Rudd 21-Mar-2019
Batches (182944, 182945) and (183197,183198) are duplicate while the other batches are not.
I thought maybe I could create a summary table with counts and sums and get close but I'm having trouble finding the true duplicates by including the names as well.
DECLARE #Summaries TABLE(
BatchId INT,
BatchDate DATETIME,
BatchCount INT,
BatchAmount MONEY)
-- Summarize the Data so we can look for duplicates
INSERT INTO #Summaries
SELECT a.BatchId, a.BatchDate, COUNT(*) AS RecordCount, SUM(a.InputAmount) AS BatchAmount
FROM Batches a
WHERE a.BatchDate BETWEEN '20190316' and '20190321'
GROUP BY a.BatchId, a.BatchDate
ORDER BY a.BatchId DESC
-- find the potential duplicate batches based on the Counts and Sums
SELECT A.* FROM #Summaries A
INNER JOIN (SELECT BatchCount, BatchAmount, BatchDate FROM #Summaries
GROUP BY BatchCount, BatchAmount, BatchDate
HAVING COUNT(*) > 1) B
ON A.BatchCount = B.BatchCount
AND A.BatchAmount = B.BatchAmount
WHERE DATEDIFF(DAY, a.BatchDate, b.BatchDate) BETWEEN -1 AND 1
Thank you for the help. I'm using a SQL Server 2012 database.
you can try like below
with cte as
(select BatchId from table_name
group by BatchId
having count(*)>1
) select * from table_name a where a.BatchId in (select BatchId from cte)

SQL Query: How to select multiple instances of a single item without collapsing into a group?

I'm trying to do with following with an SQL query in Impala. I've got a single data table that has (among other things) two columns with values that intersect multiple times. For example, let's say we have a table with two columns for related names and phone numbers:
Names Phone Numbers
John Smith (123) 456-7890
Rob Johnson (123) 456-7890
Greg Jackson (123) 456-7890
Tom Green (123) 456-7890
Jack Mathis (123) 456-7890
John Smith (234) 567-8901
Rob Johnson (234) 567-8901
Joe Wolf (234) 567-8901
Mike Thomas (234) 567-8901
Jim Moore (234) 567-8901
John Smith (345) 678-9012
Rob Johnson (345) 678-9012
Toby Ellis (345) 678-9012
Sam Wharton (345) 678-9012
Bob Thompson (345) 678-9012
John Smith (456) 789-0123
Rob Johnson (456) 789-0123
Kelly Howe (456) 789-0123
Hank Rehms (456) 789-0123
Jim Fellows (456) 789-0123
What I need to get from this table is a selection of each item from the Name column that has multiple entries from the Phone Numbers column associated with it, like this:
Names Phone Numbers
John Smith (123) 456-7890
John Smith (234) 567-8901
John Smith (345) 678-9012
John Smith (456) 789-0123
Rob Johnson (123) 456-7890
Rob Johnson (234) 567-8901
Rob Johnson (345) 678-9012
Rob Johnson (456) 789-0123
This is the query I've got so far, but it's not quite giving me the results I'm looking for:
SELECT a.name, a.phone_number, b.phone_number, b.count1
FROM databasename a
INNER JOIN (
SELECT phone_number, COUNT(phone_number) as count1
FROM databasename
GROUP BY phone_number
) b
ON a.phone_number = b.phone_number;
Any ideas on how to improve my query to get the results I'm looking for?
Thank you.
Working with your query...
This generates a subset by name of users having more than 1 phone number it then joins back to the entire set based on name returning all phone numbers for users having more than 1 phone number. however if a user has the same phone number listed more than once it would get returned. to eliminate those if needed, add distinct to the count in the inline view.
SELECT a.name, a.phone_number
FROM databasename a
INNER JOIN (
SELECT name, COUNT(phone_number) as count1
FROM databasename
GROUP BY name
having COUNT(phone_number) > 1
) b
on a.name = b.name
Order by a.name, a.phone_Number
One method is to use exists:
select t.*
from tablename t
where exists (select 1 from tablename t2 where t2.name = t.name and t2.phonenumber <> t.phonenumber)
SELECT DISTINCT x.*
FROM my_table x
JOIN my_table y
ON y.name = x.name
AND y.phone <> x.phone;

sql "group by" same PersonID, different PersonNames. Eliminate duplicates

I have a (rather dirty) datasource (excel) that looks like this:
ID | Name | Subject | Grade
123 | Smith, Joe R. | MATH | 2.0
123 | Smith, Joe Rodriguez | FRENCH | 3.0
234 | Doe, Mary Jane D.| BIOLOGY | 2.5
234 | Doe, Mary Jane Dawson| CHEMISTRY | 2.5
234 | Doe, Mary Jane | FRENCH | 3.5
My application's output should look like this:
Smith, Joe R.
123
MATH | 2.0
FRENCH | 3.0
So basically I want to do query (just for the ID/Person parent 'container') something like:
SELECT DISTINCT ID, Name FROM MyTable<br/>
or
SELECT ID, Name FROM MyTable GROUP BY ID
Of course both of the above are invalid and won't work.
I would like to 'combine' the same ID's and ignore/truncate the other records with the same ID/different Name (because we all know they're the same person since ID is our identifier and clearly it's just a typo/dirty data).
Can this be done by a single SELECT query?
If you don't really care which value shows up in the name field, use MAX() or MIN():
SELECT ID,
MAX(Name) AS Name
FROM [YourTable]
GROUP BY ID
Here's a working example to play with: https://data.stackexchange.com/stackoverflow/q/116699/
You can find the MIN or MAX Value of Name
SELECT ID, Max(Name)
FROM MyTable
GROUP BY ID
SELECT A.ID, A.NAME, T.Subject, T.Grade
FROM (SELECT ID, MIN(NAME) AS NAME
FROM MyTable
GROUP BY ID) A
LEFT JOIN MyTable T on A.ID = T.ID
Will give you something like
123 Smith, Joe R. Math 2.0
123 Smith, Joe R. FRENCH 3.0
234 Doe, Mary Jane BIOLOGY 2.5
234 Doe, Mary Jane CHEMISTRY 2.5
234 Doe, Mary Jane FRENCH 3.5
If you don't care which name you keep, you can use a MAX() or MIN() aggregate to pick just one name:
SELECT ID, MAX(Name) as Name
FROM MyTable GROUP BY ID