How to have SQL INNER JOIN accept null results - sql

I have the following query:
SELECT TOP 25 CLIENT_ID_MD5, COUNT(CLIENT_ID_MD5) TOTAL
FROM dbo.amazonlogs
GROUP BY CLIENT_ID_MD5
ORDER BY COUNT(*) DESC;
Which returns:
283fe255cbc25c804eb0c05f84ee5d52 864458
879100cf8aa8b993a8c53f0137a3a176 126122
06c181de7f35ee039fec84579e82883d 88719
69ffb6c6fd5f52de0d5535ce56286671 68863
703441aa63c0ac1f39fe9e4a4cc8239a 47434
3fd023e7b2047e78c6742e2fc5b66fce 45350
a8b72ca65ba2440e8e4028a832ec2160 39524
...
I want to retrieve the corresponding client name (FIRM) using the returned MD5 from this query, so a row might look like:
879100cf8aa8b993a8c53f0137a3a176 126122 Burger King
So I made this query:
SELECT a.CLIENT_ID_MD5, COUNT(a.CLIENT_ID_MD5) TOTAL, c.FIRM
FROM dbo.amazonlogs a
INNER JOIN dbo.customers c
ON c.CLIENT_ID_MD5 = a.CLIENT_ID_MD5
GROUP BY a.CLIENT_ID_MD5, c.FIRM
ORDER BY COUNT(*) DESC;
This returns something like:
879100cf8aa8b993a8c53f0137a3a176 126122 Burger King
06c181de7f35ee039fec84579e82883d 88719 McDonalds
703441aa63c0ac1f39fe9e4a4cc8239a 47434 Wendy's
3fd023e7b2047e78c6742e2fc5b66fce 45350 Tim Horton's
Which works, except I need to return an empty value for c.FIRM if there is no corresponding FIRM for a given MD5. For example:
879100cf8aa8b993a8c53f0137a3a176 126122 Burger King
06c181de7f35ee039fec84579e82883d 88719 McDonalds
69ffb6c6fd5f52de0d5535ce56286671 68863
703441aa63c0ac1f39fe9e4a4cc8239a 47434 Wendy's
3fd023e7b2047e78c6742e2fc5b66fce 45350 Tim Horton's
How should I modify the query to still return a row even if there is no corresponding c.FIRM?

Replace INNER JOIN with LEFT JOIN

use LEFT JOIN instead of INNER JOIN

Instead of doing an INNER join, you should do a LEFT OUTER join:
SELECT
a.CLIENT_ID_MD5,
COUNT(a.CLIENT_ID_MD5) TOTAL,
ISNULL(c.FIRM,'')
FROM
dbo.amazonlogs a LEFT OUTER JOIN
dbo.customers c ON c.CLIENT_ID_MD5 = a.CLIENT_ID_MD5
GROUP BY
a.CLIENT_ID_MD5,
c.FIRM
ORDER BY COUNT(0) DESC
http://www.w3schools.com/sql/sql_join.asp

An inner join excludes NULLs; you want a LEFT OUTER join.

SELECT a.CLIENT_ID_MD5, COUNT(a.CLIENT_ID_MD5) TOTAL, IsNull(c.FIRM, 'Unknown') as Firm
FROM dbo.amazonlogs a
LEFT JOIN dbo.customers c ON c.CLIENT_ID_MD5 = a.CLIENT_ID_MD5
GROUP BY a.CLIENT_ID_MD5, c.FIRM ORDER BY COUNT(*) DESC;
This will give you a value of "Unknown" when records in the customers table don't exist. You could obviously drop that part and just return c.FIRM if you want to have actual nulls instead.

Change your INNER JOIN to an OUTER JOIN...
SELECT a.CLIENT_ID_MD5, COUNT(a.CLIENT_ID_MD5) TOTAL, c.FIRM
FROM dbo.amazonlogs a
LEFT OUTER JOIN dbo.customers c
ON c.CLIENT_ID_MD5 = a.CLIENT_ID_MD5
GROUP BY a.CLIENT_ID_MD5, c.FIRM
ORDER BY COUNT(*) DESC;

WITH amazonlogs_Tallies
AS
(
SELECT a.CLIENT_ID_MD5, COUNT(a.CLIENT_ID_MD5) TOTAL
FROM dbo.amazonlogs a
GROUP
BY a.CLIENT_ID_MD5
),
amazonlogs_Tallies_Firms
AS
(
SELECT a.CLIENT_ID_MD5, a.TOTAL, c.FIRM
FROM amazonlogs_Tallies a
INNER JOIN dbo.customers c
ON c.CLIENT_ID_MD5 = a.CLIENT_ID_MD5
)
SELECT CLIENT_ID_MD5, TOTAL, FIRM
FROM amazonlogs_Tallies_Firms
UNION
SELECT CLIENT_ID_MD5, TOTAL, '{{NOT_KNOWN}}'
FROM amazonlogs_Tallies
EXCEPT
SELECT CLIENT_ID_MD5, TOTAL, '{{NOT_KNOWN}}'
FROM amazonlogs_Tallies_Firms;

Related

WHERE clause with JOIN SQL

SELECT AVG(score) AS avg_score, st.name
FROM firstTable AS ft
LEFT JOIN secondTable AS st
ON ft.dog_id = st.dog_id
WHERE (SELECT COUNT(ft.dog_id) FROM firstTable) > 1
GROUP BY dog_id
The where clause doesnt seem to do anything. Why is that? - I'm essentially trying to output the average score only to the dogs that appear more than once in the first table
You should use an INNER join since you want only dogs that match in both tables and add the condition in the HAVING clause:
SELECT AVG(ft.score) AS avg_score, st.name
FROM secondTable AS st INNER JOIN firstTable AS ft
ON ft.dog_id = st.dog_id
GROUP BY st.dog_id
HAVING COUNT(*) > 1;

how to inner join the third column that need to use the function of count

select distinct CatalogNo_No.branch_id, dvd.DVD_name from
CatalogNo_No inner join DVD on CatalogNo_No.DVD_catalogno = DVD.DVD_catalogno
inner join count (catalogno_no.branch_id='MA0001',catalogno_no.branch_id='MA0002',catalogno_no.branch_id='MA0003',catalogno_no.branch_id='MA0004',catalogno_no.branch_id='MA0005')
where catalogno_no.DVD_catalogno in (select DVD_catalogno from DVD where DVD_name='Final Destination')
the out put I want is distinct CatalogNo_No.branch_id, dvd.DVD_name and CatalogNo_No.branch_id count of the branch_id without distinct
I think you can use GROUP BY and COUNT
select
CatalogNo_No.branch_id,
dvd.DVD_name,
count(*) branch_count
from CatalogNo_No
join DVD on CatalogNo_No.DVD_catalogno = DVD.DVD_catalogno
where catalogno_no.DVD_catalogno in (select DVD_catalogno from DVD where DVD_name='Final Destination')
and catalogno_no.branch_id in ('MA0001','MA0002','MA0003','MA0004','MA0005')
group by CatalogNo_No.branch_id, dvd.DVD_name

SQL Query Refactoring Possible?

Assume this is part of the result set
AND
Assume Dob,Name,Adress,Postcode,Telephone,EmailAddress are the same for each ID - and these columns are used in the group by clause
Sample data:
ID date Amount
---------------------------
12345 1/1/2017 100
12345 1/2/2017 200
12345 1/3/2017 300
With the outer query included I get the following which is what I want to achieve
ID date Amount
--------------------------
12345 1/1/2017 600
I want to confirm if there's a better way in terms of performance for this code. I feel like I could do a join, or a shorter version of the query but I can't get the logic right.
When I remove the outer query and do the MIN and SUM aggregate functions inside, the results doesn't group by correctly. It'll show more than one result for each id.
Also is it possible for a shorter group by?
Here's the partial version of the final code
SELECT
a.id, a.dob, a.claim_id,
a.name, a.Address, a.postcode,
a.Telephone, a.EmailAdress,
MIN(a.date), SUM(a.amount) as Amount
FROM
(SELECT DISTINCT
i.date, i.id, cl.name, cl.address,
cl.postcode, cl.telephone, cl.dob,
cl.EmailAdress, i.amount, cm.claim_id
FROM
testdb.dbo.invoice i
JOIN
testdb.dbo.claim cm with (nolock) ON i.id = cm.id
JOIN
testdb.dbo.clients cl with (nolock) ON cm.clientid = cl.id
JOIN
(....) c ON i.id = c.id
WHERE
.....) AS a
GROUP BY
a.id, a.dob, a.claim, a.name, a.Address,
a.postcode, a.Telephone, a.EmailAdress
ORDER BY
1
SELECT DISTINCT
i.date ,i.id ,cl.name ,cl.address
,cl.postcode ,cl.telephone,cl.dob
,cl.EmailAdress ,i.amount ,cm.claim_id
FROM
testdb.dbo.invoice i
JOIN
testdb.dbo.claim cm with (nolock) on i.id = cm.id
JOIN
testdb.dbo.clients cl with (nolock) on cm.clientid = cl.id
JOIN
( .... ) c on i.id = c.id
WHERE
.....
GROUP BY
i.id,i.dob,cm.claim_id,cl.name,cl.Address,cl.postcode,
cl.Telephone,cl.EmailAdress
ORDER BY 1
Is pretty much the previous code. With the outer query removed. I'm not sure what happened previously and as to why it still gave me multiple records(I'm not sure what differed now and then). But it isn't doing that anymore with this code.
Why not do the calculation inline and then join the detail tables afterwards,
something like:
SELECT
a.id, a.dob, claimDetails.claim_id,
a.name, a.Address, a.postcode,
a.Telephone, a.EmailAdress,
claimDetails.FirstDate, claimDetails.Amount
FROM a
LEFT JOIN
(
SELECT i.id, cm.claim_id, MIN(i.date) as FirstDate, SUM(i.amount) as Amount
FROM testdb.dbo.invoice i
JOIN testdb.dbo.claim cm ON i.id = cm.id
GROUP BY i.id, cm.claim_id
) claimDetails
ON claimDetails.id = a.id
LEFT JOIN Clients....

SQL Query or Table Error?

So Im trying to find the total amount spent on Cuts and Products by each Customer
I don't know if my Query is Wrong or my entire Database Schema any ideas?
My Query
`Select First_Name, SUM(B.Cost), SUM(C.Cost)
FROM bookings A, cuts B, products C, customers D
Where A.Customer_ID= D.Customer_ID
AND A.Cut_ID = B.Cut_ID
AND A.Product_ID= C.Product_ID;`
My Database
`Table: bookings
Booking_N0, Customer_ID, Cut_ID, Product_ID, TimeTaken`
`Table: customers
Customre_ID, First_Name, Sex`
`Table: products
Product_ID, Products, Cost`
`Table: cuts
Cut_ID, Cut, Cost`
You should GROUP BY to SUM by each customer :
Select D.First_Name
, SUM(B.Cost)
, SUM(C.Cost)
FROM bookings A LEFT JOIN cuts B ON A.Cut_ID = B.Cut_ID
JOIN products C ON A.Product_ID = C.Product_ID
JOIN customers D ON A.Customer_ID = D.Customer_ID
GROUP BY D.First_Name;
Also, look forward using explicit join notation (FROM table1 t1 JOIN table2 t2 ON t1.field1 = t2.field2) instead of implicit join notation (FROM table1 t1, table2 t2 WHERE t1.field1 = t2.field2), because it is has more intuitive view (tables are listed near conditions on which they are joined).
Start using recommended JOIN / ON syntax for joining instead of using WHERE clause . You also need a GROUP BY clause
Select First_Name, SUM(B.Cost), SUM(C.Cost)
FROM bookings A
INNER JOIN cuts B
ON A.Cut_ID = B.Cut_ID
INNER JOIN products C
ON A.Product_ID= C.Product_ID
INNER JOIN customers D
ON A.Customer_ID= D.Customer_ID
GROUP BY First_Name
If you use aggregate function like SUM you have to add a group by clause
in your case:
...
AND A.Product_ID= C.Product_ID
GROUP BY First_Name

SQL Return only where more than one join

Not sure how to ask this as I'm a bit of a database noob,
What I want to do is the following.
table tb_Company
table tb_Division
I want to return companies that have more than one division and I don't know how to do the where clause.
SELECT dbo.tb_Company.CompanyID, dbo.tb_Company.CompanyName,
dbo.tb_Division.DivisionName FROM dbo.tb_Company INNER JOIN dbo.tb_Division ON
dbo.tb_Company.CompanyID = dbo.tb_Division.DivisionCompanyID
Any help or links much appreciated.
You'll need another JOIN where you only return companies having more than one division by using a GROUP BYand a HAVINGclause.
You can read up on grouping here
Groups a selected set of rows into a
set of summary rows by the values of
one or morecolumns or expressions. One
row is returned for each group.
Aggregate functions in the SELECT
clause list provide
information about each group instead
of individual rows.
SELECT dbo.tb_Company.CompanyID
, dbo.tb_Company.CompanyName
, dbo.tb_Division.DivisionName
FROM dbo.tb_Company
INNER JOIN dbo.tb_Division ON dbo.tb_Company.CompanyID = dbo.tb_Division.DivisionCompanyID
INNER JOIN (
SELECT DivisionCompanyID
FROM dbo.tb_Division
GROUP BY
DivisionCompanyID
HAVING COUNT(*) > 1
) d ON d.DivisionCompanyID = dbo.tb_Company.CompanyID
another alternative...
SELECT c.CompanyId, c.CompanyName, d.DivisionName
FROM tbl_Company c
INNER JOIN tbl_Division d ON c.CompanyId=d.DivisionCompanyId
GROUP BY c.CompanyId, c.CompanyName, d.DivisionName
HAVING COUNT(*) > 1
How about?
WITH COUNTED AS
(
SELECT C.CompanyID, C.CompanyName, D.DivisionName,
COUNT() OVER(PARTITION BY C.CompanyID) AS Cnt
FROM dbo.tb_Company C
INNER JOIN dbo.tb_Division D ON C.CompanyID = D.DivisionCompanyID
)
SELECT *
FROM COUNTED
WHERE Cnt > 1
With the other solutions (that join onto Division table twice), a single company/division can be returned under a heavy insert load.
If a row is inserted into the Division table between the time the first join occurs and the time the second join (with the group by/having) is evaluated, the first Division join will return a single row. However, the second one will return a count of 2.
How about...
SELECT dbo.tb_Company.CompanyID,
dbo.tb_Company.CompanyName,
FROM dbo.tb_Company
WHERE (SELECT COUNT(*)
FROM dbo.tb_Division
WHERE dbo.tb_Company.CompanyID =
dbo.tb_Division.DivisionCompanyID) > 1;