Finding multiple results in SQL Server 2012 (not duplicates) - sql

I need some help. I've got a list of customers and services that they've used, but I need to narrow that list to customers that have used more than one service (excluding those who've only used one service). They have sometimes used the same service more than once, but I need a list of unique services.
The below brings back the main list of customers.
SELECT
DISTINCT M.CustID
,S.ServiceID
,R.ReceivedDate
,S.ServiceRequestID
FROM Customers AS M
LEFT OUTER JOIN CustomerDates AS R ON M.CustID = R.CustID
LEFT OUTER JOIN Service1 AS S ON R.ServiceRequestID = S.ServiceRequestID
WHERE S.CloseDate IS NULL
What I need is a list that excludes the first three lines as they have only used one service, whereas the next seven I need as they've used more than one service.

It is quite likely that you can just use NOT EXISTS. This is likely to do what you want:
SELECT M.CustID, S.ServiceID, R.ReceivedDate, S.ServiceRequestID
FROM Customers M JOIN
CustomerDates R
ON M.CustID = R.CustID JOIN
Service1 S
ON R.ServiceRequestID = S.ServiceRequestID
WHERE S.CloseDate IS NULL AND
EXISTS (SELECT 1
FROM CustomerDates cd2
WHERE cd2.CustId = m.custId AND cd2.ServiceRequestID <> r.ServiceRequestID
);
I'm not 100% sure this is equivalent to your query. The SELECT DISTINCT should not be needed, unless you have duplicates in your tables. Also, you seem to require matches between the tables, so LEFT JOIN is not appropriate. It it is not clear if the WHERE condition is relevant for finding duplicates.

Try the following, it should work
SELECT
CustID
,ServiceID
,ReceivedDate
,ServiceRequestID
FROM
(SELECT
DISTINCT M.CustID
,S.ServiceID
,R.ReceivedDate
,S.ServiceRequestID
,count(*) over (partition by M.CustID) as total
FROM Customers AS M
LEFT OUTER JOIN CustomerDates AS R ON M.CustID = R.CustID
LEFT OUTER JOIN Service1 AS S ON R.ServiceRequestID = S.ServiceRequestID
WHERE S.CloseDate IS NULL
) vals
where total > 1

Related

Join optimization PostgresSQL

I have 2 tables : Calls (10,000 rows) , CRM (25 million rows)
I want to do Calls left join CRM.
select *
from calls a
left join crm b
on (
(a.customerID = b.customerID)
OR
(a.Number1 in (b.Number_A,b.Number_B))
OR
(a.Number2 in (b.Number_A,b.Number_B))
);
When I do just the customerID join, it runs fine. But the above code causes timeout and it crashes.
I would suggest multiple left joins:
select c.*,
coalesce(cc.col1, c1a.col1, c1b.col1, c2a.col1, c2b.col1)
from calls c left join
crm cc
on c.customerID = cc.customerID left join
crm c1a
on c.Number1 = c1a.Number_A left join
crm c1b
on c.Number1 = c1b.Number_B left join
crm c2a
on c.Number2 = c2a.Number_A left join
crm c2b
on c.Number2 = c2b.Number_B;
This can then take advantage of indexes on crm(CustomerId), crm(Number1), and crm(Number2).
Sometimes, when replacing one query that contains two conditons with OR with two queries that get glued together with UNION, this results in a better execution plan. I have never understood why DBMS optimizers don't take this in consideration themselves. And I don't know whether this is true for PostgreSQL or not. But it may be worth a try.
In your case there is an outer join in the query. That complicates the matter. With the separate queries we may get both outer joined and matching crm rows for a call and must get rid of the former in that case.
select *
from
(
select * from calls left join crm on crm.customerID = calls.customerID
union
select * from calls left join crm on crm.number_a = calls.number1
union
select * from calls left join crm on crm.number_a = calls.number2
union
select * from calls left join crm on crm.number_b = calls.number1
union
select * from calls left join crm on crm.number_b = calls.number2
) data
order by rank() over (partition by calls.id order by case when crm.id is null then 2 else 1 end)
fetch first row with ties;
For this to work fast you should have one index per column in the query, i.e. six single-column indexes.
Whether this is faster than your original query depends on a lot of things. Mainly: the fewer matches the better.

Is it true that all joins following a left join in a SQL query must also be left joins? Why or why not?

I remember this rule of thumb from back in college that if you put a left join in a SQL query, then all subsequent joins in that query must also be left joins instead of inner joins, or else you'll get unexpected results. But I don't remember what those results are, so I'm wondering if maybe I'm misremembering something. Anyone able to back me up on this or refute it? Thanks! :)
For instance:
select * from customer
left join ledger on customer.id= ledger.customerid
inner join order on ledger.orderid = order.id -- this inner join might be bad mojo
Not that they have to be. They should be (or perhaps a full join at the end). It is a safer way to write queries and express logic.
Your query is:
select *
from customer c left join
ledger l
on c.id = l.customerid inner join
order o
on l.orderid = o.id
The left join says "keep all customers, even if there is no matching record in ledger. The second says, "I have to have a matching ledger record". So, the inner join converts the first to an inner join.
Because you presumably want all customers, regardless of whether there is a match in the other two tables, you would use a left join:
select *
from customer c left join
ledger l
on c.id = l.customerid left join
order o
on l.orderid = o.id
You remember correctly some parts of it!
The thing is, when you chain join tables like this
select * from customer
left join ledger on customer.id= ledger.customerid
inner join order on ledger.orderid = order.id
The JOIN is executed sequentialy, so when customer left join ledger happens, you are making sure all joined keys from customer return (because it's a left join! and you placed customers to the left).
Next,
The results of the former JOIN are joined with order (using inner join), forcing the "the first join keys" to match (1 to 1) with the keys from order so you will end up only with records that were matched in order table as well
Bad mojo? it really depends on what you are trying to accomplish.
If you want to guarantee all records from customers return, you should keep "left joining" to it.
You can, however, make this a little more intuitive to understand (not necessarily a better way of writing SQL!) by writing:
SELECT * FROM
(
(SELECT * from customer) c
LEFT JOIN
(SELECT * from ledger) l
ON
c.id= l.customerid
) c_and_l
INNER JOIN (OR PERHAPS LEFT JOIN)
(SELECT * FROM order) as o
ON c_and_l.orderid (better use c_and_l.id as you want to refer to customerid from customers table) = o.id
So now you understand that c_and_l is created first, and then joined to order (you can imagine it as 2 tables are joining again)

SQL query joining four tables

Original query is joining customer table and contract table and Extra Service History, this all works.
However I'm having trouble adding 4th table which should apply some further criteria.
Current working query (no changes needed) :
select b.*
from SubscribersFIN b
inner join (select Id, Account_Number, ContractNumber, BackendId
from Contract) e on b.c_id='FI_' + e.Account_Number
left join (select Contract
from Extra_Service_History
where Service_Name='debit_plan') d on e.Id=d.Contract
where COUNTRY='fi'
and NO_SMS = 0
and d.Contract is null
Goal is to filter the set that came from the big query that only records that had Paid status in Invoice to show.
right join (select Contract
from Invoice
where Status = 'PAID') i on e.Id=i.Contract
This one does not seem to do the trick, so I'm not able to figure out what sort of a join-type or logic is required here.
You have a few options:
INNER JOIN
Depending on the particular type of outer join, they return rows where no match is found (either left, right, or both sides of the join). Based on your description this is not what you want. Simply use:
inner join (select Contract
from Invoice
where Status = 'PAID') i on e.Id=i.Contract
It shouldn't matter where this occurs in the FROM clause; provided the join between these 2 tables is INNER. The query engine is free to rearrange for performance provided it doesn't change semantics. (But personally I find it tidier to put INNER JOINs at the top.)
IN filter
What you've described is a filter.
Goal is to filter the set that came from the big query that only records that had Paid status in Invoice to show.
So it's clearer to implement this as a filter in the WHERE clause. E.g.
where e.Id in (select Contract
from Invoice
where Status = 'PAID')
and ...
EXISTS filter
Similar to the above, but using an EXISTS subquery instead.
where exists (select *
from Invoice i
where Status = 'PAID'
and i.Contract = e.Id)
and ...
Rather than mixing LEFT and RIGHT joins, just place it as an INNER join higher up in your query:
select b.*
from SubscribersFIN b
inner join (select Id, Account_Number, ContractNumber, BackendId
from Contract) e on b.c_id='FI_' + e.Account_Number
inner join (select Contract
from Invoice
where Status = 'PAID') i on e.Id=i.Contract
left join (select Contract
from Extra_Service_History
where Service_Name='debit_plan') d on e.Id=d.Contract
where COUNTRY='fi'
and NO_SMS = 0
and d.Contract is null
Based on my understanding i just re arranged the query. Try this. If your where condition columns are coming from any of the LEFT JOIN tables, join them at the on clause.
select b.* from SubscribersFIN b
inner join Contract e on b.c_id='FI_' + e.Account_Number
left join Extra_Service_History d on e.Id=d.Contract and d.Service_Name='debit_plan' and d.Contract is null
left join invoice i on e.Id=i.Contract and i.Status = 'PAID'
where COUNTRY='fi' and NO_SMS = 0

SQL Get aggregate as 0 for non existing row using inner joins

I am using SQL Server to query these three tables that look like (there are some extra columns but not that relevant):
Customers -> Id, Name
Addresses -> Id, Street, StreetNo, CustomerId
Sales -> AddressId, Week, Total
And I would like to get the total sales per week and customer (showing at the same time the address details). I have come up with this query
SELECT a.Name, b.Street, b.StreetNo, c.Week, SUM (c.Total) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
INNER JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name, c.Week, b.Street, b.StreetNo
and even if my SQL skill are close to none it looks like it's doing its job. But now I would like to be able to show 0 whenever the one customer don't have sales for a particular week (weeks are just integers). And I wonder if somehow I should get distinct values of the weeks in the Sales table, and then loop through them (not sure how)
Any help?
Thanks
Use CROSS JOIN to generate the rows for all customers and weeks. Then use LEFT JOIN to bring in the data that is available:
SELECT c.Name, a.Street, a.StreetNo, w.Week,
COALESCE(SUM(s.Total), 0) as Total
FROM Customers c CROSS JOIN
(SELECT DISTINCT s.Week FROM sales s) w LEFT JOIN
Addresses a
ON c.CustomerId = a.CustomerId LEFT JOIN
Sales s
ON s.week = w.week AND s.AddressId = a.AddressId
GROUP BY c.Name, a.Street, a.StreetNo, w.Week;
Using table aliases is good, but the aliases should be abbreviations for the table names. So, a for Addresses not Customers.
You should generate a week numbers, rather than using DISTINCT. This is better in terms of performance and reliability. Then use a LEFT JOIN on the Sales table instead of an INNER JOIN:
SELECT a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
,COALESCE(SUM(c.Total),0) as Total
FROM Customers a
INNER JOIN Addresses b ON a.Id = b.CustomerId
CROSS JOIN (
-- Generate a sequence of 52 integers (13 x 4)
SELECT ROW_NUMBER() OVER (ORDER BY a.x) AS [Week]
FROM (VALUES(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1),(1)) a(x)
CROSS JOIN (SELECT x FROM (VALUES(1),(1),(1),(1)) b(x)) b
) weeks
LEFT JOIN Sales c ON b.Id = c.AddressId AND c.[Week] = weeek.[Week]
GROUP BY a.Name
,b.Street
,b.StreetNo
,weeks.[Week]
Please try the following...
SELECT Name,
Street,
StreetNo,
Week,
SUM( CASE
WHEN Total IS NULL THEN
0
ELSE
Total
END ) AS Total
FROM Customers a
JOIN Addresses b ON a.Id = b.CustomerId
RIGHT JOIN Sales c ON b.Id = c.AddressId
GROUP BY a.Name,
c.Week,
b.Street,
b.StreetNo;
I have modified your statement in three places. The first is I changed your join to Sales to a RIGHT JOIN. This will join as it would with an INNER JOIN, but it will also keep the records from the table on the right side of the JOIN that do not have a matching record or group of records on the left, placing NULL values in the resulting dataset's fields that would have come from the left of the JOIN. A LEFT JOIN works in the same way, but with any extra records in the table on the left being retained.
I have removed the word INNER from your surviving INNER JOIN. Where JOIN is not preceded by a join type, an INNER JOIN is performed. Both JOIN and INNER JOIN are considered correct, but the prevailing protocol seems to be to leave the INNER out, where the RDBMS allows it to be left out (which SQL-Server does). Which you go with is still entirely up to you - I have left it out here for illustrative purposes.
The third change is that I have added a CASE statement that tests to see if the Total field contains a NULL value, which it will if there were no sales for that Customer for that Week. If it does then SUM() would return a NULL, so the CASE statement returns a 0 instead. If Total does not contain a NULL value, then the SUM() of all values of Total for that grouping is performed.
Please note that I am assuming that Total will not have any NULL values other than from the RIGHT JOIN. Please advise me if this assumption is incorrect.
Please also note that I have assumed that either there will be no missing Weeks for a Customer in the Sales table or that you are not interested in listing them if there are. Again, please advise me if this assumption is incorrect.
If you have any questions or comments, then please feel free to post a Comment accordingly.

Filtering query with join statement

I am using ColdFusion 8 to develop my company's website and would like to return a list of records (just the clientname field) from a table (dbo.clients) that has no match in a different table (dbo.fees) for the purpose of prompting the end-user to add a fee schedule for those companies. An example:
dbo.clients
CLIENT_ID CLIENT_NAME
1 Joe's Diner
2 Save-a-Lot
3 Family Meds
4 DiFazio's
dbo.fees
CID CLIENT_NAME FEE
1 Joe's Diner 25.000
2 Save-a-Lot 35.000
4 DiFazio's 30.000
What I desire is a resultset that, in the case of the above tables/data, would return only clientid/clientname 3/Family Meds because they do not have a fee listed/record in the table dbo.fees. My DB is MSSQL 2005. My query is:
SELECT clientid
FROM clients
INNER JOIN fees
ON clients.clientid <> fees.cid;
Which returns a Cartesian product of 50,000+ results. Using LEFT/RIGHT OUTER JOIN still gives me a Cartesian product and DISTINCT simply returns every record from dbo.clients regardless of whether or not they have a dbo.fees entry or not. What am I doing wrong?
p.s. Also of note: The admin before me apparently did not set up a PK/FK relationship between the clients/fees tables and so any query syntax that might be reliant on that may not work in this situation. It would probably have to work based solely on the values of the relevant fields.
You can use a LEFT JOIN with a WHERE clause that will return only those records that do not appear in the fees table:
select c.CLIENT_ID, c.CLIENT_NAME
from clients c
left join fees f
on c.CLIENT_ID = f.CLIENT_ID
where f.CLIENT_ID is null
If you need help learning JOIN syntax, here is a great reference:
A Visual Explanation of SQL Joins
This can also be written using a NOT EXISTS:
select *
from clients c
where not exists (select CLIENT_ID
from fees f
where c.CLIENT_ID = f.CLIENT_ID)
See SQL Fiddle Demo with both queries
Simplest, you could just use a NOT IN;
SELECT clientid FROM clients WHERE clientid NOT IN
(SELECT clientid FROM fees)
...or you can use a LEFT JOIN to do the same thing a bit more verbosely; f.clientid will be NULL if a fee does not exist for the client.
SELECT c.clientid
FROM clients c
LEFT JOIN fees f
ON c.clientid = f.clientid
WHERE f.clientid IS NULL