Why did the 'NOT IN' work but not the 'NOT EXISTS'? - sql

I've been trying to improve my SQL and was playing around with a 'NOT EXISTS' function. I needed to find the names of salespeople who did not have any sales to company 'RED'.
I tried this and it did not work:
SELECT DISTINCT
sp.name
FROM salesperson sp
WHERE NOT EXISTS (
SELECT
ord.sales_id
FROM
company cmp
LEFT JOIN orders ord
on cmp.com_id=ord.com_id
WHERE cmp.name = 'RED')
This query ran but returned a NULL. Then I changed it to this and it worked fine:
SELECT DISTINCT
sp.name
FROM salesperson sp
WHERE sp.sales_id NOT IN (
SELECT
ord.sales_id as sales_id
FROM
company cmp
left join orders ord
on cmp.com_id=ord.com_id
WHERE cmp.name = 'RED')
Can someone explain why 'NOT EXISTS' did not work in this instance?
.
.
.
.
.
.
Just in case, here is the exercise in full:
Given three tables: salesperson, company, orders
Output all the names in the table salesperson, who didn’t have sales to company 'RED'.
Table: salesperson
sales_id
name
salary
commission_rate
hire_date
1
John
100000
6
4/1/2006
2
Amy
120000
5
5/1/2010
3
Mark
65000
12
12/25/2008
4
Pam
25000
25
1/1/2005
5
Alex
50000
10
2/3/2007
The table salesperson holds the salesperson information. Every salesperson has a sales_id and a name.
Table: company
com_id
name
city
1
RED
Boston
2
ORANGE
New York
3
YELLOW
Boston
4
GREEN
Austin
The table company holds the company information. Every company has a com_id and a name.
Table: orders
order_id
order_date
com_id
sales_id
amount
1
1/1/2014
3
4
100000
2
2/1/2014
4
5
5000
3
3/1/2014
1
1
50000
4
4/1/2014
1
4
25000
The table orders holds the sales record information, salesperson and customer company are represented by sales_id and com_id.
expected output
name
Amy
Mark
Alex
Explanation:
According to order '3' and '4' in table orders, it is easy to tell only salesperson 'John' and 'Pam' have sales to company 'RED', so we need to output all the other names in the table salesperson.

I think your two queries are totally different.
NOT EXISTS - this will return data when that subquery doesn't return data. Which will always return some data so you will always get null. You need to join this subquery with the main query using WHERE sp.sales_id = ord.sales_id AND cmp.name = 'RED'
NOT IN - this is what you need for your purpose. You can see that it's clearly giving you data for not in (subquery) condition.

The equivalent NOT EXISTS requires a correlation clause:
SELECT sp.name
FROM salesperson sp
WHERE NOT EXISTS (SELECT ord.sales_id
FROM company cmp JOIN
orders ord
ON cmp.com_id = ord.com_id
WHERE sp.sales_id = ord.sales_id AND
cmp.name = 'RED'
);
Neither the NOT IN nor NOT EXISTS versions requires a LEFT JOIN in the subquery. In fact, the LEFT JOIN somewhat defeats the purpose of the logic.
Without the correlation clause, the subquery runs and it will return rows if any cmp.name is 'RED'. That appears to be the case and so NOT EXISTS always returns false.

Related

Where statement for exact match on Many to Many SQL tables

I am trying to construct a SQL statement to search in two tables that are in a many to many relation.
Problem : SQL statement to search for products with exact stones.
For example, in the below tables, I need a statement that will search for product with Ruby and Emerald stone ONLY. In all my attempts I get both Ring and Necklace because they both have Ruby and Emerald even though Necklace has one additional stone. It should only give Ring product.
I need a way to implement the AND operator on the stone table so that the result contains products that have the exact stones. Please help.
Table stone
s_id
s_name
1
Ruby
2
Emerald
3
Onyx
Table product
p_id
p_name
1
Ring
2
Necklace
3
Pendent
Relation table - product_stone
p_s_id
p_id
s_id
1
1
1
1
1
2
1
2
1
1
2
2
1
2
3
1
3
3
This is a relational division question. We need to find the cross join of the two tables "divided" by our list, with no remainder i.e. no other stone in product.
We will assume that p_id and s_id are unique:
;WITH StonesToFind AS ( -- we could also use a table variable etc here
SELECT *
FROM stone
WHERE s_name IN ('Ruby','Emerald')
)
SELECT p.p_name
FROM product AS p -- let's get all products...
JOIN product_stone AS ps ON ps.p_id = p.p_id -- ...cross join all their stones
LEFT JOIN StonesToFind AS s ON s.s_id = ps.s_id -- they may have stones in the list
GROUP BY p.p_id, p_name
HAVING COUNT(CASE WHEN s.s_id IS NULL THEN 1 END) = 0
-- the number of non matching stones in product must be zero
AND COUNT(*) = (SELECT COUNT(*) FROM StonesToFind);
-- the total number of stones must be the same as the list

Only joining rows where the date is less than the max date in another field

Let's say I have two tables. One table containing employee information and the days that employee was given a promotion:
Emp_ID Promo_Date
1 07/01/2012
1 07/01/2013
2 07/19/2012
2 07/19/2013
3 08/21/2012
3 08/21/2013
And another table with every day employees closed a sale:
Emp_ID Sale_Date
1 06/12/2013
1 06/30/2013
1 07/15/2013
2 06/15/2013
2 06/17/2013
2 08/01/2013
3 07/31/2013
3 09/01/2013
I want to join the two tables so that I only include sales dates that are less than the maximum promotion date. So the result would look something like this
Emp_ID Sale_Date Promo_Date
1 06/12/2013 07/01/2012
1 06/30/2013 07/01/2012
1 06/12/2013 07/01/2013
1 06/30/2013 07/01/2013
And so on for the rest of the Emp_IDs. I tried doing this using a left join, something to the effect of
left join SalesTable on PromoTable.EmpID = SalesTable.EmpID and Sale_Date
< max(Promo_Date) over (partition by Emp_ID)
But apparently I can't use aggregates in joins, and I already know that I can't use them in the where statement either. I don't know how else to proceed with this.
The maximum promotion date is:
select emp_id, max(promo_date)
from promotions
group by emp_id;
There are various ways to get the sales before that date, but here is one way:
select s.*
from sales s
where s.sales_date < (select max(promo_date)
from promotions p
where p.emp_id = s.emp_id
);
Gordon's answer is right on! Alternatively, you could also do a inner join to a subquery to achieve your desired output like this:
SELECT s.emp_id
,s.sales_date
,t.promo_date
FROM sales s
INNER JOIN (
SELECT emp_id
,max(promo_date) AS promo_date
FROM promotions
GROUP BY emp_id
) t ON s.emp_id = t.emp_id
AND s.sales_date < t.promo_date;
SQL Fiddle Demo

Best solution for SQL without looping

I'm relatively new to SQL, and am trying to find the best way to attack this problem.
I am trying to take data from 2 tables and start merging them together to perform analysis on it, but I don't know the best way to go about this without looping or many nested subqueries.
What I've done so far:
I have 2 tables. Table1 has user information and Table2 has information on orders(prices and dates, as well as user)
What I need to do:
I want to have a single row for each user that has a summary of information about all of their orders. I'm looking to find the sum of prices of all orders by each user, the max price paid by that user, and the number of orders. I'm not sure how to best manipulate my data in SQL.
Currently, my code looks as follows:
Select alias1.*, Table2.order_id, Table2.price, Table2.order_date
From (Select * from Table1 where country='United States') as alias1
LEFT JOIN Table2
on alias1.user_id = Table2.user_id
This filters out the datatypes by country, and then joins it with users, creating a record of each order including the user information. I don't know if this is a helpful step, but this is part of my first attempt playing around with the data. I was thinking of looping over this, but I know that is against the spirit of SQL
Edit: Here is an example of what I have and what I want:
Table 1(user info):
user_id user_country
1 United States
2 United Kingdom
(etc)
Table 2(order info):
order_id price user_id
100 5.00 1
101 3.50 2
102 2.50 1
103 1.00 1
104 8.00 2
What I would like output:
user_id user_country total_price max_price number_of_orders
1 United States 8.50 5.00 3
2 United Kingdom 11.50 8.00 2
Here's one way to do this:
SELECT alias1.user_id,
MAX(alias1.user_name) As user_name,
SUM(Table2.price) As UsersTotalPrice,
MAX(Table2.price) As UsersHighestPrice
FROM Table1 As alias1
LEFT JOIN Table2 ON alias1.user_id = Table2.user_id
WHERE country = 'United States'
GROUP BY user_id
If you can give us the actual table definitions, then we can show you some actual working queries.
Something like this? Agregate the rows in table2 and then join to table 1 for the detail info you want?
SELECT Table1.*,agg.thesum FROM
(SELECT UserID, SUM(aggregatedata) as thesum FROM Table2 GROUP BY UserID) agg
INNER JOIN Table1 on table1.userid = agg.userid
This should work
select table1.*, t2.total_price, t2.max_price, t2.order_count
from table1
join (selectt user_id, sum(table2.price) as total_price, max(table2.price) as max_price, count(order_id) as order_count from table2 as t2 group by t2.user_id)
on table1.user_id = t2.user_id
where t1.country = 'untied_states'
EDIT: (removed:"dont use explicit join" this was wrong, I meant:)
Try to use the following Sytax, for better understanding what goes on:
1st step:
select
user.user_id, -- < you must tell the DB userid of which column
user_country,
price,
price
from -- now just the two tables:
Table1 as user, --table1 is a bad name, we use 'user'
Table2 as order
where user.user_id = order.user_id
so you will get somthing like:
user_id user_country price price
1 alabama 5 5
2 nebrasca 1 1
2 alabama 7 7
1 alabama 7 7
2 alabama 3 7
and so on ..
the next step is to add an other where usercountry='alabama' so 'nebrasca' is off
user_id user_country price price
1 alabama 5 5
2 alabama 7 7
1 alabama 7 7
2 alabama 3 7
now you are ready for "aggregate": just select the MAX and SUM of price, but you have to tell the SQL engine what columes are 'fixed' = group by
select
user.user_id, user_country, MAX(price), SUM(price)
from
Table1 as user,
Table2 as order
where user.user_id = order.user_id
and user_country='alabama'
group by user_id, user_country

Fill Users table with data using percentages from another table

I have a Table Users (it has millions of rows)
Id Name Country Product
+----+---------------+---------------+--------------+
1 John Canada
2 Kate Argentina
3 Mark China
4 Max Canada
5 Sam Argentina
6 Stacy China
...
1000 Ken Canada
I want to fill the Product column with A, B or C based on percentages.
I have another table called CountriesStats like the following
Id Country A B C
+-----+---------------+--------------+-------------+----------+
1 Canada 60 20 20
2 Argentina 35 45 20
3 China 40 10 50
This table holds the percentage of people with each product. For example in Canada 60% of people have product A, 20% have product B and 20% have product C.
I would like to fill the Users table with data based on the Percentages in the second data. So for example if there are 1 million user in canada, I would like to fill 600000 of the Product column in the Users table with A 200000 with B and 200000 with C
Thanks for any help on how to do that. I do not mind doing it in multiple steps I jsut need hints on how can I achieve that in SQL
The logic behind this is not too difficult. Assign a sequential counter to each person in each country. Then, using this value, assign the correct product based on this value. For instance, in your example, when the number is less than or equal to 600,000 then 'A' gets assigned. For 600,001 to 800,000 then 'B', and finally 'C' to the rest.
The following SQL accomplishes this:
with toupdate as (
select u.*,
row_number() over (partition by country order by newid()) as seqnum,
count(*) over (partition by country) as tot
from users u
)
update u
set product = (case when seqnum <= tot * A / 100 then 'A'
when seqnum <= tot * (A + B) / 100 then 'B'
else 'C'
end)
from toupdate u join
CountriesStats cs
on u.country = cs.country;
The with statement defines an updatable subquery with the sequence number and total for each each country, on each row. This is a nice feature of SQL Server, but is not supported in all databases.
The from statement is joining back to the CountriesStats table to get the needed values for each country. And the case statement does the necessary logic.
Note that the sequential number is assigned randomly, using newid(), so the products should be assigned randomly through the initial table.

SQL Inner Join query

I have following table structures,
cust_info
cust_id
cust_name
bill_info
bill_id
cust_id
bill_amount
bill_date
paid_info
paid_id
bill_id
paid_amount
paid_date
Now my output should display records (1 jan 2013 to 1 feb 2013) between two bill_dates dates as single row as follows,
cust_name | bill_id | bill_amount | tpaid_amount | bill_date | balance
where tpaid_amount is total paid for particular bill_id
For example,
for bill id abcd, bill_amount is 10000 and user pays 2000 one time and 3000 second time
means, paid_info table contains two entries for same bill_id
bill_id | paid_amount
abcd 2000
abcd 3000
so, tpaid_amount = 2000 + 3000 = 5000 and balance = 10000 - tpaid_amount = 10000 - 5000 = 5000
Is there any way to do this with single query (inner joins)?
You'd want to join the 3 tables, then group them by bill ids and other relevant data, like so.
-- the select line, as well as getting your columns to display, is where you'll work
-- out your computed columns, or what are called aggregate functions, such as tpaid and balance
SELECT c.cust_name, p.bill_id, b.bill_amount, SUM(p.paid_amount) AS tpaid, b.bill_date, b.bill_amount - SUM(p.paid_amount) AS balance
-- joining up the 3 tables here on the id columns that point to the other tables
FROM cust_info c INNER JOIN bill_info b ON c.cust_id = b.cust_id
INNER JOIN paid_info p ON p.bill_id = b.bill_id
-- between pretty much does what it says
WHERE b.bill_date BETWEEN '2013-01-01' AND '2013-02-01'
-- in group by, we not only need to join rows together based on which bill they're for
-- (bill_id), but also any column we want to select in SELECT.
GROUP BY c.cust_name, p.bill_id, b.bill_amount, b.bill_date
A quick overview of group by: It will take your result set and smoosh rows together, based on where they have the same data in the columns you give it. Since each bill will have the same customer name, amount, date, etc, we are fine to group by those as well as the bill id, and we'll get a record for each bill. If we wanted to group it by p.paid_amount, though, since each payment would have a different one of those (possibly), you'd get a record for each payment as opposed to for each bill, which isn't what you'd want. Once group by has smooshed these rows together, you can run aggregate functions such as SUM(column). In this example, SUM(p.paid_amount) totals up all the payments that have that bill_id to work out how much has been paid. For more information, please look at W3Schools chapter on group by in their SQL tutorials.
Hope I've understood this correctly and that this helps you.
This will do the trick;
select
cust_name,
bill_id,
bill_amount,
sum(paid_amount),
bill_date,
bill_amount - sum(paid_amount)
from
cust_info
left outer join bill_info
left outer join paid_info
on bill_info.bill_id=paid_info.bill_id
on cust_info.cust_id=bill_info.cust_id
where
bill_info.bill_date between X and Y
group by
cust_name,
bill_id,
bill_amount,
bill_date