In the following problem
Filtering based on Joining Multiple Tables in SQL
I managed to determine that the posters problem was happening because he was accessing derived tables from the outer query.
What I don't understand is why this happened.
So if you run the following
create table salesperson (
id int, name varchar(40)
)
create table customer (
id int, name varchar(40)
)
create table orders (
number int, cust_id int, salesperson_id int
)
insert into salesperson values (1, 'abe'); insert into salesperson values (2, 'bob');
insert into salesperson values (5, 'chris'); insert into salesperson values (7, 'dan');
insert into salesperson values (8, 'ken'); insert into salesperson values (11, 'joe');
insert into customer values (4, 'Samsonic'); insert into customer values (6, 'panasung');
insert into customer values (7, 'samony'); insert into customer values (9, 'orange');
insert into orders values (10, 4, 2); insert into orders values (20, 4, 8);
insert into orders values (30, 9, 1); insert into orders values (40, 7, 2);
insert into orders values (50, 6, 7); insert into orders values (60, 6, 7);
insert into orders values (70, 9, 7);
SELECT *
FROM salesperson s
INNER JOIN orders o ON s.id = o.salesperson_id
INNER JOIN customer c ON o.cust_id = c.id
WHERE s.name NOT IN (
select s.name where c.name='Samsonic'
)
SELECT *
FROM salesperson s
INNER JOIN orders o ON s.id = o.salesperson_id
INNER JOIN customer c ON o.cust_id = c.id
WHERE s.name NOT IN (
SELECT s.name
FROM salesperson s
INNER JOIN orders o ON s.id = o.salesperson_id
INNER JOIN customer c ON o.cust_id = c.id
WHERE c.name = 'Samsonic'
)
The first select statement accesses the derived tables in the outer query, while the other creates its own joins and derives its own tables.
Why does the first select contain bob while the other one does not?
In your first query you are only removing the rows which has customer name Samsonic, since Bob has a record for samony that one comes in the out put.
In the second one you are getting the salesperson who has the customer name Samsonic in that case you are getting both Bob and Ken then you are removing all there records for both Bob and Ken using the 'not in'so both records for bob is getting removed hence you dont get any.
The difference is that in your first query you are only removing orders which involve Samsonic, because the exclusion is only looking at data in the current row. Whereas by the sounds of it you want to remove any sales-person who has ever sold a Samsonic. You can see the difference with in the results of the following query:
SELECT *, s.name, c.name
, case when s.name NOT IN (
select s.name where c.name='Samsonic'
) then 1 else 0 end /* Order not Samsonic */
, case when not exists (
select 1
from Orders O1
inner join Customer C1 on o1.cust_id = c1.id
where C1.Name = 'Samsonic' and o1.salesperson_id = O.salesperson_id
) then 1 else 0 end /* Salesperson never sold a Samsonic */
FROM salesperson s
INNER JOIN orders o ON s.id = o.salesperson_id
INNER JOIN customer c ON o.cust_id = c.id
Your first query has a select with no from clause. So the where is equivalent to:
WHERE s.name NOT IN (CASE WHEN c.name = 'Samsonic' THEN s.name END)
Or more simply:
WHERE c.name <> 'Samsonic'
Bob has an order that is not with 'Samsonic', so Bob is in the result set. In other words, the logic is looking at each row individually.
The second version is looking at all names that have made an order. Bob is one of those names, so this applies to all orders made by Bob.
If you want to exclude all salespersons who have ever made an order to 'Samsonic', then I would recommend using window functions instead of complicated logic:
SELECT *
FROM (SELECT s.id as salesperson_id, s.name as salesperson_name, c.id as customer_id, c.name as customer_name, o.number,
SUM(CASE WHEN c.name = 'Samsonic' THEN 1 ELSE 0 END) OVER (PARTITION BY s.id) as num_samsonic
FROM salesperson s INNER JOIN
orders o
ON s.id = o.salesperson_id INNER JOIN
customer c
ON o.cust_id = c.id
WHERE c.name <> 'Samsonic'
) soc
WHERE num_samsonic = 0
Related
I will send the Database Description in an Image.
I tried this Select but I'm afraid that this isn't right
SELECT t.type , a.ICAOId , a.name , ci.id , c.ISOAlpha2ID , p.docReference , ti.docReference , ti.number , p.name , p.surname
FROM dbo.AirportType t
INNER JOIN dbo.Airport a ON t.type = a.type
INNER JOIN dbo.City ci ON a.city = ci.id
INNER JOIN dbo.Country c ON ci.ISOalpha2Id = c.ISOalpha2Id
INNER JOIN dbo.Passenger p ON c.ISOalpha2Id = p.nationality
INNER JOIN dbo.Ticket ti ON p.docReference = ti.docReference
WHERE NOT ci.id = 'Tokyo'
Can you please help to get this right?
enter image description here
You could make a list of the passengers that HAVE flown to the city then use that as a subquery to select the ones not in the list
I am just going to make an example of how it should be done
Subquery:
SELECT p.id FROM passengers
JOIN tickets t ON p.id = t.passengerID
JOIN city c ON c.id = t.cityID
Now you just put that into another query that selects the elements not in it
SELECT * FROM passenger
WHERE id not in (
SELECT p.id FROM passengers
JOIN tickets t ON p.id = t.passengerID
JOIN city c ON c.id = t.cityID
WHERE c.name= 'tokyo'
)
Notice I didn't use your attribute names, you will have to change those.
This was a bit simplified version of what you will have to do because the city is not directly in your tickets table. So you will also have to join tickets, with coupons, and flights to get the people that have flown to a city. But from there it is the same.
Overall I believe this should help you get what you have to do.
A minimal reproducible example is not provided.
Here is a conceptual example, that could be easily extended to a real scenario.
SQL
-- DDL and sample data population, start
DECLARE #passenger TABLE (passengerID INT PRIMARY KEY, passenger_name VARCHAR(20));
INSERT #passenger (passengerID, passenger_name) VALUES
(1, 'Anna'),
(2, 'Paul');
DECLARE #city TABLE (cityID INT PRIMARY KEY, city_name VARCHAR(20));
INSERT #city (cityID, city_name) VALUES
(1, 'Miami'),
(2, 'Orldando'),
(3, 'Tokyo');
-- Already visited cities
DECLARE #passenger_city TABLE (passengerID INT, cityID INT);
INSERT #passenger_city (passengerID, cityID) VALUES
(1, 1),
(2, 3);
-- DDL and sample data population, end
SELECT * FROM #passenger;
SELECT * FROM #city;
SELECT * FROM #passenger_city;
;WITH rs AS
(
SELECT c.passengerID, b.cityID
FROM #passenger AS c
CROSS JOIN #city AS b -- get all possible combinations of passengers and cities
EXCEPT -- filter out already visited cities
SELECT passengerID, cityID FROM #passenger_city
)
SELECT c.*, b.city_name
FROM rs
INNER JOIN #passenger AS c ON c.passengerID = rs.passengerID
INNER JOIN #city AS b ON b.cityID = rs.cityID
ORDER BY c.passenger_name, b.city_name;
Output
passengerID
passenger_name
city_name
1
Anna
Orldando
1
Anna
Tokyo
2
Paul
Miami
2
Paul
Orldando
If I have a CUSTOMER table with the attribute customer_id and an ORDER table
with the attributes order_id and customer_id.
How do I find the total number of orders submitted by each customer and if a customer has none, return zero.
I have tried the following:
SELECT c.customer_id, COUNT(*)
FROM Customer c, Orders o
WHERE c.customer_id= o.customer_id
GROUP BY c.customer_id;
With the above, I am able to display the number of orders made by each customer, only if they made an order.
How do I also display count 0 for those customers who did not make any order?
Use an outer join and count the rows in the "outer" table:
SELECT c.customer_id, COUNT(o.customer_id)
FROM Customer c
LEFT JOIN Orders o ON c.customer_id= o.customer_id
GROUP BY c.customer_id;
You can use LEFT JOIN and in the COUNT() place the o.customer_id
SELECT c.customer_id, COUNT(o.customer_id) AS OrderCount
FROM Customer c
LEFT JOIN Orders o ON c.customer_id = o.customer_id
GROUP BY c.customer_id;
Demo with sample data. Here Customer Id 2 and 4 doesn't have any data in the Orders table and it result zero in the ouput.
DECLARE #Customer TABLE (CustomerId INT);
INSERT INTO #Customer (CustomerId) VALUES (1), (2), (3), (4), (5);
DECLARE #Orders TABLE (CustomerId INT, OrderId INT);
INSERT INTO #Orders (CustomerId, OrderId) VALUES (1, 1), (1, 2), (3, 2), (3, 4), (5, 1);
SELECT c.CustomerId, COUNT(o.CustomerId) AS OrderCount
FROM #Customer c
LEFT JOIN #Orders o ON c.CustomerId = o.CustomerId
GROUP BY c.CustomerId;
Output:
CustomerId OrderCount
----------------------
1 2
2 0
3 2
4 0
5 1
First aggregate the orders by customer ID and calculate the total count within an inner select. Make sure to left join this to your customers table so that you don't lose any of the customers that have not placed an order. Finally use a case statement to determine whether or not the return value for the number of orders for a customer is null meaning they have made no orders and in that case set the value to zero.
SELECT
c.customer_id,
CASE
WHEN o.num_orders IS NULL THEN 0
ELSE o.num_orders
END
FROM Customer c
LEFT JOIN (
SELECT customer_id, COUNT(*) AS num_orders
FROM Orders
GROUP BY customer_id
) AS o ON c.customer_id= o.customer_id;
Try the IFNULL function :
https://www.w3schools.com/sql/sql_isnull.asp
hope it'll help!
UPDATE: Initially, I had the order date at line item table and realized that was a mistake and moved it to the Order table. Have updated my example query as well. Sorry
I am trying to write a query to load all orders whose line item order date is after a certain date along with loading all other orders which are out there for the same product returned by the first part of the query. Maybe an example could help
CREATE TABLE DemandOrder
(OrderId INT, OrderDate date, Customer VARCHAR(25))
CREATE TABLE LineItem
(OrderId INT, LineItemId INT, ProductId VARCHAR(10))
INSERT INTO DemandOrder VALUES(1, '01/23/2014', 'ABC');
INSERT INTO DemandOrder VALUES(2, '01/24/2014', 'DEF');
INSERT INTO DemandOrder VALUES(3, '01/24/2014', 'XYZ');
INSERT INTO DemandOrder VALUES(4, '01/23/2014', 'ABC');
INSERT INTO LineItem VALUES(1, 1, 'A');
INSERT INTO LineItem VALUES(1, 2, 'C');
INSERT INTO LineItem VALUES(2, 1, 'B');
INSERT INTO LineItem VALUES(3, 1, 'A');
INSERT INTO LineItem VALUES(4, 1, 'C');
In the above example, I need to query for all orders where the order date is on or after 01/24 along with all other orders which may have the returned by the first part of the query. The result should have orders 1, 2 & 3
Here is the updated sql code (using ErikE's suggestions from a post below)
SELECT
DISTINCT O.*
FROM
dbo.[DemandOrder] O
INNER JOIN dbo.LineItem LI
ON O.OrderID = LI.OrderID
WHERE
EXISTS (
SELECT *
FROM
dbo.DemandOrder O2 INNER JOIN
dbo.LineItem L2 ON O2.OrderId = L2.OrderId
WHERE
O2.OrderDate >= '01/24/2014'
AND LI.ProductID = L2.ProductID -- not clear if correct
);
Thanks for your help and suggestions
You can also do this with window functions:
select o.*
from (Select o.*,
max(li.OrderDate) over (partition by li.product) as maxOrderDate
from Order o INNER JOIN
LineItem li
ON o.OrderId = li.OrderId
) o
where o.maxOrderDate >= '2014-01-24';
You might actually want select distinct in the outer query, to prevent duplicates if one order has multiple products shipped after the given date.
As for your query, you can simplify it. The order table is not needed:
SELECT o.*
FROM Order o INNER JOIN
LineItem li
ON o.OrderId = li.OrderId
WHERE li.Product IN (SELECT li.Product
FROM LineItem li and li.OrderDate >= '2014-01-24'
);
You can also do this with window functions:
select o.*
from (Select o.*,
max(li.OrderDate) over (partition by li.product) as maxProductOrderDate
from Order o INNER JOIN
LineItem li
ON o.OrderId = li.OrderId
) o
where o.maxProductOrderDate >= '2014-01-24';
You might actually want select distinct in the outer query, to prevent duplicates if one order has multiple products shipped after the given date.
As for your query, you can simplify it because you do not need the order table in the subquery, unless you need it for filtering purposes:
SELECT o.*
FROM Order o INNER JOIN
LineItem li
ON o.OrderId = li.OrderId
WHERE li.Product IN (SELECT li.Product
FROM LineItem li
WHERE li.OrderDate >= '2014-01-24'
);
You probably want select distinct o.* in the outer query, to avoid duplicates when an order has two or more products that match the condition.
To get a result set with 1 row per order (meaning you're not interesting in line item data, just the order summary), something like this should do:
select o.*
from ( select distinct OrderId
from dbo.LineItem t1
where exists ( select *
from dbo.LineItem t2
where t2.Product = t1.Product
and t2.OrderDate >= #SomeLowerBoundDateTimeValue
)
) t
join dbo.Order o on o.OrderId = t.OrderId
The first item in the from clause is a derived table consisting of the set of order ids associated with a product that was part of an order dated on or after the specified date. Having done that, the rest is trival: just join against the order table.
Generally, for performance, you want to use correlated subqueries with [not] exists (...) in preference to uncorrelated subqueries with [not] in (...).
exists short circuits as soon as possible; in does not as it must construct the entire result set of the subquery.
I believe this is going to be close to what you're looking for.
All orders that have at least one productID that matches any product ID in an order 1/24/2014 or later.
SELECT
O.*
FROM
dbo.[Order] O
INNER JOIN dbo.LineItem LI
ON O.OrderID = LI.OrderID
WHERE
EXISTS (
SELECT *
FROM
dbo.LineItem L2
INNER JOIN dbo.LineItem L3
ON L2.ProductID = L3.ProductID
INNER JOIN dbo.[Order] O2
ON L3.OrderID = O2.OrderID
WHERE
O2.OrderDate >= '20140124'
AND O.OrderID = L2.OrderID
)
;
first i guess that your result should be OrderId: 2 and 3 because OrderDate is 01/24...
If you want to get that result you could try to do this.
Select o1.OrderId,o1.CustomerName,l1.OrderDate,l1.ProductId
from Order o1 INNER JOIN
LineItem l1
ON o1.OrderId = l1.OrderId
where l1.OrderDate >= '01/242014'
Hope this works and solve your question.
Regards!!!
This is what you're looking for, I believe.
Here's what's happening:
JOIN LineItem liBase: grab the initial records from LineItem based on the MinDate specification
JOIN LineItem liMatches: Self JOIN to to the LineItem table using the ProductIDs collected in the initial JOIN
JOIN LineItem projection: Using the OrderIDs collected from in the previous JOIN, grab the records from the LineItem table (in an additional self JOIN)
SELECT projection.*: projection is the set of results that we are after. SELECT them
Here's the query:
;WITH parms (
MinDate
) AS (
SELECT CONVERT(DATETIME, '01/24/2014')
)
SELECT projection.*
FROM parms p
JOIN LineItem liBase
ON liBase.OrderDate >= p.MinDate
JOIN LineItem liMatches
ON liMatches.ProductId = liBase.ProductId
JOIN LineItem projection
ON projection.OrderId = liMatches.OrderId
ORDER BY projection.OrderId
;
Same query, but with data generation (generates the LineItem and Order data sets that you presented in your question).
;WITH parms (
MinDate
) AS (
SELECT CONVERT(DATETIME, '01/24/2014')
)
, LineItem (
OrderId
, LineItemID
, OrderDate
, ProductId
) AS (
SELECT 1, 1, CONVERT(DATETIME, '01/23/2014'), 'B' UNION
SELECT 4, 1, CONVERT(DATETIME, '01/23/2014'), 'C' UNION
SELECT 2, 1, CONVERT(DATETIME, '01/24/2014'), 'A' UNION
SELECT 3, 1, CONVERT(DATETIME, '01/24/2014'), 'B'
)
, [Order] (
OrderId
, CustomerName
) AS (
SELECT 1, 'ABC' UNION
SELECT 2, 'XYZ' UNION
SELECT 3, 'DEF'
)
SELECT projection.*
FROM parms p
JOIN LineItem liBase
ON liBase.OrderDate >= p.MinDate
JOIN LineItem liMatches
ON liMatches.ProductId = liBase.ProductId
JOIN LineItem projection
ON projection.OrderId = liMatches.OrderId
ORDER BY projection.OrderId
;
I've looked through all the documentation and I'm having an issue putting together this query in Sequel.
select a.*, IFNULL(b.cnt, 0) as cnt FROM a LEFT OUTER JOIN (select a_id, count(*) as cnt from b group by a_id) as b ON b.a_id = a.id ORDER BY cnt
Think of table A as products and table B is a record indicated A was purchased.
So far I have:
A.left_outer_join(B.group_and_count(:a_id), a_id: :id).order(:count)
Essentially I just want to group and count table B, join it with A, but since B does not necessarily have any records for A and I'm ordering it by the number in B, I need to default a value.
DB[:a].
left_outer_join(DB[:b].group_and_count(:a_id).as(:b), :a_id=>:id).
order(:cnt).
select_all(:a).
select_more{IFNULL(:b__cnt, 0).as(:cnt)}
I can help you in MS SQL syntax.
Let's say your tables are Product and Order.
CREATE TABLE Product (
Id INT NOT NULL,
NAME VARCHAR(100) NOT NULL)
CREATE TABLE [Order] (
Id INT NOT NULL,
ProductId INT)
INSERT INTO Product (Id, Name) VALUES
(1, 'Tea'), (2, 'Coffee'), (3, 'Hot Chocolate')
INSERT INTO [Order] (Id, ProductId) VALUES
(1, 1), (2, 1), (3, 1), (4, 2)
This query will give the number of orders each product has, including ones without any orders.
SELECT p.Id AS ProductId,
p.Name AS ProductName,
COUNT(o.Id) AS Orders
FROM Product p
LEFT OUTER JOIN [Order] o
ON p.Id = o.ProductId
GROUP BY
p.Id,
p.Name
ORDER BY
COUNT(o.Id) DESC
IN SQL Server, I have a result set from a joined many:many relationship.
Considering Products linked to Orders via a link table ,
Table - Products
ID
ProductName
Table - Orders
ID
OrderCountry
LinkTable OrderLines (columns not shown)
I'd like to be able to filter these results to show only the results where for an entity from one table, all the values in the other table only have a given value in a particular column. In terms of my example, for each product, I want to return only the joined rows when all the orders they're linked to are for country 'uk'
So if my linked result set is
productid, product, orderid, ordercountry
1, Chocolate, 1, uk
2, Banana, 2, uk
2, Banana, 3, usa
3, Strawberry, 4, usa
I want to filter so that only those products that have only been ordered in the UK are shown (i.e. Chocolate). I'm sure this should be straight-forward, but its Friday afternoon and the SQL part of my brain has given up for the day...
You could do something like this, where first you get all products only sold in one country, then you proceed to get all orders for those products
with distinctProducts as
(
select LinkTable.ProductID
from Orders
inner join LinkTable on LinkTable.OrderID = Orders.ID
group by LinkTable.ProductID
having count(distinct Orders.OrderCountry) = 1
)
select pr.ID as ProductID
,pr.ProductName
,o.ID as OrderID
,o.OrderCountry
from Products pr
inner join LinkTable lt on lt.ProductID = pr.ID
inner join Orders o on o.ID = lt.OrderID
inner join distinctProducts dp on dp.ProductID = pr.ID
where o.OrderCountry = 'UK'
In the hope that some of this may be generally reusable:
;with startingRS (productid, product, orderid, ordercountry) as (
select 1, 'Chocolate', 1, 'uk' union all
select 2, 'Banana', 2, 'uk' union all
select 2, 'Banana', 3, 'usa' union all
select 3, 'Strawberry', 4, 'usa'
), countryRankings as (
select productid,product,orderid,ordercountry,
RANK() over (PARTITION by productid ORDER by ordercountry) as FirstCountry,
RANK() over (PARTITION by productid ORDER by ordercountry desc) as LastCountry
from
startingRS
), singleCountry as (
select productid,product,orderid,ordercountry
from countryRankings
where FirstCountry = 1 and LastCountry = 1
)
select * from singleCountry where ordercountry='uk'
In the startingRS, you put whatever query you currently have to generate the intermediate results you've shown. The countryRankings CTE adds two new columns, that ranks the countries within each productid.
The singleCountry CTE reduces the result set back down to those results where country ranks as both the first and last country within the productid (i.e. there's only a single country for this productid). Finally, we query for those results which are just from the uk.
If you want, for example, all productid rows with a single country of origin, you just skip this last where clause (and you'd get 3,strawberry,4,usa in your results also)
So is you've got a current query that looks like:
select p.productid,p.product,o.orderid,o.ordercountry
from product p inner join order o on p.productid = o.productid --(or however these joins work for your tables)
Then you'd rewrite the first CTE as:
;with startingRS (productid, product, orderid, ordercountry) as (
select p.productid,p.product,o.orderid,o.ordercountry
from product p inner join order o on p.productid = o.productid
), /* rest of query */
Hmm. Based on Philip's earlier approach, try adding something like this to exclude rows where there's been the same product ordered in another country:
SELECT pr.Id, pr.ProductName, od.Id, od.OrderCountry
from Products pr
inner join LinkTable lt
on lt.ProductId = pr.ID
inner join Orders od
on od.ID = lt.OrderId
where
od.OrderCountry = 'UK'
AND NOT EXISTS
(
SELECT
*
FROM
Products MatchingProducts
inner join LinkTable lt
on lt.ProductId = MatchingProducts.ID
inner join Orders OrdersFromOtherCountries
on OrdersFromOtherCountries.ID = lt.OrderId
WHERE
MatchingProducts.ID = Pr.ID AND
OrdersFromOtherCountries.OrderCountry != od.OrderCountry
)
;WITH mytable (productid,ordercountry)
AS
(SELECT productid, ordercountry
FROM Orders od INNER JOIN LinkTable lt ON od.orderid = lt.OrderId)
SELECT * FROM mytable
INNER JOIN dbo.Products pr ON pr.productid = mytable.productid
WHERE pr.productid NOT IN (SELECT productid FROM mytable
GROUP BY productid
HAVING COUNT(ordercountry) > 1)
AND ordercountry = 'uk'
SELECT pr.Id, pr.ProductName, od.Id, od.OrderCountry
from Products pr
inner join LinkTable lt
on lt.ProductId = pr.ID
inner join Orders od
on od.ID = lt.OrderId
where od.OrderCountry = 'UK'
This probably isn't the most efficient way to do this, but ...
SELECT p.ProductName
FROM Product p
WHERE p.ProductId IN
(
SELECT DISTINCT ol.ProductId
FROM OrderLines ol
INNER JOIN [Order] o
ON ol.OrderId = o.OrderId
WHERE o.OrderCountry = 'uk'
)
AND p.ProductId NOT IN
(
SELECT DISTINCT ol.ProductId
FROM OrderLines ol
INNER JOIN [Order] o
ON ol.OrderId = o.OrderId
WHERE o.OrderCountry != 'uk'
)
TestData
create table product
(
ProductId int,
ProductName nvarchar(50)
)
go
create table [order]
(
OrderId int,
OrderCountry nvarchar(50)
)
go
create table OrderLines
(
OrderId int,
ProductId int
)
go
insert into Product VALUES (1, 'Chocolate')
insert into Product VALUES (2, 'Banana')
insert into Product VALUES (3, 'Strawberry')
insert into [order] values (1, 'uk')
insert into [order] values (2, 'uk')
insert into [order] values (3, 'usa')
insert into [order] values (4, 'usa')
insert into [orderlines] values (1, 1)
insert into [orderlines] values (2, 2)
insert into [orderlines] values (3, 2)
insert into [orderlines] values (4, 3)
insert into [orderlines] values (3, 2)
insert into [orderlines] values (3, 3)