Join query for multiple tables with condition - sql

I'm using MSsql and I'm having a difficult time trying to get the results from a SELECT query. I have 3 tables.
First table Product
second table Seller
third table Customer
(data about customers - buyers and sellers).
select * from Product;
id(PK) | name_product
----------------------
1 | apple
2 | orange
3 | juice
select * from Seller;
id_seller(PK) | id_product | product_placement_date
---------------------------------------------------
45 | 3 | 2020-01-09
46 | 3 | 2020-01-05
58 | 2 | 2020-02-08
49 | 2 | 2020-01-04
43 | 1 | 2020-01-06
select * from Customer;
id_customer(PK) | name_customer
---------------------------
43 | Alice
45 | Sam
46 | Katy
49 | Soul
58 | Fab
I'm looking to select the name of the product and the first seller that placed that product ( given the first placement date ).
I've tried with this :
SELECT C.name_product,
P.mindate,
P.name_customer
FROM Product AS C
CROSS APPLY(SELECT MIN(S.product_placement_date) as mindate,
T.name_customer
FROM Seller AS S
JOIN Customer AS T ON T.id_customer = S.id_seller
WHERE S.id_product = C.id) AS P
But I am not getting correct result. I want results as shown below:
name_product | product_placement_date | name_customer
-----------------------------------------------------
apple | 2020-01-06 | Alice
orange | 2020-01-04 | Soul
juice | 2020-01-05 | Katy
Please advise

Looks like you may have an issue with seller table. It APPEARS that the seller ID is the foreign key to the customer table. This would indicate that you would never allow the seller to sell any other item on any other date... unless the primary key for the table was the Seller ID, the item sold and the date thus pulling all 3 columns. I would expect the "Seller" table really be a "SellING" table and be more to a context of
SellingID (PK) | id_seller | id_product | product_placement_date
---------------------------------------------------
1 | 45 | 3 | 2020-01-09
2 | 46 | 3 | 2020-01-05
3 | 58 | 2 | 2020-02-08
4 | 49 | 2 | 2020-01-04
5 | 43 | 1 | 2020-01-06
Next consideration is what if two or more people were selling oranges and listed on the same day. On your original, you have no context of who listed theirs first... Or would you want ALL people who listed their product on the earliest date. Of which you could have both names shown. By having this "selling" table with a "sellingid" column as an auto-increment, you would then be able to KNOW who was first based on the earliest SELLINGID for a given product because somebody would have to commit their record first, even if on the same day. Then you might end up with something like
select
p.name_product,
S2.product_placement_date,
c.name_customer
from
( select id_product,
min( sellingid ) as FirstListedID
from
selling
group by
id_product ) First
join selling S2
on First.FirstListedID = s2.sellingID
join customer c
on S2.id_seller = c.id_customer
join product p
on S2.id_product = p.id
Here, the pre-query of selling activity to the alias "First" represents a single list of all products with the first selling ID instance sold regardless of the date per the explanation why and using the auto-increment in the case of multiple people offering on the same date.
Once that is done, re-join to the original selling table on that first "ID". Then you can join out to the product and customer for the final details.

SELECT P.name_product,
S.product_placement_date,
S.name_customer
FROM Product AS P
CROSS APPLY(SELECT TOP 1 S.product_placement_date,
C.name_customer
FROM Seller AS S
INNER JOIN Customer AS C ON C.id_customer = S.id_seller
WHERE S.id_product = P.id
ORDER BY S.product_placement_date) AS S

Related

How to write this query to avoid cartesian product?

I want to create a CSV export for orders showing the warehouse_id where each order_item had shipped from, if available.
For brevity, here is the pertinent schema:
create table o (id integer);
orders have many order_items:
create table oi (id integer, o_id integer, sku text, quantity integer);
For each order_item in the CSV we want to show a warehouse_id from where it shipped out of. But that is not stored in order_items. It is stored in the shipment.
An order can be split up into many shipments from potentially from different warehouses.
create table s (id integer, o_id integer, warehouse_id integer);
shipments have many shipment items too:
create table si (id integer, s_id integer, oi_id integer, quantity_shipped integer);
How do I extract the warehouse_id for each order_item, given that warehouse_id is on the shipment and not every order has shipped yet (may not have a shipment record or shipment_items).
We are doing something like this (simplified):
select oi.sku, s.warehouse_id from oi
left join s on s.o_id = oi.o_id;
However if an order has 2 order items, let's call them sku A and B. And that order was split into two shipments where A was shipped from warehouse '50' and then a second shipment shipped B from '200'.
What we want would be a CSV output like:
sku | warehouse_id
-----|--------------
A | 50
B | 200
But what we get is some kind of cartesian product:
=================================
Here is the sample data:
select * from o;
id
----
1
(1 row)
select * from oi;
id | o_id | sku | quantity
----+------+-----+----------
1 | 1 | A | 1
2 | 1 | B | 1
(2 rows)
select * from s;
id | o_id | warehouse_id
----+------+--------------
1 | 1 | 50
2 | 1 | 200
(2 rows)
select * from si;
id | s_id | oi_id
----+------+------
1 | 1 | 1
2 | 2 | 2
(2 rows)
select oi.sku, s.warehouse_id from oi left join s on s.o_id = oi.o_id;
sku | warehouse_id
-----+--------------
A | 50
A | 200
B | 50
B | 200
(4 rows)
UPDATE ========
Per spencer, I'm adding a different example with different pk ids for more clarity. The following is 2 example orders. Order 2 has items A,B,C. A,B are shipped from shipment 200, C is shipped from shipment 201. Order 3 has 2 items E and A. E is not yet shipped and A is shipped twice out of the same warehouse '700', (like it was on back order).
# select * from o;
id
----
2
3
(2 rows)
# select * from oi;
id | o_id | sku | quantity
-----+------+-----+----------
100 | 2 | A | 1
101 | 2 | B | 1
102 | 2 | C | 1
103 | 3 | E | 1
104 | 3 | A | 2
(5 rows)
# select * from s;
id | o_id | warehouse_id
-----+------+--------------
200 | 2 | 700
201 | 2 | 800
202 | 3 | 700
203 | 3 | 700
(4 rows)
# select * from si;
id | s_id | oi_id
-----+------+-------
300 | 200 | 100
301 | 200 | 101
302 | 201 | 102
303 | 202 | 104
304 | 203 | 104
(5 rows)
I think this works, I use left join to keep the order_items in the report no matter if the order has shipped or not, I use group by to squash multiple shipments from the same warehouse. I believe this is what I need.
# select oi.o_id, oi.id, oi.sku, s.warehouse_id from oi left join si on si.oi_id = oi.id left join s on s.id = si.s_id group by oi.o_id, oi.id, oi.sku, s.warehouse_id order by oi.o_id;
o_id | id | sku | warehouse_id
------+-----+-----+--------------
2 | 102 | C | 800
2 | 101 | B | 700
2 | 100 | A | 700
3 | 104 | A | 700
3 | 103 | E |
(5 rows)
Order items that have shipped ...
SELECT oi.id
, oi.sku
, s.warehouse_id
FROM oi
JOIN si ON si.oi_id = oi.id
JOIN s ON s.id = si.s_id
Order items that haven't yet shipped, using anti-join to exclude rows where there is a matching row in si
SELECT oi.id
, oi.sku
, s.warehouse_id
FROM oi
JOIN s ON s.o_id = oi.o_id -- fk to fk shortcut join
-- anti-join
LEFT
JOIN si ON si.oi_id = oi.id
WHERE si.oi_id IS NULL
But this will still produce a (partial) Cartesian product. We can add a GROUP BY clause to collapse the rows...
GROUP BY si.oi_id
This doesn't avoid producing an intermediate cartesian product; the addition of the GROUP BY clause collapses the set. But it's indeterminate which of matching rows from s column values will be returned from.
The two queries could be combined with a UNION ALL operation. If I did that, I'd likely add a discriminator column (an additional column in each query with different values, which would tell which query returned a row.)
This set might meet the specification outlined in the OP question. But I don't think this is really the set that needs to be returned. Figuring out which warehouse an item should ship from may involve several factors... total quantity ordered, quantity available in each warehouse, can order be fulfilled from one warehouse, which warehouse is closer to delivery destination, etc.
I don't want to leave anyone with the impression that this query is really a "fix" for the cartesian product problem... this query just hides a bigger problem.
I think you need the si table:
select oi.sku, s.warehouse_id
from si join
oi
on si.o_id = oi.o_id join
s
on s.s_id = si.s_id;
si seems to be the proper junction table between the tables. I'm not sure why there is another join key that doesn't use it.

HQL: DISTINCT Issue

Say I have a table of Customer, Vendor the Customer visited, with each row being a distinct time a certain customer visited a vendor.
Row | Customer | Vendor
1 | 1 | 001
2 | 1 | 001
3 | 1 | 002
4 | 2 | 001
My question is, how can i pull a query to show every distinct visit to a certain vendor. For the above table, I'd like to see output of:
Row | Customer | Vendor
1 | 1 | 001
2 | 1 | 002
3 | 2 | 001
You can simply use DISTINCT clause, assuming that the row column is just for illustration purpose here, and not part of the actual table
SELECT DISTINCT customer, vendor
FROM table
You can use group by:
select min(row) as row, Customer, Vendor
from table t
group by Customer, Vendor;

Find supplier which supplies a product that others don't

I'm trying to write an SQL query which selects the supplier based on the fact that it can supply a product other suppliers cannot.
I have 2 columns:
Supplier and Product
How would I select all the suppliers which supply at least 1 product which other suppliers do not supply?
I currently have:
SELECT incart.product, incart.supplier
FROM incart
WHERE incart.product
HAVING count(incart.supplier)=1
;
Try this:
SELECT
i1.supplier
FROM incart i1
WHERE i1.product NOT IN(SELECT product
FROM incart i2
WHERE i1.supplier <> i2.supplier);
For example, for the following sample data:
| PRODUCT | SUPPLIER |
|---------|----------|
| 1 | a |
| 2 | b |
| 3 | b |
| 2 | c |
| 3 | c |
| 4 | c |
It will select suppliers a and c, because supplier a supplies product 1 which others don't, and supplier c supplies product 4 which others don't.
SQL Fiddle Demo

How to get a MAX and a COUNT from a three table join?

I got an interview question where there's a Car sale modeled in a DB. Each Car represents a physical car in a Car sale which refers to a Make and a Model table. A Sale table keeps track of each Car that is sold. A Sale only consists of one Car, so there's a record in Sale per every unique Car that had been sold.
The question was to find-out the name of the most sold Model in the car sale. I answered with a 3-level nested query. The interviewer specifically asked for a solution using joins where I only succeeded in just joining the tables without the aggregates.
How would you join 3 tables as below (Car, Make, Sale) while using two other aggregates?
Here's a rough sketch of the schema. The most sold Model here should return 'Corolla'
Car
| carid| modid | etc...
_________________
| 1 | 1 |
| 2 | 1 |
| 3 | 1 |
| 4 | 2 |
| 5 | 2 |
Make
| mkid | name |
_________________
| 1 | Toyota |
| 2 | Nissan |
| 3 | Chevy |
| 4 | Merc |
| 5 | Ford |
Model
| modid| name | mkid |
________________________
| 1 | Corolla| 1
| 2 | Sunny | 2
| 3 | Carina | 1
| 4 | Skyline| 2
| 5 | Focus | 5
Sale
| sid | carid | etc...
_________________
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
| 5 | 5 |
Edit:
Using MS SQL Server 2008
Output needed:
Model Name | Count
_____________________
Corolla | 3
i.e. The model of the Car that has been sold the most.
Notice only 3 Corollas and 2 Sunnys are in the Car table while Sale table corresponds to each of those with other sales detail. The 5 Sale records are actually Corolla, Corolla, Corolla, Sunnnu and Sunny.
Since you are using SQL Server 2008, make use of Common Table Expression and Window Function.
WITH recordList
AS
(
SELECT c.name, COUNT(*) [Count],
DENSE_RANK() OVER (ORDER BY COUNT(*) DESC) rn
FROM Sale a
INNER JOIN Car b
ON a.carid = b.carID
INNER JOIN Model c
ON b.modID = c.modID
GROUP BY c.Name
)
SELECT name, [Count]
FROM recordList
WHERE rn = 1
SQLFiddle Demo
When interviewers ask for this they usually want you to say that you'd use windowed functions. You could give each sale a unique ascending number partitioned by model and the highest sale number you'd get would be the max count.
http://www.postgresql.org/docs/9.1/static/tutorial-window.html
Following query works on oracle 11g . here's fiddle link
SELECT name FROM (
SELECT model.name AS name FROM car , sale , model
WHERE car.carid=sale.carid
AND car.modid=model.modid
GROUP BY model.name
ORDER BY count(*) DESC )
WHERE rownum = 1;
Or
SELECT name FROM (
SELECT model.name AS name FROM car natural join sale natural join model
GROUP BY model.name
ORDER BY count(*) DESC )
WHERE rownum = 1;
OUTPUT
| NAME |
-----------
| Corolla |
Based on your newly added SQL Server 2008 tag. If you are using a different RDBMS you'll probably need to use limit instead of top and place it at the end of the top_sold_car subquery.
select Make.name as Make, Model.name as Model
from (
select top 1 count(*) as num_sold
from Car
group by modid
order by num_sold desc) as top_sold_car
join Model
on (top_sold_car.modid = Model.modid)
join Make
on (Model.mkid = Make.mkid)

Get SUM in GROUP BY with JOIN using MySQL

I have two tables in MySQL 5.1.38.
products
+----+------------+-------+------------+
| id | name | price | department |
+----+------------+-------+------------+
| 1 | Fire Truck | 15.00 | Toys |
| 2 | Bike | 75.00 | Toys |
| 3 | T-Shirt | 18.00 | Clothes |
| 4 | Skirt | 18.00 | Clothes |
| 5 | Pants | 22.00 | Clothes |
+----+------------+-------+------------+
ratings
+------------+--------+
| product_id | rating |
+------------+--------+
| 1 | 5 |
| 2 | 5 |
| 2 | 3 |
| 2 | 5 |
| 3 | 5 |
| 4 | 5 |
| 5 | 4 |
+------------+--------+
My goal is to get the total price of all products which have a 5 star rating in each department. Something like this.
+------------+-------------+
| department | total_price |
+------------+-------------+
| Clothes | 36.00 | /* T-Shirt and Skirt */
| Toys | 90.00 | /* Fire Truck and Bike */
+------------+-------------+
I would like to do this without a subquery if I can. At first I tried a join with a sum().
select department, sum(price) from products
join ratings on product_id=products.id
where rating=5 group by department;
+------------+------------+
| department | sum(price) |
+------------+------------+
| Clothes | 36.00 |
| Toys | 165.00 |
+------------+------------+
As you can see the price for the Toys department is incorrect because there are two 5 star ratings for the Bike and therefore counting that price twice due to the join.
I then tried adding distinct to the sum.
select department, sum(distinct price) from products
join ratings on product_id=products.id where rating=5
group by department;
+------------+---------------------+
| department | sum(distinct price) |
+------------+---------------------+
| Clothes | 18.00 |
| Toys | 90.00 |
+------------+---------------------+
But then the clothes department is off because two products share the same price.
Currently my work-around involves taking something unique about the product (the id) and using that to make the price unique.
select department, sum(distinct price + id * 100000) - sum(id * 100000) as total_price
from products join ratings on product_id=products.id
where rating=5 group by department;
+------------+-------------+
| department | total_price |
+------------+-------------+
| Clothes | 36.00 |
| Toys | 90.00 |
+------------+-------------+
But this feels like such a silly hack. Is there a better way to do this without a subquery? Thanks!
Use:
SELECT p.department,
SUM(p.price) AS total_price
FROM PRODUCTS p
JOIN (SELECT DISTINCT
r.product_id,
r.rating
FROM RATINGS r) x ON x.product_id = p.id
AND x.rating = 5
GROUP BY p.department
Technically, this does not use a subquery - it uses a derived table/inline view.
The primary reason you are having trouble finding a solution is that the schema as presented is fundamentally flawed. You shouldn't allow a table to have two rows that are complete duplicates of each other. Every table should have a means to uniquely identify each row even if it is the combination of all columns. Now, if we change the ratings table so that it has an AUTO_INCREMENT column called Id, the problem is easier:
Select products.department, Sum(price) As total_price
From products
Left Join ratings As R1
On R1.product_id = products.id
And R1.rating = 5
Left Join ratings As R2
On R2.product_id = R1.product_id
And R2.rating = R1.rating
And R2.Id > R1.Id
Where R2.Id Is Null
Group By products.department
You can do two queries. First query:
SELECT DISTINCT product_id FROM ratings WHERE rating = 5;
Then, take each of those ID's and manually put them in the second query:
SELECT department, Sum(price) AS total_price
FROM products
WHERE product_id In (1,2,3,4)
GROUP BY department;
This is the work-around for not being able to use subqueries. Without them, there is no way to eliminate the duplicate records caused by the join.
I can't think of any way to do it without a subquery somewhere in the query. You could perhaps use a View to mask the use of a subquery.
Barring that, your best bet is probably to find the minimum data set needed to make the calculation and do that in the front end. Whether or not that's possible depends on your specific data - how many rows, etc.
The other option (actually, maybe this is the best one...) would be to get a new ORM or do without it altogether ;)
This view would allow you to bypass the subquery:
CREATE VIEW Distinct_Product_Ratings
AS
SELECT DISTINCT
product_id,
rating
FROM
Ratings