Using sql to sum with multiple table calls - sql

I'll get down to the point. So basically I have 3 tables structured as follows:
orders:
i_id | o_id | quantity
-----+--------+----------
1 | 1 | 5
2 | 2 | 2
1 | 3 | 3
1 | 4 | 3
2 | 5 | 4
orderinfos:
o_id | c_id
------+------------
1 | 1
2 | 2
3 | 2
4 | 1
5 | 2
customers:
c_id | name_id
----------+----------
1 | 100001
2 | 100002
then the resulting chart would be:
name_id | i_id | quantity
-----------+----------+----------
100001 | 1 | 8
100002 | 2 | 6
100002 | 1 | 3
So basically, you have a summary of something (in this case, orders) with their quantity, and then where each order has the customer id and the item name associated. Then the resulting chart would be something that gives the quantity per customer, per item, in descending order by the customer. My first implementation was this:
select quantCust.custIdName, quantCust.itemId, quantCust.quant
from
(select O.i_id as itemId,
C.name_id as custIdName,
sum(O.quantity) as quant
from orders as O, orderinfos as I, customers as C
where O.o_id = I.o_id and I.c_id = C.c_id
group by O.i_id, I.c_id) as quantCust
order by quantCust.custId, quantCust.quant desc;
which does not print the correct values.

I think you're close with your approach, but I recommend using explicit JOIN syntax, and using the aggregate SUM (along with GROUP BY) to get your totals:
SELECT c.name_id, i_id, SUM(quant) AS quant
FROM customers c
INNER JOIN orderinfo oi
ON c.c_id = oi.c_id
INNER JOIN orders o
ON oi.o_id = o.o_id
GROUP BY c.name_id, i_id
ORDER BY c.name_id, quant DESC
This works for me with your sample data, giving the desired output that you indicate.

Related

How to join tables only with the latest record in SQL SERVER [duplicate]

This question already has answers here:
Join to only the "latest" record with t-sql
(7 answers)
Fetch the rows which have the Max value for a column for each distinct value of another column
(35 answers)
Closed 4 months ago.
I want to list all customer with the latest phone number and most recent customer type
the phone number and type of customers are changing periodically so I want the latest record only without getting old values based on the lastestupdate column
Customer:
+------------+--------------------+------------+
|latestUpdate| CustID | AddID | TypeID |
+------------+--------+-----------+-------------
| 2020-03-01 | 1 | 1 | 1 |
| 2020-04-07 | 2 | 2 | 2 |
| 2020-06-13 | 3 | 3 | 3 |
| 2020-03-29 | 4 | 4 | 4 |
| 2020-02-06 | 5 | 5 | 5 |
+------------+--------+------------+----------+
CustomerAddress:
+------------+--------+-----------+
|latestUpdate| AddID | Mobile |
+------------+--------+-----------+
| 2020-03-01 | 1 | 66666 |
| 2020-04-07 | 1 | 55555 |
| 2020-06-13 | 2 | 99999 |
| 2020-03-29 | 3 | 11111 |
| 2020-02-06 | 3 | 22222 |
+------------+--------+-----------+
CustomerType:
+------------+--------+-----------+
|latestUpdate| TypeId | TypeName |
+------------+--------+-----------+
| 2020-03-01 | 1 | First |
| 2020-04-07 | 1 | Second |
| 2020-06-13 | 3 | Third |
| 2020-03-29 | 4 | Fourth |
| 2020-02-06 | 5 | Fifth |
+------------+--------+-----------+
When I tried to join I am always getting duplicated customerID not only the latest record
I want to Display Customer.CustID and CustomerType.TypeName and CustomerAddress.Mobile
You need to make sub-queries for most recent customer type and latest phone number like this:
SELECT *
FROM (
SELECT latestUpdate, CustID, AddID, TypeID,
ROW_NUMBER() OVER (PARTITION BY CustID ORDER BY latestUpdate DESC) AS RowNumber
FROM Customer
) AS c
INNER JOIN (
SELECT latestUpdate, AddID, Mobile,
ROW_NUMBER() OVER (PARTITION BY AddId ORDER BU ltestUpdate DESC) AS RowNumber
FROM CustomerAddress
) AS t
ON c.AddId = t.AddId
INNER JOIN CustomerType ct
ON ct.TypeId = c.TypeId
WHERE c.RowNumber = 1
AND t.RowNumber = 1
A simpler way than using row_number would be using cross apply together with top 1 in an ordered subquery:
select c.CustId, p.Mobile
from Customer c
cross apply (
select top 1 Mobile
from CustomerAddress a
where c.CustId = a.AddId
order by a.latestUpdate
) p
You need to use some subqueries :
SELECT *
FROM Customer AS C
LETF OUTER JOIN (SELECT *, ROW_NUMBER() OVER(PARTITION BY CustID ORDER BY LastestUpdate DESC) AS N
FROM CustomerAddress) AS A
ON C.CustID = A.CustID AND N = 1
LETF OUTER JOIN (SELECT *, ROW_NUMBER() OVER(PARTITION BY CustID ORDER BY LastestUpdate DESC) AS N
FROM CustomerType) AS T
ON C.CustID = T.CustID AND N = 1
If you have had used Temporal table which is an ISO SQL Standard feature for data history of table, you will always have the lastest rows inside the main table, old rows stays into history table and can be queried with a time point or date interval restriction.
This is it:
select * from (select *,RANK() OVER (
PARTITION BY b.AddID
ORDER BY b.latestUpdate DESC,
) as rank1
from
Customer a
left join
CustomerAddress b
on
a.AddID=b.AddID
left join
CustomerType c
on
v.TypeId =c.TypeId
) where rank1=1;
You should join the tables using the "APPLY" operator.
See: Link

Getting an empty result with 'AND' operation and wrong result with 'OR' operation in SQL

I have two tables and I want to find out the customer_id and customer_name of all customers who bought product A and B both.
Customers table:
+-------------+---------------+
| customer_id | customer_name |
+-------------+---------------+
| 1 | Daniel |
| 2 | Diana |
| 3 | Elizabeth |
| 4 | Jhon |
+-------------+---------------+
Orders table:
+------------+--------------+---------------+
| order_id | customer_id | product_name |
+------------+--------------+---------------+
| 10 | 1 | A |
| 20 | 1 | B |
| 30 | 1 | D |
| 40 | 1 | C |
| 50 | 2 | A |
| 60 | 3 | A |
| 70 | 3 | B |
| 80 | 3 | D |
| 90 | 4 | C |
+------------+--------------+---------------+
In this example only the customers with id 1 and 3 have bought both the product A and B.
To find that i wrote this code -
SELECT distinct c.customer_id,
c.customer_name
from customers c inner join orders o
on c.customer_id = o.customer_id
where o.product_name = 'A' and o.product_name = 'B'
When I am doing this I am getting an empty result.
So tried to use OR -
SELECT distinct c.customer_id,
c.customer_name
from customers c inner join orders o
on c.customer_id = o.customer_id
where o.product_name = 'A' or o.product_name = 'B'
output -
customer_name customer_id
Daniel 1
Diana 2
Elizabeth 3
Based on OR it is working right but I am still not getting the result I am trying to find. Because customer with id 2 only bought A and not Product B. And Using AND bringing me an empty result.
I always feel confused with AND and OR operations. can someone help?
If you want both use aggregation:
select c.customer_id, c.customer_name
from customers c inner join
orders o
on c.customer_id = o.customer_id
where o.product_name in ('A', 'B')
group by c.customer_id, c.customer_name
having count(distinct product_name) = 2;
Note: This assumes that the data could have multiple rows for a customer and product. If that is not possible, just use count(*) = 2 for performance reasons.

Join tables - show every row of the left table only once and add a row with data that is not connected to the table

I have a table that look like this:
products:
order_id prices
_______ _____
2 20
3 11
null 40
Orders:
id number
1 30
2 50
3 10
4 10
I want to get the following table:
id number price
-- ------ -----
1 30 null
2 50 20
3 10 11
4 10 null
null(0) null(0) 40
foreign key are obviously the order_id -> orders. can be null.
As you probably can see i want to include all the rows from table orders if there is a link to products combine them.
and if there is no link just show null and the '40' (sum of 'disconnected' products)
Can anyone help me please?
I think you want a full join:
select o.id o.number, p.price
from orders o full join
products p
on p.order_id = o.id;
You need a left join from orders to products and union with the products that have null as order_id:
select o.id, o.number, p.prices
from orders o left join products p
on p.order_id = o.id
union all
select null, null, p.prices
from products p
where p.order_id is null
Retrieve the orders
SELECT orders.id, orders.number
FROM orders;
id | number
----+--------
1 | 30
2 | 50
3 | 10
4 | 10
Retrieve the prices associated to orders
SELECT orders.id, orders.number, products.prices
FROM orders
LEFT JOIN products ON orders.id = products.order_id;
id | number | prices
----+--------+--------
1 | 30 |
2 | 50 | 20
3 | 10 | 11
4 | 10 |
Retrieve the prices associated to orders as well as the products without order associated
SELECT orders.id, orders.number, products.prices
FROM orders
FULL JOIN products ON orders.id = products.order_id;
id | number | prices
----+--------+--------
1 | 30 |
2 | 50 | 20
3 | 10 | 11
| | 40
4 | 10 |
Sum the prices with no order associated. We see no difference here since there is only one product with no order associated (the one where order_id is null), but you asked for the sum of these prices so here you go :-)
SELECT orders.id, orders.number, SUM(products.prices) AS prices
FROM orders
FULL JOIN products ON orders.id = products.order_id
GROUP BY orders.id, orders.number;
id | number | prices
----+--------+--------
1 | 30 |
2 | 50 | 20
3 | 10 | 11
| | 40
4 | 10 |
Use your null(0) label and order by id
SELECT
coalesce(orders.id::varchar(255), 'null(0)') AS id,
coalesce(orders.number::varchar(255), 'null(0)') AS number,
SUM(products.prices) AS prices
FROM orders
FULL JOIN products ON orders.id = products.order_id
GROUP BY orders.id, orders.number
ORDER BY id;
id | number | prices
---------+---------+--------
1 | 30 |
2 | 50 | 20
3 | 10 | 11
4 | 10 |
null(0) | null(0) | 40

How to write this query to avoid cartesian product?

I want to create a CSV export for orders showing the warehouse_id where each order_item had shipped from, if available.
For brevity, here is the pertinent schema:
create table o (id integer);
orders have many order_items:
create table oi (id integer, o_id integer, sku text, quantity integer);
For each order_item in the CSV we want to show a warehouse_id from where it shipped out of. But that is not stored in order_items. It is stored in the shipment.
An order can be split up into many shipments from potentially from different warehouses.
create table s (id integer, o_id integer, warehouse_id integer);
shipments have many shipment items too:
create table si (id integer, s_id integer, oi_id integer, quantity_shipped integer);
How do I extract the warehouse_id for each order_item, given that warehouse_id is on the shipment and not every order has shipped yet (may not have a shipment record or shipment_items).
We are doing something like this (simplified):
select oi.sku, s.warehouse_id from oi
left join s on s.o_id = oi.o_id;
However if an order has 2 order items, let's call them sku A and B. And that order was split into two shipments where A was shipped from warehouse '50' and then a second shipment shipped B from '200'.
What we want would be a CSV output like:
sku | warehouse_id
-----|--------------
A | 50
B | 200
But what we get is some kind of cartesian product:
=================================
Here is the sample data:
select * from o;
id
----
1
(1 row)
select * from oi;
id | o_id | sku | quantity
----+------+-----+----------
1 | 1 | A | 1
2 | 1 | B | 1
(2 rows)
select * from s;
id | o_id | warehouse_id
----+------+--------------
1 | 1 | 50
2 | 1 | 200
(2 rows)
select * from si;
id | s_id | oi_id
----+------+------
1 | 1 | 1
2 | 2 | 2
(2 rows)
select oi.sku, s.warehouse_id from oi left join s on s.o_id = oi.o_id;
sku | warehouse_id
-----+--------------
A | 50
A | 200
B | 50
B | 200
(4 rows)
UPDATE ========
Per spencer, I'm adding a different example with different pk ids for more clarity. The following is 2 example orders. Order 2 has items A,B,C. A,B are shipped from shipment 200, C is shipped from shipment 201. Order 3 has 2 items E and A. E is not yet shipped and A is shipped twice out of the same warehouse '700', (like it was on back order).
# select * from o;
id
----
2
3
(2 rows)
# select * from oi;
id | o_id | sku | quantity
-----+------+-----+----------
100 | 2 | A | 1
101 | 2 | B | 1
102 | 2 | C | 1
103 | 3 | E | 1
104 | 3 | A | 2
(5 rows)
# select * from s;
id | o_id | warehouse_id
-----+------+--------------
200 | 2 | 700
201 | 2 | 800
202 | 3 | 700
203 | 3 | 700
(4 rows)
# select * from si;
id | s_id | oi_id
-----+------+-------
300 | 200 | 100
301 | 200 | 101
302 | 201 | 102
303 | 202 | 104
304 | 203 | 104
(5 rows)
I think this works, I use left join to keep the order_items in the report no matter if the order has shipped or not, I use group by to squash multiple shipments from the same warehouse. I believe this is what I need.
# select oi.o_id, oi.id, oi.sku, s.warehouse_id from oi left join si on si.oi_id = oi.id left join s on s.id = si.s_id group by oi.o_id, oi.id, oi.sku, s.warehouse_id order by oi.o_id;
o_id | id | sku | warehouse_id
------+-----+-----+--------------
2 | 102 | C | 800
2 | 101 | B | 700
2 | 100 | A | 700
3 | 104 | A | 700
3 | 103 | E |
(5 rows)
Order items that have shipped ...
SELECT oi.id
, oi.sku
, s.warehouse_id
FROM oi
JOIN si ON si.oi_id = oi.id
JOIN s ON s.id = si.s_id
Order items that haven't yet shipped, using anti-join to exclude rows where there is a matching row in si
SELECT oi.id
, oi.sku
, s.warehouse_id
FROM oi
JOIN s ON s.o_id = oi.o_id -- fk to fk shortcut join
-- anti-join
LEFT
JOIN si ON si.oi_id = oi.id
WHERE si.oi_id IS NULL
But this will still produce a (partial) Cartesian product. We can add a GROUP BY clause to collapse the rows...
GROUP BY si.oi_id
This doesn't avoid producing an intermediate cartesian product; the addition of the GROUP BY clause collapses the set. But it's indeterminate which of matching rows from s column values will be returned from.
The two queries could be combined with a UNION ALL operation. If I did that, I'd likely add a discriminator column (an additional column in each query with different values, which would tell which query returned a row.)
This set might meet the specification outlined in the OP question. But I don't think this is really the set that needs to be returned. Figuring out which warehouse an item should ship from may involve several factors... total quantity ordered, quantity available in each warehouse, can order be fulfilled from one warehouse, which warehouse is closer to delivery destination, etc.
I don't want to leave anyone with the impression that this query is really a "fix" for the cartesian product problem... this query just hides a bigger problem.
I think you need the si table:
select oi.sku, s.warehouse_id
from si join
oi
on si.o_id = oi.o_id join
s
on s.s_id = si.s_id;
si seems to be the proper junction table between the tables. I'm not sure why there is another join key that doesn't use it.

SQL Server : query grouping

I have some queries in SQL Server. I have two tables
keyword_text
Keyword_relate
Columns in keyword_text:
key_id
keywords
Columns in keyword_relate:
key_id
product_id
score
status
Sample data for keyword_text:
----|----------
1 | Pencil
2 | Pen
3 | Books
Sample data for keyword_relate:
----------------------------
Sno| Product | SCore|status
---------------------------
1 | 124 | 2 | 1
1 | 125 | 3 | 1
2 | 124 | 3 | 1
2 | 125 | 2 | 1
From this I want to get the product_id, grouped by keywords and which have maximum score
Presuming that key_id of first table is Sno in second table. You can use ROW_NUMBER:
WITH CTE AS
(
SELECT Product AS ProductID, Score As MaxScore,
RN = ROW_NUMBER() OVER (PARTITION BY kt.key_id ORDER BY Score DESC)
FROM keyword_text kt INNER JOIN keyword_relate kr
ON kt.key_id = kr.Sno
)
SELECT ProductID, MaxScore
FROM CTE
WHERE RN = 1