SQL: Bug in Joining two tables - sql

I have a item table from which i want to get Sum of item quantity
Query:
Select item_id, Sum(qty) from item_tbl group by item_id
Result:
==================
| ID | Quantity |
===================
| 1 | 10 |
| 2 | 20 |
| 3 | 5 |
| 4 | 20 |
The second table is invoice table from which i am getting the item quantity which is sold. I am joining these two tables as
Query:
Select item_tbl.item_id, Sum(item_tbl.qty) as [item_qty],
-isnull(Sum(invoice.qty),0) as [invoice_qty]
from item_tbl
left join invoice on item_tbl.item_id = invoice invoice.item_id group by item_tbl.item_id
Result:
=================================
| ID | item_qty | invoice_qty |
=================================
| 1 | 10 | -5 |
| 2 | 20 | -20 |
| 3 | 10 | -25 | <------ item_qty raised from 5 to 10 ??
| 4 | 20 | -20 |
I don't know if i am joining these tables in right way. Because i want to get everything from item table and available things from invoice table to maintain the inventory. So i use left join. Help please..
Modification
when i added group by item_id, qty i got this:
=================================
| ID | item_qty | invoice_qty |
=================================
| 1 | 10 | -5 |
| 2 | 20 | -20 |
| 3 | 5 | -5 |
| 3 | 5 | -20 |
| 4 | 20 | -20 |
As its a view so ID is repeated. what should i do to avoid this ??

Clearing things up, my answer from the comments explained:
While using left join operation (A left join B) - a record will be created for every matching B record to an A record, also - a record will be created for any A record that has no matching B record, using null values wherever needed to complement the fields from B.
I would advise reading up on Using Joins in SQL when approaching such problems.
Below are 2 possible solutions, using different assumptions.
Solution A
Without any assumptions regarding primary key:
We have to sum up the item quantity column to determine the total quantity, resulting in two sums that need to be performed, I would advise using a sub query for readability and simplicity.
select item_tbl.item_id, Sum(item_tbl.qty) as [item_qty], -isnull(Sum(invoice_grouped.qty),0) as [invoice_qty]
from item_tbl left join
(select invoice.item_id as item_id, Sum(invoice.qty) as qty from invoice group by item_id) invoice_grouped
on (invoice_grouped.item_id = item_tbl.item_id)
group by item_tbl.item_id
Solution B
Assuming item_id is primary key for item_tbl:
Now we know we can rely on the fact that there is only one quantity for each item_id, so we can do without the sub query by selecting any (max) of the item quantities in the join result, resulting in a quicker execution plan.
select item_tbl.item_id, Max(item_tbl.qty) as [item_qty], -isnull(Sum(invoice.qty),0) as [invoice_qty]
from item_tbl left join invoice on (invoice.item_id = item_tbl.item_id)
group by item_tbl.item_id

If your database design is following the common rules, item_tbl.item_id must be unique.
So just change your query:
Select item_tbl.item_id, item_tbl.qty as [item_qty],
-isnull(Sum(invoice.qty),0) as [invoice_qty]
from item_tbl
left join invoice on item_tbl.item_id = invoice invoice.item_id group by item_tbl.item_id, item_tbl.qty

Related

Linking 2 columns, same table to a different table

First time poster, a little background I am not the most experienced SQL user, most of my knowledge is self taught, but I really struggling to get the results I am looking for here so I am hoping someone can point me in the right direction.
In the simplest form
I have a table that has all of our Item_ID's. Each of those item numbers has a Universal_ID associated with it stored in the same table structure. Most of the time these numbers match, except in the example below Item_ID 2 has a Universal_ID of 1
Item_ID | Univeral_ID
1 | 1
2 | 1
We then have an inventory table, which can be linked on the ItemID to show the QTY
Item_ID | Item_Qty | Item_Code
1 | 10 | 2/2/2021
1 | 20 | 2/3/2021
2 | 30 | 2/2/2021
If the Item_ID and Universal_ID are the same, it is quite easy to obtain the inventory
However I am struggling to get inventories for both when they do not match.
For example, if I wanted to find the QTY of Item_ID 1, I would be returned 2 results
Item_ID | Item_Qty | Item_Code
1 | 10 | 2/2/2021
1 | 20 | 2/3/2021
Problem: if I specifically am interested in Item_ID 2, how can I link it to the inventory table, to see not only Item_ID 2's qty available and also Item_ID 1's qty available since the Universal_ID does not match the Item_ID
So I would like the results to be just like the 2nd block of code I posted.
Item_ID | Item_Qty | Item_Code
1 | 10 | 2/2/2021
1 | 20 | 2/3/2021
2 | 30 | 2/2/2021
What is the best way to set up views or my select query to make this happen? If I need to add any more info I can!
You can use a left join and filtering:
select i.*
from inventory i left join
universal u
on i.item_id = u.item_id
where 1 in (u.universal_id, i.item_id);

Make a 1 to 1 multi-field SQL join where only some of the values match

I am trying to build a table that will be used as a conversion chart. I aim to make a simple join with this conversion table on multiple fields (8 in my case), and get a result. I will try to simplify the examples as much as I can because the original chart is a 40x10 matrix.
Let's say that I have these two (I know they don't make much sense and have bad design but they are just examples):
supply_conversion_chart
---
supply (integer)
customer_id (integer)
product_id (integer)
size (varchar)
purchase_type (varchar)
purchases
---
customer_id (integer)
product_id (integer)
size (varchar)
purchase_type (varchar)
and conversion chart would look something like this:
| supply | customer_id | product_id | size | purchase_type |
|--------|--------------|------------|----------|---------------|
| 100 | 1 | anything | anything | online |
| 101 | 1 | anything | anything | offline |
| 102 | other than 1 | anything | anything | online |
| 103 | 1 | 5 | XXL | online |
The main goal was to get an exact supply value by simply doing a join by doing something like:
SELECT supply
FROM purchases p
JOIN supply_conversion_chart scc ON
p.customer_id = scc.customer_id AND
p.product_id = scc.product_id AND
p.size = scc.size AND
p.purchase_type = scc.purchase_type;
Let's say that these are the records on purchases table:
| customer_id | product_id | size | purchase_type |
|-------------|------------|------|---------------|
| 1 | 3 | M | online |
| 1 | 5 | S | offline |
| 12345 | 4 | XL | online |
| 1 | 5 | XXL | online |
| 4353 | null | M | online |
I would expect first record's supply value to be 101, second record's to be 102, third 102, fourth 103, and fifth to be 102. However, as far as I know, SQL won't be able to do a proper join on all of these records except the fourth one, which is fully matching with supply 103 on supply_conversion_chart table. I don't know if it is possible in the first place to do a join using multiple fields when some of those fields are not fully matching.
My approach is probably faulty and there are better ways to get the results I am trying to achieve but I don't even know where to start. What should I do?
The original chart is much bigger that the provided example, and that I will be doing a join on 8 different fields.
You approach is a lateral join:
select p.*, scc.*
from purchases p left join lateral
(select scc.*
from supply_conversion_chart scc
where (scc.customer_id = p.customer_id or scc.customer_id is null) and
(scc.product_id = p.product_id or scc. product_id is null) and
(scc.size = p.size or scc.size is null) and
(scc.purchase_type = p.purchase_type or scc.purchase_type is null)
order by ( (scc.customer_id = p.customer_id)::int +
(scc.product_id = p.product_id)::int
(scc.size = p.size)::int
(scc.purchase_type = p.purchase_type)::int
) desc
limit 1
) scc;
Note: This represents "everything" as NULL. It doesn't have special logic for "customer other than 1". However, it does show you how to implement basically what you are trying to do.

List and Count items with a JOIN with SQL

I'm trying to create a basic rapport from these 2 tables:
Table Products
|--------|----------------|----------|
| PRO_Id | PRO_CategoryId | PRO_Name |
|--------|----------------|----------|
| 1 | 98 | Banana |
| 2 | 98 | Apple |
|--------|----------------|----------|
Table Categories
|--------|----------|
| CAT_Id | CAT_Name |
|--------|----------|
| 98 | Fruits |
| 99 | Other |
|--------|----------|
What I needed is this output:
|------------|
| Categories |
|------------|
| Fruits (2) |
|------------|
I would like a report listing all the categories from Categories but only when product from Products has a link (with is the case form Fruits but not for Other).
This is where I am actually:
SELECT CAT_Name, COUNT(PRO_Name IN sum)
FROM Categories
JOIN Products
ON Products.PRO_CategoryId = Categories.CAT_Id as sum
ORDER BY CAT_Name ASC
Anyone to help me with this please ?
Thanks.
You are pretty close. You need to get rid of the garbage in the query and use a group by:
SELECT c.cat_name, COUNT(*)
FROM Categories c JOIN
Products p
ON p.PRO_CategoryId = c.CAT_Id
GROUP BY c.CAT_Name ;
Notes:
SELECT * is not appropriate for an aggregation query. What you want to select is.
This puts the count in a separate column which seems to be your intention, despite the sample results.
COUNT(pro_name in sum) doesn't make sense.
as sum doesn't make sense.

Select rows from a filtered portion of Table A where a column matches a relationship with a column from the row in Table B that matches by ID

I want to get all rows in a table where one column matches a relationship with the value of the column in the row in a different table that has the same value of another column.
Concretely, I have two tables, orders and product_info that I'm accessing through Amazon Redshift
Orders
| ID | Date | Amount | Region |
=====================================
| 1 | 2019/4/1 | $120 | A |
| 1 | 2019/4/4 | $100 | A |
| 2 | 2019/4/2 | $50 | A |
| 3 | 2019/4/6 | $70 | B |
The partition keys of order are region and date.
Product Information
| ID | Release Date | Region |
| ---- | ------------ | ------ |
| 1 | 2019/4/2 | A |
| 2 | 2019/4/3 | A |
| 3 | 2019/4/5 | B |
The primary key of product information is id, and the partition key is region.
I want to get all rows from Orders in region A where the date of the row is greater than the release date value in product information for that ID.
So in this case it should return just one row,
| 1 | 2019/4/4 | $100 | A |
I tried doing
select *
from orders
INNER JOIN product_info ON orders.date>product_info.release_date
AND orders.id=product_info.id
AND orders.region=A
AND product_info.region=A
limit 10
The problem is that this query was absurdly slow (cancelled it after 10 minutes). The tables are extremely large, and I have a feeling it was scanning the entire table without restricting it to region first (in reality I have other filters in addition to region that I want to apply to the list of IDs before I do the inner join, but I've limited it to only region for the sake of simplifying the question).
How can I efficiently write this type of query?
The best way to make an SQL query faster is to exclude rows as soon as possible.
So, rather than putting conditions like orders.region=A in the JOIN statement, you should move them to a WHERE statement. This will eliminate rows before they are joined.
Also, make the JOIN condition as simple as possible so that the database can optimize the comparison.
Try something like this:
SELECT *
FROM orders
INNER JOIN product_info ON orders.id = product_info.id
WHERE orders.region = 'A'
AND product_info.region = 'A'
AND orders.date > product_info.release_date
Any further optimization would require consideration of the DISTKEY and SORTKEY on the Redshift tables. (Preferably a DISTKEY of id and a SORTKEY of date).

Joining three tables with primary in the middle

I am breaking my head on joining of three tables. I have recreated a simple test case where I see the same problem, so it looks I make a fundamental mistake in my join query:
I have three tables:
case:
id (PK)| date_closed
155 | '2018-04-17 10:08'
156 | '2018-03-17 10:08'
pizza | '2018-02-17 10:08'
registration:
id (FK) | source | quantity
155 | market | 300
155 | sawdust| 200
bagged:
id | case_id (FK) | kg_bagged
X | 155 | 123
Y | 155 | 90
These tables I want to join to compare the total amounts per 'case' in quantity column and kg_bagged. So the case table has a 1:* many relationship to the other two. Therefore I make a join query like this:
SELECT case.id,
date_closed,
SUM(quantity),
SUM(kg_bagged),
SUM(kg_bagged)/SUM(quantity) AS reduction_factor
FROM case
JOIN bagged ON case.id = bagged.case_id
JOIN registration ON case.id = registration.id
Than I would think this would be a correct query, but Postgres tells me I have to add case.id, date_closed to the group by clause. So I add this:
GROUP BY case.id, date_closed;
This code is running without errors, but it shows 1000 for the quanity at case 155 not the expected 500 (200+300). This behaviour only appears when there is more than 1 record. When joining only 1 table to the case table it also works fine. Can someone see the mistake made at the JOIN query?
I also tried using a subquery for joining two tables and than use a join on the table left, but it gave me similar results
When you joining data 2 rows on 2 other tables it match together, so you get the multiplied result. In your example is 2*2 = 4
For easier understand, in your case when you execute the query
SELECT case.id, date_closed, source, quantity, kg_bagged
FROM case
JOIN registration ON registration.id = case.id
JOIN bagged ON bagged.case_id = case.id
You will get the data like this:
| id | date_closed | source | quantity | kg_bagged |
| :-: | :----------------: | :----: | :------: | :-------: |
| 155 | '2018-04-17 10:08' | market | 300 | 123 |
| 155 | '2018-04-17 10:08' | sawdust| 200 | 123 |
| 155 | '2018-04-17 10:08' | market | 300 | 90 |
| 155 | '2018-04-17 10:08' | sawdust| 200 | 90 |
In this case, as my experience before, I used to write subquery first to get the sum data first then joining it together.
Such as:
WITH r AS (SELECT id, sum(quantity) as quantity FROM registration GROUP BY id),
b as (SELECT case_id, SUM(kg_bagged) as kg_bagged FROM bagged GROUP BY case_id)
SELECT case.id,
date_closed,
quantity,
kg_bagged,
kg_bagged/quantity AS reduction_factor
FROM case
JOIN b ON case.id = b.case_id
JOIN r ON case.id = r.id
Hopefully, this answer will help you.