Semi-hierarchical SQL query with multiple tables and possible outer joins - sql

I have products. Each product is made up of items and assemblies. Assemblies themselves can be made up of items too. So it's a hierarchy but limited in depth. What I would like to do is list products with the items and assemblies it contains, plus any items in the product's assemblies.
This is the output I would like to see. It doesn't have to look exactly like this, but the aim is to show the items in the product, then the assemblies and within each assembly the items with in it. The number of columns isn't fixed, if more are necessary to show the items in the assemblies there is no problem with that.
ProductID ProductName AssemblyID AssemblyName ItemID ItemName
--------- ----------- ---------- ------------ ------ --------
P0001001 Product One
I0045 Item A
I0082 Item B
A00023 Assembly 1
I0320 Item 1
I0900 Item 2
A00024 Assembly 2
I0877 Item 3
I0900 Item 2
I0042 Item 4
This I can then use to build a report grouped on the Product ID to list the contents of each product.
This is the table structure I have at the moment.
+ProductList-+ +ProductItems-+
|ProductID | ----------> |ProductID | +ItemList-+
|ProductName | \ |ItemID | --------------------------------> |ItemID |
|Price | \ +-------------+ > |ItemName |
+------------+ \ / |Cost |
\ +ProductAssemblies-+ / +---------+
\-> |ProductID | +AssemblyItems-+ /
+-- |AssemblyID | ----> |AssemblyID | /
| |BuildTime | |ItemID | ---/
| +------------------+ +--------------+
| +AssemblyList-+
+-> |AssemblyID |
|AssemblyName |
What kind of SELECT statement would I need to do this.
I think I need some sort of outer join but I'm not totally up on SQL syntax to know how to structure the select statement. All my efforts have always led to the product being listed multiple times for each item and assembly. So if a product has 3 items and 2 assemblies, the product appears with 6 times.
Searching for this kind of problem is not easy as I don't know what I need to search on. Is it a three table problem, an outer join issue, or just a simple syntactical answer.
Or would it be better to switch to a pure hierarchical table structure without the use of assemblies? It would then be easier to search on hierarchical tables to solve any problems I might have.
I'm using LibreOffice Base. It has wizards and other helpful things but they don't extend to the complexity of the situation that I find myself in. The aim is that the database contains prices and it can be used to properly price out the products from the cost of the items and time to build the assemblies.
Be gentle, I'm a newbie to SO.

The normal SQL approach to this would put all the data on one line, rather than split among several lines. So, your data would look like:
ProductID ProductName AssemblyID AssemblyName ItemID ItemName
--------- ----------- ---------- ------------ ------ --------
P0001001 Product One I0045 Item A
P0001001 Product One I0045 Item B
P0001001 Product One A00023 Assembly 1 I0320 Item 1
P0001001 Product One A00023 Assembly 1 I0320 Item 2
. . .
The product and assembly information, for instance, would not be blank for a given item. All would be on the same line.
This information comes from two sources, the product items and the assembly items. The following query gets each component, then unions them together, finally ordering the results by product:
select *
from ((select p.Productid, p.ProductName, NULL as AssemblyId, NULL as AssemblyName, il.Itemid, il.ItemName
from Product p join
ProductItems pi
on p.productId = pi.ProductId join
ItemList il
on pi.ItemId = il.ItemId
) union all
(select p.Productid, p.ProductName, al.AssemblyId, al.AssemblyName, il.Itemid, il.ItemName
from Product p join
ProductAssemblies pa
on pa.ProductId = p.ProductId join
AssemblyList al
on pl.AssembyId = al.AssemblyId, join
AssemblyItems ai
on al.AssemblyItems join
ItemList il
on p.ItemId = il.ItemId
) t
order by 1, 2, 3, 4, 5, 6
Often, restructuring into the format you want would be done at the app level. You can do it in SQL, but the best approach depends on the database you are using.


knowing which item is owned by each customer - spark SQL

lets say I have a table of customers, the contains 2 columns:
preferences array - array of strings, of length 3, might contain nulls. preferences are different for each customer, so one might care about color and the other will not.
as an example:
id | preferences array
1 | {'color:red','shape:triangle','speed:high'}
2 | {'age:14','color:blue',null}
I also have a table of items, with again 2 columns. again, id and preferences array - but this time, the array can be of any length:
id | preferences array
1 | {'color:red','shape:triangle','speed:high','hand:third'}
2 | {'shape:circle'}
an items is matched to a customer if all of the strings in the customer's preferences appear in the item's preferences array. not all the strings in the item's preferences array has to appear on the customers preferences array, though.
i need to create a new table, in which one of the columns is the customer id, and the other is an array of all of the items id that matched to the customer.
customer_id | items
1 | {3,4,7,300,4190..., 6000}
2 | {3,5617}
19,456 | {1551, 1456,3000}
please note that I need a solution that will work even for a lot of items and customers (around 10,000).
how can I do this using SQL (spark sql, specifically)
Hmmm . . . One method is to explode the arrays and join. The following gets the customer/item pairs:
select as customer_id, as item_id
from (customers c lateral view
explode(c.preferences_array) as c_preference
) join
(items i lateral view
explode(i.preferences_array) as i_preference
on c_preference = i.preference
group by,, size(c.preferences_array)
having count(*) = size(c.preferences_array);
You can reaggregate to get the list of items for each customer.
Note: This does not return customers with no preferences. Although they technically meet the requirements of your question, I suspect they don't meet the spirit of what you want to do.

How to create two JOIN-tables so that I can compare attributes within?

I take a Database course in which we have listings of AirBnBs and need to be able to do some SQL queries in the Relationship-Model we made from the data, but I struggle with one in particular :
I have two tables that we are interested in, Billing and Amenities. The first one have the id and price of listings, the second have id and wifi (let's say, to simplify, that it equals 1 if there is Wifi, 0 otherwise). Both have other attributes that we don't really care about here.
So the query is, "What is the difference in the average price of listings with and without Wifi ?"
My idea was to build to JOIN-tables, one with listings that have wifi, the other without, and compare them easily :
SELECT avg(B.price - A.price) as averagePrice
SELECT Billing.price,
FROM Billing
INNER JOIN Amenities
ON =
WHERE Amenities.wifi = 0
) A, (
SELECT Billing.price,
FROM Billing
INNER JOIN Amenities
ON =
WHERE Amenities.wifi = 1) B
Obviously this doesn't work... I am pretty sure that there is a far easier solution to it tho, what do I miss ?
(And by the way, is there a way to compute the absolute between the difference of price ?)
I hope that I was clear enough, thank you for your time !
Edit : As mentionned in the comments, forgot to say that, but both tables have idas their primary key, so that there is one row per listing.
Just use conditional aggregation:
SELECT AVG(CASE WHEN a.wifi = 0 THEN b.price END) as avg_no_wifi,
AVG(CASE WHEN a.wifi = 1 THEN b.price END) as avg_wifi
FROM Billing b JOIN
Amenities a
ON =
WHERE a.wifi IN (0, 1);
You can use a - if you want the difference instead of the specific values.
Let's assume we're working with data like the following (problems with your data model are noted below):
| listing_id | price |
| 1 | 1500.00 |
| 2 | 1700.00 |
| 3 | 1800.00 |
| 4 | 1900.00 |
| listing_id | wifi |
| 1 | 1 |
| 2 | 1 |
| 3 | 0 |
Notice that I changed "id" to "listing_id" to make it clear what it was (using "id" as an attribute name is problematic anyways). Also, note that one listing doesn't have an entry in the Amenities table. Depending on your data, that may or may not be a concern (again, refer to the bottom for a discussion of your data model).
Based on this data, your averages should be as follows:
Listings with wifi average $1600 (Listings 1 and 2)
Listings without wifi (just 3) average 1800).
So the difference would be $200.
To achieve this result in SQL, it may be helpful to first get the average cost per amenity (whether wifi is offered). This would be obtained with the following query:
Amenities.wifi AS has_wifi,
AVG(Billing.price) AS avg_cost
FROM Billing
Amenities.listing_id = Billing.listing_id
GROUP BY Amenities.wifi
which gives you the following results:
| has_wifi | avg_cost |
| 0 | 1800.0000000000000000 |
| 1 | 1600.0000000000000000 |
So far so good. So now we need to calculate the difference between these 2 rows. There are a number of different ways to do this, but one is to use a CASE expression to make one of the values negative, and then simply take the SUM of the result (note that I'm using a CTE, but you can also use a sub-query):
avg_by_wifi(has_wifi, avg_cost) AS
SELECT Amenities.wifi, AVG(Billing.price)
FROM Billing
Amenities.listing_id = Billing.listing_id
GROUP BY Amenities.wifi
WHEN has_wifi = 1 THEN avg_cost
ELSE -1 * avg_cost
FROM avg_by_wifi
which gives us the expected value of 200.
Now regarding your data model:
If both your Billing and Amenities table only have 1 row for each listing, it makes sense to combine them into 1 table. For example: Listings(listing_id, price, wifi)
However, this is still problematic, because you probably have a bunch of other amenities you want to model (pool, sauna, etc.) So you might want to model a many-to-many relationship between listings and amenities using an intermediate table:
Listings(listing_id, price)
Amenities(amenity_id, amenity_name)
ListingsAmenities(listing_id, amenity_id)
This way, you could list multiple amenities for a given listing without having to add additional columns. It also becomes easy to store additional information about an amenity: What's the wifi password? How deep is the pool? etc.
Of course, using this model makes your original query (difference in average cost of listings by wifi) a bit tricker, but definitely still doable.

SQL treat two entries as one

I have a table with stock codes and quantity sold, but I would like to treat 2 different stock codes as one, the reason being is that one is imported and the other one locally produced but are the same product,
lets say
Product A - Imported, Stock code is abc123
Product A - Local, Stock code is aimp563
I want to sum over the quantity sold but treat the same product with and an imported stock code and local stock code as one. Is this possible?
Okay this is what I have
tbe table looks like
Product | StockCode | QtySold
Product A - Local | prdA001loc | 100
Product A - Imported | prdAImp7Z4 | 150
SELECT Product, SUM(QtySold) FROM tblA GROUP BY StockCode, Product
But this will just return the table as is. I would like this output:
Product | QtySold
Product A | 250
I believe that you need to update your DB schema to have reflect this information however if you need some naive solution you can use the following statement
SELECT substring(product, 1 , charindex('-',product)), SUM(QtySold)
FROM tblA GROUP BY substring(product, 1 , charindex('-',product))
note that the above statement assuming that all your products name will be similar to what is mentioned inside your question

SQL group related rows in a list

I'm a bit stuck with this...
I have items table:
id | name
1 | item 1
2 | item 2
3 | item 3
4 | item 4
and related items table:
id | item_id | related_item_id
2 | 1 | 2
3 | 1 | 4
so this means that item 1 is related to items 2 and 4.
Now I'm trying to display these in a list where related items follow always the main item they are related to:
item 1
item 2
item 4
item 3
Then I can visually show that these items 2 and 4 are related to item one and draw something like:
item 1
-- item 2
-- item 4
item 3
To be honest, haven't got any ideas myself. I quess I could query for items which are not related to any other item and get a list of "parent items" and then query relations separately in a script loop. This is not definately the sexiest solution...
I am assuming that this question is about ordering the items list, without duplicates. That is, a given item does not have more than one parent (which I ask in a comment).
If so, you can do this with a left outer join and cleverness in the order by.
select coalesce(r.related_item_id, as item_id
from items i left join
related r
on = r.related_item_id
order by coalesce(r.item_id,,
(r.related_item_id is null) desc;
The left outer join identifies parents because they will not have any rows that match. If so, the coalesce() finds them and uses the item id.
In my opinion , rather than implementing this logic in a query , you should move it to your actual code.
assuming that item_ids are sequential, you can find the largest number of item_id, then in a loop
you can find related_item_id to each item_id and make a convenient data structure out of it.
This functionality comes under the category of hierarchical queries. In Oracle its handled by connect by clause not sure about mysql. But you can search "hierarchical queries mysql" to get the answer.

SQL - calculate consumption of materials based on recipe

I'm not sure how to ask my question, so I'll explain what I'm trying to do: I'm building an app in Delphi XE, which should calculate the consumption of raw materials, based on products recipes and orders.
I have 5 tables: Orders, OrdersContent, Products, Raw Materials and Recipes. Each order is composed of a few products, and each product has it's own recipe of raw materials.
I already summed up all products from all orders using sql in Query1.
This is the command for Query1:
select Products.Price,
OrdersContent.ID_Product, sum(OrdersContent.QNT) as QNT_Sum,
(Products.Price * sum(OrdersContent.QNT)) as Value
from Orders, OrdersContent, Products
where Orders.ID = OrdersContent.ID_Order
and Products.ID = OrdersContent.ID_Product
group by
OrdersContent.ID_Product, Products.Price
This returns:
|Price | ID_Product | QNT_Sum | Value |
| 2 | 122521 | 150 | 300 |
| 10 | 366547 | 10 | 100 |
| xxx | xxxxxx | xxx | xxxxx|
It's exactly what I want.
So now I'm wondering if there's a way to calculate the raw materials consumption also using sql, as the only other way I know how to do this is to iterate through the whole Query1 and calculate raw materials consumption for each record(product) individually, add it to a new table and then sum up the results, which is very time consuming.
I'm pretty sure there must be a more efficient way to do this, but have no clue as to how or where to search how to do it. I'm not asking for the code, but some pointers or links to tutorials or examples.
I hope I'm clear enough, if not please do ask for more info.
Add Raw Materials and Recipe to your FROM clause with the appropriate joins. Group by raw materials. Remove id_product and price from your group by statement. Change the aggregation in your select to sum(products.price*orderscontent.qnt).
I'm guessing at your column names in Recipes and Raw Materials but here's the general idea.
select Recipes.ID_RAW_MATERIALS,
sum(OrdersContent.QNT) as QNT_Sum,
sum(Products.Price * OrdersContent.QN)) as Value
from Orders, OrdersContent, Products, Recipes, RawMaterials
where Orders.ID = OrdersContent.ID_Order
and Products.ID = OrdersContent.ID_Product
and Recipes.ID_PRODUCT = Products.ID
AND Recipes.ID_RAW_MATERIAL = Rawmaterials.ID
group by Recipes.ID_RAW_MATERIALS