SQL group related rows in a list - sql

I'm a bit stuck with this...
I have items table:
id | name
1 | item 1
2 | item 2
3 | item 3
4 | item 4
and related items table:
id | item_id | related_item_id
2 | 1 | 2
3 | 1 | 4
so this means that item 1 is related to items 2 and 4.
Now I'm trying to display these in a list where related items follow always the main item they are related to:
item 1
item 2
item 4
item 3
Then I can visually show that these items 2 and 4 are related to item one and draw something like:
item 1
-- item 2
-- item 4
item 3
To be honest, haven't got any ideas myself. I quess I could query for items which are not related to any other item and get a list of "parent items" and then query relations separately in a script loop. This is not definately the sexiest solution...

I am assuming that this question is about ordering the items list, without duplicates. That is, a given item does not have more than one parent (which I ask in a comment).
If so, you can do this with a left outer join and cleverness in the order by.
select coalesce(r.related_item_id, i.id) as item_id
from items i left join
related r
on i.id = r.related_item_id
order by coalesce(r.item_id, i.id),
(r.related_item_id is null) desc;
The left outer join identifies parents because they will not have any rows that match. If so, the coalesce() finds them and uses the item id.

In my opinion , rather than implementing this logic in a query , you should move it to your actual code.
assuming that item_ids are sequential, you can find the largest number of item_id, then in a loop
you can find related_item_id to each item_id and make a convenient data structure out of it.

This functionality comes under the category of hierarchical queries. In Oracle its handled by connect by clause not sure about mysql. But you can search "hierarchical queries mysql" to get the answer.


knowing which item is owned by each customer - spark SQL

lets say I have a table of customers, the contains 2 columns:
preferences array - array of strings, of length 3, might contain nulls. preferences are different for each customer, so one might care about color and the other will not.
as an example:
id | preferences array
1 | {'color:red','shape:triangle','speed:high'}
2 | {'age:14','color:blue',null}
I also have a table of items, with again 2 columns. again, id and preferences array - but this time, the array can be of any length:
id | preferences array
1 | {'color:red','shape:triangle','speed:high','hand:third'}
2 | {'shape:circle'}
an items is matched to a customer if all of the strings in the customer's preferences appear in the item's preferences array. not all the strings in the item's preferences array has to appear on the customers preferences array, though.
i need to create a new table, in which one of the columns is the customer id, and the other is an array of all of the items id that matched to the customer.
customer_id | items
1 | {3,4,7,300,4190..., 6000}
2 | {3,5617}
19,456 | {1551, 1456,3000}
please note that I need a solution that will work even for a lot of items and customers (around 10,000).
how can I do this using SQL (spark sql, specifically)
Hmmm . . . One method is to explode the arrays and join. The following gets the customer/item pairs:
select c.id as customer_id, i.id as item_id
from (customers c lateral view
explode(c.preferences_array) as c_preference
) join
(items i lateral view
explode(i.preferences_array) as i_preference
on c_preference = i.preference
group by c.id, i.id, size(c.preferences_array)
having count(*) = size(c.preferences_array);
You can reaggregate to get the list of items for each customer.
Note: This does not return customers with no preferences. Although they technically meet the requirements of your question, I suspect they don't meet the spirit of what you want to do.

Hibernate criteria left join with query

I have two classes Apartment and AdditionalSpace representing tables as below.
Apartment table
---- ------ ----
1 100 1
2 200 0
AdditionalSpace table
---- ------ -----------
10 10 1
11 10 1
12 10 1
20 20 2
21 20 2
As you can see Apartment's table has a one-to-many relation with AdditionalSpace table, i.e. Apartment.ID=AdditionalSpace.APARTMENTID.
Question:- How to retrieve total area of a sold apartment including its additional space area.
The SQL which I have used so far to retrieve similar result is :-
select sum(apt.area + ads.adsarea) from apartment apt left outer join (select sum(area) as adsarea, apartmentid from additionalspace group by apartmentid) ads on ads.apartmentid=apt.id where apt.sold=1
I am struggling to find a way in order to implement the above scenario via criteria instead of SQL/HQL. Please suggest. Thanks.
I don't think this is possible in criteria. The closest I can see is to simply get the size of the apartment and the sum of the additional areas as two columns in your result, like this:
Criteria criteria = session.createCriteria(Apartment.class,"a");
criteria.createAlias("additionalSpaces", "ads");
Alternatively, if you still want to use Hibernate but are happy to write it in HQL, you can do the following:
select ads.apartment.id,max(a.area)+sum(ads.area)
from Apartment a
join a.additionalSpaces ads
group by ads.apartment.id
This works because HQL allows you to write the + to add together the two projections, but I don't know that an analogous method exists on the projections api.

SQL Two SELECT vs. JOIN best performance?

I wonder which has better performance in this case. First of all, I want to show to the user his medical information. I have two tables
id_user | type_blood | number | ...
1 O 123
2 A+ 442
id_user | name
1 name1
1 name2
I want to return:
JSON {id_user=1, type_blood=0, allergies=(name1,name2)}
So, Its better do a JOIN for user and user_allergies and iterate, or maybe two SELECT?
But if then I have another table like user_allergies, that the result can be:
id_user | name
1 namet1
1 namet2
1 namet3
JSON {id_user=1, type_blood=0, allergies=(name1,name2), table=(namet1,namet2,namet3)}
It's better three SELECT or a JOIN, but then I have to iterate on the results and I can't imagine a esay way. A JOIN can give me a result like:
id_user | type_blood | allergy_name | another_table_name
1 O name1 namet1
1 O name1 namet2
1 O name1 namet3
1 O name2 namet1
1 O name2 namet2
1 O name2 namet3
Is there any way to extract:
id_user | type_blood | allergy_name | another_table_name
1 O name1 namet1
1 O name2 namet2
1 O namet3
Thanks community, I'm newbie in SQL
Depending on the data - there is no way to get the 2nd set of results you've shown, if the 1st set of results shows the values. The 2nd one is throwing data away - in this case allergy 'name2' for another_table_name 'namet3'. This is why you get many rows back with repeated data.
You can use the group by clause to restrict this in some cases, but again - it won't let you throw away data like that.
You could try using the COALESCE clause, if your DB supports it.
If not, I think you're going to have to construct your JSON in some business logic, in which case its fine to read the data in a 3-way join. You order by the user id and either create or append the row data to the JSON document depending if a user record is present or not (if you order by user id, you only need to keep track of when the user id value changes).
Alternatively, you can read a list of users and single-item data in one query, and then ht the DB again for the repeating data.

Semi-hierarchical SQL query with multiple tables and possible outer joins

I have products. Each product is made up of items and assemblies. Assemblies themselves can be made up of items too. So it's a hierarchy but limited in depth. What I would like to do is list products with the items and assemblies it contains, plus any items in the product's assemblies.
This is the output I would like to see. It doesn't have to look exactly like this, but the aim is to show the items in the product, then the assemblies and within each assembly the items with in it. The number of columns isn't fixed, if more are necessary to show the items in the assemblies there is no problem with that.
ProductID ProductName AssemblyID AssemblyName ItemID ItemName
--------- ----------- ---------- ------------ ------ --------
P0001001 Product One
I0045 Item A
I0082 Item B
A00023 Assembly 1
I0320 Item 1
I0900 Item 2
A00024 Assembly 2
I0877 Item 3
I0900 Item 2
I0042 Item 4
This I can then use to build a report grouped on the Product ID to list the contents of each product.
This is the table structure I have at the moment.
+ProductList-+ +ProductItems-+
|ProductID | ----------> |ProductID | +ItemList-+
|ProductName | \ |ItemID | --------------------------------> |ItemID |
|Price | \ +-------------+ > |ItemName |
+------------+ \ / |Cost |
\ +ProductAssemblies-+ / +---------+
\-> |ProductID | +AssemblyItems-+ /
+-- |AssemblyID | ----> |AssemblyID | /
| |BuildTime | |ItemID | ---/
| +------------------+ +--------------+
| +AssemblyList-+
+-> |AssemblyID |
|AssemblyName |
What kind of SELECT statement would I need to do this.
I think I need some sort of outer join but I'm not totally up on SQL syntax to know how to structure the select statement. All my efforts have always led to the product being listed multiple times for each item and assembly. So if a product has 3 items and 2 assemblies, the product appears with 6 times.
Searching for this kind of problem is not easy as I don't know what I need to search on. Is it a three table problem, an outer join issue, or just a simple syntactical answer.
Or would it be better to switch to a pure hierarchical table structure without the use of assemblies? It would then be easier to search on hierarchical tables to solve any problems I might have.
I'm using LibreOffice Base. It has wizards and other helpful things but they don't extend to the complexity of the situation that I find myself in. The aim is that the database contains prices and it can be used to properly price out the products from the cost of the items and time to build the assemblies.
Be gentle, I'm a newbie to SO.
The normal SQL approach to this would put all the data on one line, rather than split among several lines. So, your data would look like:
ProductID ProductName AssemblyID AssemblyName ItemID ItemName
--------- ----------- ---------- ------------ ------ --------
P0001001 Product One I0045 Item A
P0001001 Product One I0045 Item B
P0001001 Product One A00023 Assembly 1 I0320 Item 1
P0001001 Product One A00023 Assembly 1 I0320 Item 2
. . .
The product and assembly information, for instance, would not be blank for a given item. All would be on the same line.
This information comes from two sources, the product items and the assembly items. The following query gets each component, then unions them together, finally ordering the results by product:
select *
from ((select p.Productid, p.ProductName, NULL as AssemblyId, NULL as AssemblyName, il.Itemid, il.ItemName
from Product p join
ProductItems pi
on p.productId = pi.ProductId join
ItemList il
on pi.ItemId = il.ItemId
) union all
(select p.Productid, p.ProductName, al.AssemblyId, al.AssemblyName, il.Itemid, il.ItemName
from Product p join
ProductAssemblies pa
on pa.ProductId = p.ProductId join
AssemblyList al
on pl.AssembyId = al.AssemblyId, join
AssemblyItems ai
on al.AssemblyItems join
ItemList il
on p.ItemId = il.ItemId
) t
order by 1, 2, 3, 4, 5, 6
Often, restructuring into the format you want would be done at the app level. You can do it in SQL, but the best approach depends on the database you are using.

Access join on first record

I have two tables in an Access database, tblProducts and tblProductGroups.
I am trying to run a query that joins both of these tables, and brings back a single record for each product. The problem is that the current design allows for a product to be listed in the tblProductGroups table more than 1 - i.e. a product can be a member of more than one group (i didnt design this!)
The query is this:
select tblProducts.intID, tblProducts.strTitle, tblProductGroups.intGroup
from tblProducts
inner join tblProductGroups on tblProducts.intID = tblProductGroups.intProduct
where tblProductGroups.intGroup = 56
and tblProducts.blnActive
order by tblProducts.intSort asc, tblProducts.curPrice asc
At the moment this returns results such as:
intID | strTitle | intGroup
1 | Product 1 | 1
1 | Product 1 | 2
2 | Product 2 | 1
2 | Product 2 | 2
Whereas I only want the join to be based on the first matching record, so that would return:
intID | strTitle | intGroup
1 | Product 1 | 1
2 | Product 2 | 1
Is this possible in Access?
Thanks in advance
This option runs a subquery to find the minimum intGoup for each tblProducts.intID.
SELECT tblProducts.intID
, tblProducts.strTitle
, (SELECT TOP 1 intGroup
FROM tblProductGroups
WHERE intProduct=tblProducts.intID
ORDER BY intGroup ASC) AS intGroup
FROM tblProducts
WHERE tblProducts.blnActive
ORDER BY tblProducts.intSort ASC, tblProducts.curPrice ASC
This works for me. Maybe this helps someone:
FIRST(a.Regal) AS frstRegal,
FIRST(a.Fachboden) AS frstFachboden,
FIRST(a.xOffset) AS frstxOffset,
FIRST(a.yOffset) AS frstyOffset,
FIRST(a.xSize) AS frstxSize,
FIRST(a.ySize) AS frstySize,
FIRST(a.Platzgr) AS frstyPlatzgr,
FIRST(b.Artikel_ID) AS frstArtikel_ID,
FIRST(b.Menge) AS frstMenge,
FIRST(c.Breite) AS frstBreite,
FIRST(c.Tiefe) AS frstTiefe,
FIRST(a.Fachboden_ID) AS frstFachboden_ID,
FIRST(b.BewegungsDatum) AS frstBewegungsDatum,
FIRST(b.ErzeugungsDatum) AS frstErzeugungsDatum
FROM ((Lagerort AS a)
LEFT JOIN LO_zu_ART AS b ON a.Lagerort_ID = b.Lagerort_ID)
LEFT JOIN Regal AS c ON a.Regal = c.Regal
GROUP BY a.Lagerort_ID
ORDER BY FIRST(a.Regal), FIRST(a.Fachboden), FIRST(a.xOffset), FIRST(a.yOffset);
I have non unique entries for Lagerort_ID on the table LO_zu_ART. My goal was to only use the first found entry from LO_zu_ART to match into Lagerort.
The trick is to use FIRST() an any column but the grouped one. This may also work with MIN() or MAX(), but I have not tested it.
Also make sure to call the Fields with the "AS" statement different than the original field. I used frstFIELDNAME. This is important, otherwise I got errors.
Create a new query, qryFirstGroupPerProduct:
SELECT intProduct, Min(intGroup) AS lowest_group
FROM tblProductGroups
GROUP BY intProduct;
Then JOIN qryFirstGroupPerProduct (instead of tblProductsGroups) to tblProducts.
Or you could do it as a subquery instead of a separate saved query, if you prefer.
It's not very optimal, but if you're bringing in a few thousand records this will work:
Create a query that gets the max of tblProducts.intID from one table and call it qry_Temp.
Create another query and join qry_temp to the table you are trying to join against, and you should get your results.