I have two tables:
Product
------------------------------------
id group_id name quick_select
------------------------------------
1 1 product1 1
2 3 product2 0
3 5 product3 1
Product_group
-----------------------
id name parent_id
-----------------------
1 group1 0
2 group2 0
3 group3 1
4 group4 1
5 group5 3
I making a navigation system for quick select products. I show categories to user and user can navigate in them by clicking category button, then goes level down to subcategory and goes down so many levels that finally it can't go deeper - then I show products. First I show root categories where there's products under them and under those root categories subcategories, subsubcategories and so on.
In my query I want select all root categories (where parent_id=0), if there's products in them and in their subcategories and subsubcategories and so on, where quick_select must be 1 in product table. And I don't know the deepness of categories - how many levels there are.
Is it possible with one query? Or do I need to do two queries?
I made so far, with this query:
SELECT pg.id, pg.name, pg.parent_id AS parent_id
FROM product_group AS pg
LEFT JOIN product AS p ON pg.id = p.group_id
WHERE pg.parent_id = 0 AND p.id IS NOT NULL AND p.quick_select = 1
GROUP BY pg.id
But I don't receive root categories which subcategory is empty, which subcategory is empty and under this is one more subcategory with products with quick_select=1.
Sorry for my bad english.
I want to receive all categories where products with quick_select=1 are, not products
-- Category
| |
| product
|
-- Category
|
Category
|
Category
|
multiple products
The bad news is that you can't do this in SQLite, at least with this data structure, since SQLite doesn't support recursive SQL or window functions.
If select performance is important, you can try to organize the data like this:
http://articles.sitepoint.com/article/hierarchical-data-database/2
Another option is to add the root id to each row at input time.
Basically, at some point you will have to use multiple selects and determine the root id at the application level.
Update:
Ok, this is very much pseudo-code, but it should get you there.
You need a language that has some sort of hashmap or named array datatype.
hashmap results, parent, nodes, nodes_new; # variables
foreach (res in sql_execute("SELECT id, parent_id FROM product_group;") ) {
parent[res.id] = res.parent_id;
}
# get groups with products
foreach (res in sql_execute("SELECT pg.id FROM product_group AS pg INNER JOIN
product AS p ON pg.id = p.group_id
WHERE p.quick_select = 1 GROUP BY pg.id ") ) {
nodes[res.id] = res.id;
}
while (length(nodes) > 0) {
foreach (i in nodes) {
if (i = 0) { results[i] = i; } # if its a root node, add to results
else { nodes_new[parent[i]] = parent[i]; } # otherwise, add parent to the next round
}
nodes = nodes_new; # prepare for next round
}
print results;
Related
lets say I have a table of customers, the contains 2 columns:
id
preferences array - array of strings, of length 3, might contain nulls. preferences are different for each customer, so one might care about color and the other will not.
as an example:
id | preferences array
|
-------------------------------------------
1 | {'color:red','shape:triangle','speed:high'}
2 | {'age:14','color:blue',null}
I also have a table of items, with again 2 columns. again, id and preferences array - but this time, the array can be of any length:
id | preferences array
----------------------------------
|
1 | {'color:red','shape:triangle','speed:high','hand:third'}
|
2 | {'shape:circle'}
an items is matched to a customer if all of the strings in the customer's preferences appear in the item's preferences array. not all the strings in the item's preferences array has to appear on the customers preferences array, though.
i need to create a new table, in which one of the columns is the customer id, and the other is an array of all of the items id that matched to the customer.
customer_id | items
----------------------------------
|
1 | {3,4,7,300,4190..., 6000}
|
2 | {3,5617}
.
.
.
19,456 | {1551, 1456,3000}
please note that I need a solution that will work even for a lot of items and customers (around 10,000).
how can I do this using SQL (spark sql, specifically)
Hmmm . . . One method is to explode the arrays and join. The following gets the customer/item pairs:
select c.id as customer_id, i.id as item_id
from (customers c lateral view
explode(c.preferences_array) as c_preference
) join
(items i lateral view
explode(i.preferences_array) as i_preference
)
on c_preference = i.preference
group by c.id, i.id, size(c.preferences_array)
having count(*) = size(c.preferences_array);
You can reaggregate to get the list of items for each customer.
Note: This does not return customers with no preferences. Although they technically meet the requirements of your question, I suspect they don't meet the spirit of what you want to do.
I have a table stories and a table blockings which has the columns story_id (referencing a story), and a blocked_story_id (also referencing a story, which is blocked by the story_id)
I'm trying to construct a query to return all the stories in order of precedence based on their blockers - so blockers first, traversing down the tree.
One story can be blocked by many stories, and can itself be a blocker for many stories.
I've been reading and re-reading the PostgreSQL docs on WITH RECURSIVE but I'm a little lost on where I should be going with this, and how to construct the relevant query.
I have got as far as:
select s.id, b.story_id as blocker_id
from stories s
left outer join blockings b on s.id = b.blocked_story_id
where s.deleted_at is null
as for getting a list of stories and their blockers, but some pointers as to what I need to join/union to get the desired result would be helpful.
Context
I want to know which stories I can work on first. So I want an output that contains all stories in an order that allows me to work top down and never hit a blocked story.
The content of the blockings table gives me a simple join table between stories that block one another. The story_id being the blocker, the blocked_story_id being the one being blocked.
Sample Data
Stories
id | title
------------------
1 | Story title 1
2 | Story title 2
3 | Story title 3
4 | Story title 4
5 | Story title 5
Blockings
story_id | blocked_story_id
---------------------------
4 | 2
4 | 3
3 | 1
3 | 5
I would expect to see the following result:
id | title
------------------
4 | Story title 4
2 | Story title 2
3 | Story title 3
1 | Story title 1
5 | Story title 5
Disclaimer: Because it is not clear to me why you need a recursion for finding the blocked stories (Which can be achieved easily by SELECT blocked_story_id FROM blocking) I would ask you for further information. A real recursion case could be: "All blocking that are reachable from story 4" or something like that.
Here's what I've done so far as I understood your problem:
Your blocking table says: story 4 blocks stories 2 and 3. Story 3 blocks stories 1 and 5. So there are blocked stories 1, 2, 3, 5. Because of the recursion, story 4 can block 1 and 5 via 3. So there a two ways of blocking them (directly with starting point 3 and and from starting point 4 via 3). I gave out all possible paths with this query:
WITH RECURSIVE blocks AS (
SELECT blocked_story_id, ARRAY[story_id]::int[] as path FROM blockings
UNION
SELECT bk.blocked_story_id, b.path || bk.story_id
FROM blockings bk INNER JOIN blocks b ON b.blocked_story_id = bk.story_id
)
SELECT b.blocked_story_id, s.title, b.path
FROM blocks b INNER JOIN stories s ON s.id = b.blocked_story_id;
Result:
blocked_story_id title path
2 Title 2 {4}
3 Title 3 {4}
1 Title 1 {3}
5 Title 5 {3}
1 Title 1 {4,3}
5 Title 5 {4,3}
demo: db<>fiddle
#S-Man I figured it out thanks to your help pointing me in the right direction.
WITH recursive blockings_tree(id, title, path) AS (
SELECT stories.id, title, ARRAY[blockings.blocked_story_id, blockings.story_id]
FROM stories
LEFT OUTER JOIN blockings ON blockings.story_id = stories.id
UNION ALL
SELECT stories.id, stories.title, path || stories.id
FROM blockings_tree
JOIN blockings ON blockings.story_id = blockings_tree.id
JOIN stories ON blockings.blocked_story_id = stories.id
WHERE NOT blockings.blocked_story_id = any(path)
)
SELECT stories.*
FROM stories
JOIN (SELECT id, MAX(path) AS path FROM blockings_tree GROUP BY id) bt ON bt.id = stories.id
ORDER BY path
Greetings Benevolent Gods of Stackoverflow,
I am presently struggling to get a spatially enabled query to work for a SQL assignment I am working on. The wording is as follows:
SELECT PURCHASES.TotalPrice, STORES.GeoLocation, STORES.StoreName
FROM MuffinShop
join (SELECT SUM(PURCHASES.TotalPrice) AS StoreProfit, STORES.StoreName
FROM PURCHASES INNER JOIN STORES ON PURCHASES.StoreID = STORES.StoreID
GROUP BY STORES.StoreName
HAVING (SUM(PURCHASES.TotalPrice) > 600))
What I am trying to do with this query is perform a function query (like avg, sum etc) and get the spatial information back as well. Another example of this would be:
SELECT STORES.StoreName, AVG(REVIEWS.Rating),Stores.Shape
FROM REVIEWS CROSS JOIN
STORES
GROUP BY STORES.StoreName;
This returns a Column 'STORES.Shape' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. error message.
I know I require a sub query to perform this task, I am just having endless trouble getting it to work. Any help at all would be wildly appreciated.
There are two parts to this question, I would tackle the first problem with the following logic:
List all the store names and their respective geolocations
Get the profit for each store
With that in mind, you need to use the STORES table as your base, then bolt the profit onto it through a sub query or an apply:
SELECT s.StoreName
,s.GeoLocation
,p.StoreProfit
FROM STORES s
INNER JOIN (
SELECT pu.StoreId
,StoreProfit = SUM(pu.TotalPrice)
FROM PURCHASES pu
GROUP BY pu.StoreID
) p
ON p.StoreID = s.StoreID;
This one is a little more efficient:
SELECT s.StoreName
,s.GeoLocation
,profit.StoreProfit
FROM STORES s
CROSS APPLY (
SELECT StoreProfit = SUM(p.TotalPrice)
FROM PURCHASES p
WHERE p.StoreID = s.StoreID
GROUP BY p.StoreID
) profit;
Now for the second part, the error that you are receiving tells you that you need to GROUP BY all columns in your select statement with the exception of your aggregate function(s).
In your second example, you are asking SQL to take an average rating for each store based on an ID, but you are also trying to return another column without including that inside the grouping. I will try to show you what you are asking SQL to do and where the issue lies with the following examples:
-- Data
Id | Rating | Shape
1 | 1 | Triangle
1 | 4 | Triangle
1 | 1 | Square
2 | 1 | Triangle
2 | 5 | Triangle
2 | 3 | Square
SQL Server, please give me the average rating for each store:
SELECT Id, AVG(Rating)
FROM Store
GROUP BY StoreId;
-- Result
Id | Avg(Rating)
1 | 2
2 | 3
SQL Server, please give me the average rating for each store and show its shape in the result (but don't group by it):
SELECT Id, AVG(Rating), Shape
FROM Store
GROUP BY StoreId;
-- Result
Id | Avg(Rating) | Shape
1 | 2 | Do I show Triangle or Square ...... ERROR!!!!
2 | 3 |
It needs to be told to get the average for each store and shape:
SELECT Id, AVG(Rating), Shape
FROM Store
GROUP BY StoreId, Shape;
-- Result
Id | Avg(Rating) | Shape
1 | 2.5 | Triangle
1 | 1 | Square
2 | 3 | Triangle
2 | 3 | Square
As in any spatial query you need an idea of what your final geometry will be. It looks like you are attempting to group by individual stores but delivering an average rating from the subquery. So if I'm reading it right you are just looking to get the stores shape info associated with the average ratings?
Query the stores table for the shape field and join the query you use to get the average rating
select a.shape
b.*
from stores a inner join (your Average rating query with group by here) b
on a.StoreID = b.Storeid
I'm a bit stuck with this...
I have items table:
id | name
1 | item 1
2 | item 2
3 | item 3
4 | item 4
and related items table:
id | item_id | related_item_id
2 | 1 | 2
3 | 1 | 4
so this means that item 1 is related to items 2 and 4.
Now I'm trying to display these in a list where related items follow always the main item they are related to:
item 1
item 2
item 4
item 3
Then I can visually show that these items 2 and 4 are related to item one and draw something like:
item 1
-- item 2
-- item 4
item 3
To be honest, haven't got any ideas myself. I quess I could query for items which are not related to any other item and get a list of "parent items" and then query relations separately in a script loop. This is not definately the sexiest solution...
I am assuming that this question is about ordering the items list, without duplicates. That is, a given item does not have more than one parent (which I ask in a comment).
If so, you can do this with a left outer join and cleverness in the order by.
select coalesce(r.related_item_id, i.id) as item_id
from items i left join
related r
on i.id = r.related_item_id
order by coalesce(r.item_id, i.id),
(r.related_item_id is null) desc;
The left outer join identifies parents because they will not have any rows that match. If so, the coalesce() finds them and uses the item id.
In my opinion , rather than implementing this logic in a query , you should move it to your actual code.
assuming that item_ids are sequential, you can find the largest number of item_id, then in a loop
you can find related_item_id to each item_id and make a convenient data structure out of it.
This functionality comes under the category of hierarchical queries. In Oracle its handled by connect by clause not sure about mysql. But you can search "hierarchical queries mysql" to get the answer.
I've been trying to figure out how to query a shopping cart database to find all Orders that contain tangible items (items can be downloadable, therefore, not shipped) that have not been assigned a UPS tracking label. I haven't been able to do it.
The involved tables are as follows:
// dbo.Inventory - details about the individual product being sold
- ProductID int primary - Name nvarchar - IsDownloadable bit -
| 5 | Awesome Shirt | 0 |
| 7 | An Audio Track | 1 |
// dbo.ShoppingCart --("ShopID" groups the items in the cart)
- CartID int primary - ProductID int - ShopID char (guid) - Quantity int -
| 2 | 5 | e854a982c9264a72 | 4 |
| 3 | 7 | e854a982c9264a72 | 1 |
// dbo.Orders - Order information (shipping address, etc)
- OrderID int primary - ShopID char(x) - BillingInfoColumns -
| 13 | e854a982c9264a72 | Name,Address,etc |
// dbo.Tracking - Shipments' (note: a shipment can contain several items) tracking numbers
- TrackingID int primary - OrderID int - TrackingNumber char(x) -
| 5 | 13 | Ze5Whatever... |
// dbo.ShippedItems - Maps a ShoppingCart's shipped items to tracking numbers
- ShippingID int primary - TrackingID int - CartID int - QuantityInShipment int
| 6 | 5 | 2 | 3 |
Hopefully the above provides an reasonable approximation of how the DB is designed.
So, to clarify what I think I need:
SELECT all OrderIDs that have NOT had ALL their tangible items Shipped.
Non-tangible items are IsDownloadable = 1
Must take into account the ShoppingCart.Quantity column. If we order 4 t-shirts we may put them in one box (with one UPS tracking label). Then again, we may put 2 per box. Or we may put one pair of jeans with one shirt in one same box (again, with one tracking label)...etc.
I have been concocting crap with endless JOINs and nested WHERE NOT IN (SELECT * FROM)s to no avail. Sadly, I can't seem to wrap my head around it...I'm still waiting for my eureka moment.
I'm relatively new to SQL and database design so any information or (constructive) criticism will be greatly appreciated. Feel free to poke holes in the design of the database itself if you think that will help. :-)
// I wish I could run this on my brain right now...
// (Neurons, apparently, are "excitable")
UPDATE Brain SET Neuron = 'Excited' WHERE Cortex = 'SQL'
UPDATE
Here is what I came up with thanks to Benoit Vidis. This is the actual query I'm using on my real tables/data:
SELECT
d.OrderID
FROM
Person.ShoppingCart c
JOIN
Inventory.Item i
ON
i.ItemID = c.ItemID
JOIN
Orders.Details d
ON
d.ShopID = c.ShopID
LEFT JOIN
Orders.Shipping s
ON
d.OrderID = s.OrderID
LEFT JOIN
Orders.ShippedItems si
ON
s.ShippingID = si.ShippingID
WHERE
i.DownloadableMedia = 0 AND
d.Billed = 1 AND
d.Ordered = 1
GROUP BY
d.OrderID
HAVING
SUM(c.Quantity) > CASE WHEN SUM(si.Quantity) IS NULL THEN 0 ELSE SUM(si.Quantity) END
You might be able to do it using the HAVING clause. In MySQL, it would give something like:
SELECT
c.OrderID,
SUM(c.Quantity) AS tangible_products_number,
SUM(s.QuantityInShipment) as shipped_items_number
FROM
(
Inventory i,
ShoppingCart c
)
LEFT JOIN
ShippedItems s
ON
c.OrderID = s.OrderID
WHERE
i.ProductID = c.ItemID AND
i.IsDOwnloadable = 0 AND
c.OrderID = t.OrderID AND
s.CartID = c.ID
GROUP BY
c.OrderID
HAVING
SUM(c.Quantity) > SUM(s.QuantityInShipment)
The group by syntax will probably need to be adapted for SQL-Server
Can you query dbo.Tracking where TrackingNumber is Null? Would that give you the required information?