I take a Database course in which we have listings of AirBnBs and need to be able to do some SQL queries in the Relationship-Model we made from the data, but I struggle with one in particular :
I have two tables that we are interested in, Billing and Amenities. The first one have the id and price of listings, the second have id and wifi (let's say, to simplify, that it equals 1 if there is Wifi, 0 otherwise). Both have other attributes that we don't really care about here.
So the query is, "What is the difference in the average price of listings with and without Wifi ?"
My idea was to build to JOIN-tables, one with listings that have wifi, the other without, and compare them easily :
SELECT avg(B.price - A.price) as averagePrice
FROM (
SELECT Billing.price, Billing.id
FROM Billing
INNER JOIN Amenities
ON Billing.id = Amenities.id
WHERE Amenities.wifi = 0
) A, (
SELECT Billing.price, Billing.id
FROM Billing
INNER JOIN Amenities
ON Billing.id = Amenities.id
WHERE Amenities.wifi = 1) B
WHERE A.id = B.id;
Obviously this doesn't work... I am pretty sure that there is a far easier solution to it tho, what do I miss ?
(And by the way, is there a way to compute the absolute between the difference of price ?)
I hope that I was clear enough, thank you for your time !
Edit : As mentionned in the comments, forgot to say that, but both tables have idas their primary key, so that there is one row per listing.
Just use conditional aggregation:
SELECT AVG(CASE WHEN a.wifi = 0 THEN b.price END) as avg_no_wifi,
AVG(CASE WHEN a.wifi = 1 THEN b.price END) as avg_wifi
FROM Billing b JOIN
Amenities a
ON b.id = a.id
WHERE a.wifi IN (0, 1);
You can use a - if you want the difference instead of the specific values.
Let's assume we're working with data like the following (problems with your data model are noted below):
Billing
+------------+---------+
| listing_id | price |
+------------+---------+
| 1 | 1500.00 |
| 2 | 1700.00 |
| 3 | 1800.00 |
| 4 | 1900.00 |
+------------+---------+
Amenities
+------------+------+
| listing_id | wifi |
+------------+------+
| 1 | 1 |
| 2 | 1 |
| 3 | 0 |
+------------+------+
Notice that I changed "id" to "listing_id" to make it clear what it was (using "id" as an attribute name is problematic anyways). Also, note that one listing doesn't have an entry in the Amenities table. Depending on your data, that may or may not be a concern (again, refer to the bottom for a discussion of your data model).
Based on this data, your averages should be as follows:
Listings with wifi average $1600 (Listings 1 and 2)
Listings without wifi (just 3) average 1800).
So the difference would be $200.
To achieve this result in SQL, it may be helpful to first get the average cost per amenity (whether wifi is offered). This would be obtained with the following query:
SELECT
Amenities.wifi AS has_wifi,
AVG(Billing.price) AS avg_cost
FROM Billing
INNER JOIN Amenities ON
Amenities.listing_id = Billing.listing_id
GROUP BY Amenities.wifi
which gives you the following results:
+----------+-----------------------+
| has_wifi | avg_cost |
+----------+-----------------------+
| 0 | 1800.0000000000000000 |
| 1 | 1600.0000000000000000 |
+----------+-----------------------+
So far so good. So now we need to calculate the difference between these 2 rows. There are a number of different ways to do this, but one is to use a CASE expression to make one of the values negative, and then simply take the SUM of the result (note that I'm using a CTE, but you can also use a sub-query):
WITH
avg_by_wifi(has_wifi, avg_cost) AS
(
SELECT Amenities.wifi, AVG(Billing.price)
FROM Billing
INNER JOIN Amenities ON
Amenities.listing_id = Billing.listing_id
GROUP BY Amenities.wifi
)
SELECT
ABS(SUM
(
CASE
WHEN has_wifi = 1 THEN avg_cost
ELSE -1 * avg_cost
END
))
FROM avg_by_wifi
which gives us the expected value of 200.
Now regarding your data model:
If both your Billing and Amenities table only have 1 row for each listing, it makes sense to combine them into 1 table. For example: Listings(listing_id, price, wifi)
However, this is still problematic, because you probably have a bunch of other amenities you want to model (pool, sauna, etc.) So you might want to model a many-to-many relationship between listings and amenities using an intermediate table:
Listings(listing_id, price)
Amenities(amenity_id, amenity_name)
ListingsAmenities(listing_id, amenity_id)
This way, you could list multiple amenities for a given listing without having to add additional columns. It also becomes easy to store additional information about an amenity: What's the wifi password? How deep is the pool? etc.
Of course, using this model makes your original query (difference in average cost of listings by wifi) a bit tricker, but definitely still doable.
Greetings Benevolent Gods of Stackoverflow,
I am presently struggling to get a spatially enabled query to work for a SQL assignment I am working on. The wording is as follows:
SELECT PURCHASES.TotalPrice, STORES.GeoLocation, STORES.StoreName
FROM MuffinShop
join (SELECT SUM(PURCHASES.TotalPrice) AS StoreProfit, STORES.StoreName
FROM PURCHASES INNER JOIN STORES ON PURCHASES.StoreID = STORES.StoreID
GROUP BY STORES.StoreName
HAVING (SUM(PURCHASES.TotalPrice) > 600))
What I am trying to do with this query is perform a function query (like avg, sum etc) and get the spatial information back as well. Another example of this would be:
SELECT STORES.StoreName, AVG(REVIEWS.Rating),Stores.Shape
FROM REVIEWS CROSS JOIN
STORES
GROUP BY STORES.StoreName;
This returns a Column 'STORES.Shape' is invalid in the select list because it is not contained in either an aggregate function or the GROUP BY clause. error message.
I know I require a sub query to perform this task, I am just having endless trouble getting it to work. Any help at all would be wildly appreciated.
There are two parts to this question, I would tackle the first problem with the following logic:
List all the store names and their respective geolocations
Get the profit for each store
With that in mind, you need to use the STORES table as your base, then bolt the profit onto it through a sub query or an apply:
SELECT s.StoreName
,s.GeoLocation
,p.StoreProfit
FROM STORES s
INNER JOIN (
SELECT pu.StoreId
,StoreProfit = SUM(pu.TotalPrice)
FROM PURCHASES pu
GROUP BY pu.StoreID
) p
ON p.StoreID = s.StoreID;
This one is a little more efficient:
SELECT s.StoreName
,s.GeoLocation
,profit.StoreProfit
FROM STORES s
CROSS APPLY (
SELECT StoreProfit = SUM(p.TotalPrice)
FROM PURCHASES p
WHERE p.StoreID = s.StoreID
GROUP BY p.StoreID
) profit;
Now for the second part, the error that you are receiving tells you that you need to GROUP BY all columns in your select statement with the exception of your aggregate function(s).
In your second example, you are asking SQL to take an average rating for each store based on an ID, but you are also trying to return another column without including that inside the grouping. I will try to show you what you are asking SQL to do and where the issue lies with the following examples:
-- Data
Id | Rating | Shape
1 | 1 | Triangle
1 | 4 | Triangle
1 | 1 | Square
2 | 1 | Triangle
2 | 5 | Triangle
2 | 3 | Square
SQL Server, please give me the average rating for each store:
SELECT Id, AVG(Rating)
FROM Store
GROUP BY StoreId;
-- Result
Id | Avg(Rating)
1 | 2
2 | 3
SQL Server, please give me the average rating for each store and show its shape in the result (but don't group by it):
SELECT Id, AVG(Rating), Shape
FROM Store
GROUP BY StoreId;
-- Result
Id | Avg(Rating) | Shape
1 | 2 | Do I show Triangle or Square ...... ERROR!!!!
2 | 3 |
It needs to be told to get the average for each store and shape:
SELECT Id, AVG(Rating), Shape
FROM Store
GROUP BY StoreId, Shape;
-- Result
Id | Avg(Rating) | Shape
1 | 2.5 | Triangle
1 | 1 | Square
2 | 3 | Triangle
2 | 3 | Square
As in any spatial query you need an idea of what your final geometry will be. It looks like you are attempting to group by individual stores but delivering an average rating from the subquery. So if I'm reading it right you are just looking to get the stores shape info associated with the average ratings?
Query the stores table for the shape field and join the query you use to get the average rating
select a.shape
b.*
from stores a inner join (your Average rating query with group by here) b
on a.StoreID = b.Storeid
I need your help about the visit of a directed graph stored in a database.
Consider the following directed graph
1->2
2->1,3
3->1
A table stores those relations:
create database test;
\c test;
create table ownership (
parent bigint,
child bigint,
primary key (parent, child)
);
insert into ownership (parent, child) values (1, 2);
insert into ownership (parent, child) values (2, 1);
insert into ownership (parent, child) values (2, 3);
insert into ownership (parent, child) values (3, 1);
I'd like to extract all the semi-connected edges (i.e. the connected edges ignoring the direction) of the graph reachable from a node. I.e., if I start from parent=1, I'd like to have the following output
1,2
2,1
2,3
3,1
I'm using postgresql.
I've modified the example on Postgres' manual which explains recursive queries, and I've adapted the join condition to go "up" and "down" (doing so I ignore the directions). My query is the following one:
\c test
WITH RECURSIVE graph(parent, child, path, depth, cycle) AS (
SELECT o.parent, o.child, ARRAY[ROW(o.parent, o.child)], 0, false
from ownership o
where o.parent = 1
UNION ALL
SELECT
o.parent, o.child,
path||ROW(o.parent, o.child),
depth+1,
ROW(o.parent, o.child) = ANY(path)
from
ownership o, graph g
where
(g.parent = o.child or g.child = o.parent)
and not cycle
)
select g.parent, g.child, g.path, g.cycle
from
graph g
its output follows:
parent | child | path | cycle
--------+-------+-----------------------------------+-------
1 | 2 | {"(1,2)"} | f
2 | 1 | {"(1,2)","(2,1)"} | f
2 | 3 | {"(1,2)","(2,3)"} | f
3 | 1 | {"(1,2)","(3,1)"} | f
1 | 2 | {"(1,2)","(2,1)","(1,2)"} | t
1 | 2 | {"(1,2)","(2,3)","(1,2)"} | t
3 | 1 | {"(1,2)","(2,3)","(3,1)"} | f
1 | 2 | {"(1,2)","(3,1)","(1,2)"} | t
2 | 3 | {"(1,2)","(3,1)","(2,3)"} | f
1 | 2 | {"(1,2)","(2,3)","(3,1)","(1,2)"} | t
2 | 3 | {"(1,2)","(2,3)","(3,1)","(2,3)"} | t
1 | 2 | {"(1,2)","(3,1)","(2,3)","(1,2)"} | t
3 | 1 | {"(1,2)","(3,1)","(2,3)","(3,1)"} | t
(13 rows)
I have a problem: the query extracts the same edges many times, as they are reached through different paths, and I'd like to avoid this. If I modify the outer query into
select distinct g.parent, g.child from graph
I have the desired result, but inefficiencies remain in the WITH query, as unneeded joins are done.
So, is there a solution to extract the reachable edges of a graph in a db, starting from a given one, without using distinct?
I also have another problem (this one is solved, look at the bottom): as you can see from the output, cycles stop only when a node is reached for the second time. I.e. I have (1,2) (2,3) (1,2).
I'd like to stop the cycle before cycling over that last node again, i.e. having (1,2) (2,3).
I've tried to modify the where condition as follows
where
(g.parent = o.child or g.child = o.parent)
and (ROW(o.parent, o.child) <> any(path))
and not cycle
to avoid visiting already visited edges, but it doesn't work and I cannot understand why ((ROW(o.parent, o.child) <> any(path)) should avoid cycling before going on the cycled edge again but doesn't work). How can I do to stop cycles one step before the node that closes the cycle?
Edit: as danihp suggested, to solve the second problem I used
where
(g.parent = o.child or g.child = o.parent)
and not (ROW(o.parent, o.child) = any(path))
and not cycle
and now the output contains no cycles. Rows went from 13 to 6, but I still have duplicates, so the main (the first) problem of extracting all the edges without duplicates and without distinct is still alive. Current output with and not ROW
parent | child | path | cycle
--------+-------+---------------------------+-------
1 | 2 | {"(1,2)"} | f
2 | 1 | {"(1,2)","(2,1)"} | f
2 | 3 | {"(1,2)","(2,3)"} | f
3 | 1 | {"(1,2)","(3,1)"} | f
3 | 1 | {"(1,2)","(2,3)","(3,1)"} | f
2 | 3 | {"(1,2)","(3,1)","(2,3)"} | f
(6 rows)
Edit #2:: following what Erwin Brandstetter suggested, I modified my query, but if I'm not wrong, the proposed query gives MORE rows than mine (ROW comparison is still there as it seems more clear to me, even I understood that string comparison will be more efficient).
Using the new query, I obtain 20 rows, while mine gives 6 rows
WITH RECURSIVE graph(parent, child, path, depth) AS (
SELECT o.parent, o.child, ARRAY[ROW(o.parent, o.child)], 0
from ownership o
where 1 in (o.child, o.parent)
UNION ALL
SELECT
o.parent, o.child,
path||ROW(o.parent, o.child),
depth+1
from
ownership o, graph g
where
g.child in (o.parent, o.child)
and ROW(o.parent, o.child) <> ALL(path)
)
select g.parent, g.child from graph g
Edit 3: so, as Erwin Brandstetter pointed out, the last query was still wrong, while the right one can be found in his answer.
When I posted my first query, I hadn't understood that I was missing some joins, as it happens in the following case: if I start with the node 3, the db selects the rows (2,3) and (3,1). Then, the first inductive step of the query would select, joining from these rows, the rows (1,2), (2,3) and (3,1), missing the row (2,1) that should be included in the result as conceptually the algorithm would imply ( (2,1) is "near" (3,1) )
When I tried to adapt the example in Postgresql manual, I was right trying to join ownership's parent and child, but I was wrong not saving the value of graph that had to be joined in each step.
These type of queries seem to generate a different set of rows depending on the starting node (i.e. depending on the set of rows selected in the base step). So, I think it could be useful to select just one row containing the starting node in the base step, as you'll get any other "adjacent" node anyway.
Could work like this:
WITH RECURSIVE graph AS (
SELECT parent
,child
,',' || parent::text || ',' || child::text || ',' AS path
,0 AS depth
FROM ownership
WHERE parent = 1
UNION ALL
SELECT o.parent
,o.child
,g.path || o.child || ','
,g.depth + 1
FROM graph g
JOIN ownership o ON o.parent = g.child
WHERE g.path !~~ ('%,' || o.parent::text || ',' || o.child::text || ',%')
)
SELECT *
FROM graph
You mentioned performance, so I optimized in that direction.
Major points:
Traverse the graph only in the defined direction.
No need for a column cycle, make it an exclusion condition instead. One less step to go. That is also the direct answer to:
How can I do to stop cycles one step before the node that closes the
cycle?
Use a string to record the path. Smaller and faster than an array of rows. Still contains all necessary information. Might change with very big bigint numbers, though.
Check for cycles with the LIKE operator (~~), should be much faster.
If you don't expect more that 2147483647 rows over the course of time, use plain integer columns instead of bigint. Smaller and faster.
Be sure to have an index on parent. Index on child is irrelevant for my query. (Other than in your original where you traverse edges in both directions.)
For huge graphs I would switch to a plpgsql procedure, where you can maintain the path as temp table with one row per step and a matching index. A bit of an overhead, that will pay off with huge graphs, though.
Problems in your original query:
WHERE (g.parent = o.child or g.child = o.parent)
There is only one endpoint of your traversal at any point in the process. As you wlak the directed graph in both directions, the endpoint can be either parent or child - but not both of them. You have to save the endpoint of every step, and then:
WHERE g.child IN (o.parent, o.child)
The violation of the direction also makes your starting condition questionable:
WHERE parent = 1
Would have to be
WHERE 1 IN (parent, child)
And the two rows (1,2) and (2,1) are effectively duplicates this way ...
Additional solution after comment
Ignore direction
Still walk any edge only once per path.
Use ARRAY for path
Save original direction in path, not actual direction.
Note, that this way (2,1) and (1,2) are effective duplicates, but both can be used in the same path.
I introduce the column leaf which saves the actual endpoint of every step.
WITH RECURSIVE graph AS (
SELECT CASE WHEN parent = 1 THEN child ELSE parent END AS leaf
,ARRAY[ROW(parent, child)] AS path
,0 AS depth
FROM ownership
WHERE 1 in (child, parent)
UNION ALL
SELECT CASE WHEN o.parent = g.leaf THEN o.child ELSE o.parent END -- AS leaf
,path || ROW(o.parent, o.child) -- AS path
,depth + 1 -- AS depth
FROM graph g
JOIN ownership o ON g.leaf in (o.parent, o.child)
AND ROW(o.parent, o.child) <> ALL(path)
)
SELECT *
FROM graph
I have a query I need to perform to show search results for a project. What needs to happen, I need to sort the results by the "horsesActiveDate" and this applies to all of them except for any ad with the adtypesID=7. Those results are sorted by date but they must always result after all other ads.
So I will have all my ads in the result set be ordered by the Active Date AND adtypesID != 7. After that, I need all adtypesID=7 to be sorted by Active Date and appended at the bottom of all the results.
I'm hoping to put this in one query instead of two and appending them together in PHP. The way the code is written, I have to find a way to get it all in one query.
So here is my original query which has worked great until I had to ad the adtypesID=7 which has different sorting requirements.
This is the query that exists now that doesn't take into account the adtypesID for sorting.
SELECT
horses.horsesID,
horsesDescription,
horsesActiveDate,
adtypesID,
states.statesName,
horses_images.himagesPath
FROM horses
LEFT JOIN states ON horses.statesID = states.statesID
LEFT JOIN horses_images ON horses_images.himagesDefault = 1 AND horses_images.horsesID = horses.horsesID AND horses_images.himagesPath != ''
WHERE
horses.horsesStud = 0
AND horses.horsesSold = 0
AND horses.horsesID IN
(
SELECT DISTINCT horses.horsesID
FROM horses
LEFT JOIN horses_featured ON horses_featured.horsesID = horses.horsesID
WHERE horses.horsesActive = 1
)
ORDER BY adtypesID, horses.horsesActiveDate DESC
My first thought was to do two queries where one looked for all the ads that did not contain adtypesID=7 and sort those as the query does, then run a second query to find only those ads with adtypesID=7 and sort those by date. Then take those two results and append them to each other. Since I need to get this all into one query, I can't use a php function to do that.
Is there a way to merge the two query results one after the other in mysql? Is there a better way to run this query that will accomplish this sorting?
The Ideal Results would be as below (I modified the column names so they would be shorter):
ID | Description | ActiveDate | adtypesID | statesName | himagesPath
___________________________________________________________________________
3 | Ad Text | 06-01-2010 | 3 | OK | image.jpg
2 | Ad Text | 05-31-2010 | 2 | LA | image1.jpg
9 | Ad Text | 03-01-2010 | 4 | OK | image3.jpg
6 | Ad Text | 06-01-2010 | 7 | OK | image5.jpg
6 | Ad Text | 05-01-2010 | 7 | OK | image5.jpg
6 | Ad Text | 04-01-2010 | 7 | OK | image5.jpg
Any help that can be provided will be greatly appreciated!
I am not sure about the exact syntax in MySQL, but something like
ORDER BY case when adtypesID = 7 then 2 else 1 end ASC, horses.horsesActiveDate DESC
would work in many other SQL dielects.
Note that most SQL dialects allow the order by to not only be a column, but an expression.
This should work:
ORDER BY (adtypesID = 7) ASC, horses.horsesActiveDate DESC
Use a Union to append two queries together, like this:
SELECT whatever FROM wherever ORDER BY something AND adtypesID!=7
UNION
SELECT another FROM somewhere ORDER BY whocares AND adtypesID=7
http://dev.mysql.com/doc/refman/5.0/en/union.html
I re-wrote your query as:
SELECT h.horsesID,
h.horsesDescription,
h.horsesActiveDate,
adtypesID,
s.statesName,
hi.himagesPath
FROM HORSES h
LEFT JOIN STATES s ON s.stateid = h.statesID
LEFT JOIN HORSES_IMAGES hi ON hi.horsesID = h.horsesID
AND hi.himagesDefault = 1
AND hi.himagesPath != ''
LEFT JOIN HORSES_FEATURED hf ON hf.horsesID = h.horsesID
WHERE h.horsesStud = 0
AND h.horsesSold = 0
AND h.horsesActive = 1
ORDER BY (adtypesID = 7) ASC, h.horsesActiveDate DESC
The IN subquery, using a LEFT JOIN and such, will mean that any horse record whose horsesActive value is 1 will be returned - regardless if they have an associated HORSES_FEATURED record. I leave it to you for checking your data to decide if it should really be an INNER JOIN. Likewise for the STATES table relationship...