SQL query for fetching users and common friends - sql

I know similiar questions have been asked and answered before, I have reviewed them but still can't quite wrap my head around how to do this in my case.
I would like to create a query (I use postgreSQL) that would return users from my database filtered by name, sorted by the number of friends in common with a given user (the user sending the request).
The data structure is as follows:
I have a users table, that has a column called search_full_name which stores name + surname in the format of "ADAM SMITH". This is what I filter with.
I have a user_friends table that stores information about who is friends with whom. So I have two columns in there: user_id and friend_id . The data is symmetric, i.e. for every (1,3) there is a (3,1) entry.
So far in the friend search I was just using a query like
select * from users where users.search_full_name like '%query%'
But now, I would like to additionally order the result by the amount of friends in common with the user asking, so my query would have two inputs: query and userId.
Turns out I am not as good with sql as I thought, and I would really appreciate your help, it would be great to see some explanations too.
I imagine the desired output as:
+---------+------------------+----------------------+--+
| user_id | search_full_name | common_friends_count | |
+---------+------------------+----------------------+--+
| 45 | Adam Smith | 14 | |
| 123 | Adam Cole | 11 | |
| 12 | Adamic Kapi | 0 | |
+---------+------------------+----------------------+--+
for a query like 'Adam'
I have been trying this for a whole day now and I feel my brain has exploded.
Please help, thanks

The basic idea is a self-join. The following gets a match on users who share friends with the specified user:
select uf2.user_id, count(*) as num_friends
from user_friends uf join
user_friends uf2
on uf2.friend_id = uf.friend_id and
uf2.user_id <> uf2.user_id
where uf2.user_id = ?
group by uf2.user_id
order by count(*) desc; -- the user you care about

Ok, so after a few hours I came up with a query that works :) Here it is for future reference:
select u.id, u.search_full_name, count(uf.friend_id) as common_friend_count
from users u left join user_friends uf on (u.id = uf.user_id and uf.friend_id in (select friend_id from user_friends where user_id = ?))
where u.search_full_name like ?
group by u.search_full_name, u.id
order by common_friend_count desc;

Related

How to create two JOIN-tables so that I can compare attributes within?

I take a Database course in which we have listings of AirBnBs and need to be able to do some SQL queries in the Relationship-Model we made from the data, but I struggle with one in particular :
I have two tables that we are interested in, Billing and Amenities. The first one have the id and price of listings, the second have id and wifi (let's say, to simplify, that it equals 1 if there is Wifi, 0 otherwise). Both have other attributes that we don't really care about here.
So the query is, "What is the difference in the average price of listings with and without Wifi ?"
My idea was to build to JOIN-tables, one with listings that have wifi, the other without, and compare them easily :
SELECT avg(B.price - A.price) as averagePrice
FROM (
SELECT Billing.price, Billing.id
FROM Billing
INNER JOIN Amenities
ON Billing.id = Amenities.id
WHERE Amenities.wifi = 0
) A, (
SELECT Billing.price, Billing.id
FROM Billing
INNER JOIN Amenities
ON Billing.id = Amenities.id
WHERE Amenities.wifi = 1) B
WHERE A.id = B.id;
Obviously this doesn't work... I am pretty sure that there is a far easier solution to it tho, what do I miss ?
(And by the way, is there a way to compute the absolute between the difference of price ?)
I hope that I was clear enough, thank you for your time !
Edit : As mentionned in the comments, forgot to say that, but both tables have idas their primary key, so that there is one row per listing.
Just use conditional aggregation:
SELECT AVG(CASE WHEN a.wifi = 0 THEN b.price END) as avg_no_wifi,
AVG(CASE WHEN a.wifi = 1 THEN b.price END) as avg_wifi
FROM Billing b JOIN
Amenities a
ON b.id = a.id
WHERE a.wifi IN (0, 1);
You can use a - if you want the difference instead of the specific values.
Let's assume we're working with data like the following (problems with your data model are noted below):
Billing
+------------+---------+
| listing_id | price |
+------------+---------+
| 1 | 1500.00 |
| 2 | 1700.00 |
| 3 | 1800.00 |
| 4 | 1900.00 |
+------------+---------+
Amenities
+------------+------+
| listing_id | wifi |
+------------+------+
| 1 | 1 |
| 2 | 1 |
| 3 | 0 |
+------------+------+
Notice that I changed "id" to "listing_id" to make it clear what it was (using "id" as an attribute name is problematic anyways). Also, note that one listing doesn't have an entry in the Amenities table. Depending on your data, that may or may not be a concern (again, refer to the bottom for a discussion of your data model).
Based on this data, your averages should be as follows:
Listings with wifi average $1600 (Listings 1 and 2)
Listings without wifi (just 3) average 1800).
So the difference would be $200.
To achieve this result in SQL, it may be helpful to first get the average cost per amenity (whether wifi is offered). This would be obtained with the following query:
SELECT
Amenities.wifi AS has_wifi,
AVG(Billing.price) AS avg_cost
FROM Billing
INNER JOIN Amenities ON
Amenities.listing_id = Billing.listing_id
GROUP BY Amenities.wifi
which gives you the following results:
+----------+-----------------------+
| has_wifi | avg_cost |
+----------+-----------------------+
| 0 | 1800.0000000000000000 |
| 1 | 1600.0000000000000000 |
+----------+-----------------------+
So far so good. So now we need to calculate the difference between these 2 rows. There are a number of different ways to do this, but one is to use a CASE expression to make one of the values negative, and then simply take the SUM of the result (note that I'm using a CTE, but you can also use a sub-query):
WITH
avg_by_wifi(has_wifi, avg_cost) AS
(
SELECT Amenities.wifi, AVG(Billing.price)
FROM Billing
INNER JOIN Amenities ON
Amenities.listing_id = Billing.listing_id
GROUP BY Amenities.wifi
)
SELECT
ABS(SUM
(
CASE
WHEN has_wifi = 1 THEN avg_cost
ELSE -1 * avg_cost
END
))
FROM avg_by_wifi
which gives us the expected value of 200.
Now regarding your data model:
If both your Billing and Amenities table only have 1 row for each listing, it makes sense to combine them into 1 table. For example: Listings(listing_id, price, wifi)
However, this is still problematic, because you probably have a bunch of other amenities you want to model (pool, sauna, etc.) So you might want to model a many-to-many relationship between listings and amenities using an intermediate table:
Listings(listing_id, price)
Amenities(amenity_id, amenity_name)
ListingsAmenities(listing_id, amenity_id)
This way, you could list multiple amenities for a given listing without having to add additional columns. It also becomes easy to store additional information about an amenity: What's the wifi password? How deep is the pool? etc.
Of course, using this model makes your original query (difference in average cost of listings by wifi) a bit tricker, but definitely still doable.

result repetition in SQL inner join with one to many relationship

I am implementing an application that provides the opening hours of several venues. A simplified version of my DB implementation consists of two tables:
+-----------+ +------------------+
| Venue | | opening_hour |
+-----------+ +------------------+
| venue_id | | opening_hour_id |
| name | | day |
+-----------+ | close_time |
| open_time |
| venue_id |
+------------------+
In this case there is a one-to-many relationship between venue and opening hour.
Now, I would like to retrieve a list of all venues available in the database and their corresponding opening hours. To solve this problem I am using the following code:
SELECT ven.name as name, oh.day as day
FROM venue ven INNER JOIN opening_hour oh
ON oh.venue_id = ven.venue_id
With this implementation, for each day's opening hours I get a row result with the venue name and the day value. This means that if a venue is opened 6 days a week, I would receive 6 rows with the same name and the corresponding day. As a result I find myself with a lot of repeated data that I have to manipulate on the server side.
The only two solutions I can think of from my small DB knowledge is to either follow the current solution or to extract all venues and then perform a single query for each one of them in order to extract their opening hours. The latter one is clearly the worse solution since it would require a ridiculous amount of DB requests.
Can anyone thing of a better approach? The ideal would be to receive a row containing the venue name and an array formed by all the opening hours.
note: Not sure if this is relevant in this case, but I am using a PostgreSQL database.
This will give the venue name and an array of all days when the venue is open:
SELECT ven.name, array_agg(oh.day)
FROM venue ven
NATURAL JOIN opening_hour oh
GROUP BY ven.name;
For the ones using MSSQL;
SELECT
v.name,
REPLACE(REPLACE(REPLACE('['+(
SELECT
'''' + convert(nvarchar(max),s2.open_time) + '''' as a
FROM opening_hour s2
WHERE s2.venue_id= s.venue_id
FOR XML PATH('')
) + ']','<a>',''),'</a>',','),',]',']') as opening_hours
FROM opening_hour s
INNER JOIN Venue v on v.venue_id = s.venue_id
GROUP BY s.venue_id,v.name
Just to note here, this does not return data type Array. It is just a string in a array format.

Retrieving values from two columns based on different conditions

I have a question for you all. I have 'inherited' a DB at work and I have to create a report from a table using different conditions. Please note I'm no sql expert, I hope what I write makes sense.
Trying to simplify, I have a HARDWARE table that contains the following:
HWTYPE - type of hardware
HWMODEL - model of hardware
PHONENUM - phone number
USERID - user the hardware is assigned to
the data looks like this:
HWTYPE | HWMODEL | PHONENUM | USERID
-------+------------+------------+----------
SIM | SIMVOICE | 123456 | CIRO
SIM | SIMVOICE | 124578 | LEO
PHONE | APPLE | | CIRO
PHONE | SAMSUNG | | LEO
now as you can see, every user has assigned one phone and one SIM with a phone number.
I need to sort the data per user, so that every line of the query result look like:
HW | PHONENUM | USERID
---------+--------------+------
APPLE | 123456 | CIRO
SAMSUNG | 124578 | LEO
so basically: group column PHONENUM and HWMODEL based on USER.
And this is where I get stuck! I tried union, join, case etc. but I still don't get the correct result.
Again apologies for the (probably) very basic question. I tried to look for something similar but could not find anything.
Thanks to whoever will want to help me.
regards
Leo
I dont know that I understood your question or not
But i think you just need to write following query for your O/P
SELECT
HWTYPE, HWMODEL, USERID
FROM
HARDWARE
GROUP BY USERID ,HWTYPE,HWMODEL
ORDER BY HWTYPE
Placing my comment as an answer;
Write this as your SQL:
SELECT
(HWTYPE, HWMODEL, USERID)
FROM
HARDWARE
GROUP BY (USERID) /*Other Clauses can be added here, but ensure you use commas to seperate them!*/
Taking this step by step:
SELECT ... -> what columns you want to see
FROM ... -> what table you want it from (use joins if you need from multiple tables)
GROUP BY... -> What you want to collect together
There is also:
WHERE... -> conditions for when to include/what not to include
You can also get your expected Result using below query. This will use self join on Hardware table as well as reduce the length of query syntex.
SELECT H2.HWMODEL AS HW,H1.PHONENUM,H1.USERID FROM HARDWARE H1 INNER JOIN HARDWARE H2 ON H1.USERID = H2.USERID
WHERE ISNULL(H1.PHONENUM,'') <> ''
AND ISNULL(H2.PHONENUM,'') = ''
ORDER BY H2.HWMODEL ASC
Hope this will help you.
If your goal is a list of Phone numbers and models sorted by user, you can use this query:
SELECT HWMODEL,PHONENUM,USERID FROM HARDWARE ORDER BY USER
Hope to have understood your question
SELECT
hw.HWMODEL as HW
,phone.PHONENUM
,user.USERID
FROM
(
SELECT DISTINCT
USERID
FROM
HARDWARE
) as user
LEFT JOIN
(
SELECT
USERID
,PHONENUM
FROM
HARDWARE
WHERE
PHONENUM IS NOT NULL
) as phone
ON phone.USERID = user.USERID
LEFT JOIN
(
SELECT
USERID
,HWMODEL
FROM
HARDWARE
WHERE
HWTYPE = 'PHONE'
) AS hw
ON hw.USERID = user.USERID
ORDER BY
user.USERID
,hw.HWMODEL

SQL: SUM of MAX values WHERE date1 <= date2 returns "wrong" results

Hi stackoverflow users
I'm having a bit of a problem trying to combine SUM, MAX and WHERE in one query and after an intense Google search (my search engine skills usually don't fail me) you are my last hope to understand and fix the following issue.
My goal is to count people in a certain period of time and because a person can visit more than once in said period, I'm using MAX. Due to the fact that I'm defining people as male (m) or female (f) using a string (for statistic purposes), CHAR_LENGTH returns the numbers I'm in need of.
SELECT SUM(max_pers) AS "People"
FROM (
SELECT "guests"."id", MAX(CHAR_LENGTH("guests"."gender")) AS "max_pers"
FROM "guests"
GROUP BY "guests"."id")
So far, so good. But now, as stated before, I'd like to only count the guests which visited in a certain time interval (for statistic purposes as well).
SELECT "statistic"."id", SUM(max_pers) AS "People"
FROM (
SELECT "guests"."id", MAX(CHAR_LENGTH("guests"."gender")) AS "max_pers"
FROM "guests"
GROUP BY "guests"."id"),
"statistic", "guests"
WHERE ( "guests"."arrival" <= "statistic"."from" AND "guests"."departure" >= "statistic"."to")
GROUP BY "statistic"."id"
This query returns the following, x = desired result:
x * (x+1)
So if the result should be 3, it's 12. If it should be 5, it's 30 etc.
I probably could solve this algebraic but I'd rather understand what I'm doing wrong and learn from it.
Thanks in advance and I'm certainly going to answer all further questions.
PS: I'm using LibreOffice Base.
EDIT: An example
guests table:
ID | arrival | departure | gender |
10 | 1.1.14 | 10.1.14 | mf |
10 | 15.1.14 | 17.1.14 | m |
11 | 5.1.14 | 6.1.14 | m |
12 | 10.2.14 | 24.2.14 | f |
13 | 27.2.14 | 28.2.14 | mmmmmf |
statistic table:
ID | from | to | name |
1 | 1.1.14 | 31.1.14 |January | expected result: 3
2 | 1.2.14 | 28.2.14 |February| expected result: 7
MAX(...) is the wrong function: You want COUNT(DISTINCT ...).
Add proper join syntax, simplify (and remove unnecessary quotes) and this should work:
SELECT s.id, COUNT(DISTINCT g.id) AS People
FROM statistic s
LEFT JOIN guests g ON g.arrival <= s."from" AND g.departure >= s."too"
GROUP BY s.id
Note: Using LEFT join means you'll get a result of zero for statistics ids that have no guests. If you would rather no row at all, remove the LEFT keyword.
You have a very strange data structure. In any case, I think you want:
SELECT s.id, sum(numpersons) AS People
FROM (select g.id, max(char_length(g.gender)) as numpersons
from guests g join
statistic s
on g.arrival <= s."from" AND g.departure >= s."too"
group by g.id
) g join
GROUP BY s.id;
Thanks for all your inputs. I wasn't familiar with JOIN but it was necessary to solve my problem.
Since my databank is designed in german, I made quite the big mistake while translating it and I'm sorry if this caused confusion.
Selecting guests.id and later on grouping by guests.id wouldn't make any sense since the id is unique. What I actually wanted to do is select and group the guests.adr_id which links a visiting guest to an adress databank.
The correct solution to my problem is the following code:
SELECT statname, SUM (numpers) FROM (
SELECT statistic.name AS statname, guests.adr_id, MAX( CHAR_LENGTH( guests.gender ) ) AS numpers
FROM guests
JOIN statistics ON (guests.arrival <= statistics.too AND guests.departure >= statistics.from )
GROUP BY guests.adr_id, statistic.name )
GROUP BY statname
I also noted that my database structure is a mess but I created it learning by doing and haven't found any time to rewrite it yet. Next time posting, I'll try better.

How to properly loop through a table and output to an array in PostgreSQL?

Here is a DB example
table "Users"
fname | lname | id | email
Joe | smith | 1 | yadda#goo.com
Bob | smith | 2 | bob#goo.com
Jane | smith | 3 | jane#goo.com
table "Awards"
userId | award
1 | bigaward
1 | smallaward
1 | thisaward
2 | thataward
table "Invites"
userId | invited
1 | true
3 | true
Basically, how do you write a query in PostgreSQL that allows you to create something like this:
[{
fname:"Joe",
lname:"Smith",
id: 1,
email: "yadda#goo.com",
invited: true,
awards: ["bigaward", "smallaward", "thisaward"]
},
{
fname:"Jane",
lname:"Smith",
id: 3,
email: "jane#goo.com",
invited: true,
awards: []
}]
Here is what I am trying to do...
SELECT users.fname, users.lname, users.id, users.email, invites.invited, awards.award(needs to be an array)
FROM users
JOIN awards on ....(unknown)
JOIN invites on invites.userid = users.id
WHERE invited = true
the above array would be the desired output, just can't figure out a good one shot query. I tried the PostgreSQL docs, but to no avail. I think I might need a WITH statement?
Thanks in advance, Postgres guru!
PostgreSQL v. 9.2
Answered by RhodiumToad on the postgresql IRC:
SELECT users.fname, users.lname, .... array(select awards.award from awards where a.id = user.id) as awards
FROM users
JOIN invites on invites.userid = users.id
WHERE invited = true
array() then through a query inside of it... brilliant!
I think it can be as simple as:
SELECT u.fname, u.lname, u.id, u.email, i.invited, array_agg(a.award)
FROM users u
JOIN invites i ON i.userid = u.id AND i.invited
LEFT JOIN awards a ON a.userid = u.id
GROUP BY u.fname, u.lname, u.id, u.email, i.invited
Your display in JSON format just makes it seem more complicated.
Use a basic GROUP BY including all columns not to be aggregated - leaving award, which you aggre3gate into an array with the help of the aggregate function array_agg().
The LEFT JOIN is important, so not to exclude users without any awards by mistake.
Note that I ignored CaMeL-case spelling in my example to avoid the ugly double-quoting.
This is considerably faster for bigger tables than notoriously slow correlated subqueries - like demonstrated in the added example in the question.
->SQLfiddle demo
You will need to use a hasNext or forEach (or similar) method to iterate trough the database. Whilst you're iterating you can add the results to an array and then encode that array to JSON.
I think I may have misunderstood your question. Apologies if inhave