Writing a Mathematical Formula in SQL? - sql

I have these tables: users, comments, ratings, and items
I would like to know if it is possible to write SQL query that basically does this:
user_id is in each table. I'd like a SQL query to count each occurrence in each table (except users of course). BUT, I want some tables to carry more weight than the others. Then I want to tally up a "score".
Here is an example:
user_id 5 occurs...
2 times in items;
5 times in comments;
11 times in ratings.
I want a formula/point system that totals something like this:
items 2 x 5 = 10;
comments 5 x 1 = 5;
ratings 11 x .5 = 5.5
TOTAL 21.5
This is what I have so far.....
SELECT u.users
COUNT(*) r.user_id
COUNT(*) c.user_id
COUNT(*) i.user_id
FROM users as u
JOIN COMMENTS as c
ON u.user_id = c_user_id
JOIN RATINGS as r
ON r.user_id = u.user_id
JOIN ITEMS as i
i.user_id = u.user_id
WHERE
????
GROUP BY u.user_id
ORDER by total DESC
I am not sure how to do the mathematical formula portion (if possible). Or how to tally up a total.
Final Code based on John Woo's Answer!
$sql = mysql_query("
SELECT u.username,
(a.totalCount * 5) +
(b.totalCount) +
(c.totalCount * .2) totalScore
FROM users u
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM items
GROUP BY user_id
) a ON a.user_id= u.user_id
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM comments
GROUP BY user_id
) b ON b.user_id= u.user_id
LEFT JOIN
(
SELECT user_id, COUNT(user_id) totalCount
FROM ratings
GROUP BY user_id
) c ON c.user_id = u.user_id
ORDER BY totalScore DESC LIMIT 10;");

Maybe this can help you,
SELECT u.user_ID,
(a.totalCount * 5) +
(b.totalCount) +
(c.totalCount * .2) totalScore
FROM users u LEFT JOIN
(
SELECT user_ID, COUNT(user_ID) totalCount
FROM items
GROUP BY user_ID
) a ON a.user_ID = u.user_ID
LEFT JOIN
(
SELECT user_ID, COUNT(user_ID) totalCount
FROM comments
GROUP BY user_ID
) b ON b.user_ID = u.user_ID
LEFT JOIN
(
SELECT user_ID, COUNT(user_ID) totalCount
FROM ratings
GROUP BY user_ID
) c ON c.user_ID = u.user_ID
ORDER BY totalScore DESC
but based on yur query above,thismay also work
SELECT u.users
(COUNT(*) * .5) +
COUNT(*) +
(COUNT(*) * 2) totalcore
FROM users as u
LEFT JOIN COMMENTS as c
ON u.user_id = c_user_id
LEFT JOIN RATINGS as r
ON r.user_id = u.user_id
LEFT JOIN ITEMS as i
ON i.user_id = u.user_id
GROUP BY u.user_id
ORDER by totalcore DESC
The only difference is by using LEFT JOIN. You will not use INNER JOIN in this situation because there are chances that user_id is not guaranteed to exists on every table.
Hope this makes sense

Here's an alternative approach:
SELECT
u.user_id,
SUM(s.weight) AS totalScore
FROM users u
LEFT JOIN (
SELECT user_id, 5.0 AS weight
FROM items
UNION ALL
SELECT user_id, 1.0
FROM comments
UNION ALL
SELECT user_id, 0.5
FROM ratings
) s
ON u.user_id = s.user_id
GROUP BY
u.user_id
I.e. for every occurrence of every user in every table, a row with a specific weight is produced. The UNIONed set of weights is then joined to the users table for subsequent grouping and aggregating.

Related

Distinct Count with data from another table

I have 4 tables
All ID related things are ints and the rest are texts.
I want to count the number of albums the user is tagged at so if a user is tagged in album1 once album2 once and album3 once it will show 3 and if more in any of them it will still show 3.
I tried to do:
SELECT COUNT(DISTINCT ALBUM_ID) FROM PICTURES WHERE ID=(SELECT PICTURE_ID FROM TAGS WHERE USER_ID=userId);
But this returned 1 although it was supposed to return 3 and the same happened without DISTINCT.
How can I get the amount?
EDIT:
I want to check only one user(I have the user's ID and name)
You must join users with LEFT joins to tags and pictures and aggregate:
SELECT u.id, u.name, COUNT(DISTINCT p.album_id) counter
FROM users u
LEFT JOIN tags t ON t.user_id = u.id
LEFT JOIN pictures p ON p.id = t.picture_id
GROUP BY u.id, u.name
If you want the result for a specific user only:
SELECT u.id, u.name, COUNT(DISTINCT p.album_id) counter
FROM users u
LEFT JOIN tags t ON t.user_id = u.id
LEFT JOIN pictures p ON p.id = t.picture_id
WHERE u.id = ?
GROUP BY u.id, u.name -- you may omit this line, because SQLite allows it
Or with a correlated subquery:
SELECT u.id, u.name,
(
SELECT COUNT(DISTINCT p.album_id)
FROM tags t INNER JOIN pictures p
ON p.id = t.picture_id
WHERE t.user_id = u.id
) counter
FROM users u
WHERE u.id = ?
Replace ? with the id of the user that you want.

Select only those users who have the most visits to provided district

I have a query that selects users with the districts which they visited and visits count.
select users.id, places.district, count(users.id) as counts from users
left join visits on users.id = visits.user_id
inner join places on visits.place_id = places.id
group by users.id, places.district
I need to select only those users who have visited provided district the most. For example, I have a user with id 1 who visited district A one time and district B three times. If I provide district B as parameter, user 1 will be in select. If I want to select users from district A, user 1 will not be in select.
I think that's ranking, then filtering:
select *
from (
select u.id, p.district, count(*) as cnt_visits,
rank() over(partition by u.id order by count(*) desc)
from users u
inner join visits v on u.id = v.user_id
inner join places p on p.id = v.place_id
group by u.id, p.district
) t
where rn = 1 and district = ?
Note that you don't actually need table users to get this result. We could simplify the query as:
select *
from (
select v.user_id, p.district, count(*) as cnt_visits,
rank() over(partition by u.id order by count(*) desc)
from visits v
inner join places p on p.id = v.place_id
group by v.user_id, p.district
) t
where rn = 1 and district = ?
This query handles top ties: if a user had the same, maximum number of visits in two different districts, both are taken into account. If you don't need that feature, then we can simplify the subquery with distinct on:
select *
from (
select distinct on (v.user_id) v.user_id, p.district, count(*) as cnt_visits
from visits v
inner join places p on p.id = v.place_id
group by v.user_id, p.district
order by v.user_id, cnt_visits desc
) t
where district = ?

SQL query to find the top 3 in a category

Calling all sql enthusiasts!
Quick info: using PostgreSQL.
I have a query that return the maximum number of likes for a user per category. What I want now, is to show the top 3 users with the most likes per category.
A helpful resource was using this example to solve the problem:
select type, variety, price
from fruits
where (
select count(*) from fruits as f
where f.type = fruits.type and f.price <= fruits.price
) <= 2;
I understand this, but my query is using joins and I am also a beginner, so I was not able to use this information effectively.
Down to business, this is my query for returning the MAX likes for a user per category.
SELECT category, username, MAX(post_likes) FROM (
SELECT c.name category, u.username username, SUM(p.like_count) post_likes, COUNT(*) post_num
FROM categories c
JOIN topics t ON c.id = t.category_id
JOIN posts p ON t.id = p.topic_id
JOIN users u ON u.id = p.user_id
GROUP BY c.name, u.username) AS leaders
WHERE post_likes > 0
GROUP BY category, username
HAVING MAX(post_likes) >= (SELECT SUM(p.like_count)
FROM categories c
JOIN topics t ON c.id = t.category_id
JOIN posts p ON t.id = p.topic_id
JOIN users u ON u.id = p.user_id WHERE c.name = leaders.category
GROUP BY u.username order by sum desc limit 1)
ORDER BY MAX(post_likes) DESC;
Any and all help would be greatly appreciated. I am having a difficult time wrapping my head around this problem. Thank!
If you want the most likes per category, use window functions:
SELECT cu.*
FROM (SELECT c.name as category, u.username as username,
SUM(p.like_count) as post_likes, COUNT(*) as post_num,
ROW_NUMBER() OVER (PARTITION BY c.name ORDER BY COUNT(*) DESC) as seqnum
FROM categories c JOIN
topics t
ON c.id = t.category_id JOIN
posts p
ON t.id = p.topic_id JOIN
users u
ON u.id = p.user_id
GROUP BY c.name, u.username
) cu
WHERE seqnum <= 3;
This always returns three rows per category, even if there are ties. If you want to do something else, then consider DENSE_RANK() or RANK() instead of ROW_NUMBER().
Also, use as for column aliases in the FROM clause. Although optional, one day you will leave out a comma and be grateful that you are in the habit of using as.

Get last instance of counted rows

There are two tables jobs and users. users has a 1-to-many relation to jobs.
I want to grab the email of all users who have done 5 or more jobs.
The query below does that. However, how can I also retrieve the date of the last job done by the user.
So the desired output would be:
Email jobs done date of last of job
jack#email.com 5+ 1-20-2015
joe#email.com 5+ 2-20-2015
Query that grabs all emails of users who have done 5+ jobs
select
email
, case
when times_used >= 5
then '5+'
end as times_used
from
(
select
u.id
, u.email as email
, count(*) as times_used
from
jobs j
join users u on
j.user_id = u.id
group by
u.id
)
a
where
times_used >= 5
group by
times_used
, email
You could add a join for another derived table that pulls the last date for each user:
select
b.email,
case when times_used >= 5 then '5+' end as 'jobs done',
b.max_date 'date of last job'
from (
select u.id, count(*) as times_used
from jobs j
join users u on j.user_id = u.id
group by u.id
) a
join (
select u.id, u.email, max(j.date) max_date
from jobs j
join users u on j.user_id = u.id
group by u.id, email
) b on b.id = a.id
where times_used >= 5
But if you only want the email, number of jobs and date of the last job for all users that have 5+ jobs then you the query below should be enough:
select u.id, u.email, max(j.date) max_date
from jobs j
join users u on j.user_id = u.id
group by u.id, u.email
having count(j.id) >= 5
Both queries assume that the jobs table looks like id (pk), user_id, date so you have to adjust according to your actual table definition.
You should try WINDOW function approach, as it can be more efficient:
WITH user_jobs AS (
SELECT
u.id as user_id,
j.id as job_id,
u.email,
ROW_NUMBER() OVER (PARTITION BY u.id ORDER BY j.date DESC) as rn,
ROW_NUMBER() OVER (PARTITION BY u.id ORDER BY j.date) as job_number
FROM
jobs j
join users u ON j.user_id = u.id
)
SELECT
user_id,
job_id,
email,
job_number
FROM user_jobs
WHERE rn = 1 and job_number >= 5

Select all threads and order by the latest one

Now that I got the Select all forums and get latest post too.. how? question answered, I am trying to write a query to select all threads in one particular forum and order them by the date of the latest post (column "updated_at").
This is my structure again:
forums forum_threads forum_posts
---------- ------------- -----------
id id id
parent_forum (NULLABLE) forum_id content
name user_id thread_id
description title user_id
icon views updated_at
created_at created_at
updated_at
last_post_id (NULLABLE)
I tried writing this query, and it works.. but not as expected: It doesn't order the threads by their last post date:
SELECT DISTINCT ON(t.id) t.id, u.username, p.updated_at, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
ORDER BY t.id, p.updated_at DESC;
How can I solve this one?
Assuming you want a single row per thread and not all rows for all posts.
DISTINCT ON is still the most convenient tool. But the leading ORDER BY items have to match the expressions of the DISTINCT ON clause. If you want to order the result some other way, you need to wrap it into a subquery and add another ORDER BY to the outer query:
SELECT *
FROM (
SELECT DISTINCT ON (t.id)
t.id, u.username, p.updated_at, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
ORDER BY t.id, p.updated_at DESC
) sub
ORDER BY updated_at DESC;
If you are looking for a query without subquery for some unknown reason, this should work, too:
SELECT DISTINCT
t.id
, first_value(u.username) OVER w AS username
, first_value(p.updated_at) OVER w AS updated_at
, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
WINDOW w AS (PARTITION BY t.id ORDER BY p.updated_at DESC)
ORDER BY updated_at DESC;
There is quite a bit going on here:
The tables are joined and rows are selected according to JOIN and WHERE clauses.
The two instances of the window function first_value() are run (on the same window definition) to retrieve username and updated_at from the latest post per thread. This results in as many identical rows as there are posts in the thread.
The DISTINCT step is executed after the window functions and reduces each set to a single instance.
ORDER BY is applied last and updated_at references the OUT column (SELECT list), not one of the two IN columns (FROM list) of the same name.
Yet another variant, a subquery with the window function row_number():
SELECT id, username, updated_at, title
FROM (
SELECT t.id
, u.username
, p.updated_at
, t.title
, row_number() OVER (PARTITION BY t.id
ORDER BY p.updated_at DESC) AS rn
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
) sub
WHERE rn = 1
ORDER BY updated_at DESC;
Similar case:
Return records distinct on one column but order by another column
You'll have to test which is faster. Depends on a couple of circumstances.
Forget the distinct on:
SELECT t.id, u.username, p.updated_at, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
ORDER BY p.updated_at DESC;