Very hard greatest n per group query - sql

I have a very complexe query here, I try to give you an overview about the necessary tables here:
RPG
RPGCharacter
RPGPost
User
We have X Chars per RPG, x Posts per Char. 1 User can have X Chars, but 1 Char only depens on 1 User.
What I want is a query in which I got the last post per RPG within information about the Username who wrote this, the character and the RPG itself addition to a number how much RPGPosts per RPG we have (total).
This is how far I solved it until now:
SELECT c.RPGID, c.Name, DateTime, r.Name, u.Username, t.count
FROM dbo.RPGCharacter c inner join
(
SELECT CharacterID,
MAX(DateTime) MaxDate
FROM RPGPost
GROUP BY CharacterID
) MaxDates ON c.RPGCharacterID = MaxDates.CharacterID
INNER JOIN RPGPost p ON MaxDates.CharacterID = p.CharacterID
AND MaxDates.MaxDate = p.DateTime
Inner join RPG r on c.RPGID = r.RPGID
Inner join [User] u on u.UserID = c.OwnerID
inner join (Select RPG.RPGID, Count(*) as Count from RPGPost
inner join RPGCharacter on RPGPost.CharacterID = RPGCharacter.RPGCharacterID
inner join RPG on RPG.RPGID = RPGCharacter.RPGID
where RPGPost.IsDeleted = 0
Group by RPG.RPGID) t on r.RPGID = t.RPGID
Order by DateTime desc
Result : http://abload.de/image.php?img=16iudw.jpg
This query gives me all I want but has an Errors:
1) It gives me the last post per Character, but I need the last Post per RPG

Does this help? This should give you the last post per CharacterID in the RPGPost table and include the total number of posts for that CharacterID.
WITH RankedPost AS (
SELECT
P.PostID,
P.CharacterID,
P.DateTime
RANK() OVER (
PARTITION BY CharacterID,
ORDER BY DateTime DESC) Rank,
RANK() OVER (
PARTITION BY CharacterID,
ORDER BY DateTime ASC) Count
FROM RPGPost P)
SELECT
P.DateTime
P.CharacterID,
P.Count
FROM RankedPost P
WHERE
RankedPost.Rank = 0;

Related

postgres join max date

I need to construct a join that will give me the most recent price for each product. I vastly simplified the table structures for the purpose of the example, and each table row counts will be in the millions. My previous stabs at this have not exactly been very effecient.
In PostgreSQL, you could try DISTINCT ON to only get the first row per product id in descending create_date order;
SELECT DISTINCT ON (products.id) products.*, prices.*
FROM products
JOIN prices
ON products.id = prices.product_id
ORDER BY products.id, create_date DESC
(of course, except for illustrative purposes, you should of course select the exact columns you need)
The simplest way to do it is using the row_number function.
SELECT
p.name,
t.amount AS latest_price
FROM (
SELECT
p.*,
row_number() OVER (PARTITION BY product_id ORDER BY create_date DESC) AS rn
FROM
prices p) t
JOIN products p ON p.id = t.product_id
WHERE
rn = 1
While the DISTINCT ON answer worked for my instance, I found there's a faster way for me to get what I need.
SELECT
DISTINCT ON(u.id) u.id,
(CAST(data AS JSON) ->> 'Finished') AS Finished,
ee.post_value
FROM
users_user u
JOIN events_event ee on u.id = ee.actor_id
WHERE
u.id > 20000
ORDER BY
u.id DESC,
ee.time DESC;
takes ~25s on my DB, while
SELECT
u.id,
(CAST(data AS JSON) ->> 'Finished') AS Finished,
e.post_value
FROM
users_user u
JOIN events_event e on u.id = e.actor_id
LEFT JOIN events_event ee on ee.actor_id = e.actor_id
AND ee.time > e.time
WHERE
u.id > 20000
AND ee.id IS NULL
ORDER BY
u.id DESC;
takes ~15s.

SQL query to find the top 3 in a category

Calling all sql enthusiasts!
Quick info: using PostgreSQL.
I have a query that return the maximum number of likes for a user per category. What I want now, is to show the top 3 users with the most likes per category.
A helpful resource was using this example to solve the problem:
select type, variety, price
from fruits
where (
select count(*) from fruits as f
where f.type = fruits.type and f.price <= fruits.price
) <= 2;
I understand this, but my query is using joins and I am also a beginner, so I was not able to use this information effectively.
Down to business, this is my query for returning the MAX likes for a user per category.
SELECT category, username, MAX(post_likes) FROM (
SELECT c.name category, u.username username, SUM(p.like_count) post_likes, COUNT(*) post_num
FROM categories c
JOIN topics t ON c.id = t.category_id
JOIN posts p ON t.id = p.topic_id
JOIN users u ON u.id = p.user_id
GROUP BY c.name, u.username) AS leaders
WHERE post_likes > 0
GROUP BY category, username
HAVING MAX(post_likes) >= (SELECT SUM(p.like_count)
FROM categories c
JOIN topics t ON c.id = t.category_id
JOIN posts p ON t.id = p.topic_id
JOIN users u ON u.id = p.user_id WHERE c.name = leaders.category
GROUP BY u.username order by sum desc limit 1)
ORDER BY MAX(post_likes) DESC;
Any and all help would be greatly appreciated. I am having a difficult time wrapping my head around this problem. Thank!
If you want the most likes per category, use window functions:
SELECT cu.*
FROM (SELECT c.name as category, u.username as username,
SUM(p.like_count) as post_likes, COUNT(*) as post_num,
ROW_NUMBER() OVER (PARTITION BY c.name ORDER BY COUNT(*) DESC) as seqnum
FROM categories c JOIN
topics t
ON c.id = t.category_id JOIN
posts p
ON t.id = p.topic_id JOIN
users u
ON u.id = p.user_id
GROUP BY c.name, u.username
) cu
WHERE seqnum <= 3;
This always returns three rows per category, even if there are ties. If you want to do something else, then consider DENSE_RANK() or RANK() instead of ROW_NUMBER().
Also, use as for column aliases in the FROM clause. Although optional, one day you will leave out a comma and be grateful that you are in the habit of using as.

Select all threads and order by the latest one

Now that I got the Select all forums and get latest post too.. how? question answered, I am trying to write a query to select all threads in one particular forum and order them by the date of the latest post (column "updated_at").
This is my structure again:
forums forum_threads forum_posts
---------- ------------- -----------
id id id
parent_forum (NULLABLE) forum_id content
name user_id thread_id
description title user_id
icon views updated_at
created_at created_at
updated_at
last_post_id (NULLABLE)
I tried writing this query, and it works.. but not as expected: It doesn't order the threads by their last post date:
SELECT DISTINCT ON(t.id) t.id, u.username, p.updated_at, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
ORDER BY t.id, p.updated_at DESC;
How can I solve this one?
Assuming you want a single row per thread and not all rows for all posts.
DISTINCT ON is still the most convenient tool. But the leading ORDER BY items have to match the expressions of the DISTINCT ON clause. If you want to order the result some other way, you need to wrap it into a subquery and add another ORDER BY to the outer query:
SELECT *
FROM (
SELECT DISTINCT ON (t.id)
t.id, u.username, p.updated_at, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
ORDER BY t.id, p.updated_at DESC
) sub
ORDER BY updated_at DESC;
If you are looking for a query without subquery for some unknown reason, this should work, too:
SELECT DISTINCT
t.id
, first_value(u.username) OVER w AS username
, first_value(p.updated_at) OVER w AS updated_at
, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
WINDOW w AS (PARTITION BY t.id ORDER BY p.updated_at DESC)
ORDER BY updated_at DESC;
There is quite a bit going on here:
The tables are joined and rows are selected according to JOIN and WHERE clauses.
The two instances of the window function first_value() are run (on the same window definition) to retrieve username and updated_at from the latest post per thread. This results in as many identical rows as there are posts in the thread.
The DISTINCT step is executed after the window functions and reduces each set to a single instance.
ORDER BY is applied last and updated_at references the OUT column (SELECT list), not one of the two IN columns (FROM list) of the same name.
Yet another variant, a subquery with the window function row_number():
SELECT id, username, updated_at, title
FROM (
SELECT t.id
, u.username
, p.updated_at
, t.title
, row_number() OVER (PARTITION BY t.id
ORDER BY p.updated_at DESC) AS rn
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
) sub
WHERE rn = 1
ORDER BY updated_at DESC;
Similar case:
Return records distinct on one column but order by another column
You'll have to test which is faster. Depends on a couple of circumstances.
Forget the distinct on:
SELECT t.id, u.username, p.updated_at, t.title
FROM forum_threads t
LEFT JOIN forum_posts p ON p.thread_id = t.id
LEFT JOIN users u ON u.id = p.user_id
WHERE t.forum_id = 3
ORDER BY p.updated_at DESC;

Highest Count with a group

I'm having an absolute brain fade
SELECT p.ProductCategory, f.ProductSubCategory, COUNT(*) AS Cnt
FROM Sales f
JOIN Products p ON f.ProductSubCategory = p.ProductSubCategory
GROUP BY p.ProductCategory, f.ProductSubCategory
ORDER BY 1,3 DESC
This shows me the count for each ProductSubCategory, I would like to see only the highest ProductSubCategory per ProductCategory.
I wish to see (I don't care about the Count value)
There are a couple of different ways to do this. One involves joining the results back to themselves and using the max aggregate. But since you are using SQL Server, you can use ROW_NUMBER to achieve the same result:
with cte as (
select p.productcategory, p.ProductSubCategory, COUNT(*) cnt,
ROW_NUMBER() over (partition by p.productcategory order by count(*) desc) rn
from products p
join sales s on p.ProductSubCategory = s.ProductSubCategory
group by p.productcategory, p.ProductSubCategory
)
select *
from cte
where rn = 1
You already got the answer, Please see the following code to. It may help you.
SELECT p.ProductCategory,
f.ProductSubCategory,
COUNT(*) AS Cnt
FROM Sales f
JOIN Products p ON f.ProductSubCategory = p.ProductSubCategory
JOIN (
SELECT p.ProductCategory,
f.ProductSubCategory,
ROW_NUMBER() OVER ( PARTITION BY p.ProductCategory,
f.ProductSubCategory
ORDER BY COUNT(*) DESC) [Row]
FROM Sales f
JOIN Products p ON f.ProductSubCategory = p.ProductSubCategory) Lu
ON P.ProductCategory = Lu.ProductCategory
AND f.ProductSubCategory = Lu.ProductSubCategory
WHERE Lu.Row = 1
GROUP By p.ProductCategory,
f.ProductSubCategory

SELECT distinct gives wrong count

I've got a problem where my count(*) will return the number of rows before distinct rows are filtered.
This is a simplified version of my query. Note that I'll extract a lot of other data from the tables, so group by won't return the same result, as I'd have to group by maybe 10 columns. The way it works is that m is a map mapping q, c and kl, so there can be several references to q.id. I only want one.
SELECT distinct on (q.id) count(*) over() as full_count
from q, c, kl, m
where c.id = m.chapter_id
and q.id = m.question_id
and q.active = 1
and c.class_id = m.class_id
and kl.id = m.class_id
order by q.id asc
If I run this i get full_count = 11210 while it only returns 9137 rows. If I run it without the distinct on (q.id), distinct on (q.id) is indeed the number of rows.
So it seems that the count function doesn't have access to the filtered rows. How can I solve this? Do I need to rethink my approach?
I'm not entirely sure what exactly you are trying to count, but this might get you started:
select id,
full_count,
id_count
from (
SELECT q.id,
count(*) over() as full_count,
count(*) over (partition by q.id) as id_count,
row_number() over (partition by q.id order by q.id) as rn
from q
join m on q.id = m.question_id
join c on c.id = m.chapter_id and c.class_id = m.class_id
join kl on kl.id = m.class_id
where q.active = 1
) t
where rn = 1
order by q.id asc
If you need the count per id, then the column id_count would be what you need. If you need the overall count, but just on row per id, then the full_count is probably what you want.
(note that I re-wrote your implicit join syntax to use explicit JOINs)
Can you use a subquery:
select qid, count(*) over () as full_count
from (SELECT distinct q.id
from q, c, kl, m
where c.id = m.chapter_id
and q.id = m.question_id
and q.active = 1
and c.class_id = m.class_id
and kl.id = m.class_id
) t
order by q.id asc
But the group by is the right approach. The key word distinct in selectis really just syntactic sugar for doing a group by on all non-aggregate-functioned columns.