Retrieve 2 last posts for each category - sql

Lets say I have 2 tables: blog_posts and categories. Each blog post belongs to only ONE category, so there is basically a foreign key between the 2 tables here.
I would like to retrieve the 2 lasts posts from each category, is it possible to achieve this in a single request?
GROUP BY would group everything and leave me with only one row in each category. But I want 2 of them.
It would be easy to perform 1 + N query (N = number of category). First retrieve the categories. And then retrieve 2 posts from each category.
I believe it would also be quite easy to perform M queries (M = number of posts I want from each category). First query selects the first post for each category (with a group by). Second query retrieves the second post for each category. etc.
I'm just wondering if someone has a better solution for this. I don't really mind doing 1+N queries for that, but for curiosity and general SQL knowledge, it would be appreciated!
Thanks in advance to whom can help me with this.

Check out this MySQL article on how to work with the top N things in arbitrarily complex groupings; it's good stuff. You can try this:
SET #counter = 0;
SET #category = '';
SELECT
*
FROM
(
SELECT
#counter := IF(posts.category = #category, #counter + 1, 0) AS counter,
#category := posts.category,
posts.*
FROM
(
SELECT
*
FROM test
ORDER BY category, date DESC
) posts
) posts
HAVING counter < 2

SELECT p.*
FROM (
SELECT id,
COALESCE(
(
SELECT datetime
FROM posts pi
WHERE pi.category = c.id
ORDER BY
pi.category DESC, pi.datetime DESC, pi.id DESC
LIMIT 1, 1
), '1900-01-01') AS post_datetime,
COALESCE(
(
SELECT id
FROM posts pi
WHERE pi.category = c.id
ORDER BY
pi.category DESC, pi.datetime DESC, pi.id DESC
LIMIT 1, 1
), 0) AS post_id
FROM category c
) q
JOIN posts p
ON p.category <= q.id
AND p.category >= q.id
AND p.datetime >= q.post_datetime
AND (p.datetime, p.id) >= (q.post_datetime, q.post_id)
Make an index on posts (category, datetime, id) for this to be fast.
Note the p.category <= c.id AND p.category >= c.id hack: this makes MySQL to use Range checked for each record which is more index efficient.
See this article in my blog for a similar problem:
MySQL: emulating ROW_NUMBER with multiple ORDER BY conditions

Related

Optimization of multiple aggregate sorting in SQL

I have a postgres query written for the Spree Commerce store that sorts all of it's products in the following order: In stock (then first available), Backorder (then first available), Sold out (then first available).
In order to chain it with rails scopes I had to put it in the order by clause as opposed to anywhere else. The query itself works, and is fairly performant, but complex. I was curious if anyone with a bit more knowledge could discuss a better way to do it? I'm interested in performance, but also different ways to approach the problem.
ORDER BY (
SELECT
CASE
WHEN tt.count_on_hand > 0
THEN 2
WHEN zz.backorderable = true
THEN 1
ELSE 0
END
FROM (
SELECT
row_number() OVER (dpartition),
z.id,
bool_or(backorderable) OVER (dpartition) as backorderable
FROM (
SELECT DISTINCT ON (spree_variants.id) spree_products.id, spree_stock_items.backorderable as backorderable
FROM spree_products
JOIN "spree_variants" ON "spree_variants"."product_id" = "spree_products"."id" AND "spree_variants"."deleted_at" IS NULL
JOIN "spree_stock_items" ON "spree_stock_items"."variant_id" = "spree_variants"."id" AND "spree_stock_items"."deleted_at" IS NULL
JOIN "spree_stock_locations" ON spree_stock_locations.id=spree_stock_items.stock_location_id
WHERE spree_stock_locations.active = true
) z window dpartition as (PARTITION by id)
) zz
JOIN (
SELECT
row_number() OVER (dpartition),
t.id,
sum(count_on_hand) OVER (dpartition) as count_on_hand
FROM (
SELECT DISTINCT ON (spree_variants.id) spree_products.id, spree_stock_items.count_on_hand as count_on_hand
FROM spree_products
JOIN "spree_variants" ON "spree_variants"."product_id" = "spree_products"."id" AND "spree_variants"."deleted_at" IS NULL
JOIN "spree_stock_items" ON "spree_stock_items"."variant_id" = "spree_variants"."id" AND "spree_stock_items"."deleted_at" IS NULL
) t window dpartition as (PARTITION by id)
) tt ON tt.row_number = 1 AND tt.id = spree_products.id
WHERE zz.row_number = 1 AND zz.id=spree_products.id
) DESC, available_on DESC
The FROM shown above determines whether or not a product is backorderable, and the JOIN shown above determines the stock in inventory. Note that these are very similar queries, except that I need to determine if something is backorderable based on a locations ability to support backorders and its state, WHERE spree_stock_locations.active=true.
Thanks for any advice!

Complex SQL converted to active record

I have been puzzling over the following query, and how it could be done using active record.
select * from links where id in
(select id from
(select votable_id as id, count(votable_id) as count from votes where vote_scope = 'flag' and votable_type = 'Link' group by votable_id )
as x where count < 3);
I am currently using just the raw SQL query in my finder at the moment (find_by_sql), but was wondering if there was a more 'railsy' way of doing this?
EDIT
Thanks to Joachim Isaksson the query can be squashed to
select * from links where id in
(select votable_id from votes
where vote_scope = 'flag'
and votable_type = 'Link'
group by votable_id
HAVING COUNT(*) < 3) ;
Let's start with me not being a rails guru in any capacity, and I can't test run this so a bit "running blind". In other words, take this with a grain of salt :)
Rewriting your query as a join (assuming here your links table has id and one more field called link for the sake of the GROUP BY;
SELECT links.*
FROM links
JOIN votes
ON links.id = votes.votable_id
AND votes.vote_scope = 'flag'
AND votes.votable_type = 'Link'
GROUP BY votes.votable_id, links.id, links.link
HAVING COUNT(*) < 3;
(SQLfiddle to test with)
...should make something like this work (line split for readability)
Link.joins("JOIN votes ON links.id = votes.votable_id
AND votes.vote_scope = 'flag' AND votes.votable_type = 'Link'")
.group("votes.votable_id, links.id, links.link")
.having("COUNT(*) < 3")
Having your associations set up correctly may allow you to do the join in a more "railsy" way too.

Fetch Certain Number of random Rows Belonging to Different Tables into One Result Set?

See the Tables : http://i.stack.imgur.com/4FitK.png
Can anyone help me with constructing a query that will fetch random questions based on the category and difficulty level and the total number of questions set in the Question Set Config. table.
I have constructed one.
SELECT c.question_id
, s.question_set_id
FROM qm_question_category c
, qm_question_set_cfg s
WHERE ( c.category_id = s.category_id
AND c.difficulty_level = s.difficulty_level
)
AND ROWNUM <= (SELECT SUM(total_questions)
FROM qm_question_set_cfg
WHERE question_set_id = 101138) /* Set ID */
ORDER BY dbms_random.value
This fetches the total number of questions randomly from the existing questions based on the categories and difficulty level.
But what I want is to first fetch the questions in each category + difficulty randomly and then merge those rows into one single result set. (For eg. 10 questions random from Category1, 10 from Category2 upto the 10 from CategoryN as set in the Question Set Config)
Try to use the order by like this:
order by rank() over (partition by c.category_id order by dbms_random.value)
What I'm trying to do is random the order of the records for every category by its own and then oredr it so it will be -
category a
category b
category c
category a
category b
category c
...
so now you can fetch 30 records and be sure that you have 10 of category a, 10 of category b and 10 of category c
UPDATE after reading your comment I realized you may be looking for something like this:
select *
from (select c.question_id,
c.category_id,
s.set_id,
s.no_of_questions,
rank() over(partition by c.category_id order by dbms_random.value) rank
from question_category c, question_set_config s
where c.category_id = s.category_id
and c.difficulty_level = s.difficulty_level
and s.set_id = 101138) t
where rank <= t.no_of_questions
I used the rank() analytic function to give a "line number" to each random question within its category. Then I could request only those who're smaller than no_of_questions

Getting a MySQL group by query to display the row in that group with the highest value

I'm trying to figure out how to query my database so that it will essentially first ORDER my results and then GROUP them... This question seems to be slightly common and I have found examples but I still don't quite grasp the 'how' to do this and use the examples in my own situation.... So all help is definitely appreciated.
Here are my MySQL tables:
books
book_id
book_title
users
user_id
user_name
book_reviews
review_id
book_id
user_id
review_date (unix timestamp date)
I would like to query 30 of the latest book reviews. They will simply display as:
Book Name
Username of Reviewer
However I would like to display each book no more than one time. So the review shown in the list should be the most recently added review. To do this I have been simply grouping by book_name and ordering by review_date DESC. But querying this way doesn't display the record with the most recently added review_date as the grouped by row so my data is incorrect.
Here is my current query:
SELECT books.books_title, users.user_name, book_reviews.review_id FROM books, users, book_reviews WHERE book_reviews.book_id = books.book_id AND book_reviews.user_id = users.user_id GROUP BY book_title ORDER BY review_date DESC LIMIT 30
From what I've read it seems like I have to have a subquery where I get the MAX(review_date) value but I still don't understand how to link it all up.
Thanks a ton.
Use:
SELECT x.book_title,
x.user_name
FROM (SELECT b.book_title,
u.user_name,
br.review_date,
CASE
WHEN #book = b.book_title THEN #rownum := #rownum + 1
ELSE #rownum := 1
END AS rank,
#book := b.book_title
FROM BOOKS b
JOIN BOOK_REVIEWS br ON br.book_id = b.book_id
JOIN USERS u ON u.user_id = br.user_id
JOIN (SELECT #rownum := 0, #book := '') r
ORDER BY b.book_title, br.review_date DESC) x
WHERE x.rank = 1
ORDER BY x.review_date DESC
LIMIT 30
MySQL doesn't have analytical/ranking/windowing functionality, but this ranks the reviews where the latest is marked as 1. This is on a per book basis...
I exposed the review date to order by the latest of those which are the latest per book...

SQL latest/top items in category

What is a scalable way to select latest 10 items from each category.
I have a schema list this:
item category updated
so I want to select 10 last update items from each category. The current solution I can come up with is to query for categories first and then issue some sort of union query:
categories = sql.execute("select categories from categories_table")
query = ""
for cat in categories:
query += "union select top 10 from table where category=cat order by updated"
result = sql.execute(query)
I am not sure how efficient this will be for bigger databases (1 million rows).
If there is a way to do this in one go - that would be nice.
Any help appreciated.
This will not compile but you'll have the general idea:
from i in table
group i by i.category into g
select new { cat = g.Key, LastTens = g.OrderByDescending(o => o.Updated).Take(10).Select(...) }
EDIT: the question asked for SQL:
SELECT * FROM
(
SELECT
ROW_NUMBER() OVER (PARTITION BY categoryId ORDER BY somedate) AS PartNum,
categoryId,
[...]
FROM
category
) AS Partitionned
WHERE PartNum <= 10