Trying to remove an inner SQL select statement - sql

I am making a music player where we have stations. I have a table called histories. It has data on the songs a user likes, dislikes or skipped. We store all the times that a person has liked a song or disliked it. We want to get a current snapshot of all the songs the user has either liked (event_type=1) or disliked (event_type=2) in a given station.
The table has the following rows:
id (PK int autoincrement)
station_id (FK int)
song_id (FK int)
event_type (int, either 1, 2, or 3)
Here is my query:
SELECT song_id, event_type, id
FROM histories
WHERE id IN (SELECT MAX(id) AS id
FROM histories
WHERE station_id = 187
AND (event_type=1 OR event_type=2)
GROUP BY station_id, song_id)
ORDER BY id;
Is there a way to make this query run without the inner select? I am pretty sure this will run a lot faster without it

You can use JOIN instead. Something like this:
SELECT h1.song_id, h1.event_type, h1.id
FROM histories AS h1
INNER JOIN
(
SELECT station_id, song_id, MAX(id) AS MaxId
FROM histories
WHERE station_id = 187
AND event_type IN (1, 2)
GROUP BY station_id, song_id
) AS h2 ON h1.station_id = h2.station_id
AND h1.song_id = h2.song_id
AND h1.id = h2.maxid
ORDER BY h1.id;

#Mahmoud Gamal answer is correct, you probably can get rid of the some conditions that is not needed.
SELECT h1.song_id, h1.event_type, h1.id
FROM histories AS h1
INNER JOIN
(
SELECT MAX(id) AS MaxId
FROM histories
WHERE station_id = 187
AND event_type IN (1, 2)
GROUP BY song_id
) AS h2 ON h1.id = h2.maxid
ORDER BY h1.id;

Based on your description, this is the answer:
SELECT DISTINCT song_id, event_type, id
FROM histories
WHERE station_id = 187
AND (event_type=1 OR event_type=2)
ORDER BY id
But you must be doing the MAX for some reason - why?

Related

Select column in latest record in joined table while grouping on original table

Let us say we have three Postgres tables:
book_details that correspond to a given observation of a given book. book_details are never updated, only new observations are added. But only the most recent for a given book is relevant.
rental_events indicate that a reader borrowed books in a given period
book_rentals indicate which books were borrowed in that rental. They are unique on (rental_id, book_id).
Or using simplified table definitions:
CREATE TABLE IF NOT EXISTS book_details(
book_id bigint NOT NULL,
title VARCHAR,
category VARCHAR,
author_id bigint NOT NULL,
updated_at timestamp without time zone NOT NULL
);
CREATE TABLE IF NOT EXISTS book_rentals(
rental_id bigint NOT NULL,
book_id bigint NOT NULL,
PRIMARY KEY (rental_id, book_id)
);
CREATE TABLE IF NOT EXISTS rental_events(
rental_id bigint NOT NULL,
reader_id bigint NOT NULL,
started_at timestamp without time zone NOT NULL,
ended_at timestamp without time zone NOT NULL
);
Now let us say we would like to get the 5 most rented books and their latest title (the title in the latest matching book_details entry). What would be an efficient way to do that? (Completing the pseudo query below.)
SELECT COUNT(DISTINCT book_rentals.rental_id) AS rental_count,
[[latest(book_details).title)]]
FROM book_rentals
INNER JOIN book_details
ON book_rentals.book_id = book_details.book_id
GROUP BY book_rentals.book_id
ORDER BY rental_count DESC
LIMIT 5;
And finally the same question, but considering only books that are currently considered to be in a given category, that is only books for which latest(book_details).category = 'Sci-Fi'.
Use a CTE that returns the latest observation of each book and join to book_rentals and aggregate:
WITH books AS (
SELECT b.book_id, b.title, b.category
FROM (
SELECT *, ROW_NUMBER() OVER (PARTITION BY book_id ORDER BY updated_at DESC) rn
FROM book_details
) b
WHERE b.rn = 1
)
SELECT b.title, COUNT(DISTINCT r.rental_id) AS rental_count
FROM books b INNER JOIN book_rentals r
ON r.book_id = b.book_id
WHERE b.category = 'Sci-Fi'
GROUP BY b.book_id, b.title
ORDER BY rental_count DESC
LIMIT 5;
I'm not sure if DISTINCT is needed in COUNT(DISTINCT r.rental_id) or you could just use COUNT(*).
Remove the WHERE clause so your query searches for all the books.
I recommend phrasing the logic like this:
SELECT DISTINCT ON (br5.book_id) br5.title, br5.rental_count
FROM (SELECT br.book_id, COUNT(DISTINCT br.rental_id) as rental_count
FROM book_rentals br
GROUP BY br.book_id
ORDER BY rental_count DESC
LIMIT 5
) br5 JOIN
book_details bd
ON br5.book_id = bd.book_id
ORDER BY br5.rental_count DESC, br5.book_id, bd.updated_at DESC;
The subquery reduces the size number of books to 5. Then it looks for the most recent title in book_details.

Order by subquery column value

I want to order an SQL query according to a subquery table.
TABLE books (id, name, author, date)
TABLE user (id, name, created)
table books_likes (user_id, book_id, date)
My query for selecting each liked book. I want it ordered by books_likes.date, not using a join but an easier to index subquery
First attempt:
SELECT id
FROM books
WHERE id IN (SELECT book_id
FROM books_likes
WHERE user_id = 1)
ORDER BY books_likes.date
Second attempt:
SELECT id
FROM books
WHERE id IN (SELECT book_id
FROM books_likes
WHERE user_id = 1
ORDER BY date)
None of these is working. The column isn't found (first attempt) or I get an error in my syntax (second attempt).
The problem with your WHERE IN + subquery approach is that it can only limit the records in the books table, but it can't make use of the columns inside the books_likes table.
I would phrase your query using a join:
SELECT
b.id
FROM books b
INNER JOIN books_likes bl
ON b.id = bl.book_id
WHERE
bl.user_id = 1
ORDER BY
bl.date;
The following index might help the above query, if MySQL choose to use it:
CREATE INDEX idx ON books_likes (user_id, book_id, date);
I don't think you need a join at all. You should try:
SELECT bl.book_id
FROM books_likes bl
WHERE bl.user_id = 1
ORDER BY bl.date;
This query should use an index on books_likes(user_id, date, book_id).

How to get last edited post of every user in PostgreSQL?

I have user data in two tables like
1. USERID | USERPOSTID
2. USERPOSTID | USERPOST | LAST_EDIT_TIME
How do I get the last edited post and its time for every user? Assume that every user has 5 posts, and each one is edited at least once.
Will I have to write a loop iterating over every user, find the USERPOST with MAX(LAST_EDIT_TIME) and then collect the values? I tried GROUP BY, but I can't put USERPOSTID or USERPOST in an aggregate function. TIA.
Seems like something like this should work:
create table users(
id serial primary key,
username varchar(50)
);
create table posts(
id serial primary key,
userid integer references users(id),
post_text text,
update_date timestamp default current_timestamp
);
insert into users(username)values('Kalpit');
insert into posts(userid,post_text)values(1,'first test');
insert into posts(userid,post_text)values(1,'second test');
select *
from users u
join posts p on p.userid = u.id
where p.update_date =
( select max( update_date )
from posts
where userid = u.id )
fiddle: http://sqlfiddle.com/#!15/4b240/4/0
You can use a windowing function here:
select
USERID
, USERPOSTID
from
USERS
left join (
select
USERID
, row_number() over (
partition by USERID
order by LAST_EDIT_TIME desc) row_num
from
USERPOST
) most_recent
on most_recent.USERID = USERS.USERID
and row_num = 1

SQLite3. Select only highest revision using FKs, MAX() and GROUP BY

What you need to know about schema and data:
SELECT * FROM 'income'; -- Returns all 309 rows.
SELECT * FROM 'income' WHERE businessday_revision = 0; -- 308 rows
SELECT * FROM 'income' WHERE businessday_revision = 1; -- 1 row
The businessday table has:
id INTEGER,
revision INTEGER,
....
PRIMARY KEY(id, revision)
The income table has:
id -- integer primary key, quite unimportant I think
businessday_id -- FK
businessday_revision -- FK, when a day is edited, a new revision is created
The foreign key looks like this:
FOREIGN KEY(businessday_id, businessday_revision) REFERENCES businessday(id, revision) ON DELETE CASCADE,
The problem
I want to select incomes only from the latest revision on each day. Which should be 308 rows.
But sadly I'm too dense to figure it out. I've found that I can get all the latest businessday revisions using this:
SELECT id, MAX(revision)
FROM businessday
GROUP BY id;
Is there some way I can use this data to select my incomes? Something along the lines of:
-- Pseudo-code:
SELECT *
FROM income i
WHERE i.businessday_id = businessday.id THAT EXISTS IN
(SELECT id, MAX(revision)
FROM businessday
GROUP BY id);
I obviously have no clue here, please point me in the right direction!
This should work:
SELECT i.*
FROM Income i
INNER JOIN (
SELECT id, MAX(revision) maxrevision
FROM businessDay
GROUP BY id
) t ON i.businessday_id = t.id AND i.businessday_revision = t.maxrevision
How about using join?
SELECT i.*
FROM income i
INNER JOIN
(
SELECT id, MAX(revision) revision
FROM businessday
GROUP BY id
) s ON i.businessday_id = s.id AND
i.businessday_revision = s.revision

Query optimization in Oracle SQL

Let's say I have an oracle database schema like so:
tournaments( id, name )
players( id, name )
gameinfo( id, pid (references players.id), tid (references tournaments.id), date)
So a row in the gameinfo table means that a certain player played a certain game in a tournament on a given date. Tournaments has about 20 records, players about 160 000 and game info about 2 million. I have to write a query which lists tournaments (with tid in the range of 1-4) and the number of players that played their first game ever in that tournament.
I came up with the following query:
select tid, count(pid)
from gameinfo g
where g.date = (select min(date) from gameinfo g1 where g1.player = g.player)
and g.tid in (1,2,3,4)
group by tid;
This is clearly suboptimal (it ran for about 58 minutes).
I had another idea, that I could make a view of:
select pid, tid, min(date)
from gameinfo
where tid in(1,2,3,4)
group by pid, tid;
And run my queries on this view, as it only had about 600 000 records, but this still seems less than optimal.
Can you give any advice on how this could be optimized ?
My first recommendation is to try analytic functions first. The row_number() function will enumerate the tournaments for each user. The first has a seqnum of 1:
select gi.*
from (select gi.*,
row_number() over (partition by gi.player order by date) as seqnum
from gameinfo gi
) gi
where tid in(1,2,3,4) and seqnum = 1
My second suggestion is to put the date of the first tournament into the players table, since it seems like important information for using the database.