How to get last edited post of every user in PostgreSQL? - sql

I have user data in two tables like
1. USERID | USERPOSTID
2. USERPOSTID | USERPOST | LAST_EDIT_TIME
How do I get the last edited post and its time for every user? Assume that every user has 5 posts, and each one is edited at least once.
Will I have to write a loop iterating over every user, find the USERPOST with MAX(LAST_EDIT_TIME) and then collect the values? I tried GROUP BY, but I can't put USERPOSTID or USERPOST in an aggregate function. TIA.

Seems like something like this should work:
create table users(
id serial primary key,
username varchar(50)
);
create table posts(
id serial primary key,
userid integer references users(id),
post_text text,
update_date timestamp default current_timestamp
);
insert into users(username)values('Kalpit');
insert into posts(userid,post_text)values(1,'first test');
insert into posts(userid,post_text)values(1,'second test');
select *
from users u
join posts p on p.userid = u.id
where p.update_date =
( select max( update_date )
from posts
where userid = u.id )
fiddle: http://sqlfiddle.com/#!15/4b240/4/0

You can use a windowing function here:
select
USERID
, USERPOSTID
from
USERS
left join (
select
USERID
, row_number() over (
partition by USERID
order by LAST_EDIT_TIME desc) row_num
from
USERPOST
) most_recent
on most_recent.USERID = USERS.USERID
and row_num = 1

Related

SQL SELECT statement with a many-to-many extra table

I have a table that links users. Consider the following:
**Table contracts:**
contract_id int,
contract_number varchar,
user_id int
**Table users:**
user_id int
**Table user_links**
user_id int,
linked_user_id int
The user_links table can have 0 rows for a particular user_id, given the user doesn't have linked users, so a select statement can return either a row or NULL.
The approach with
left join user_links ul on ul.user_id = contracts.user_id OR ul.linked_user_id = contracts.user_id doesn't seem to work if there is no row in the user_links table.
Given only an int user_id, how can I get rows from the contracts table for both user_id AND linked_user_id?
For example, if the user_id 1 has a linked_user_id 2, I need the rows from contracts for both users; however, if the user doesn't have a row in user_links table, I still need to get their contracts.
Assuming your input user_id is the variable #user_id, then the below query will get you all the contracts of that user, and if any linked user.
SELECT * from contracts c
where c.user_id = #user_id
OR c.user_id IN ( SELECT linked_user_id from user_links ul
WHERE ul.user_id = #user_id)

SELECT after specific row with non-sequential (uuid) primary key

With the following table:
CREATE TABLE users (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
inserted_at timestamptz NOT NULL DEFAULT now()
-- other fields
);
How could I retrieve n rows after a specific id, ordered by inserted_at ?
I want to retrieve n rows after a specific id, ordered by inserted_at.
I am expecting something like this:
select u.*
from users u
where u.inserted_at > (select u2.inserted_at from users u2 where u2.id = 'f4ae4105-1afb-4ba6-a2ad-4474c9bae483')
order by u.inserted_at
limit 10;
For this, you want one additional index on users(inserted_at).

PostgreSQL: Select the group with specific members

Given the tables below:
CREATE TABLE users (
id bigserial PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE groups (
id bigserial PRIMARY KEY
);
CREATE TABLE group_members (
group_id bigint REFERENCES groups ON DELETE CASCADE,
user_id bigint REFERENCES users ON DELETE CASCADE,
PRIMARY KEY (group_id, user_id)
);
How do we select a group with a specific set of users?
We want an SQL function that takes an array of user IDs and returns the group ID (from the group_members table) with the exact same set of user IDs.
Also, please add indexes if they will make your solution faster.
First, we need to get "candidate" rows from group_members relation, and then with additional run ensure that group size is the same as user_ids array size (here I use CTE https://www.postgresql.org/docs/current/static/queries-with.html):
with target(id) as (
select * from unnest(array[2, 3]) -- here is your input
), candidates as (
select group_id
from group_members
where user_id in (select id from target) -- find all groups which include input
)
select group_id
from group_members
where group_id in (select group_id from candidates)
group by group_id
having array_length(array_agg(user_id), 1)
= array_length(array(select id from target), 1) -- filter out all "bigger" groups
;
Demonstration with some sample data: http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=a98c09f20e837dc430ac66e01c7f0dd0
This query will utilize indexes you already have, but probably it's worth to add a separate index on group_members (user_id) to avoid intermediate hashing in the first stage of the CTE query.
SQL function is straightforward:
create or replace function find_groups(int8[]) returns int8 as $$
with candidates as (
select group_id
from group_members
where user_id in (select * from unnest($1))
)
select group_id
from group_members
where group_id in (select group_id from candidates)
group by group_id
having array_length(array_agg(user_id), 1) = array_length($1, 1)
;
$$ language sql;
See the same DBfiddle for demonstration.

Select rows where the last row of associated table has a specific value

I have two tables:
User (id, name)
UserEvent (id, user_id, name, date)
How can I get all the users where the last (ordered by date) UserEvent.name has a value of 'played'?
I wrote an example on SQLFiddle with some specific data: http://sqlfiddle.com/#!9/b76e24 - For this scenario I would just get 'Mery' from table User, because even though 'John' has associated events name of the last one is not 'played'.
This is probably fastest:
SELECT u.*
FROM usr u -- avoiding "User" as table name
JOIN LATERAL (
SELECT name
FROM userevent
WHERE user_id = u.id
ORDER BY date DESC NULLS LAST
LIMIT 1
) ue ON ue.name = 'played';
LATERAL requires Postgres 9.3+:
What is the difference between LATERAL and a subquery in PostgreSQL?
Or you could use DISTINCT ON (faster for few rows per user):
SELECT u.*
FROM usr u -- avoiding "User" as table name
JOIN (
SELECT DISTINCT ON (user_id)
user_id, name
FROM userevent
ORDER BY user_id, date DESC NULLS LAST
) ue ON ue.user_id = u.id
AND ue.name = 'played';
Details for DISTINCT ON:
Select first row in each GROUP BY group?
SQL Fiddle with valid test case.
If date is defined NOT NULL, you don't need NULLS LAST. (Neither in the index below.)
PostgreSQL sort by datetime asc, null first?
Key to read performance for both but especially the first query is a matching multicolumn index:
CREATE INDEX userevent_foo_idx ON userevent (user_id, date DESC NULLS LAST, name);
Optimize GROUP BY query to retrieve latest record per user
Aside: Never use reserved words as identifiers.
Return the max date from user event grouping by user id. Take that result set and join it back to user event by the user id and max date and filter for just the played records.
Here it is:
First i get the MAX ID from each user and join it to the ROW with this ID
to test if the status are 'played' if so, i get the username of them.
SELECT
ids.*,
u.name,
ue.*
FROM (
SELECt max(id) AS id from UserEvent
GROUP by user_id
) as ids
LEFT JOIN UserEvent ue ON ue.id = ids.id
LEFT JOIN User u ON u.id = ue.user_id
WHERE ue.name = 'played';

Trying to remove an inner SQL select statement

I am making a music player where we have stations. I have a table called histories. It has data on the songs a user likes, dislikes or skipped. We store all the times that a person has liked a song or disliked it. We want to get a current snapshot of all the songs the user has either liked (event_type=1) or disliked (event_type=2) in a given station.
The table has the following rows:
id (PK int autoincrement)
station_id (FK int)
song_id (FK int)
event_type (int, either 1, 2, or 3)
Here is my query:
SELECT song_id, event_type, id
FROM histories
WHERE id IN (SELECT MAX(id) AS id
FROM histories
WHERE station_id = 187
AND (event_type=1 OR event_type=2)
GROUP BY station_id, song_id)
ORDER BY id;
Is there a way to make this query run without the inner select? I am pretty sure this will run a lot faster without it
You can use JOIN instead. Something like this:
SELECT h1.song_id, h1.event_type, h1.id
FROM histories AS h1
INNER JOIN
(
SELECT station_id, song_id, MAX(id) AS MaxId
FROM histories
WHERE station_id = 187
AND event_type IN (1, 2)
GROUP BY station_id, song_id
) AS h2 ON h1.station_id = h2.station_id
AND h1.song_id = h2.song_id
AND h1.id = h2.maxid
ORDER BY h1.id;
#Mahmoud Gamal answer is correct, you probably can get rid of the some conditions that is not needed.
SELECT h1.song_id, h1.event_type, h1.id
FROM histories AS h1
INNER JOIN
(
SELECT MAX(id) AS MaxId
FROM histories
WHERE station_id = 187
AND event_type IN (1, 2)
GROUP BY song_id
) AS h2 ON h1.id = h2.maxid
ORDER BY h1.id;
Based on your description, this is the answer:
SELECT DISTINCT song_id, event_type, id
FROM histories
WHERE station_id = 187
AND (event_type=1 OR event_type=2)
ORDER BY id
But you must be doing the MAX for some reason - why?