With the following table:
CREATE TABLE users (
id uuid PRIMARY KEY DEFAULT gen_random_uuid(),
inserted_at timestamptz NOT NULL DEFAULT now()
-- other fields
);
How could I retrieve n rows after a specific id, ordered by inserted_at ?
I want to retrieve n rows after a specific id, ordered by inserted_at.
I am expecting something like this:
select u.*
from users u
where u.inserted_at > (select u2.inserted_at from users u2 where u2.id = 'f4ae4105-1afb-4ba6-a2ad-4474c9bae483')
order by u.inserted_at
limit 10;
For this, you want one additional index on users(inserted_at).
Related
Given the tables below:
CREATE TABLE users (
id bigserial PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE groups (
id bigserial PRIMARY KEY
);
CREATE TABLE group_members (
group_id bigint REFERENCES groups ON DELETE CASCADE,
user_id bigint REFERENCES users ON DELETE CASCADE,
PRIMARY KEY (group_id, user_id)
);
How do we select a group with a specific set of users?
We want an SQL function that takes an array of user IDs and returns the group ID (from the group_members table) with the exact same set of user IDs.
Also, please add indexes if they will make your solution faster.
First, we need to get "candidate" rows from group_members relation, and then with additional run ensure that group size is the same as user_ids array size (here I use CTE https://www.postgresql.org/docs/current/static/queries-with.html):
with target(id) as (
select * from unnest(array[2, 3]) -- here is your input
), candidates as (
select group_id
from group_members
where user_id in (select id from target) -- find all groups which include input
)
select group_id
from group_members
where group_id in (select group_id from candidates)
group by group_id
having array_length(array_agg(user_id), 1)
= array_length(array(select id from target), 1) -- filter out all "bigger" groups
;
Demonstration with some sample data: http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=a98c09f20e837dc430ac66e01c7f0dd0
This query will utilize indexes you already have, but probably it's worth to add a separate index on group_members (user_id) to avoid intermediate hashing in the first stage of the CTE query.
SQL function is straightforward:
create or replace function find_groups(int8[]) returns int8 as $$
with candidates as (
select group_id
from group_members
where user_id in (select * from unnest($1))
)
select group_id
from group_members
where group_id in (select group_id from candidates)
group by group_id
having array_length(array_agg(user_id), 1) = array_length($1, 1)
;
$$ language sql;
See the same DBfiddle for demonstration.
I have user data in two tables like
1. USERID | USERPOSTID
2. USERPOSTID | USERPOST | LAST_EDIT_TIME
How do I get the last edited post and its time for every user? Assume that every user has 5 posts, and each one is edited at least once.
Will I have to write a loop iterating over every user, find the USERPOST with MAX(LAST_EDIT_TIME) and then collect the values? I tried GROUP BY, but I can't put USERPOSTID or USERPOST in an aggregate function. TIA.
Seems like something like this should work:
create table users(
id serial primary key,
username varchar(50)
);
create table posts(
id serial primary key,
userid integer references users(id),
post_text text,
update_date timestamp default current_timestamp
);
insert into users(username)values('Kalpit');
insert into posts(userid,post_text)values(1,'first test');
insert into posts(userid,post_text)values(1,'second test');
select *
from users u
join posts p on p.userid = u.id
where p.update_date =
( select max( update_date )
from posts
where userid = u.id )
fiddle: http://sqlfiddle.com/#!15/4b240/4/0
You can use a windowing function here:
select
USERID
, USERPOSTID
from
USERS
left join (
select
USERID
, row_number() over (
partition by USERID
order by LAST_EDIT_TIME desc) row_num
from
USERPOST
) most_recent
on most_recent.USERID = USERS.USERID
and row_num = 1
Problem: Find the most recent record based on (created) column for each (linked_id) column in multiple tables, the results should include (user_id, MAX(created), linked_id). The query must also be able to be used with a WHERE clause to find a single record based on the (linked_id).
There is actually several tables in question but here is 3 tables so you can get the idea of the structure (there is several other columns in each table that have been omitted since they are not to be returned).
CREATE TABLE em._logs_adjustments
(
id serial NOT NULL,
user_id integer,
created timestamp with time zone NOT NULL DEFAULT now(),
linked_id integer,
CONSTRAINT _logs_adjustments_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE TABLE em._logs_assets
(
id serial NOT NULL,
user_id integer,
created timestamp with time zone NOT NULL DEFAULT now(),
linked_id integer,
CONSTRAINT _logs_assets_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
CREATE TABLE em._logs_condition_assessments
(
id serial NOT NULL,
user_id integer,
created timestamp with time zone NOT NULL DEFAULT now(),
linked_id integer,
CONSTRAINT _logs_condition_assessments_pkey PRIMARY KEY (id)
)
WITH (
OIDS=FALSE
);
The query i'm currently using with a small hack to get around the need for user_id in the GROUP BY clause, if possible array_agg should be removed.
SELECT MAX(MaxDate), linked_id, (array_agg(user_id ORDER BY MaxDate DESC))[1] AS user_id FROM (
SELECT user_id, MAX(created) as MaxDate, asset_id AS linked_id FROM _logs_assets
GROUP BY asset_id, user_id
UNION ALL
SELECT user_id, MAX(created) as MaxDate, linked_id FROM _logs_adjustments
GROUP BY linked_id, user_id
UNION ALL
SELECT user_id, MAX(created) as MaxDate, linked_id FROM _logs_condition_assessments
GROUP BY linked_id, user_id
) as subQuery
GROUP BY linked_id
ORDER BY linked_id DESC
I receive the desired results but don't believe it is the right way to be doing this, especially when array_agg is being used and shouldn't and some tables can have upwards of 1.5+ million records making the query take upwards of 10-15+ seconds to run. Any help/steering in the right direction is much appreciated.
distinct on
SELECT DISTINCT ON ( expression [, ...] ) keeps only the first row of each set of rows where the given expressions evaluate to equal. The DISTINCT ON expressions are interpreted using the same rules as for ORDER BY (see above). Note that the "first row" of each set is unpredictable unless ORDER BY is used to ensure that the desired row appears first
select distinct on (linked_id) created, linked_id, user_id
from (
select user_id, created, asset_id as linked_id
from _logs_assets
union all
select user_id, created, linked_id
from _logs_adjustments
union all
select user_id, created, linked_id
from _logs_condition_assessments
) s
order by linked_id desc, created desc
Ok, so the title is a bit convoluted. This is basically a greatest-n-per-group type problem, but I can't for the life of me figure it out.
I have a table, user_stats:
------------------+---------+---------------------------------------------------------
id | bigint | not null default nextval('user_stats_id_seq'::regclass)
user_id | bigint | not null
datestamp | integer | not null
post_count | integer |
friends_count | integer |
favourites_count | integer |
Indexes:
"user_stats_pk" PRIMARY KEY, btree (id)
"user_stats_datestamp_index" btree (datestamp)
"user_stats_user_id_index" btree (user_id)
Foreign-key constraints:
"user_user_stats_fk" FOREIGN KEY (user_id) REFERENCES user_info(id)
I want to get the stats for each id by latest datestamp. This is a biggish table, somewhere in the neighborhood of 41m rows, so I've created a temp table of user_id, last_date using:
CREATE TEMP TABLE id_max_date AS
(SELECT user_id, MAX(datestamp) AS date FROM user_stats GROUP BY user_id);
The problem is that datestamp isn't unique since there can be more than 1 stat update in a day (should have been a real timestamp but the guy who designed this was kind of an idiot and theres too much data to go back at the moment). So some IDs have multiple rows when I do the JOIN:
SELECT user_stats.user_id, user_stats.datestamp, user_stats.post_count,
user_stats.friends_count, user_stats.favorites_count
FROM id_max_date JOIN user_stats
ON id_max_date.user_id=user_stats.user_id AND date=datestamp;
If I was doing this as subselects I guess I could LIMIT 1, but I've always heard those are horribly inefficient. Thoughts?
DISTINCT ON is your friend.
select distinct on (user_id) * from user_stats order by datestamp desc;
Basically you need to decide how to resolve ties, and you need some other column besides datestamp which is guaranteed to be unique (at least over a given user) so it can be used as the tiebreaker. If nothing else, you can use the id primary key column.
Another solution if you're using PostgreSQL 8.4 is windowing functions:
WITH numbered_user_stats AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY user_id ORDER BY datestamp DESC) AS RowNum
FROM user_stats) AS numbered_user_stats
) SELECT u.user_id, u.datestamp, u.post_count, u.friends_count, u.favorites_count
FROM numbered_user_stats AS u
WHERE u.RowNum = 1;
Using the existing infrastructure, you can use:
SELECT u.user_id, u.datestamp,
MAX(u.post_count) AS post_count,
MAX(u.friends_count) AS friends_count,
MAX(u.favorites_count) AS favorites_count
FROM id_max_date AS m JOIN user_stats AS u
ON m.user_id = u.user_id AND m.date = u.datestamp
GROUP BY u.user_id, u.datestamp;
This gives you a single value for each of the 'not necessarily unique' columns. However, it does not absolutely guarantee that the three maxima all appeared in the same row (though there is at least a moderate chance that they will - and that they will all come from the last of entries created on the given day).
For this query, the index on date stamp alone is no help; an index on user ID and date stamp could speed this query up considerably - or, perhaps more accurately, it could speed up the query that generates the id_max_date table.
Clearly, you can also write the id_max_date expression as a sub-query in the FROM clause:
SELECT u.user_id, u.datestamp,
MAX(u.post_count) AS post_count,
MAX(u.friends_count) AS friends_count,
MAX(u.favorites_count) AS favorites_count
FROM (SELECT u2.user_id, MAX(u2.datestamp) AS date
FROM user_stats AS u2
GROUP BY u2.user_id) AS m
JOIN user_stats AS u ON m.user_id = u.user_id AND m.date = u.datestamp
GROUP BY u.user_id, u.datestamp;
Sorry the title isn't more help. I have a database of media-file URLs that came from two sources:
(1) RSS feeds and (2) manual entries.
I want to find the ten most-recently added URLs, but a maximum of one from any feed. To simplify, table 'urls' has columns 'url, feed_id, timestamp'.
feed_id='' for any URL that was entered manually.
How would I write the query? Remember, I want the ten most-recent urls, but only one from any single feed_id.
Assuming feed_id = 0 is the manually entered stuff this does the trick:
select p.* from programs p
left join
(
select max(id) id1 from programs
where feed_id <> 0
group by feed_id
order by max(id) desc
limit 10
) t on id1 = id
where id1 is not null or feed_id = 0
order by id desc
limit 10;
It works cause the id column is constantly increasing, its also pretty speedy. t is a table alias.
This was my original answer:
(
select
feed_id, url, dt
from feeds
where feed_id = ''
order by dt desc
limit 10
)
union
(
select feed_id, min(url), max(dt)
from feeds
where feed_id <> ''
group by feed_id
order by dt desc
limit 10
)
order by dt desc
limit 10
Assuming this table
CREATE TABLE feed (
feed varchar(20) NOT NULL,
add_date datetime NOT NULL,
info varchar(45) NOT NULL,
PRIMARY KEY (feed,add_date);
this query should do what you want. The inner query selects the last entry by feed and picks the 10 most recent, and then the outer query returns the original records for those entries.
select f2.*
from (select feed, max(add_date) max_date
from feed f1
group by feed
order by add_date desc
limit 10) f1
left join feed f2 on f1.feed=f2.feed and f1.max_date=f2.add_date;
Here's the (abbreviated) table:
CREATE TABLE programs (
id int(11) NOT NULL auto_increment,
feed_id int(11) NOT NULL,
`timestamp` timestamp NOT NULL default CURRENT_TIMESTAMP on update CURRENT_TIMESTAMP,
PRIMARY KEY (id)
) ENGINE=InnoDB;
And here's my query based on sambo99's concept:
(SELECT feed_id,id,timestamp
FROM programs WHERE feed_id=''
ORDER BY timestamp DESC LIMIT 10)
UNION
(SELECT feed_id,min(id),max(timestamp)
FROM programs WHERE feed_id<>'' GROUP BY feed_id
ORDER BY timestamp DESC LIMIT 10)
ORDER BY timestamp DESC LIMIT 10;
Seems to work. More testing needed, but at least I understand it. (A good thing!). What's the enhancement using the 'id' column?
You probably want a union. Something like this should work:
(SELECT
url, feed_id, timestamp
FROM rss_items
GROUP BY feed_id
ORDER BY timestamp DESC
LIMIT 10)
UNION
(SELECT
url, feed_id, timestamp
FROM manual_items
GROUP BY feed_id
ORDER BY timestamp DESC
LIMIT 10)
ORDER BY timestamp DESC
LIMIT 10
Would it work to group by the field that you want to be distinct?
SELECT url, feedid FROM urls GROUP BY feedid ORDER BY timestamp DESC LIMIT 10;
MySQL doesn't have the greatest support for this type of query.
You can do it using a combination of "GROUP-BY" and "HAVING" clauses, but you'll scan the whole table, which can get costly.
There is a more efficient solution published here, assuming you have an index on group ids:
http://www.artfulsoftware.com/infotree/queries.php?&bw=1390#104
(Basically, create a temp table, insert into it top K for every group, select from the table, drop the table. This way you get the benefit of the early termination from the LIMIT clause).