Array_agg alterantive - PostgreSQL - sql

I am using postgreSQL version 8.3.4, which doesn't support the function "array_agg"
my definitions for the tables are:
create table photos (id integer, user_id integer, primary key (id, user_id));
create table tags (photo_id integer, user_id integer, info text, primary key (user_id, photo_id, info));
I came across this query, which gives me what I need:
SELECT photo_id
FROM tags t
GROUP BY 1
HAVING (SELECT count(*) >= 1
FROM (
SELECT photo_id
FROM tags
WHERE info = ANY(array_agg(t.info))
AND photo_id <> t.photo_id
GROUP BY photo_id
HAVING count(*) >= 1
) t1
)
but I can't use it because of my version.
Is there any alternative query to this one that I can use?

select
t2.photo_id, count(*)
from tags t1
join tags t2 on t1.info = t2.info and t1.photo_id <> t2.photo_id
group by t2.photo_id
;
and HAVING count = k if you want the exact k

Related

How do I order a SELECT query by the sum of a column on a related table?

I have a table of posts and a table of ratings. One post can have many ratings, and a rating can be any positive integer.
In SQL, how can I express a SELECT query that will return posts ordered by the sum of their ratings?
For reference, a sample schema:
CREATE TABLE Posts (
id INT PRIMARY_KEY
);
CREATE TABLE Ratings (
id INT PRIMARY_KEY,
post_id INT REFERENCES Posts(id),
rating INT
);
Additionally, I only require posts, but that is mainly irrelevant.
You are unclear if you want posts with no ratings. If so, use a left join:
select p.post, sum(r.rating)
from post p left join
ratings r
on p.post = r.post
group by p.post
order by sum(r.rating) desc;
(I assume you want the highest sums first; hence the desc.)
If you want only posts with ratings, no join is needed:
select r.post, sum(r.rating)
from ratings r
group by r.post
order by sum(r.rating) desc;
Assuming 2 tables, a primary key and foreign key usage, and the fact that a post name may be re-used, I present the following:
select p.post_name, sum(r.rating) as post_rating
from posts p, ratings r
where p.post = r.post
group by r.post, p.post_name
order by post_rating desc, post_name asc
.headers on
create table posts ( 'posts' char(20)) ;
create table ratings ( posts char(20), 'rating' number) ;
insert into post values ( 'post1') ;
insert into post values ( 'post2') ;
insert into post values ( 'post3') ;
insert into ratings values ( 'post1', 1 ) ;
insert into ratings values ( 'post1', 3 ) ;
insert into ratings values ( 'post1', 6 ) ;
insert into ratings values ( 'post2', 2 ) ;
insert into ratings values ( 'post2', 8 ) ;
insert into ratings values ( 'post3', 1 ) ;
insert into ratings values ( 'post3', 1 ) ;
select a.post , sum(b.rating) from post a
join ratings b on a.post = b.post
group by a.post
order by sum(b.rating) ;
output:
post|sum(b.rating)
post3|2
post1|10
post2|10

PostgreSQL: Select the group with specific members

Given the tables below:
CREATE TABLE users (
id bigserial PRIMARY KEY,
name text NOT NULL
);
CREATE TABLE groups (
id bigserial PRIMARY KEY
);
CREATE TABLE group_members (
group_id bigint REFERENCES groups ON DELETE CASCADE,
user_id bigint REFERENCES users ON DELETE CASCADE,
PRIMARY KEY (group_id, user_id)
);
How do we select a group with a specific set of users?
We want an SQL function that takes an array of user IDs and returns the group ID (from the group_members table) with the exact same set of user IDs.
Also, please add indexes if they will make your solution faster.
First, we need to get "candidate" rows from group_members relation, and then with additional run ensure that group size is the same as user_ids array size (here I use CTE https://www.postgresql.org/docs/current/static/queries-with.html):
with target(id) as (
select * from unnest(array[2, 3]) -- here is your input
), candidates as (
select group_id
from group_members
where user_id in (select id from target) -- find all groups which include input
)
select group_id
from group_members
where group_id in (select group_id from candidates)
group by group_id
having array_length(array_agg(user_id), 1)
= array_length(array(select id from target), 1) -- filter out all "bigger" groups
;
Demonstration with some sample data: http://dbfiddle.uk/?rdbms=postgres_9.6&fiddle=a98c09f20e837dc430ac66e01c7f0dd0
This query will utilize indexes you already have, but probably it's worth to add a separate index on group_members (user_id) to avoid intermediate hashing in the first stage of the CTE query.
SQL function is straightforward:
create or replace function find_groups(int8[]) returns int8 as $$
with candidates as (
select group_id
from group_members
where user_id in (select * from unnest($1))
)
select group_id
from group_members
where group_id in (select group_id from candidates)
group by group_id
having array_length(array_agg(user_id), 1) = array_length($1, 1)
;
$$ language sql;
See the same DBfiddle for demonstration.

SQLite: How would you rephrase the query?

I created a basic movie database and for that I'm working with SQLite.
I have a table, which looks like this:
CREATE TABLE movie_collection (
user_id INTEGER NOT NULL,
movie_id INTEGER NOT NULL,
PRIMARY KEY (user_id, movie_id),
FOREIGN KEY (user_id) REFERENCES user (id),
FOREIGN KEY (movie_id) REFERENCES movie (id)
)
As one simple task, I want to show one user (let's say user_id = 1) the whole movie collections, in which the actual user(user_id = 1) might or might not have some movie collection. I also have to prevent the multiple result sets, where more than one user have the same movie record in their collection, especially if this involves the actual user (user_id = 1) then he has the priority, that is if there are let's say 3 records as following:
user_id movie_id
-------- ---------
1 17
5 17
8 17
Then the result set must have the record (1, 17) and not other two.
For this task I wrote a sql query like this:
SELECT movie_collect.user_id, movie_collect.movie_id
FROM (
SELECT user_id, movie_id FROM movie_collection WHERE user_id = 1
UNION
SELECT user_id, movie_id FROM movie_collection WHERE user_id != 1 AND movie_id NOT IN (SELECT movie_id FROM movie_collection WHERE user_id = 1)
) AS movie_collect
Altough this query delivers pretty much that what I need, but just out of curiosity I wanted to ask, if someone else has an another idea to solve this problem.
Thank you.
The outer query is superfluous:
SELECT user_id, movie_id
FROM movie_collection
WHERE user_id = 1
UNION
SELECT user_id, movie_id
FROM movie_collection
WHERE user_id != 1
AND movie_id NOT IN (SELECT movie_id
FROM movie_collection
WHERE user_id = 1)
And UNION removes duplicates, so you do not need to check for uid in the second subquery:
SELECT user_id, movie_id
FROM movie_collection
WHERE user_id = 1
UNION
SELECT user_id, movie_id
FROM movie_collection
WHERE movie_id NOT IN (SELECT movie_id
FROM movie_collection
WHERE user_id = 1)
And the only difference between the two subqueries is the WHERE clause, so you can combine them:
SELECT user_id, movie_id
FROM movie_collection
WHERE user_id = 1
OR movie_id NOT IN (SELECT movie_id
FROM movie_collection
WHERE user_id = 1);

How to get last edited post of every user in PostgreSQL?

I have user data in two tables like
1. USERID | USERPOSTID
2. USERPOSTID | USERPOST | LAST_EDIT_TIME
How do I get the last edited post and its time for every user? Assume that every user has 5 posts, and each one is edited at least once.
Will I have to write a loop iterating over every user, find the USERPOST with MAX(LAST_EDIT_TIME) and then collect the values? I tried GROUP BY, but I can't put USERPOSTID or USERPOST in an aggregate function. TIA.
Seems like something like this should work:
create table users(
id serial primary key,
username varchar(50)
);
create table posts(
id serial primary key,
userid integer references users(id),
post_text text,
update_date timestamp default current_timestamp
);
insert into users(username)values('Kalpit');
insert into posts(userid,post_text)values(1,'first test');
insert into posts(userid,post_text)values(1,'second test');
select *
from users u
join posts p on p.userid = u.id
where p.update_date =
( select max( update_date )
from posts
where userid = u.id )
fiddle: http://sqlfiddle.com/#!15/4b240/4/0
You can use a windowing function here:
select
USERID
, USERPOSTID
from
USERS
left join (
select
USERID
, row_number() over (
partition by USERID
order by LAST_EDIT_TIME desc) row_num
from
USERPOST
) most_recent
on most_recent.USERID = USERS.USERID
and row_num = 1

SQLite3. Select only highest revision using FKs, MAX() and GROUP BY

What you need to know about schema and data:
SELECT * FROM 'income'; -- Returns all 309 rows.
SELECT * FROM 'income' WHERE businessday_revision = 0; -- 308 rows
SELECT * FROM 'income' WHERE businessday_revision = 1; -- 1 row
The businessday table has:
id INTEGER,
revision INTEGER,
....
PRIMARY KEY(id, revision)
The income table has:
id -- integer primary key, quite unimportant I think
businessday_id -- FK
businessday_revision -- FK, when a day is edited, a new revision is created
The foreign key looks like this:
FOREIGN KEY(businessday_id, businessday_revision) REFERENCES businessday(id, revision) ON DELETE CASCADE,
The problem
I want to select incomes only from the latest revision on each day. Which should be 308 rows.
But sadly I'm too dense to figure it out. I've found that I can get all the latest businessday revisions using this:
SELECT id, MAX(revision)
FROM businessday
GROUP BY id;
Is there some way I can use this data to select my incomes? Something along the lines of:
-- Pseudo-code:
SELECT *
FROM income i
WHERE i.businessday_id = businessday.id THAT EXISTS IN
(SELECT id, MAX(revision)
FROM businessday
GROUP BY id);
I obviously have no clue here, please point me in the right direction!
This should work:
SELECT i.*
FROM Income i
INNER JOIN (
SELECT id, MAX(revision) maxrevision
FROM businessDay
GROUP BY id
) t ON i.businessday_id = t.id AND i.businessday_revision = t.maxrevision
How about using join?
SELECT i.*
FROM income i
INNER JOIN
(
SELECT id, MAX(revision) revision
FROM businessday
GROUP BY id
) s ON i.businessday_id = s.id AND
i.businessday_revision = s.revision