aggregate functions are not allowed in WHERE - when joining PostgreSQL tables - sql

In a game using PostgreSQL 9.3.10 some players have paid for a "VIP status", which is indicated by vip column containing a date from future:
# \d pref_users
Column | Type | Modifiers
------------+-----------------------------+--------------------
id | character varying(32) | not null
first_name | character varying(64) | not null
last_name | character varying(64) |
vip | timestamp without time zone |
Also players can rate other players by setting nice column to true, false or leaving it at null:
# \d pref_rep
Column | Type | Modifiers
-----------+-----------------------------+-----------------------------------------------------------
id | character varying(32) | not null
author | character varying(32) | not null
nice | boolean |
I calculate a "reputation" of VIP-players by issuing this SQL JOIN statement:
# select u.id, u.first_name, u.last_name,
count(nullif(r.nice, false))-count(nullif(r.nice, true)) as rep
from pref_users u, pref_rep r
where u.vip>now()and u.id=r.id group by u.id order by rep asc;
id | first_name | last_name | rep
-------------------------+--------------------------------+--------------------
OK413274501330 | ali | salimov | -193
OK357353924092 | viktor | litovka | -137
DE20287 | sergej warapow |
My question is please the following:
How to find all negatively rated players, who have rated other players?
(The background is that I have added a possibility to rate others - to all VIP-players. Until that only positively rated players could rate others).
I have tried the following, but get the error below:
# select count(*) from pref_rep r, pref_users u
where r.author = u.id and u.vip > now() and
u.id in (select id from pref_rep
where (count(nullif(nice, false)) -count(nullif(nice, true))) < 0);
ERROR: aggregate functions are not allowed in WHERE
LINE 1: ...now() and u.id in (select id from pref_rep where (count(null...
^
UPDATE:
I am trying it with temporary table now -
First I fill it with all negatively rated VIP-users and this works well:
# create temp table my_temp as select u.id, u.first_name, u.last_name,
count(nullif(r.nice, false))-count(nullif(r.nice, true)) as rep
from pref_users u, pref_rep r
where u.vip>now() and u.id=r.id group by u.id;
SELECT 362
But then my SQL JOIN returns too many identical rows and I can not find what condition is missing there:
# select u.id, u.first_name, u.last_name
from pref_rep r, pref_users u, my_temp t
where r.author=u.id and u.vip>now()
and u.id=t.id and t.rep<0;
id | first_name | last_name
-------------------------+--------------------------------+----------------------------
OK400153108439 | Vladimir | Pelix
OK123283032465 | Edik | Lehtik
OK123283032465 | Edik | Lehtik
OK123283032465 | Edik | Lehtik
OK123283032465 | Edik | Lehtik
OK123283032465 | Edik | Lehtik
OK123283032465 | Edik | Lehtik
Same problem (multiple rows with same data) I get for the statement:
# select u.id, u.first_name, u.last_name
from pref_rep r, pref_users u
where r.author = u.id and u.vip>now()
and u.id in (select id from my_temp where rep < 0);
I wonder what condition could be missing here?

First of all, I would write your first query as this:
select
u.id, u.first_name, u.last_name,
sum(case
when r.nice=true then 1
when r.nice=false then -1
end) as rep
from
pref_users u inner join pref_rep r on u.id=r.id
where
u.vip>now()
group by
u.id, u.first_name, u.last_name;
(it's the same as yours, but I find it clearer).
To find negatively rated players, you can use the same query as before, just adding HAVING clause:
having
sum(case
when r.nice=true then 1
when r.nice=false then -1
end)<0
to find negatively rated players who have rated players, one solution is this:
select
s.id, s.first_name, s.last_name, s.rep
from (
select
u.id, u.first_name, u.last_name,
sum(case
when r.nice=true then 1
when r.nice=false then -1
end) as rep
from
pref_users u inner join pref_rep r on u.id=r.id
where
u.vip>now()
group by
u.id, u.first_name, u.last_name
having
sum(case
when r.nice=true then 1
when r.nice=false then -1
end)<0
) s
where
exists (select * from pref_rep p where p.author = s.id)
eventually the having clause can be removed from the inner query, and you just can use this where clause on the outer query:
where
rep<0
and exists (select * from pref_rep p where p.author = s.id)

You forgot to mention that pref_users.id is defined as PRIMARY KEY - else your first query would not work. It also means that id is already indexed.
The best query largely depends on typical data distribution.
Assuming that:
... most users don't get any negative ratings.
... most users don't vote at all.
... some or many of those who vote do it often.
It would pay to identify the few possible candidates and only calculate the total rating for those to arrive at the final selection - instead of calculating the total for every user and then filtering only few.
SELECT *
FROM ( -- filter candidates in a subquery
SELECT *
FROM pref_users u
WHERE u.vip > now()
AND EXISTS (
SELECT 1
FROM pref_rep
WHERE author = u.id -- at least one rating given
)
AND EXISTS (
SELECT 1
FROM pref_rep
WHERE id = u.id
AND NOT nice -- at least one neg. rating received
)
) u
JOIN LATERAL ( -- calculate total only for identified candidates
SELECT sum(CASE nice WHEN true THEN 1 WHEN false THEN -1 END) AS rep
FROM pref_rep
WHERE id = u.id
) r ON r.rep < 0;
Indexes
Obviously, you need an index on pref_rep.author besides the (also assumed!) PRIMARY KEY indexes on both id columns.
If your tables are big some more advanced indexes will pay.
For one, you only seem to be interested in current VIP users (u.vip > now()). A plain index on vip would go a long way. Or even a partial multicolumn index that includes the id and truncates older tuples from the index:
CREATE INDEX pref_users_index_name ON pref_users (vip, id)
WHERE vip > '2015-04-21 18:00';
Consider details:
Add datetime constraint to a PostgreSQL multi-column partial index
If (and only if) negative votes are a minority, a partial index on pref_rep might also pay:
CREATE INDEX pref_rep_downvote_idx ON pref_rep (id)
WHERE NOT nice;
Test performance with EXPLAIN ANALYZE, repeat a couple of time to rule out caching effects.

Related

Query to get record id if exists in any table

I have these 4 tables, they are to store user item.
The item is unique and can only exists one table at a time.
Also, I am searching using serial column
users:
======
id
name
user_bags:
==========
id
user_id
serial
user_store:
===========
id
user_id
serial
user_storage:
============
id
user_id
serial
I have a list of item and need to search them whether they are in any table and show the id for the record in that table.
user_name | user_bags | user_store | user_storage |
==================================================================
A | 2390 | | |
------------------------------------------------------------------
B | | 352 | |
------------------------------------------------------------------
A | 5500 | | |
------------------------------------------------------------------
C | | | 6440 |
------------------------------------------------------------------
I tried this:
SELECT
users.name AS user_name,
(SELECT id FROM user_bags WHERE user_bags.serial = 'abc' AND user_bags.user_id = users.id) AS user_bags,
(SELECT id FROM user_storage WHERE user_storage.serial = 'abc' AND user_storage.user_id = users.id) AS user_storage,
(SELECT id FROM user_store WHERE user_store.serial = 'abc' AND user_store.user_id = users.id) AS user_store
FROM
users
How do I do a better query (faster)? and a proper one. I will be looking through several thousand serial and there are million of records in each table at at time.
Updated: And only show with user having found a match
Your query is fine. For performance, you want indexes on (user_id, serial, id) in the three tables used in the subqueries.
With the indexes, a left join would probably have equivalent performance:
select u.name, ub.id as user_bags, us.id as user_storage, ust.id as user_store
from users u left join
user_bags ub
on ub.serial = 'abc' and ub.user_id = u.id left join
user_storage us
on us.serial = 'abc' and us.user_id = u.id left join
user_store ust
on ust.serial = 'abc' and ust.user_id = u.id
from users;
This is about the best I can do with the given information.
Let me restructure / standardize this a little for you.
SELECT users.name AS [user_name],
user_bags.id AS user_bags,
user_storage.id AS user_storage,
user_store.id AS user_store
FROM users
LEFT JOIN user_store ON user_store.user_id = users.id
LEFT JOIN user_bags ON user_bags.user_id = users.id AND user_bags.serial = user_store.serial
LEFT JOIN user_storage ON user_storage.user_id = users.id AND user_storage.serial = user_store.serial
WHERE user_store.serial = 'abc'
Make sure that you have covering indexes:
CREATE INDEX IX_users_ID ON Users ( ID )
CREATE INDEX IX_user_store_ID_Serial ON user_store ( ID, serial )
CREATE INDEX IX_user_bags_ID_Serial ON user_bags ( ID, serial )
CREATE INDEX IX_user_storage_ID_Serial ON user_storage ( ID, serial )
There are ways to improve performance from here but I would need to know the tables, see query plans and know their perspective record counts.

Query sql to get the first occurrence in a many to many relationship

I have a User table that has a many to many relationship with Areas. This relationship is stored in the Rel_User_area table. I want to show the user name and the first area that appears in the list of areas.
Ex.
User
id | Name
1 | Peter
2 | Joe
Area
id | Name
1 | Area A
2 | Area B
3 | Area C
Rel_User_area
iduser | idarea
1 | 1
1 | 3
2 | 3
The result I want:
User Name | Area
Peter |Area A
Joe |Area C
Using the minimum area id to determine "First" you could use a correlated subquery (A subquery that refers to field(s) in the main query to filter results):
SELECT user.name, area.name
FROM
user
INNER JOIN Rel_User_Area RUA ON user.id = RUA.iduser
INNER JOIN Area ON RUA.idarea = area.id
WHERE area.id = (SELECT min(idarea) FROM Rel_User_Area WHERE iduser = RUA.iduser)
There's other ways of doing this that may be RDBMS specific. Like in Teradata I would use a QUALIFY clause that doesn't exist in MySQL, SQL Server, Oracle, Postgres, etc.. Regardless of the RDBMS the above should work.
SELECT user.name, area.name
FROM
user
INNER JOIN Rel_User_Area RUA ON user.id = RUA.iduser
INNER JOIN Area ON RUA.idarea = area.id
QUALIFY ROW_NUMBER() OVER (PARTITION BY user.id ORDER BY area.id ASC) = 1;
using the ID from Rel_user_Area you mentioned in comments...
This should be pretty platform independent.
SELECT U.name as Username, A.Name as Area
FROM (SELECT min(ID) minID, IDUser, IDarea
FROM Rel_user_Area
GROUP BY IDUser, IDarea) UA
INNER JOIN User U
on U.ID = UA.IDuser
INNER JOIN Area A
on A.ID = UA.IDArea
If Cross apply and top work (could substitute limit 1 vs top if Postgresql or mySQL)
This will run the cross apply SQL once for each record in user; thus you get the most recent rel_user_Area ID per user.
SELECT U.name as Username, A.Name as Area
FROM User U
on U.ID = UA.IDuser
CROSS APPLY (SELECT TOP 1 IDUser, IDArea
FROM Rel_user_Area z
WHERE Z.IDUSER = U.ID
ORDER BY ID ASC) UA
INNER JOIN Area A
on A.ID = UA.IDArea

selecting records from parent / child tables

I have parent / child tables that look like this:
CREATE TABLE users(
id integer primary key AUTOINCREMENT,
pnum varchar(10),
dloc varchar(100),
cc varchar(10),
name varchar(255),
group active bit(1)
);
CREATE TABLE group_members(
id integer primary key AUTOINCREMENT,
group_id integer,
member_id integer,
FOREIGN KEY (group_id) REFERENCES users(id),
FOREIGN KEY (member_id) REFERENCES users(id)
);
Users Data looks like:
ID PNUM DLOC CC NAME GRP
86|23101|dloc 89| | |0
87|23101|dloc 90| | |0
88|23102|dloc 91| | |0
590|12345|Group | |Test Group|1
591|90000|dloc 1 | | |0
group_members data looks like:
ID GROUP_ID
1 |590 | 87
2 |590 | 88
Based on the PNUM, I would like to be able to get the dloc values for all users, whether its a group or not.
So for example, if someone requests pnum 23101, I would like to get back
"dloc 89" and
"dloc 90"
But if they request 12345, I would like to get back
"dloc 90", and
"dloc 91"
So far, I have come up with this query:
SELECT users.dloc
FROM users
WHERE users.id IN
(SELECT group_members.member_id
FROM group_members
INNER JOIN users on users.id = group_members.group_id
WHERE users.pnum='12345');
That works for groups, but it won't return any results if i run the same query with pnum 23101.
What I've tried so far
I tried to see if I could use OR like so:
SELECT users.dloc
FROM users
WHERE users.id in
(SELECT group_members.member_id
FROM group_members
INNER JOIN users on users.id = group_members.group_id
WHERE users.pnum='12345');
OR
(SELECT users.dloc
FROM users
WHERE pnum='12345')
Any suggestions would be appreciated.
You may use a UNION to execute both queries, one of which matches directly on PNUM for values where GROUP <> 1 and the other which matches on PNUM and joins through group_members for values where GROUP = 1.
In the second part, you need to join twice to users to get the members of the group matching the original PNUM.
Since the GROUP condition is opposite in each, only one part of the UNION will ever return results.
/* First part of UNION directly matches PNUM for GROUP = 0 */
SELECT dloc
FROM users
WHERE
PNUM = 23101
AND `group` <> 1
UNION
/**
Second part of UNION matches GROUP = 1
and joins through group_members back to users
to get member dloc (from the second users join)
*/
SELECT uu.dloc
FROM users u
INNER JOIN group_members m ON u.`GROUP` = 1 AND u.id = m.group_id
INNER JOIN users uu ON m.member_id = uu.id
WHERE
u.PNUM = 23101
This does unfortunately require placing the PNUM value twice in the query, once per UNION part, but that isn't so bad.
Here it is in action (using MySQL rather than SQLite, but that doesn't really matter)
Using your original method with OR and an IN() subquery, it can also be done, but I've added WHERE conditions for GROUP = 0 and GROUP = 1.
SELECT users.dloc
FROM users
WHERE users.id in (
SELECT group_members.member_id
FROM group_members
INNER JOIN users on users.id = group_members.group_id
WHERE users.pnum='23101' AND `GROUP` = 1
)
OR users.id IN (
SELECT users.ID
FROM users
WHERE pnum='23101' AND `GROUP` = 0
);
And here's the alternative method in action...

Find rows that have same value in one column and other values in another column?

I have a PostgreSQL database that stores users in a users table and conversations they take part in a conversation table. Since each user can take part in multiple conversations and each conversation can involve multiple users, I have a conversation_user linking table to track which users are participating in each conversation:
# conversation_user
id | conversation_id | user_id
----+------------------+--------
1 | 1 | 32
2 | 1 | 3
3 | 2 | 32
4 | 2 | 3
5 | 2 | 4
In the above table, user 32 is having one conversation with just user 3 and another with both 3 and user 4. How would I write a query that would show that there is a conversation between just user 32 and user 3?
I've tried the following:
SELECT conversation_id AS cid,
user_id
FROM conversation_user
GROUP BY cid HAVING count(*) = 2
AND (user_id = 32
OR user_id = 3);
SELECT conversation_id AS cid,
user_id
FROM conversation_user
GROUP BY (cid HAVING count(*) = 2
AND (user_id = 32
OR user_id = 3));
SELECT conversation_id AS cid,
user_id
FROM conversation_user
WHERE (user_id = 32)
OR (user_id = 3)
GROUP BY cid HAVING count(*) = 2;
These queries throw an error that says that user_id must appear in the GROUP BY clause or be used in an aggregate function. Putting them in an aggregate function (e.g. MIN or MAX) doesn't sound appropriate. I thought that my first two attempts were putting them in the GROUP BY clause.
What am I doing wrong?
This is a case of relational division. We have assembled an arsenal of techniques under this related question:
How to filter SQL results in a has-many-through relation
The special difficulty is to exclude additional users. There are basically 4 techniques.
Select rows which are not present in other table
I suggest LEFT JOIN / IS NULL:
SELECT cu1.conversation_id
FROM conversation_user cu1
JOIN conversation_user cu2 USING (conversation_id)
LEFT JOIN conversation_user cu3 ON cu3.conversation_id = cu1.conversation_id
AND cu3.user_id NOT IN (3,32)
WHERE cu1.user_id = 32
AND cu2.user_id = 3
AND cu3.conversation_id IS NULL;
Or NOT EXISTS:
SELECT cu1.conversation_id
FROM conversation_user cu1
JOIN conversation_user cu2 USING (conversation_id)
WHERE cu1.user_id = 32
AND cu2.user_id = 3
AND NOT EXISTS (
SELECT 1
FROM conversation_user cu3
WHERE cu3.conversation_id = cu1.conversation_id
AND cu3.user_id NOT IN (3,32)
);
Both queries do not depend on a UNIQUE constraint for (conversation_id, user_id), which may or may not be in place. Meaning, the query even works if user_id 32 (or 3) is listed more than once for the same conversation. You would get duplicate rows in the result, though, and need to apply DISTINCT or GROUP BY.
The only condition is the one you formulated:
... a query that would show that there is a conversation between just user 32 and user 3?
Audited query
The query you linked in the comment wouldn't work. You forgot to exclude other participants. Should be something like:
SELECT * -- or whatever you want to return
FROM conversation_user cu1
WHERE cu1.user_id = 32
AND EXISTS (
SELECT 1
FROM conversation_user cu2
WHERE cu2.conversation_id = cu1.conversation_id
AND cu2.user_id = 3
)
AND NOT EXISTS (
SELECT 1
FROM conversation_user cu3
WHERE cu3.conversation_id = cu1.conversation_id
AND cu3.user_id NOT IN (3,32)
);
Which is similar to the other two queries, except that it will not return multiple rows if user_id = 3 is linked multiple times.
You can use conditional aggregation to select all cids that only have 2 specific particpants
select cid from conversation_user
group by cid
having count(*) = 2
and count(case when user_id not in (32,3) then 1 end) = 0
If (cid,user_id) is not unique then replace having count(*) = 2 with having count(distinct user_id) = 2
if you just want confirmation.
select conversation_id
from conversation_users
group by conversation_id
having bool_and ( user_id in (3,32))
and count(*) = 2;
if you want full details,
you can use a window function and a CTE like this:
with a as (
select *
,not bool_and( user_id in (3,32) )
over ( partition by conversation_id)
and 2 = count(user_id)
over ( partition by conversation_id)
as conv_candidates
from conversation_users
)
select * from a where conv_candidates;
Because you want conversations with just 2 users, you can use a self outer join on other users and filter out hits:
To find all 2-user conversations and they're between:
SELECT
a.conversation_id cid,
a.user_id user_id_1,
b.user_id user_id_2
FROM conversation_user a
JOIN conversation_user b ON b.cid = a.cid
AND b.user_id > a.user_id
LEFT JOIN conversation_user c ON c.cid = a.cid
AND c.user_id NOT IN (a.user_id, b.user_id)
WHERE c.cid IS NULL -- only return misses on join to others
To find all 2-user conversations for a particular user, just add:
AND a.user_id = 32

Too many sub-queries, that I am already confused and I am still missing one column - Oracle

I have the following query that has no errors:
SELECT u.user_name, u.user_lastn, outer_s.movie_id, outer_s.times_rented
FROM users u,
(
SELECT * FROM
(
SELECT user_id, movie_id, count (movie_id) as times_rented
FROM movie_queue
GROUP BY (user_id, movie_id)
ORDER BY user_id, movie_id
) inner_s
WHERE times_rented>1
) outer_s
WHERE u.user_id= outer_s.user_id;
This is what it returns:
USER_NAME USER_LASTN MOVIE_ID TIMES_RENTED
------------------------ ------------------------ ---------- ------------
John Smith 1 3
John Smith 6 2
Mary Berman 4 2
Mary Berman 6 4
Elizabeth Johnson 1 2
Peter Quigley 2 2
What I still need to do is to show the name of the movie, instead of the movie_id, but
the name of the movies are located in another table named movies that is similar to the
following sample:
MOVIE_ID MOVIE_NAME
---------- ---------------------------------------------
1 E.T. the Extra-Terrestrial
2 Jurassic Park
3 Indiana Jones and the Kingdom of the Crystal
4 War of the Worlds
5 Signs
Desired result:
What I want to see in the final table are the following columns:
USER_NAME | USER_LASTN | MOVIE_NAME | TIMES_RENTED |
Question:
But after all the many subqueries I am very confused, how can I get the movie_name there instead of the movie_id?
Attempted:
I tried getting the desired result by changing the query to
SELECT u.user_name, u.user_lastn, m.movie_name, outer_s.times_rented
FROM users u, movie m (etc.....)
But It returned 120 rows instead of the 6 I should get.
Help please!!
SELECT u.user_name, u.user_lastn, m.movie_name, COUNT(q.movie_id)
FROM users AS u
JOIN movie_queue AS q ON q.user_id = u.user_id
JOIN movie AS m ON m.movie_id = q.movie_id
GROUP BY u.user_name, u.user_lastn, m.movie_name
HAVING COUNT(q.movie_id) > 1
You just need to join the results of your query to the other query. However, first, I'm going to rewrite the query to simplify it an use proper join syntax:
SELECT u.user_name, u.user_lastn, m.movie_name, outer_s.movie_id, outer_s.times_rented
FROM users u join
(SELECT user_id, movie_id, count (movie_id) as times_rented
FROM movie_queue
GROUP BY (user_id, movie_id)
having count (movie_id) > 1
) outer_s
on u.user_id= outer_s.user_id join
movies m
on outer_s.movie_id = m.move_id
Or you could use CTEs to make your the query readable:
WITH outer_s as (SELECT user_id, movie_id, count (movie_id) as times_rented
FROM movie_queue
GROUP BY (user_id, movie_id)
having count (movie_id) > 1
)
SELECT u.user_name, u.user_lastn, m.movie_name, outer_s.movie_id, outer_s.times_rented FROM users u join outer_s
on u.user_id= outer_s.user_id join
movies m
on outer_s.movie_id = m.move_id
Using a CTE offers the advantages of improved readability and ease in maintenance of complex queries. The query can be divided into separate, simple, logical building blocks. These simple blocks can then be used to build more complex, interim CTEs until the final result set is generated.