Postgresql IN statement - sql

Why is this query deleting all users when the user_id column does not exist at the profile table?
The initial query:
DELETE FROM users
WHERE user_id IN (
SELECT user_id FROM profile
)
AND user_id != 35
AND user_id != 6;
To be more specific.
The IN statement query returns:
SELECT user_id FROM profile;
ERROR: column "user_id" does not exist
LINE 1: SELECT user_id FROM profile;
Running the initial query returns
DELETE 100
Why is that happening?
Shouldn't it have stopped the execution upon error or return false or null?
Running PostgreSQL Server 9.0

This behavior is correct per ANSI standards.
If the unqualified column name doesn't resolve in the inner scope then the outer scope will be considered. So effectively you are doing an unintentional correlated sub query.
As long as the table profile contains at least one row then
FROM users
WHERE user_id IN (
SELECT user_id FROM profile
)
will end up matching all rows in users (except any where users.user_id IS NULL as WHERE NULL IN (NULL) does not evaluate to true). To avoid this possible issue you can use two part names.
DELETE FROM users
WHERE user_id IN (SELECT p.user_id
FROM profile p)
Would give the error
column p.user_id does not exist:

Related

PostgreSQL subquery (illogical) bug in select statement with join equivalent working? [duplicate]

This question already has answers here:
NULL values inside NOT IN clause
(12 answers)
Closed 2 years ago.
I have some funny bug with Postgres 9.5.1
I got 2 tables that contain related data contacts (id, name) and jobs (id, contact_id, name).
I'm not sure of the validity of this query (given the curious behavior explained just after).
-- get unassigned contacts
select * from contacts where id not in (select contact_id from jobs);
Edit : The following case was how i tried to analyze the issue. See the end of post and comments to get why the query is not correct.
When testing with a case of contact id=20 without job, I got some (IMO) strange result (a notable difference in results between a select query and a join equivalent).
First, I need to assert some prerequisites (step A).
Next, I show the result with join (step B).
Finally, I show the result using subquery (step D).
(Step C is the complementary request of D and is only here to highlight what I found strange).
A-0. check that there is datas in both tables : OK
select count(distinct id) from contacts;
--> returns 10100
select count(distinct id) from jobs;
--> returns 12000
select count(distinct id) from contacts where id in (select contact_id from jobs);
--> returns 10000
A-1. get name in table contacts for id=20 : OK
select name from contacts where id=20;
--> returns "NAME"
A-3. check contact id=20 is NOT in table jobs : OK
select id from jobs where contact_id=20;
--> returns nothing (0 row)
B. get name and (null) job id for contact id=20 with join : OK
select c.id, c.name, j.id
from contacts c
left join jobs j
on j.contact_id=c.id
where c.id=20;
--> returns 20, "NAME", <NULL>
C. get contact id=20 only if it is assigned in jobs : OK
select name from contacts where id in (select contact_id from jobs) and id=20;
--> returns nothing (0 row); (that's the expected result)
D. get contact id=20 only if is NOT assigned in jobs : KO
select name from contacts where id not in (select contact_id from jobs) and id=20;
--> returns nothing (0 row); (that's not the expected result - "NAME")
Funny conclusion
C and D queries got the same results.
In logical terms, this could means that in pgsql:
id NOT IN (..values..) == id IN (..values..)
FALSE == TRUE
Can a "Postgres guru" find me a nice explanation or should I call the FBI?
Epilogue
following the answers
My query
select * from contacts where id not in (select contact_id from jobs);
was not correct, because NOT IN can't handle NULL values. Therefore, it is not the right selector to check (non)existence of a value.
See NULL values inside NOT IN clause.
The correct query is the following :
-- to get unassigned contacts
select * from contacts c where not exists (select 1 from jobs where contact_id=c.id);
For a specified id :
select * from contacts c where not exists (select 1 from jobs where contact_id=c.id) and id=20;
This query works too :
select * from contacts where id not in (select contact_id from jobs where contact_id is not null);
What you are seeing is a null-safety problem. If any value returned by the not in subquery is null, all other values are ignored. We say that not in is not null-safe.
Imagine that the subquery returns: (1, 2, null). The not in condition becomes:
id <> 1 and id <> 2 and id <> null
The first two conditions evaluate as true, but the last one is unknown, which contaminates the whole predicate, that, in turns, returns unknown. As a consequence, all rows are evicted.
This is one of the reason why the use of not in is usually discouraged. You can simply rewrite this with not exists:
select name
from contacts c
where c.id = 20 and not exists(select 1 from jobs j where j.contact_id = c.id);

Selectively retrieve data from tables when one record in first table is linked to multiple records in second table

I have 2 tables:
1. Tbl_Master: columns:
a. SEQ_id
b. M_Email_id
c. M_location_id
d. Del_flag
2. Tbl_User: columns
a. U_email_id
b. Last_logged_date
c. User_id
First table Is master table it has unique rows i.e. single record of all users in the system.
Each User can be uniquely identified by the email_id in each table.
One user can have multiple profile, which means for one us_email_id field in the tblUser table, there can be many user_id in tbl_User,
i.e there can be multiple entries in second table for each user.
Now I have to select only those users who have logged in for last time before, lets say '2012', i.e before 1-Jan-2012.
But if one user has 2 or more user_id and one user_id has last_logged_date less than 2012
But other user_id has greater than 2012 then such user should be ignored.
In the last all all the result user will be marked for deletion by setting DEL_flag in master table to ‘Yes’
For eg:
Record in Tbl_Master:
A123 ram#abc.com D234 No
A123 john#abc.com D256 No
Record in tbl_User can be Like:
ram#abc.com '11-Dec-2011' Ram1
ram#abc.com '05-Apr-2014' Ram2
john#abc.com '15-Dec-2010' John1
In such case only John's Record should be selected not of Ram whose one profile has last_logged_date>1-Jan-2012
Another possibility was
SELECT
m.M_Email_id,
MAX(u.Last_logged_date) AS last_login
FROM
Tbl_Master m
INNER JOIN
Tbl_User u on u.U_email_id = m.M_Email_id
GROUP BY m.M_Email_id
HAVING
-- Year(MAX(u.Last_logged_date)) < 2012 -- use the appropriate function of your DBMS
EXTRACT(YEAR FROM(MAX(u.Last_logged_date))) < 2012 -- should be the version for oracle
-- see http://docs.oracle.com/cd/B14117_01/server.101/b10759/functions045.htm#i1017161
Your UPDATE operation can use this select in the WHERE clause.
Try this, this ans is in sql server, I haven't worked on Oracle.
select * from Tbl_Master
outer apply
(
select U_email_id,max(Last_logged_date)as LLogged,count(U_email_id) as RecCount
from Tbl_User
where Tbl_User.U_email_id = Tbl_Master.M_Email_id
group by U_email_id
)as a
where RecCount >2
and Year(LLogged) < '2012'
Try this DEMO
Hope it helps you.

postgresql :: retrieve row matching exact condition or number of rows matching part of condition

I have table "permission" like
| user_id | object_id | readable | writable |
I need to find out if the given object_id can be accesible by current user_id with following rules:
if there is no records for object_id at all, then return true
if there is a record for object_id but for different user_id, while there is no record for given user_id, then return false
if there is record for given user_id and object_id, then check agains provided readable and writable conditions
I'm not sure if it's possible to build SQL query, which wouldn't involve nested queries, for now I came out with
select (
select count(*)
from permission
where
object_id = 123456
) == 0 OR (
select readable=true, writable=false
from permission
where user_id=1 and object_id=123456
)
Is there a more elegant solution?
Thanks!
Try:
SELECT count(*) as total_count, --count of permission records for object
bool_or(user_id = 1 AND readable) as user_readable, -- user has read permission, null if no permission record
bool_or(user_id = 1 AND writable) as user_writable, -- user has write permission, null if no permission record
FROM permission
WHERE object_id = 123456
Then build your logic cases from this query like:
SELECT total_count = 0 OR user_readable as readable,
total_count = 0 OR user_writable as writable
FROM (first select here)
select true as readable, true as writable
where not exists (select 1 permission where object_id = 123456)
union all
select readable, writable
from permission
where user_id = 1 and object_id = 123456
Note you have disjoint possibilities:
No permissions exist for the object
Permissions exist for the object
Permission exists for the user
No permission exists for the user
That is, if no permissions exist then return true. Otherwise return the permission for the user. If there are permissions that exist but not for this user, return nothing (and you could make the query explicitly return "false, false" in this case, but why waste effort?)
Also, this assumes that (object_id, user_id) is a unique key for the permission table.
You could do it like this:
select
case
when (not exists (select object_id from permissions where object_id=123456)) then true
when (not exists (select * from permissions where object_id=123456 and id=1)) then false
else <check your conditions here>
end;
It is slightly less efficient that araqnid's query, but could be more readable.

PostgreSQL where all in array

What is the easiest and fastest way to achieve a clause where all elements in an array must be matched - not only one when using IN? After all it should behave like mongodb's $all.
Thinking about group conversations where conversation_users is a join table between conversation_id and user_id I have something like this in mind:
WHERE (conversations_users.user_id ALL IN (1,2))
UPDATE 16.07.12
Adding more info about schema and case:
The join-table is rather simple:
Table "public.conversations_users"
Column | Type | Modifiers | Storage | Description
-----------------+---------+-----------+---------+-------------
conversation_id | integer | | plain |
user_id | integer | | plain |
A conversation has many users and a user belongs to many conversations. In order to find all users in a conversation I am using this join table.
In the end I am trying to figure out a ruby on rails scope that find's me a conversation depending on it's participants - e.g.:
scope :between, ->(*users) {
joins(:users).where('conversations_users.user_id all in (?)', users.map(&:id))
}
UPDATE 23.07.12
My question is about finding an exact match of people. Therefore:
Conversation between (1,2,3) won't match if querying for (1,2)
Assuming the join table follows good practice and has a unique compound key defined, i.e. a constraint to prevent duplicate rows, then something like the following simple query should do.
select conversation_id from conversations_users where user_id in (1, 2)
group by conversation_id having count(*) = 2
It's important to note that the number 2 at the end is the length of the list of user_ids. That obviously needs to change if the user_id list changes length. If you can't assume your join table doesn't contain duplicates, change "count(*)" to "count(distinct user_id)" at some possible cost in performance.
This query finds all conversations that include all the specified users even if the conversation also includes additional users.
If you want only conversations with exactly the specified set of users, one approach is to use a nested subquery in the where clause as below. Note, first and last lines are the same as the original query, only the middle two lines are new.
select conversation_id from conversations_users where user_id in (1, 2)
and conversation_id not in
(select conversation_id from conversations_users where user_id not in (1,2))
group by conversation_id having count(*) = 2
Equivalently, you can use a set difference operator if your database supports it. Here is an example in Oracle syntax. (For Postgres or DB2, change the keyword "minus" to "except.)
select conversation_id from conversations_users where user_id in (1, 2)
group by conversation_id having count(*) = 2
minus
select conversation_id from conversations_users where user_id not in (1,2)
A good query optimizer should treat the last two variations identically, but check with your particular database to be sure. For example, the Oracle 11GR2 query plan sorts the two sets of conversation ids before applying the minus operator, but skips the sort step for the last query. So either query plan could be faster depending on multiple factors such as the number of rows, cores, cache, indices etc.
I'm collapsing those users into an array. I'm also using a CTE (the thing in the WITH clause) to make this more readable.
=> select * from conversations_users ;
conversation_id | user_id
-----------------+---------
1 | 1
1 | 2
2 | 1
2 | 3
3 | 1
3 | 2
(6 rows)
=> WITH users_on_conversation AS (
SELECT conversation_id, array_agg(user_id) as users
FROM conversations_users
WHERE user_id in (1, 2) --filter here for performance
GROUP BY conversation_id
)
SELECT * FROM users_on_conversation
WHERE users #> array[1, 2];
conversation_id | users
-----------------+-------
1 | {1,2}
3 | {1,2}
(2 rows)
EDIT (Some resources)
array functions: http://www.postgresql.org/docs/9.1/static/functions-array.html
CTEs: http://www.postgresql.org/docs/9.1/static/queries-with.html
This preserves ActiveRecord objects.
In the below example, I want to know the time sheets which are associated with all codes in the array.
codes = [8,9]
Timesheet.joins(:codes).select('count(*) as count, timesheets.*').
where('codes.id': codes).
group('timesheets.id').
having('count(*) = ?', codes.length)
You should have the full ActiveRecord objects to work with. If you want it to be a true scope, you can just use your above example and pass in the results with .pluck(:id).
While #Alex' answer with IN and count() is probably the simplest solution, I expect this PL/pgSQL function to be the faster:
CREATE OR REPLACE FUNCTION f_conversations_among_users(_user_arr int[])
RETURNS SETOF conversations AS
$BODY$
DECLARE
_sql text := '
SELECT c.*
FROM conversations c';
i int;
BEGIN
FOREACH i IN ARRAY _user_arr LOOP
_sql := _sql || '
JOIN conversations_users x' || i || ' USING (conversation_id)';
END LOOP;
_sql := _sql || '
WHERE TRUE';
FOREACH i IN ARRAY _user_arr LOOP
_sql := _sql || '
AND x' || i || '.user_id = ' || i;
END LOOP;
/* uncomment for conversations with exact list of users and no more
_sql := _sql || '
AND NOT EXISTS (
SELECT 1
FROM conversations_users u
WHERE u.conversation_id = c.conversation_id
AND u.user_id <> ALL (_user_arr)
)
*/
-- RAISE NOTICE '%', _sql;
RETURN QUERY EXECUTE _sql;
END;
$BODY$ LANGUAGE plpgsql VOLATILE;
Call:
SELECT * FROM f_conversations_among_users('{1,2}')
The function dynamically builds executes a query of the form:
SELECT c.*
FROM conversations c
JOIN conversations_users x1 USING (conversation_id)
JOIN conversations_users x2 USING (conversation_id)
...
WHERE TRUE
AND x1.user_id = 1
AND x2.user_id = 2
...
This form performed best in an extensive test of queries for relational division.
You could also build the query in your app, but I went by the assumption that you want to use one array parameter. Also, this is probably fastest anyway.
Either query requires an index like the following to be fast:
CREATE INDEX conversations_users_user_id_idx ON conversations_users (user_id);
A multi-column primary (or unique) key on (user_id, conversation_id) is just as well, but one on (conversation_id, user_id) (like you may very well have!) would be inferior. You find a short rationale at the link above, or a comprehensive assessment under this related question on dba.SE
I also assume you have a primary key on conversations.conversation_id.
Can you run a performance test with EXPLAIN ANALYZE on #Alex' query and this function and report your findings?
Note that both solutions find conversations where at least the users in the array take part - including conversations with additional users.
If you want to exclude those, un-comment the additional clause in my function (or add it to any other query).
Tell me if you need more explanation on the features of the function.
create a mapping table with all possible values and use this
select
t1.col from conversations_users as t1
inner join mapping_table as map on t1.user_id=map.user_id
group by
t1.col
having
count(distinct conversations_users.user_id)=
(select count(distinct user_id) from mapping)
select id from conversations where not exists(
select * from conversations_users cu
where cu.conversation_id=conversations.id
and cu.user_id not in(1,2,3)
)
this can easily be made into a rails scope.
I am guessing that you don't really want to start messing with temporary tables.
Your question was unclear as to whether you want conversations with exactly the set of users, or conversations with a superset. The following is for the superset:
with users as (select user_id from users where user_id in (<list>)
),
conv as (select conversation_id, user_id
from conversations_users
where user_id in (<list>)
)
select distinct conversation_id
from users u left outer join
conv c
on u.user_id = c.user_id
where c.conversation_id is not null
For this query to work well, it assumes that you have indexes on user_id in both users and conversations_users.
For the exact set . . .
with users as (select user_id from users where user_id in (<list>)
),
conv as (select conversation_id, user_id
from conversations_users
where user_id in (<list>)
)
select distinct conversation_id
from users u full outer join
conv c
on u.user_id = c.user_id
where c.conversation_id is not null and u.user_id is not null
Based on #Alex Blakemore's answer, the equivalent Rails 4 scope on you Conversation class would be:
# Conversations exactly with users array
scope :by_users, -> (users) {
self.by_any_of_users(users)
.group("conversations.id")
.having("COUNT(*) = ?", users.length) -
joins(:conversations_users)
.where("conversations_users.user_id NOT IN (?)", users)
}
# generates an IN clause
scope :by_any_of_users, -> (users) { joins(:conversations_users).where(conversations_users: { user_id: users }).distinct }
Note you can optimize it instead of doing a Rails - (minus) you could do a .where("NOT IN") but that would be really complex to read.
Based on Alex Blakemore answer
select conversation_id
from conversations_users cu
where user_id in (1, 2)
group by conversation_id
having count(distinct user_id) = 2
I have found an alternative query with the same goal, finding the conversation_id of a conversation that contains user_1 and user_2 (ignoring aditional users)
select *
from conversations_users cu1
where 2 = (
select count(distinct user_id)
from conversations_users cu2
where user_id in (1, 2) and cu1.conversation_id = cu2.conversation_id
)
It is slower according the analysis that postgres perform via explain query statement, and i guess that is true because there is more conditions beign evaluated, at least, for each row of the conversations_users the subquery will get executed as it is correlated subquery. The possitive point with this query is that you aren't grouping, thus you can select aditional fields of the conversations_users table. In some situations (like mine) it could be handy.

GROUP_CONCAT and LEFT_JOIN issue - Not all rows returned

Let's say my DB Scheme is as follows:
T_PRODUCT
id_product (int, primary)
two entries: (id_product =1) , (id_product =2)
T_USER
id_user (int, primary)
id_product (int, foreign key)
name_user (varchar)
two entries: (id_product=1,name_user='John') , (id_product=1,name_user='Mike')
If I run a first query to get all products with their users (if there are any), I get this:
SELECT T_PRODUCT.id_product, T_USER.name_user
FROM T_PRODUCT
LEFT JOIN T_USER on T_USER.id_product = T_PRODUCT.id_product;
>>
id_product name_user
1 John
1 Mike
2 NULL
Looks good to me.
Now if I want to the same thing, except I'd like to have one product per line, with concatenated user names (if there are any users, otherwise NULL):
SELECT T_PRODUCT.id_product, GROUP_CONCAT(T_USER.name_user)
FROM T_PRODUCT
LEFT JOIN T_USER on T_USER.id_product = T_PRODUCT.id_product;
>>
id_product name_user
1 John,Mike
**expected output**:
id_product name_user
1 John,Mike
2 NULL
If there are no users for a product, the GROUP_CONCAT prevents mysql from producing a line for this product, even though there's a LEFT JOIN.
Is this an expected MySQL behaviour?
Is there any way I could get the expected output using GROUP_CONCAT or another function?
Ah, found my answer:
You should never forget the GROUP BY clause when working with GROUP_CONCAT.
I was missing GROUP BY T_PRODUCT.id_product in my second query.
I'll leave the question up in case someone is as absent-minded as I am.
Edit:
From this answer, I figured I can also activate the SQL Mode ONLY_FULL_GROUP_BY to force MySQL to throw an error in case the GROUP BY is missing or incorrect.
Alternately, you could use a subquery, just test for performance
SELECT
T_PRODUCT.id_product,
(SELECT GROUP_CONCAT(T_USER.name_user)
FROM T_USER
WHERE T_USER.id_product = T_PRODUCT.id_product
)
FROM T_PRODUCT