How to check if values exists in a table - postgres PLPGSQL - sql

ERD
users
id
name
groups
id
name
users_in_groups
user_id
group_id
Problem summary
I'm writing a stored procedure in postgres that recieves a group name and users array and adds users to the group, and I want to assert first that the users exists in users - because I want to raise a custom error so I can catch it my server (if I rely on the default errors like - FK violation, I cannot classify it specifically enough in my server).
The stored procedure
CREATE FUNCTION add_users_to_group(group_name text, users text[])
RETURNS VOID AS $$
DECLARE
does_all_users_exists boolean;
BEGIN
SELECT exist FROM (
WITH to_check (user_to_check) as (select unnest(users))
SELECT bool_and(EXISTS (
SELECT * FROM users where id = to_check.user_to_check
)) as exist from to_check) as existance INTO does_all_users_exists;
IF NOT does_all_users_exists THEN
RAISE EXCEPTION '%', does_all_users_exists USING ERRCODE = 'XXXXX';
-- TODO: loop through each user and insert into users_in_groups
END;
$$ LANGUAGE PLPGSQL VOLATILE STRICT SECURITY INVOKER;
The problem
When I execute the function with users that exists in the users table, I get the error I throw and the message is: f (so my variable was false), but when I run only the query that gives me the existance of the all the users:
WITH to_check (user_to_check) as (select unnest(users))
SELECT bool_and(EXISTS (
SELECT * FROM users where id = to_check.user_to_check
)) as exist from to_check
I get true. but I get it inside a table like so:
#
exist (boolean)
1
true
so I guess I need to extract the true somehow.
anyway I know there is a better solution for validating the existance before insert, you are welcome to suggest.

Your logic seems unnecessarily complex. You can just check if any user doesn't exist using NOT EXISTS:
SELECT 1
FROM UNNEST(users) user_to_check
WHERE NOT EXISTS (SELECT 1 FROM users u WHERE u.id = user_to_check)

When you want to avoid issues with unique and foreign key constraints, you can SELECT and INSERT the records that you need for the next step. And you can do this for both tables (users and groups) in a single query, including the INSERT in users_in_groups:
CREATE FUNCTION add_users_to_group(group_name text, users text[])
RETURNS VOID AS $$
WITH id_users AS (
-- get id's for existing users:
SELECT id, name
FROM users
WHERE name =any($2)
), dml_users AS (
-- create id's for the new users:
INSERT INTO users (name)
SELECT s.name
FROM unnest($2) s(name)
WHERE NOT EXISTS(SELECT 1 FROM id_users i WHERE i.name = s.name)
-- Just to be sure, not sure you want this:
ON conflict do NOTHING
-- Result:
RETURNING id
), id_groups AS (
-- get id for an existing group:
SELECT id, name
FROM users
WHERE name = $1
), dml_group AS (
-- create id's for the new users:
INSERT INTO groups (name)
SELECT s.name
FROM (VALUES($1)) s(name)
WHERE NOT EXISTS(SELECT 1 FROM id_groups i WHERE i.name = s.name)
-- Just to be sure, not sure you want this:
ON conflict do NOTHING
-- Result:
RETURNING id
)
INSERT INTO users_in_groups(user_id, group_id)
SELECT user_id, group_id
FROM (
-- get all user-id's
SELECT id FROM dml_users
UNION
SELECT id FROM id_users
) s1(user_id)
-- get all group-id's
, (
SELECT id FROM dml_group
UNION
SELECT id FROM id_groups
) s2(group_id);
$$ LANGUAGE sql VOLATILE STRICT SECURITY INVOKER;
And you don't need PLpgSQL either, SQL will do.

Related

Stored Procedure calling variables from table

I have a stored procedure that uses a variable ID, I have a list of valid IDs in a table.
I'm trying to write a stored procedure that runs a specific piece of code if the ID exists in table. I'm just not sure of the syntax.
Below is my pseudo-code of what I'm attempting to do.
IF
#ID = possible id IN (SELECT DISTINCT ID FROM [dbo].ID_TABLE WHERE ID = 'valid')
SELECT * FROM dbo.[results]
ELSE
SELECT * FROM dbo.[otherresults]
I'm using SQL Server
Typically, this is the case where you would use EXISTS; as in....
IF EXISTS (SELECT * FROM ID_TABLE WHERE ID = #ID)
While #ID IN (SELECT DISTINCT would work, that query requires going through the table data to assemble a result set that is then checked for #ID's inclusion. EXISTS queries do not create result sets, and return early on the first row fitting the criteria.
If your query
SELECT DISTINCT ID FROM [dbo].ID_TABLE WHERE ID = 'valid'
always returns only one id, then below solution is enough
#ID = (SELECT DISTINCT ID FROM [dbo].ID_TABLE WHERE ID = 'valid')
If it returns list of ids, then you need to create a temp table and store those id's like below
create table #temp_table(id int)
insert into #temp_table SELECT DISTINCT ID FROM [dbo].ID_TABLE WHERE ID = 'valid'

SQL Limit number of references to another table without locking

Is there a technique to avoid locking a row but still be able to limit the number of rows in another table that reference it?
For example:
create table accounts (
id integer,
name varchar,
max_users integer
);
create table users (
id integer,
account_id integer,
email varchar
);
If I want to limit the number of users that are part of an account using the max_users value in accounts. Is there another way to ensure that concurrent calls won't create more users than permitted without locking the group row?
Something like this doesn't work, since this happening in two concurrent transactions can have select count(*)... be true even if the count is just at the limit:
begin;
insert into users(id, account_id, email)
select 1, 1, 'john#abc.com' where (select count(*) from users where account_id = 1) < (select max_users from accounts where id = 1);
commit;
And the following works, but I'm having performance issues that are mostly based transactions waiting for locks:
begin;
select id from accounts where id = 1 for update;
insert into users(id, account_id, email)
select 1, 1, 'john#abc.com' where (select count(*) from users where account_id = 1) < (select max_users from accounts where id = 1);
commit;
EDIT: Bonus question: what if the value is not stored in the database, but is something you can set dynamically?

Remove duplicates from table based on multiple criteria and persist to other table

I have a taccounts table with columns like account_id(PK), login_name, password, last_login. Now I have to remove some duplicate entries according to a new business logic.
So, duplicate accounts will be with either same email or same (login_name & password). The account with the latest login must be preserved.
Here are my attempts (some email values are null and blank)
DELETE
FROM taccounts
WHERE email is not null and char_length(trim(both ' ' from email))>0 and last_login NOT IN
(
SELECT MAX(last_login)
FROM taccounts
WHERE email is not null and char_length(trim(both ' ' from email))>0
GROUP BY lower(trim(both ' ' from email)))
Similarly for login_name and password
DELETE
FROM taccounts
WHERE last_login NOT IN
(
SELECT MAX(last_login)
FROM taccounts
GROUP BY login_name, password)
Is there any better way or any way to combine these two separate queries?
Also some other table have account_id as foreign key. How to update this change for those tables?`
I am using PostgreSQL 9.2.1
EDIT: Some of the email values are null and some of them are blank(''). So, If two accounts have different login_name & password and their emails are null or blank, then they must be considered as two different accounts.
If most of the rows are deleted (mostly dupes) and the table fits into RAM, consider this route:
SELECT surviving rows into a temporary table.
Reroute FK references to survivors
DELETE all rows from the base table.
Re-INSERT survivors.
1a. Distill surviving rows
CREATE TEMP TABLE tmp AS
SELECT DISTINCT ON (login_name, password) *
FROM (
SELECT DISTINCT ON (email) *
FROM taccounts
ORDER BY email, last_login DESC
) sub
ORDER BY login_name, password, last_login DESC;
About DISTINCT ON:
Select first row in each GROUP BY group?
To identify duplicates for two different criteria, use a subquery to apply the two rules one after the other. The first step preserves the account with the latest last_login, so this is "serializable".
Inspect results and test for plausibility.
SELECT * FROM tmp;
Temporary tables are dropped automatically at the end of a session. In pgAdmin (which you seem to be using) the session lives as long as the editor window is open.
1b. Alternative query for updated definition of "duplicates"
SELECT *
FROM taccounts t
WHERE NOT EXISTS (
SELECT FROM taccounts t1
WHERE ( NULLIF(t1.email, '') = t.email
OR (NULLIF(t1.login_name, ''), NULLIF(t1.password, '')) = (t.login_name, t.password))
AND (t1.last_login, t1.account_id) > (t.last_login, t.account_id)
);
This doesn't treat NULL or empty string ('') as identical in any of the "duplicate" columns.
The row expression (t1.last_login, t1.account_id) takes care of the possibility that two dupes could share the same last_login. The one with the bigger account_id is chosen in this case - which is unique, since it is the PK.
2a. How to identify all incoming FKs
SELECT c.confrelid::regclass::text AS referenced_table
, c.conname AS fk_name
, pg_get_constraintdef(c.oid) AS fk_definition
FROM pg_attribute a
JOIN pg_constraint c ON (c.conrelid, c.conkey[1]) = (a.attrelid, a.attnum)
WHERE c.confrelid = 'taccounts'::regclass -- (schema-qualified) table name
AND c.contype = 'f'
ORDER BY 1, contype DESC;
Only building on the first column of the foreign key. More about that:
Find the referenced table name using table, field and schema name
Or inspect the Dependents rider in the right hand window of the object browser of pgAdmin after selecting the table taccounts.
2b. Reroute to new primary
If you have tables referencing taccounts (incoming foreign keys to taccounts) you will want to update all those fields, before you delete the dupes.
Reroute all of them to the new primary row:
UPDATE referencing_tbl r
SET referencing_column = tmp.reference_column
FROM tmp
JOIN taccounts t1 USING (email)
WHERE r.referencing_column = t1.referencing_column
AND referencing_column IS DISTINCT FROM tmp.reference_column;
UPDATE referencing_tbl r
SET referencing_column = tmp.reference_column
FROM tmp
JOIN taccounts t2 USING (login_name, password)
WHERE r.referencing_column = t1.referencing_column
AND referencing_column IS DISTINCT FROM tmp.reference_column;
3. & 4. Go in for the kill
Now, dupes are not referenced any more. Go in for the kill.
ALTER TABLE taccounts DISABLE TRIGGER ALL;
DELETE FROM taccounts;
VACUUM taccounts;
INSERT INTO taccounts
SELECT * FROM tmp;
ALTER TABLE taccounts ENABLE TRIGGER ALL;
Disable all triggers for the duration of the operation. This avoids checking for referential integrity during the operation. Everything should be fine once you re-activate triggers. We took care of all incoming FKs above. Outgoing FKs are guaranteed to be sound, since you have no concurrent write access and all values have been there before.
In addition to Erwin's excellent answer, it can often be useful to create in intermediate link-table that relates the old keys with the new ones.
DROP SCHEMA tmp CASCADE;
CREATE SCHEMA tmp ;
SET search_path=tmp;
CREATE TABLE taccounts
( account_id SERIAL PRIMARY KEY
, login_name varchar
, email varchar
, last_login TIMESTAMP
);
-- create some fake data
INSERT INTO taccounts(last_login)
SELECT gs FROM generate_series('2013-03-30 14:00:00' ,'2013-03-30 15:00:00' , '1min'::interval) gs
;
UPDATE taccounts
SET login_name = 'User_' || (account_id %10)::text
, email = 'Joe' || (account_id %9)::text || '#somedomain.tld'
;
SELECT * FROM taccounts;
--
-- Create (temp) table linking old id <--> new id
-- After inspection this table can be used as a source for the FK updates
-- and for the final delete.
--
CREATE TABLE update_ids AS
WITH pairs AS (
SELECT one.account_id AS old_id
, two.account_id AS new_id
FROM taccounts one
JOIN taccounts two ON two.last_login > one.last_login
AND ( two.email = one.email OR two.login_name = one.login_name)
)
SELECT old_id,new_id
FROM pairs pp
WHERE NOT EXISTS (
SELECT * FROM pairs nx
WHERE nx.old_id = pp.old_id
AND nx.new_id > pp.new_id
)
;
SELECT * FROM update_ids
;
UPDATE other_table_with_fk_to_taccounts dst
SET account_id. = ids.new_id
FROM update_ids ids
WHERE account_id. = ids.old_id
;
DELETE FROM taccounts del
WHERE EXISTS (
SELECT * FROM update_ids ex
WHERE ex.old_id = del.account_id
);
SELECT * FROM taccounts;
Yet another way to accomplish the same is to add a column with a pointer to the preferred key to the table itself and use that for your updates and deletes.
ALTER TABLE taccounts
ADD COLUMN better_id INTEGER REFERENCES taccounts(account_id)
;
-- find the *better* records for each record.
UPDATE taccounts dst
SET better_id = src.account_id
FROM taccounts src
WHERE src.login_name = dst.login_name
AND src.last_login > dst.last_login
AND src.email IS NOT NULL
AND NOT EXISTS (
SELECT * FROM taccounts nx
WHERE nx.login_name = dst.login_name
AND nx.email IS NOT NULL
AND nx.last_login > src.last_login
);
-- Find records that *do* have an email address
UPDATE taccounts dst
SET better_id = src.account_id
FROM taccounts src
WHERE src.login_name = dst.login_name
AND src.email IS NOT NULL
AND dst.email IS NULL
AND NOT EXISTS (
SELECT * FROM taccounts nx
WHERE nx.login_name = dst.login_name
AND nx.email IS NOT NULL
AND nx.last_login > src.last_login
);
SELECT * FROM taccounts ORDER BY account_id;
UPDATE other_table_with_fk_to_taccounts dst
SET account_id = src.better_id
FROM update_ids src
WHERE dst.account_id = src.account_id
AND src.better_id IS NOT NULL
;
DELETE FROM taccounts del
WHERE EXISTS (
SELECT * FROM taccounts ex
WHERE ex.account_id = del.better_id
);
SELECT * FROM taccounts ORDER BY account_id;

Return id if a row exists, INSERT otherwise

I'm writing a function in node.js to query a PostgreSQL table.
If the row exists, I want to return the id column from the row.
If it doesn't exist, I want to insert it and return the id (insert into ... returning id).
I've been trying variations of case and if else statements and can't seem to get it to work.
A solution in a single SQL statement. Requires PostgreSQL 8.4 or later though.
Consider the following demo:
Test setup:
CREATE TEMP TABLE tbl (
id serial PRIMARY KEY
,txt text UNIQUE -- obviously there is unique column (or set of columns)
);
INSERT INTO tbl(txt) VALUES ('one'), ('two');
INSERT / SELECT command:
WITH v AS (SELECT 'three'::text AS txt)
,s AS (SELECT id FROM tbl JOIN v USING (txt))
,i AS (
INSERT INTO tbl (txt)
SELECT txt
FROM v
WHERE NOT EXISTS (SELECT * FROM s)
RETURNING id
)
SELECT id, 'i'::text AS src FROM i
UNION ALL
SELECT id, 's' FROM s;
The first CTE v is not strictly necessary, but achieves that you have to enter your values only once.
The second CTE s selects the id from tbl if the "row" exists.
The third CTE i inserts the "row" into tbl if (and only if) it does not exist, returning id.
The final SELECT returns the id. I added a column src indicating the "source" - whether the "row" pre-existed and id comes from a SELECT, or the "row" was new and so is the id.
This version should be as fast as possible as it does not need an additional SELECT from tbl and uses the CTEs instead.
To make this safe against possible race conditions in a multi-user environment:
Also for updated techniques using the new UPSERT in Postgres 9.5 or later:
Is SELECT or INSERT in a function prone to race conditions?
I would suggest doing the checking on the database side and just returning the id to nodejs.
Example:
CREATE OR REPLACE FUNCTION foo(p_param1 tableFoo.attr1%TYPE, p_param2 tableFoo.attr1%TYPE) RETURNS tableFoo.id%TYPE AS $$
DECLARE
v_id tableFoo.pk%TYPE;
BEGIN
SELECT id
INTO v_id
FROM tableFoo
WHERE attr1 = p_param1
AND attr2 = p_param2;
IF v_id IS NULL THEN
INSERT INTO tableFoo(id, attr1, attr2) VALUES (DEFAULT, p_param1, p_param2)
RETURNING id INTO v_id;
END IF;
RETURN v_id:
END;
$$ LANGUAGE plpgsql;
And than on the Node.js-side (i'm using node-postgres in this example):
var pg = require('pg');
pg.connect('someConnectionString', function(connErr, client){
//do some errorchecking here
client.query('SELECT id FROM foo($1, $2);', ['foo', 'bar'], function(queryErr, result){
//errorchecking
var id = result.rows[0].id;
};
});
Something like this, if you are on PostgreSQL 9.1
with test_insert as (
insert into foo (id, col1, col2)
select 42, 'Foo', 'Bar'
where not exists (select * from foo where id = 42)
returning foo.id, foo.col1, foo.col2
)
select id, col1, col2
from test_insert
union
select id, col1, col2
from foo
where id = 42;
It's a bit longish and you need to repeat the id to test for several times, but I can't think of a different solution that involves a single SQL statement.
If a row with id=42 exists, the writeable CTE will not insert anything and thus the existing row will be returned by the second union part.
When testing this I actually thought the new row would be returned twice (therefor a union not a union all) but it turns out that the result of the second select statement is actually evaluated before the whole statement is run and it does not see the newly inserted row. So in case a new row is inserted, it will be taken from the "returning" part.
create table t (
id serial primary key,
a integer
)
;
insert into t (a)
select 2
from (
select count(*) as s
from t
where a = 2
) s
where s.s = 0
;
select id
from t
where a = 2
;

Query returning exact number of rows

I have a table that stores two foreign keys, implementing a n:m relationship.
One of them points to a person (subject), the other one to a specific item.
Now, the amount of items a person may have is specified in a different table and I need a query which would return the same number of rows as the number of items a person may have.
The rest of the records may be filled with NULL values or whatever else.
It has proven to be a pain to solve this problem from the application side, so I've decided to try a different approach.
Edit:
Example
CREATE TABLE subject_items
(
sub_item integer NOT NULL,
sal_subject integer NOT NULL,
CONSTRAINT pkey PRIMARY KEY (sub_item, sal_subject),
CONSTRAINT fk1 FOREIGN KEY (sal_subject)
REFERENCES subject (sub_id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE,
CONSTRAINT fk2 FOREIGN KEY (sub_item)
REFERENCES item (item_id) MATCH SIMPLE
ON UPDATE CASCADE ON DELETE CASCADE
)
I need a query/function which would return all subject items (subject may have 5 items)
but there are only 3 items assigned to the subject.
Return would be somewhat like:
sub_item | sal_subject
2 | 1
3 | 1
4 | 1
NULL | 1
NULL | 1
I am using postgresql-8.3
Consider this largely simplified version of your plpgsql function. Should work in PostgreSQL 8.3:
CREATE OR REPLACE FUNCTION x.fnk_abonemento_nariai(_prm_item integer)
RETURNS SETOF subject_items AS
$BODY$
DECLARE
_kiek integer := num_records -- get number at declaration time
FROM subjekto_abonementai WHERE num_id = _prm_item;
_counter integer;
BEGIN
RETURN QUERY -- get the records that actualy exist
SELECT sub_item, sal_subject
FROM sal_subject
WHERE sub_item = prm_item;
GET DIAGNOSTICS _counter = ROW_COUNT; -- save number of returned rows.
RETURN QUERY
SELECT NULL, NULL -- fill the rest with null values
FROM generate_series(_counter + 1, _kiek);
END;
$BODY$ LANGUAGE plpgsql VOLATILE STRICT;
Details about plpgsql in the manual (link to version 8.3).
Could work like this (pure SQL solution):
SELECT a.sal_subject
, b.sub_item
FROM (
SELECT generate_series(1, max_items) AS rn
, sal_subject
FROM subject
) a
LEFT JOIN (
SELECT row_number() OVER (PARTITION BY sal_subject ORDER BY sub_item) AS rn
, sal_subject
, sub_item
FROM subject_items
) b USING (sal_subject, rn)
ORDER BY sal_subject, rn
Generate the maximum rows per subject, let's call them theoretical items.
See the manual for generate_series().
Apply a row-number to existing items per subject.
Manual about window functions.
LEFT JOIN the existing items to the theoretical items per subject. Missing items are filled in with NULL.
In addition to the table you disclosed in the question, I assume a column that holds the maximum number of items in the subject table:
CREATE temp TABLE subject
( sal_subject integer, -- primary key of subject
max_items int); -- max. number of items
Query for PostgreSQL 8.3, substituting for the missing window function row_number():
SELECT a.sal_subject
, b.sub_item
FROM (
SELECT generate_series(1, max_items) AS rn
, sal_subject
FROM subject
) a
LEFT JOIN (
SELECT rn, sal_subject, arr[rn] AS sub_item
FROM (
SELECT generate_series(1, ct) rn, sal_subject, arr
FROM (
SELECT s.sal_subject
, s.ct
, ARRAY(
SELECT sub_item
FROM subject_items s0
WHERE s0.sal_subject = s.sal_subject
ORDER BY sub_item
) AS arr
FROM (
SELECT sal_subject
, count(*) AS ct
FROM subject_items
GROUP BY 1
) s
) x
) y
) b USING (sal_subject, rn)
ORDER BY sal_subject, rn
More about substituting row_number() in this article by Quassnoi.
I was able to come up to this simplistic solution:
First returning all the values i may select then looping returning null values while we have the right amount. Posting it here if someone would stumble on the same problem.
Still looking for easier/faster solutions if they exist.
CREATE OR REPLACE FUNCTION fnk_abonemento_nariai(prm_item integer)
RETURNS SETOF subject_items AS
$BODY$DECLARE _kiek integer;
DECLARE _rec subject_items;
DECLARE _counter integer;
BEGIN
/*get the number of records we need*/
SELECT INTO _kiek num_records
FROM subjekto_abonementai
WHERE num_id = prm_item;
/*get the records that actualy exist */
FOR _rec IN SELECT sub_item, sal_subject
FROM sal_subject
WHERE sub_item = prm_item LOOP
return
next _rec;
_counter := COALESCE(_counter, 0) + 1;
END LOOP;
/*fill the rest with null values*/
While _kiek > _counter loop
_rec.sub_item := NULL;
_rec.sal_subject := NULL;
Return next _rec;
_counter := COALESCE(_counter, 0) + 1;
end loop;
END;$BODY$
LANGUAGE plpgsql VOLATILE;