Postgres 9.3 index involving JSON column - sql

I have this table structure:
CREATE TABLE user_items
(
user_id bigint references users(id) NOT NULL,
item_id bigint references items(id) NOT NULL,
col1 json DEFAULT '[{"text":""}]',
col2 json DEFAULT '[{"date":"","text":""}]',
col3 json DEFAULT '{"text":""}',
PRIMARY KEY (user_id, item_id)
)
I will be running queries such as this:
SELECT * FROM user_items WHERE item_id = '?' AND col1 IS NOT NULL
Do I need an index (item_id, col1) in this case ?
And if so, what's the right way to do it, because when trying it Postgres is throwing an error since col1 is a JSON type.

I suggest to use a partial index on item_id:
CREATE INDEX foo_idx ON user_items (item_id)
WHERE col1 IS NOT NULL
The data type of col1 is irrelevant here. Be sure to include the verbatim WHERE clause in queries to allow Postgres to use this index.

You would only need it if each item_id occurs many times and col1 is usually NULL. If either of those is not true, just make the index on (item_id). The database will have to visit the row and filter out the ones where col1 is NULL, but if NULL is rare that will be no big deal.
If NULL is common, then try a "functional" or "expression" index on (item_id, (col1 is not null))

Related

Composite Not Null constraint on two columns

I want to have a constraint in which I make sure that at least of two columns is not null.
Basically, from those two columns, one must contain values.
How can I have a constraint like that?
Is it possible on liquibase? If not, is it possible via SQL or some postgres specific thing?
I like using num_nonnulls() for this:
For at least one not null column:
check (num_nonnulls(col1, col2) >= 1)
For exactly one not null column:
check (num_nonnulls(col1, col2) = 1)
Liquibase has not built-in change for check constraints (at least not in the community version), so you will need a <sql> change for this:
<sql>
alter table the_table
add constraint at_least_one_not_null
check (num_nonnulls(col1, col2) >= 1)
</sql>
You can use a check constraint. For at least one non-NULL value:
check (col1 is not null or col2 is not null)
If you need for exactly one to contain values:
check (col1 is not null and col2 is null or
col1 is null and col2 is not null
)
Or in Postgres:
check ( (col1 is not null)::int + (col2 is not null)::int = 1 )

Auto-increment primary keys in SQL

I need help with the insert statements for a plethora of tables in our DB.
New to SQL - just basic understanding
Summary:
Table1
Col1 Col2 Col3
1 value1 value1
2 value2 value2
3 value3 value3
Table2
Col1 Col2 Col3
4 value1 value1
5 value2 value2
6 value3 value3
Multiple tables use the same sequence of auto-generated primary keys when user creates a static data record from the GUI.
However, creating a script to upload static data from one environment to the other is something I'm looking for.
Example from one of the tables:
Insert into RULE (PK_RULE,NAME,RULEID,DESCRIPTION)
values
(4484319,'TESTRULE',14,'TEST RULE DESCRIPTION')
How do I design my insert statement so that it reads the last value from the PK column (4484319 here) and auto inserts 4484320 without explicitly mentioning the same?
Note: Our DB has hundreds and thousands of records.
I think there's something similar to (SELECT MAX(ID) + 1 FROM MyTable) which could potentially solve my problem but I don't know how to use it.
Multiple tables use the same sequence of auto-generated primary keys when user creates a static data record from the GUI.
Generally, multiple tables sharing a single sequence of primary keys is a poor design choice. Primary keys only need to be unique per table. If they need to be unique globally there are better options such as UUID primary keys.
Instead, one gives each table their own independent sequence of primary keys. In MySQL it's id bigint auto_increment primary key. In Postgres you'd use bigserial. In Oracle 12c it's number generated as identity.
create table users (
id number generated as identity,
name text not null
);
create table things (
id number generated as identity,
description text not null
);
Then you insert into each, leaving off the id, or setting it null. The database will fill it in from each sequence.
insert into users (name) values ('Yarrow Hock'); -- id 1
insert into users (id, name) values (null, 'Reaneu Keeves'); -- id 2
insert into things (description) values ('Some thing'); -- id 1
insert into things (id, description) values (null, 'Shiny stuff'); -- id 2
If your schema is not set up with auto incrementing, sequenced primary keys, you can alter the schema to use them. Just be sure to set each sequence to the maximum ID + 1. This is by far the most sane option in the long run.
If you really must draw from a single source for all primary keys, create a sequence and use that.
create sequence master_seq
start with ...
Then get the next key with nextval.
insert into rule (pk_rule, name, ruleid, description)
values (master_seq.nextval, 'TESTRULE', 14, 'TEST RULE DESCRIPTION')
Such a sequence goes up to 1,000,000,000,000,000,000,000,000,000 which should be plenty.
The INSERT and UPDATE statements in Oracle have a ...RETURNING...INTO... clause on them which can be used to return just-inserted values. When combined with a trigger-and-sequence generated primary key (Oracle 11 and earlier) or an identity column (Oracle 12 and up) this lets you get back the most-recently-inserted/updated value.
For example, let's say that you have a table TABLE1 defined as
CREATE TABLE TABLE1 (ID1 NUMBER
GENERATED ALWAYS AS IDENTITY
PRIMARY KEY,
COL2 NUMBER,
COL3 VARCHAR2(20));
You then define a function which inserts data into TABLE1 and returns the new ID value:
CREATE OR REPLACE FUNCTION INSERT_TABLE1(pCOL2 NUMBER, vCOL3 VARCHAR2)
RETURNS NUMBER
AS
nID NUMBER;
BEGIN
INSERT INTO TABLE1(COL2, COL3) VALUES (pCOL2, vCOL3)
RETURNING ID1 INTO nID;
RETURN nID;
END INSERT_TABLE1;
which gives you an easy way to insert data into TABLE1 and get the new ID value back.
dbfiddle here

what is the convention for normalizing a table with many primary keys

I have a database table like so :
col1 PRI
col2 PRI
col3 PRI
col4 PRI
col5 PRI
col6
col7
col8
So, it looks like all the columns from 1 to 5 need to be unique and that it makes 'sense' to just make those keys primary. Is this the right way of designing or should we just add a new auto generated column with a unique constraint on the 5 columns? We will query with either a subset of those columns (col1 - col3) or all 5 columns
This is fine; I see no need to have a 'generated' column:
PRIMARY KEY(a,b,c,d,e)
If you have this, it will work efficiently:
WHERE b=22 AND c=333 AND a=4444 -- in any order
Most other combinations will be less efficient.
(Please use real column names so we can discuss things in more detail.)
If you set the columns as UNIQUE, will fail because col1 cannot be equal on 2 diferent rows.
But if you set the columns as PRIMARY KEY but not UNIQUE, the database assumes that the combination of all the primary keys must be the 'UNIQUE' value, so col1+col2+col3+col4+col5 cannot be found on any other row.
Hope it helps.
EDIT
Here an example:
create table example (
col1 bigint not null unique,
col2 bigint not null,
primary key (col1,col2));
insert into example values(1,1); ==> Success
insert into example values(1,2); ==> Failure - col1 is unique and '1' was used
insert into example values(2,1); ==> Success - '2' was never used on col1
insert into example values(2,7); ==> Failure - '2' was already used on col1
But if you use instead:
create table example (
col1 bigint not null,
col2 bigint not null,
primary key (col1,col2));
insert into example values(1,1); ==> Success
insert into example values(1,2); ==> Success
insert into example values(2,1); ==> Success
insert into example values(1,2); ==> Failure - '1','2' combination was used

No unique or exclusion constraint matching the ON CONFLICT

I'm getting the following error when doing the following type of insert:
Query:
INSERT INTO accounts (type, person_id) VALUES ('PersonAccount', 1) ON
CONFLICT (type, person_id) WHERE type = 'PersonAccount' DO UPDATE SET
updated_at = EXCLUDED.updated_at RETURNING *
Error:
SQL execution failed (Reason: ERROR: there is no unique or exclusion
constraint matching the ON CONFLICT specification)
I also have an unique INDEX:
CREATE UNIQUE INDEX uniq_person_accounts ON accounts USING btree (type,
person_id) WHERE ((type)::text = 'PersonAccount'::text);
The thing is that sometimes it works, but not every time. I randomly get
that exception, which is really strange. It seems that it can't access that
INDEX or it doesn't know it exists.
Any suggestion?
I'm using PostgreSQL 9.5.5.
Example while executing the code that tries to find or create an account:
INSERT INTO accounts (type, person_id, created_at, updated_at) VALUES ('PersonAccount', 69559, '2017-02-03 12:09:27.259', '2017-02-03 12:09:27.259') ON CONFLICT (type, person_id) WHERE type = 'PersonAccount' DO UPDATE SET updated_at = EXCLUDED.updated_at RETURNING *
SQL execution failed (Reason: ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification)
In this case, I'm sure that the account does not exist. Furthermore, it never outputs the error when the person has already an account. The problem is that, in some cases, it also works if there is no account yet. The query is exactly the same.
Per the docs,
All table_name unique indexes that, without regard to order, contain exactly the
conflict_target-specified columns/expressions are inferred (chosen) as arbiter
indexes. If an index_predicate is specified, it must, as a further requirement
for inference, satisfy arbiter indexes.
The docs go on to say,
[index_predicate are u]sed to allow inference of partial unique indexes
In an understated way, the docs are saying that when using a partial index and
upserting with ON CONFLICT, the index_predicate must be specified. It is not
inferred for you. I learned this
here, and the following example demonstrates this.
CREATE TABLE test.accounts (
id int PRIMARY KEY GENERATED BY DEFAULT AS IDENTITY,
type text,
person_id int);
CREATE UNIQUE INDEX accounts_note_idx on accounts (type, person_id) WHERE ((type)::text = 'PersonAccount'::text);
INSERT INTO test.accounts (type, person_id) VALUES ('PersonAccount', 10);
so that we have:
unutbu=# select * from test.accounts;
+----+---------------+-----------+
| id | type | person_id |
+----+---------------+-----------+
| 1 | PersonAccount | 10 |
+----+---------------+-----------+
(1 row)
Without index_predicate we get an error:
INSERT INTO test.accounts (type, person_id) VALUES ('PersonAccount', 10) ON CONFLICT (type, person_id) DO NOTHING;
-- ERROR: there is no unique or exclusion constraint matching the ON CONFLICT specification
But if instead you include the index_predicate, WHERE ((type)::text = 'PersonAccount'::text):
INSERT INTO test.accounts (type, person_id) VALUES ('PersonAccount', 10)
ON CONFLICT (type, person_id)
WHERE ((type)::text = 'PersonAccount'::text) DO NOTHING;
then there is no error and DO NOTHING is honored.
A simple solution of this error
First of all let's see the cause of error with a simple example. Here is the table mapping products to categories.
create table if not exists product_categories (
product_id uuid references products(product_id) not null,
category_id uuid references categories(category_id) not null,
whitelist boolean default false
);
If we use this query:
INSERT INTO product_categories (product_id, category_id, whitelist)
VALUES ('123...', '456...', TRUE)
ON CONFLICT (product_id, category_id)
DO UPDATE SET whitelist=EXCLUDED.whitelist;
This will give you error No unique or exclusion constraint matching the ON CONFLICT because there is no unique constraint on product_id and category_id. There could be multiple rows having the same combination of product and category id (so there can never be a conflict on them).
Solution:
Use unique constraint on both product_id and category_id like this:
create table if not exists product_categories (
product_id uuid references products(product_id) not null,
category_id uuid references categories(category_id) not null,
whitelist boolean default false,
primary key(product_id, category_id) -- This will solve the problem
-- unique(product_id, category_id) -- OR this if you already have a primary key
);
Now you can use ON CONFLICT (product_id, category_id) for both columns without any error.
In short: Whatever column(s) you use with on conflict, they should have unique constraint.
The easy way to fix it is by setting the conflicting column as UNIQUE
I did not have a chance to play with UPSERT, but I think you have a case from
docs:
Note that this means a non-partial unique index (a unique index
without a predicate) will be inferred (and thus used by ON CONFLICT)
if such an index satisfying every other criteria is available. If an
attempt at inference is unsuccessful, an error is raised.
I solved the same issue by creating one UNIQUE INDEX for ALL columns you want to include in the ON CONFLICT clause, not one UNIQUE INDEX for each of the columns.
CREATE TABLE table_name (
element_id UUID NOT NULL DEFAULT gen_random_uuid(),
timestamp TIMESTAMP NOT NULL DEFAULT now():::TIMESTAMP,
col1 UUID NOT NULL,
col2 STRING NOT NULL ,
col3 STRING NOT NULL ,
CONSTRAINT "primary" PRIMARY KEY (element_id ASC),
UNIQUE (col1 asc, col2 asc, col3 asc)
);
Which will allow to query like
INSERT INTO table_name (timestamp, col1, col2, col3) VALUES ('timestamp', 'uuid', 'string', 'string')
ON CONFLICT (col1, col2, col3)
DO UPDATE timestamp = EXCLUDED.timestamp, col1 = EXCLUDED.col1, col2 = excluded.col2, col3 = col3.excluded;

In Postgresql, force unique on combination of two columns

I would like to set up a table in PostgreSQL such that two columns together must be unique. There can be multiple values of either value, so long as there are not two that share both.
For instance:
CREATE TABLE someTable (
id int PRIMARY KEY AUTOINCREMENT,
col1 int NOT NULL,
col2 int NOT NULL
)
So, col1 and col2 can repeat, but not at the same time. So, this would be allowed (Not including the id)
1 1
1 2
2 1
2 2
but not this:
1 1
1 2
1 1 -- would reject this insert for violating constraints
CREATE TABLE someTable (
id serial PRIMARY KEY,
col1 int NOT NULL,
col2 int NOT NULL,
UNIQUE (col1, col2)
)
autoincrement is not postgresql. You want a integer primary key generated always as identity (or serial if you use PG 9 or lower. serial was soft-deprecated in PG 10).
If col1 and col2 make a unique and can't be null then they make a good primary key:
CREATE TABLE someTable (
col1 int NOT NULL,
col2 int NOT NULL,
PRIMARY KEY (col1, col2)
)
Create unique constraint that two numbers together CANNOT together be repeated:
ALTER TABLE someTable
ADD UNIQUE (col1, col2)
If, like me, you landed here with:
a pre-existing table,
to which you need to add a new column, and
also need to add a new unique constraint on the new column as well as an old one, AND
be able to undo it all (i.e. write a down migration)
Here is what worked for me, utilizing one of the above answers and expanding it:
-- up
ALTER TABLE myoldtable ADD COLUMN newcolumn TEXT;
ALTER TABLE myoldtable ADD CONSTRAINT myoldtable_oldcolumn_newcolumn_key UNIQUE (oldcolumn, newcolumn);
---
ALTER TABLE myoldtable DROP CONSTRAINT myoldtable_oldcolumn_newcolumn_key;
ALTER TABLE myoldtable DROP COLUMN newcolumn;
-- down
Seems like regular UNIQUE CONSTRAINT :)
CREATE TABLE example (
a integer,
b integer,
c integer,
UNIQUE (a, c));
More here