Unique constraint for permutations across multiple columns - sql

Given the following three columns in a Postgres database: first, second, third; how can I create a constraint such that permutations are unique?
E.g. If ('foo', 'bar', 'shiz') exist in the db, ('bar', 'shiz', 'foo') would be excluded as non-unique.

You could use hstore to create the unique index:
CREATE UNIQUE INDEX hidx ON test USING BTREE (hstore(ARRAY[a,b,c], ARRAY[a,b,c]));
Fiddle
UPDATE
Actually
CREATE UNIQUE INDEX hidx ON test USING BTREE (hstore(ARRAY[a,b,c], ARRAY[null,null,null]));
might be a better idea since it will work the same but should take less space (fiddle).

For only three columns this unique index using only basic expressions should perform very well. No additional modules like hstore or custom function needed:
CREATE UNIQUE INDEX t_abc_uni_idx ON t (
LEAST(a,b,c)
, GREATEST(LEAST(a,b), LEAST(b,c), LEAST(a,c))
, GREATEST(a,b,c)
);
SQL fiddle
Also needs the least disk space:
SELECT pg_column_size(row(hstore(t))) AS hst_row
,pg_column_size(row(hstore(ARRAY[a,b,c], ARRAY[a,b,c]))) AS hst1
,pg_column_size(row(hstore(ARRAY[a,b,c], ARRAY[null,null,null]))) AS hst2
,pg_column_size(row(ARRAY[a,b,c])) AS arr
,pg_column_size(row(LEAST(a,b,c)
, GREATEST(LEAST(a,b), LEAST(b,c), LEAST(a,c))
, GREATEST(a,b,c))) AS columns
FROM t;
hst_row | hst1 | hst2 | arr | columns
---------+------+------+-----+---------
59 | 59 | 56 | 69 | 30
Numbers are bytes for index row in the example in the fiddle, measured with pg_column_size(). My example uses only single characters, the difference in size is constant.

You can do this by creating a unique index on a function which returns a sorted array of the values in the columns:
CREATE OR REPLACE FUNCTION sorted_array(anyarray)
RETURNS anyarray
AS $BODY$
SELECT array_agg(x) FROM (SELECT unnest($1) AS x FROM test ORDER BY x) AS y;
$BODY$
LANGUAGE sql IMMUTABLE;
CREATE UNIQUE index ON test (sorted_array(array[first,second,third]));

Suggestion from co-worker, variation of #julien's idea:
Sort the terms alphabetically and place a delimiter on either side of each term. Concatenate them and place them in a separate field that becomes the primary key.
Why the delimiter? So that, "a", "aa", "aaa" and "aa", "aa", "aa" can both be inserted.

Related

PostgreSQL constraint to prevent overlapping ranges

I wonder if it's possible to write a constraint that would make ranges unique. These ranges are represented as two string-typed columns bottom and top. Say, If I have the following row in a database,
| id | bottom | top |
|----|--------|-------|
| 1 | 10000 | 10999 |
inserting the row (2, 10100, 10200) would immediately result in constraint violation error.
P.S
I can't switch to integers, unfortunately -- only strings
Never store numbers as strings, and always use a range data type like int4range to store ranges. With ranges, you can easily use an exclusion constraint:
ALTER TABLE tab ADD EXCLUDE USING gist (bottom_top WITH &&);
Here, bottom_top is a range data type.
If you have to stick with the broken data model using two string columns, you can strip # characters and still have an exclusion constraint with
ALTER TABLE tab ADD EXCLUDE USING gist (
int4range(
CAST(trim(bottom, '#') AS integer),
CAST(trim(top, '#') AS integer),
'[]'
) WITH &&
);

Filter an SQL Array text[] for matching value containing a parameter

I have a table with a TEXT[] column. I want to return all rows that have at least one of the array value that contains my parameter.
Right now I'm doing WHERE array_to_string(arr, ',') ilike '%myString%'
But I feel their must be a better optimized way of doing that kind of search.
Plus I would also like to search for values begining or ending by my parameter.
CREATE TABLE IF NOT EXISTS my_table
(
id BIGSERIAL,
col_array TEXT[],
CONSTRAINT my_table_pkey PRIMARY KEY (id)
)
insert into my_table(col_array)
VALUES ('{ABC,DEF}'),
('{FGH,IJK}'),
('{LMN}'),
('{OPQ}');
select * from my_table where ARRAY_TO_STRING(col_array, ',') ilike '%F%';
this works as it returns only first 2 rows.
You can find a sqlfiddle here: http://sqlfiddle.com/#!17/09632/7
I would use a sub-query:
select t.*
from my_table t
where exists (select *
from unnest(t.col_array) as x(e)
where x.e ilike '%F%')
You might want to re-consider your decision to de-normalize your model.
Quote from the manual
Arrays are not sets; searching for specific array elements can be a sign of database misdesign. Consider using a separate table with a row for each item that would be an array element. This will be easier to search, and is likely to scale better for a large number of elements.

How to lookup based on ranged values

I have a table like:
id name
001to005 ABC
006to210 PQR
211to300 XYZ
This is not the final table i can make it any how i want...so i would like to lookup on this data on id and extract name like if id is in range of 001-005 then ABC and if id is in range 006-010 .... then name XYZ.
My approach would be, store id as regular expression in table like this:
id name
[0][0][1-5] ABC
[0-2][0-9][0-9] PQR
[2-3][0-9][0-9] XYZ
and then query:
select * from table where '004' ~ id
This query will return ABC which is correct but when range gets bigger my input value can lie on both 2nd and 3rd row.
For Eg:
select * from table where '299' ~ id
this query will result in 2 rows,so my question is what reg exp to use to make it more restrictive or is there any other approach to solve this:
Do not store regular expressions for simple ranges, that would be extremely expensive and cannot use an index: every single expression in the table would have to be evaluated for every query to satisfy conditions.
You could use range types like #a_horse commented. But while you don't need the added functionality for range types this simple layout is smaller and faster:
CREATE TABLE tbl (
id_lo int NOT NULL
, id_hi int NOT NULL
, name text NOT NULL
);
INSERT INTO t VALUES
( 1, 5, 'ABC')
, ( 6, 210, 'PQR')
, (211, 300, 'XYZ');
CREATE UNIQUE INDEX foo ON t (id_lo, id_hi DESC);
Two integer occupy 8 bytes, int4range value occupies 17 bytes. Size matters in tables and indexes.
Query:
SELECT * FROM tbl
WHERE 4 BETWEEN id_lo AND id_hi;
Lower (id_lo) and upper (id_hi) bounds are included in the range like your sample data suggests.
Note that range types exclude the upper bound by default.
Also assuming that leading zeros are insignificant, so we can operate with plain integer.
Related:
PostgreSQL daterange not using index correctly
Optimizing queries on a range of timestamps (two columns)
Find overlapping date ranges in PostgreSQL
To enforce distinct ranges in the table:
Preventing adjacent/overlapping entries with EXCLUDE in PostgreSQL
You still don't need a range type in the table for this:
Postgres: How to find nearest tsrange from timestamp outside of ranges?

Find position(s) in array matching a given sub-array

Given this table:
CREATE TABLE datasets.travel(path integer[], path_timediff double precision[]);
INSERT INTO datasets.travel
VALUES (array[50,49,49,49,49,50], array[NULL,438,12,496,17,435]);
I am looking for some kind of function or query in the PostgreSQL that for a given input array[49,50] will find the matching consecutive index values in path which is [5,6] and the corresponding element in path_timediff which is 435 in the example (array index 6).
My ultimate purpose is to find all such occurrences of [49,50] in path and all the corresponding elements in path_timediff. How can I do that?
Assuming you have a primary key in your table you did not show:
CREATE TABLE datasets.travel (
travel_id serial PRIMARY KEY
, path integer[]
, path_timediff float8[]
);
Here is one way with generate_subscripts() in a LATERAL join:
SELECT t.travel_id, i+1 AS position, path_timediff[i+1] AS timediff
FROM (SELECT * FROM datasets.travel WHERE path #> ARRAY[49,50]) t
, generate_subscripts(t.path, 1) i
WHERE path[i:i+1] = ARRAY[49,50];
This finds all matches, not just the first.
i+1 works for a sub-array of length 2. Generalize with i + array_length(sub_array, 1) - 1.
The subquery is not strictly necessary, but can use a GIN index on (path) for a fast pre-selection:
(SELECT * FROM datasets.travel WHERE path #> ARRAY[49,50])
Related:
How to access array internal index with postgreSQL?
Parallel unnest() and sort order in PostgreSQL
PostgreSQL unnest() with element number

Copy a table (including indexes) in postgres

I have a postgres table. I need to delete some data from it. I was going to create a temporary table, copy the data in, recreate the indexes and the delete the rows I need. I can't delete data from the original table, because this original table is the source of data. In one case I need to get some results that depends on deleting X, in another case, I'll need to delete Y. So I need all the original data to always be around and available.
However it seems a bit silly to recreate the table and copy it again and recreate the indexes. Is there anyway in postgres to tell it "I want a complete separate copy of this table, including structure, data and indexes"?
Unfortunately PostgreSQL does not have a "CREATE TABLE .. LIKE X INCLUDING INDEXES'
New PostgreSQL ( since 8.3 according to docs ) can use "INCLUDING INDEXES":
# select version();
version
-------------------------------------------------------------------------------------------------
PostgreSQL 8.3.7 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.2.4 (Ubuntu 4.2.4-1ubuntu3)
(1 row)
As you can see I'm testing on 8.3.
Now, let's create table:
# create table x1 (id serial primary key, x text unique);
NOTICE: CREATE TABLE will create implicit sequence "x1_id_seq" for serial column "x1.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "x1_pkey" for table "x1"
NOTICE: CREATE TABLE / UNIQUE will create implicit index "x1_x_key" for table "x1"
CREATE TABLE
And see how it looks:
# \d x1
Table "public.x1"
Column | Type | Modifiers
--------+---------+-------------------------------------------------
id | integer | not null default nextval('x1_id_seq'::regclass)
x | text |
Indexes:
"x1_pkey" PRIMARY KEY, btree (id)
"x1_x_key" UNIQUE, btree (x)
Now we can copy the structure:
# create table x2 ( like x1 INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES );
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "x2_pkey" for table "x2"
NOTICE: CREATE TABLE / UNIQUE will create implicit index "x2_x_key" for table "x2"
CREATE TABLE
And check the structure:
# \d x2
Table "public.x2"
Column | Type | Modifiers
--------+---------+-------------------------------------------------
id | integer | not null default nextval('x1_id_seq'::regclass)
x | text |
Indexes:
"x2_pkey" PRIMARY KEY, btree (id)
"x2_x_key" UNIQUE, btree (x)
If you are using PostgreSQL pre-8.3, you can simply use pg_dump with option "-t" to specify 1 table, change table name in dump, and load it again:
=> pg_dump -t x2 | sed 's/x2/x3/g' | psql
SET
SET
SET
SET
SET
SET
SET
SET
CREATE TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
And now the table is:
# \d x3
Table "public.x3"
Column | Type | Modifiers
--------+---------+-------------------------------------------------
id | integer | not null default nextval('x1_id_seq'::regclass)
x | text |
Indexes:
"x3_pkey" PRIMARY KEY, btree (id)
"x3_x_key" UNIQUE, btree (x)
[CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE table_name
[ (column_name [, ...] ) ]
[ WITH ( storage_parameter [= value] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE tablespace ]
AS query][1]
Here is an example
CREATE TABLE films_recent AS
SELECT * FROM films WHERE date_prod >= '2002-01-01';
The other way to create a new table from the first is to use
CREATE TABLE films_recent (LIKE films INCLUDING INDEXES);
INSERT INTO films_recent
SELECT *
FROM books
WHERE date_prod >= '2002-01-01';
Note that Postgresql has a patch out to fix tablespace issues if the second method is used
There are many answers on the web, one of them can be found here.
I ended up doing something like this:
create table NEW ( like ORIGINAL including all);
insert into NEW select * from ORIGINAL
This will copy the schema and the data including indexes, but not including triggers and constraints.
Note that indexes are shared with original table so when adding new row to either table the counter will increment.
I have a postgres table. I need to
delete some data from it.
I presume that ...
delete from yourtable
where <condition(s)>
... won't work for some reason. (Care to share that reason?)
I was going to create a temporary
table, copy the data in, recreate the
indexes and the delete the rows I
need.
Look into pg_dump and pg_restore. Using pg_dump with some clever options and perhaps editing the output before pg_restoring might do the trick.
Since you are doing "what if"-type analysis on the data, I wonder if might you be better off using views.
You could define a view for each scenario you want to test based on the negation of what you want to exclude. I.e., define a view based on what you want to INclude. E.g., if you want a "window" on the data where you "deleted" the rows where X=Y, then you would create a view as rows where (X != Y).
Views are stored in the database (in the System Catalog) as their defining query. Every time you query the view the database server looks up the underlying query that defines it and executes that (ANDed with any other conditions you used). There are several benefits to this approach:
You never duplicate any portion of your data.
The indexes already in use for the base table (your original, "real" table) will be used (as seen fit by the query optimizer) when you query each view/scenario. There is no need to redefine or copy them.
Since a view is a "window" (NOT a shapshot) on the "real" data in the base table, you can add/update/delete on your base table and simply re-query the view scenarios with no need to recreate anything as the data changes over time.
There is a trade-off, of course. Since a view is a virtual table and not a "real" (base) table, you're actually executing a (perhaps complex) query every time you access it. This may slow things down a bit. But it may not. It depends on many issues (size and nature of the data, quality of the statistics in the System Catalog, speed of the hardware, usage load, and much more). You won't know until you try it. If (and only if) you actually find that the performance is unacceptably slow, then you might look at other options. (Materialized views, copies of tables, ... anything that trades space for time.)
A simple way is include all:
CREATE TABLE new_table (LIKE original_table INCLUDING ALL);
Create a new table using a select to grab the data you want. Then swap the old table with the new one.
create table mynewone as select * from myoldone where ...
mess (re-create) with indexes after the table swap.