PostgreSQL constraint to prevent overlapping ranges - sql

I wonder if it's possible to write a constraint that would make ranges unique. These ranges are represented as two string-typed columns bottom and top. Say, If I have the following row in a database,
| id | bottom | top |
|----|--------|-------|
| 1 | 10000 | 10999 |
inserting the row (2, 10100, 10200) would immediately result in constraint violation error.
P.S
I can't switch to integers, unfortunately -- only strings

Never store numbers as strings, and always use a range data type like int4range to store ranges. With ranges, you can easily use an exclusion constraint:
ALTER TABLE tab ADD EXCLUDE USING gist (bottom_top WITH &&);
Here, bottom_top is a range data type.
If you have to stick with the broken data model using two string columns, you can strip # characters and still have an exclusion constraint with
ALTER TABLE tab ADD EXCLUDE USING gist (
int4range(
CAST(trim(bottom, '#') AS integer),
CAST(trim(top, '#') AS integer),
'[]'
) WITH &&
);

Related

EXCLUDE constraint with conditional statement

I have a place table in postgres that contains the following columns
id | idplace | date_slot | type
-----+---------+-------------------------+------------
1 | 360 | [2023-02-20,2023-02-27) | 'zero'
The column type accepts only 3 values ('zero','one' or 'two')
I have created a constraint earlier to exclude any entry that is similar to another and it works. Here is the gist constraint
ALTER TABLE ONLY place ADD CONSTRAINT place_idplace_date_slot_excl1 EXCLUDE USING gist (idplace WITH =, type WITH =,date_slot WITH &&)
Now I want to modify it with the following rules:
-if the column type is equal to 'zero' do not accept any similar entry.
-if the column type is equal to 'one' accept only one another entry that contains 'two' (not zero or one).
-if the column type is equal to 'two' accept only one another entry that contains 'one' (not zero or two).
I cannot think of a better way than adding a generated column:
ALTER TABLE place
ADD col int4range GENERATED ALWAYS AS (
CASE WHEN type = 'zero' THEN INT4RANGE '[1,2]'
WHEN type = 'one' THEN INT4RANGE '[1,1]'
WHEN type = 'two' THEN INT4RANGE '[2,2]'
END
) STORED
NOT NULL;
Then you can use that column in your exclusion constraint with the && operator.

Adding column to sqlite database and distribute rows based on primary key

I have some data elements containing a timestamp and information about Item X sales related to this timestamp.
e.g.
timestamp | items X sold
------------------------
1 | 10
4 | 40
7 | 20
I store this data in an SQLite table. Now I want to add to this table. Especially if I get data about another item Y.
The item Y data might or might not have different timestamps but I want to insert this data into the existing table so that it looks like this:
timestamp | items X sold | items Y sold
------------------------------------------
1 | 10 | 5
2 | NULL | 10
4 | 40 | NULL
5 | NULL | 3
7 | 20 | NULL
Later on additional sales data (columns) must be added with the same scheme.
Is there an easy way to accomplish this with SQLite?
In the end I want to fetch data by timestamp and get an overview which items were sold at this time. Most examples consider the usecase to add a complete row (one record) or a complete column if it perfectly matches to the other columns.
Or is sqlite the wrong tool at all? And I should rather use csv or excel?
(Using pythons sqlite3 package to create and manipulate the DB)
Thanks!
Dynamically adding columns is not a good design. You could add them using
ALTER TABLE your_table ADD COLUMN the_column_name TEXT
the column, for existing rows would be populated with nulls, although you could specify a DEFAULT value and the existing rows would then be populated with that value.
e.g. the following demonstrates the above :-
DROP TABLE IF EXISTS soldv1;
CREATE TABLE IF NOT EXISTS soldv1 (timestamp INTEGER PRIAMRY KEY, items_sold_x INTEGER);
INSERT INTO soldv1 VALUES(1,10),(4,40),(7,20);
SELECT * FROM soldv1 ORDER BY timestamp;
ALTER TABLE soldv1 ADD COLUMN items_sold_y INTEGER;
UPDATE soldv1 SET items_sold_y = 5 WHERE timestamp = 1;
INSERT INTO soldv1 VALUES(2,null,10),(5,null,3);
SELECT * FROM soldv1 ORDER BY timestamp;
resulting in the first query returning :-
and the second query returning :-
However, as stated, the above is not considered a good design as the schema is dynamic.
You could alternately manage an equivalent of the above with the addition of either a new column (to also be part of the primary key) or by prefixing/suffixing the timestamp with a type.
Consider, as an example, the following :-
DROP TABLE IF EXISTS soldv2;
CREATE TABLE IF NOT EXISTS soldv2 (type TEXT, timestamp INTEGER, items_sold INTEGER, PRIMARY KEY(timestamp,type));
INSERT INTO soldv2 VALUES('x',1,10),('x',4,40),('x',7,20);
INSERT INTO soldv2 VALUES('y',1,5),('y',2,10),('y',5,3);
INSERT INTO soldv2 VALUES('z',1,15),('z',2,5),('z',9,25);
SELECT * FROM soldv2 ORDER BY timestamp;
This has replicated, data-wise, your original data and additionally added another type (column items_sold_z) without having to change the table's schema (nor having the additional complication of needing to update rather than insert as per when applying timestamp 1 items_sold_y 5).
The result from the query being :-
Or is sqlite the wrong tool at all? And I should rather use csv or excel?
SQLite is a valid tool. What you then do with the data can probably be done as easy as in excel (perhaps simpler) and probably much simpler than trying to process the data in csv format.
For example, say you wanted the total items sold per timestamp and how many types were sold then :-
SELECT timestamp, count(items_sold) AS number_of_item_types_sold, sum(items_sold) AS total_sold FROM soldv2 GROUP by timestamp ORDER BY timestamp;
would result in :-

Changing a column type from integer to string

Using PostgreSQL, what's the command to migrate an integer column type to a string column type?
Obviously I'd like to preserve the data, by converting the old integer data to strings.
You can convert from INTEGER to CHARACTER VARYING out-of-the-box, all you need is ALTER TABLE query chaning column type:
SQL Fiddle
PostgreSQL 9.3 Schema Setup:
CREATE TABLE tbl (col INT);
INSERT INTO tbl VALUES (1), (10), (100);
ALTER TABLE tbl ALTER COLUMN col TYPE CHARACTER VARYING(10);
Query 1:
SELECT col, pg_typeof(col) FROM tbl
Results:
| col | pg_typeof |
|-----|-------------------|
| 1 | character varying |
| 10 | character varying |
| 100 | character varying |
I suggest a four step process:
Create a new string column. name it temp for now. See http://www.postgresql.org/docs/9.3/static/ddl-alter.html for details
Set the string column. something like update myTable set temp=cast(intColumn as text) see http://www.postgresql.org/docs/9.3/static/functions-formatting.html for more interesting number->string conversions
Make sure everything in temp looks the way you want it.
Remove your old integer column. Once again, see http://www.postgresql.org/docs/9.3/static/ddl-alter.html for details
Rename temp to the old column name. Again: http://www.postgresql.org/docs/9.3/static/ddl-alter.html
This assumes you can perform the operation while no clients are connected; offline. If you need to make this (drastic) change in an online table, take a look at setting up a new table with triggers for live updates, then swap to the new table in an atomic operation. see ALTER TABLE without locking the table?

Using ALTER TABLE command in psql to add to a table

I am trying to solve this extra credit problem for my homework. So we haven't learned about this yet, but I thought I would give it a try because extra credit is always good. I am trying to write an ALTER TABLE statement to add a column to a table. The full definition is here.
Use the ALTER TABLE command to add a field to the table called rank
that is of type smallint. We’ll use this field to store a ranking of
the teams. The team with the highest points value will be ranked
number 1; the team with the second highest points value will be
ranked number 2; etc. Write a PL/pgSQL function named update rank
that updates the rank field to contain the appropriate number for
all teams. (There are both simple and complicated ways of doing this.
Think about how it can be done with very little code.) Then, define a
trigger named tr update rank that fires after an insert or update
of any of the fields {wins, draws}. This trigger should be executed
once per statement (not per row).
The table that I am using is
Table "table.group_standings"
Column | Type | Modifiers
--------+-----------------------+-----------
team | character varying(25)| not null
wins | smallint | not null
losses | smallint | not null
draws | smallint | not null
points | smallint | not null
Indexes:
"group_standings_pkey" PRIMARY KEY, btree (team)
Check constraints:
"group_standings_draws_check" CHECK (draws >= 0)
"group_standings_losses_check" CHECK (losses >= 0)
"group_standings_points_check" CHECK (points >= 0)
"group_standings_wins_check" CHECK (wins >= 0)
heres my code
ALTER TABLE group_standings ADD COLUMN rank smallint;
I need help with writing the function to rank the teams

Copy a table (including indexes) in postgres

I have a postgres table. I need to delete some data from it. I was going to create a temporary table, copy the data in, recreate the indexes and the delete the rows I need. I can't delete data from the original table, because this original table is the source of data. In one case I need to get some results that depends on deleting X, in another case, I'll need to delete Y. So I need all the original data to always be around and available.
However it seems a bit silly to recreate the table and copy it again and recreate the indexes. Is there anyway in postgres to tell it "I want a complete separate copy of this table, including structure, data and indexes"?
Unfortunately PostgreSQL does not have a "CREATE TABLE .. LIKE X INCLUDING INDEXES'
New PostgreSQL ( since 8.3 according to docs ) can use "INCLUDING INDEXES":
# select version();
version
-------------------------------------------------------------------------------------------------
PostgreSQL 8.3.7 on x86_64-pc-linux-gnu, compiled by GCC cc (GCC) 4.2.4 (Ubuntu 4.2.4-1ubuntu3)
(1 row)
As you can see I'm testing on 8.3.
Now, let's create table:
# create table x1 (id serial primary key, x text unique);
NOTICE: CREATE TABLE will create implicit sequence "x1_id_seq" for serial column "x1.id"
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "x1_pkey" for table "x1"
NOTICE: CREATE TABLE / UNIQUE will create implicit index "x1_x_key" for table "x1"
CREATE TABLE
And see how it looks:
# \d x1
Table "public.x1"
Column | Type | Modifiers
--------+---------+-------------------------------------------------
id | integer | not null default nextval('x1_id_seq'::regclass)
x | text |
Indexes:
"x1_pkey" PRIMARY KEY, btree (id)
"x1_x_key" UNIQUE, btree (x)
Now we can copy the structure:
# create table x2 ( like x1 INCLUDING DEFAULTS INCLUDING CONSTRAINTS INCLUDING INDEXES );
NOTICE: CREATE TABLE / PRIMARY KEY will create implicit index "x2_pkey" for table "x2"
NOTICE: CREATE TABLE / UNIQUE will create implicit index "x2_x_key" for table "x2"
CREATE TABLE
And check the structure:
# \d x2
Table "public.x2"
Column | Type | Modifiers
--------+---------+-------------------------------------------------
id | integer | not null default nextval('x1_id_seq'::regclass)
x | text |
Indexes:
"x2_pkey" PRIMARY KEY, btree (id)
"x2_x_key" UNIQUE, btree (x)
If you are using PostgreSQL pre-8.3, you can simply use pg_dump with option "-t" to specify 1 table, change table name in dump, and load it again:
=> pg_dump -t x2 | sed 's/x2/x3/g' | psql
SET
SET
SET
SET
SET
SET
SET
SET
CREATE TABLE
ALTER TABLE
ALTER TABLE
ALTER TABLE
And now the table is:
# \d x3
Table "public.x3"
Column | Type | Modifiers
--------+---------+-------------------------------------------------
id | integer | not null default nextval('x1_id_seq'::regclass)
x | text |
Indexes:
"x3_pkey" PRIMARY KEY, btree (id)
"x3_x_key" UNIQUE, btree (x)
[CREATE [ [ GLOBAL | LOCAL ] { TEMPORARY | TEMP } ] TABLE table_name
[ (column_name [, ...] ) ]
[ WITH ( storage_parameter [= value] [, ... ] ) | WITH OIDS | WITHOUT OIDS ]
[ ON COMMIT { PRESERVE ROWS | DELETE ROWS | DROP } ]
[ TABLESPACE tablespace ]
AS query][1]
Here is an example
CREATE TABLE films_recent AS
SELECT * FROM films WHERE date_prod >= '2002-01-01';
The other way to create a new table from the first is to use
CREATE TABLE films_recent (LIKE films INCLUDING INDEXES);
INSERT INTO films_recent
SELECT *
FROM books
WHERE date_prod >= '2002-01-01';
Note that Postgresql has a patch out to fix tablespace issues if the second method is used
There are many answers on the web, one of them can be found here.
I ended up doing something like this:
create table NEW ( like ORIGINAL including all);
insert into NEW select * from ORIGINAL
This will copy the schema and the data including indexes, but not including triggers and constraints.
Note that indexes are shared with original table so when adding new row to either table the counter will increment.
I have a postgres table. I need to
delete some data from it.
I presume that ...
delete from yourtable
where <condition(s)>
... won't work for some reason. (Care to share that reason?)
I was going to create a temporary
table, copy the data in, recreate the
indexes and the delete the rows I
need.
Look into pg_dump and pg_restore. Using pg_dump with some clever options and perhaps editing the output before pg_restoring might do the trick.
Since you are doing "what if"-type analysis on the data, I wonder if might you be better off using views.
You could define a view for each scenario you want to test based on the negation of what you want to exclude. I.e., define a view based on what you want to INclude. E.g., if you want a "window" on the data where you "deleted" the rows where X=Y, then you would create a view as rows where (X != Y).
Views are stored in the database (in the System Catalog) as their defining query. Every time you query the view the database server looks up the underlying query that defines it and executes that (ANDed with any other conditions you used). There are several benefits to this approach:
You never duplicate any portion of your data.
The indexes already in use for the base table (your original, "real" table) will be used (as seen fit by the query optimizer) when you query each view/scenario. There is no need to redefine or copy them.
Since a view is a "window" (NOT a shapshot) on the "real" data in the base table, you can add/update/delete on your base table and simply re-query the view scenarios with no need to recreate anything as the data changes over time.
There is a trade-off, of course. Since a view is a virtual table and not a "real" (base) table, you're actually executing a (perhaps complex) query every time you access it. This may slow things down a bit. But it may not. It depends on many issues (size and nature of the data, quality of the statistics in the System Catalog, speed of the hardware, usage load, and much more). You won't know until you try it. If (and only if) you actually find that the performance is unacceptably slow, then you might look at other options. (Materialized views, copies of tables, ... anything that trades space for time.)
A simple way is include all:
CREATE TABLE new_table (LIKE original_table INCLUDING ALL);
Create a new table using a select to grab the data you want. Then swap the old table with the new one.
create table mynewone as select * from myoldone where ...
mess (re-create) with indexes after the table swap.