Is it possible to write an PostgreSQL extension that modifies the DDL syntax?
I'm creating an extension on top of PostgreSQL and PostGIS to support integrity constraints of a specific spatial data model (OMT-G). For that, I want to modify the CREATE TABLE syntax, which accept CONSTRAINTS with this syntax:
CONSTRAINT constraint_name CHECK ( expression )
But I want to create my own syntax, like those in the following example, which will then call functions or triggers I have written already.
CREATE TABLE school_district (
id integer PRIMARY KEY,
school_name varchar(120)
geom geometry,
SPATIAL_CONSTRAINT PLANAR_SUBDIVISION (geom),
SPATIAL_CONSTRAINT CONTAINS school (geom)**
);
Is that possible? If so, how?
As others have commented, it's not possible to alter Postgres grammar via an extension. There's been some discussion related to this on the hacker's mailing list, but no one sees any feasible way to make the Bison grammar extensible.
The other problem with what you're proposing is that CHECK constraints (which it appears is what you're trying to do here) can not safely reference other tables.
It sounds like what you really want here is extensible foreign key support. That's something the community would actually like as well, at least for arrays. The idea is to support something like int[], where each element should be treated as a foreign key reference to another table. What you're describing is similar: instead of a different data type, you want to use a different operator.
For now though, I think the best you could do is to provide your users a function that will put an appropriate column and trigger on the table. The column would be a foreign key to the school table. The trigger would find an appropriate school and populate it's primary key in the new column. (You'd probably want a BEFORE DELETE trigger on the school table to deal with deleting a school too.)
The reason you need the foreign key field is because foreign key triggers operate with different visibility rules than normal queries, so you can't fully simulate them in user space. If you're not quite that paranoid you could just have a single AFTER INSERT trigger that looks for a school and throws an error if it doesn't find one.
As far as any pure-DDL solutions, the most promising method is using CREATE TABLE ... CONSTRAINT ... EXCLUDE, which operates GiST on a single table, and has relatively limited operators that work only on the bounding box (e.g. &&):
CREATE TABLE polygons (
geom geometry(Polygon,4326),
EXCLUDE USING gist (geom WITH &&)
);
INSERT INTO polygons(geom)
VALUES('SRID=4326;POLYGON ((0 0, 0 1, 1 0, 0 0))');
But then, this conflicts (even though the geometries don't actually overlap):
INSERT INTO polygons(geom)
VALUES('SRID=4326;POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))');
ERROR: conflicting key value violates exclusion constraint "polygons_geom_excl"
DETAIL: Key (geom)=(0103000020E61000000100000005000000000000000000F03F000000000000F03F000000000000F03F0000000000000040000000000000004000000000000000400000000000000040000000000000F03F000000000000F03F000000000000F03F) conflicts with existing key (geom)=(0103000020E61000000100000004000000000000000000000000000000000000000000000000000000000000000000F03F000000000000F03F000000000000000000000000000000000000000000000000).
As mentioned above by #Jim, the best approach to have one table to build a constraint over another table is to make a good trigger function, and use it on both tables. It would normally be written in PL/pgSQL, where you can embed useful messages, such as:
RAISE EXCEPTION 'School % is not located in school district % (it is % away)',
s.name, d.name, ST_Distance(s.geom, d.geom);
This way, if you edit either the school_district or schools table, the trigger would to the check on UPDATE, INSERT, or DELETE to see if the conditions remain valid.
Related
Is there a way to get the values for a CHECK Constraint
Example
CONSTRAINT TheCollumn CHECK (TheCollumn IN('One','Two','Three') )
I want to get the 'One' 'Two' 'Three' from a Query which I can then use to populate a Dropdown without having to retype the values in the dropdown list
I think you want a foreign key constraint and a reference table:
create table refTheColumn (
name varchar(255) primary key
);
. . .
constraint fk_thecolumn foreign key (theColumn) references refTheColumn(name);
Then you can populate the list with the reference table.
It's a bad idea, but here's the general approach:
USE tempdb
CREATE TABLE #tmp (v varchar(50));
ALTER TABLE #tmp ADD CONSTRAINT TheCollumn CHECK(v IN ('One', 'Two', 'Three'));
GO
SELECT definition FROM sys.check_constraints WHERE Name = 'TheCollumn'
Would output:
([v]='Three' OR [v]='Two' OR [v]='One')
You'd have to parse that in code (but parsing it in SQL would be very challenging and unwise). A foreign key, as Gordon Lindoff suggests, is definitely cleaner/easier to work with.
More reasons why this is a bad idea:
If the check constraint were defined differently, it may store it differently (hence Damien_The_Unbeliever's point about needing a SQL parser). For example, it might be AND clauses, or it might point to a function (consider [v] = right(SomeOtherColumn, 5), and now you have to interpret that)
The sys tables (sys.check_constraints) could change in future versions and isn't considered a reliable way to access this data. Your code might not survive a SQL upgrade (whereas using a reference table would). Even worse, it might not throw an exception on an upgrade that changes the SQL server functionality being leveraged, but it might create a bug that's difficult to track down or reproduce across environments (i.e. prod upgrades but dev doesn't).
It seems it is preferable to use the TEXT datatype when using PostgreSQL (or some other databases which support it as well) rather than character varying(NN) because there is no performance penalty, and the maximum possible length can be adjusted by dropping and re-applying constraints without effecting any views etc. which use the field.
But, how is this constraint applied (SQL code)?
When you create the table you can do something of this sort,
CREATE TABLE names (
name text CONSTRAINT namechk CHECK (char_length(name) <= 255)
)
(namechk is just a name for the constraint)
Same goes for ALTER TABLE for example:
ALTER TABLE names
ADD CONSTRAINT namechk CHECK (char_length(name) <= 255);
There are really three things here:
Is it better to use text + a check constraint, or varchar(N)?
How would you write an appropriate check constraint?
Should you name your constraints, or let an automatic name be assigned?
Answers:
A varchar(N) will be more obvious when inspecting the schema, and what developers coming from other DBs will expect to see. However, as you say, it is harder to change later. Bear in mind that applying a new/modified check constraint is not free - all existing rows must be checked against the constraint, so on a large table, a lot of reading is necessary.
The syntax for a check constraint is CONSTRAINT name CHECK (condition) (or just CHECK (condition) and Postgres itself will come up with a name) in a CREATE TABLE statement, and ALTER TABLE table_name ADD CONSTRAINT name CHECK (condition);. condition would be an expression using an appropriate string function, e.g. char_length(foo) <= 255.
Adding a name for a constraint is very useful if you want to manage the constraint later. In particular, since you're using this for flexibility, you may well want to write code to drop and recreate the constraint with a new length. If you only ever use graphical tools, this isn't a problem, but managing multiple servers (e.g. development, testing, and production copies) becomes much easier if you can script your changes. With a named constraint, this would like ALTER TABLE foo DROP CONSTRAINT ck_bar_length; ALTER TABLE foo ADD CONSTRAINT ck_bar_length CHECK ( char_length(bar) <= 100 ); I can't actually think of a disadvantage of naming your constraint.
Is there a best practice in that is closest to one of these examples?
CREATE TABLE TABLE1
(
ID NUMBER(18) CONSTRAINT TABLE1_PK PRIMARY KEY,
NAME VARCHAR2(10) CONSTRAINT NAME_NN NOT NULL
);
or
CREATE TABLE TABLE1
(
ID NUMBER(18),
NAME VARCHAR2(10) CONSTRAINT NAME_NN NOT NULL
);
ALTER TABLE TABLE1 ADD CONSTRAINT TABLE1_PK
PRIMARY KEY (ID)
USING INDEX (CREATE UNIQUE INDEX IDX_TABLE1_PK ON TABLE1 (ID));
Is either scenario going to result in a better outcome in general? The first option is much more readable, but perhaps there are reasons why the latter is preferable.
Definitely personal preference. I prefer to do as much as I can in the single CREATE TABLE statement simply because I find it more concise. Most everything I need is described right there.
Sometimes that's not possible. Say you have two tables with references to each, or you want to load up a table with a bunch of data first, so you add the additional indexes after the table is loaded.
You'll find many tool that create schemas from DBs will separate them (mostly because it's always correct -- define all the tables, then define all of the relationships).
But personally, if practical, I find having it all in one place is best.
When building a deployment script that is eventually going to be run by someone else later on, I prefer splitting the scripts a fair bit. If something goes wrong, it's a bit easier to tell from the logs what exactly failed.
My table creation script will usually only have NOT NULL constraints. The PK, unique and FK constraints will be added afterwards.
This is a minor point though, and I don't have anything in particular against combining it all in one big CREATE TABLE statement.
You may find that your workplace already has a standard in place. e.g. my current client requires separate scripts for the CREATE TABLE, then more separate scripts for constraints, indexes, etc.
The exception, of course, is index-organized tables which must have a PK constraint declared upfront.
It's a personal preference to define any attributes or defaults for a field in the actual create statement. One thing I noticed is your second statement won't work since you haven't specified the id field is NOT NULL.
I guess it's a personal best practice for readability that I specify the table's primary key upfront.
Another thing to consider when creating the table is how you want items identified, uniquely or composite. ALTER TABLE is good for creating composite keys after the fact.
I'm learning SQL and stumbled about CONSTRAINT. I can define them like this:
CREATE TABLE products (
product_no integer,
name text,
price numeric CHECK (price > 0)
);
and like this:
CREATE TABLE products (
product_no integer,
name text,
price numeric CONSTRAINT positive_price CHECK (price > 0)
);
Why do I give them names? or Why I should or should not give them names?
As I can see in this example and most situations in my mind I can't reuse them. So what is the benefit of giving a name to a CONSTRAINT?
There are significant benefits of giving explicit names to your constraints. Just a few examples:
You can drop them by name.
If you use conventions when choosing the
name, then you can collect them
from meta tables and process them
programmatically.
It seems you are using PostgreSQL and there the difference isn't actually that big.
This is because a system generated name in PostgreSQL is actually somewhat meaningful. But "positive_price" is still easier to understand than "foo_price_check":
Think about which error message is better to understand:
new row for relation "foo" violates check constraint "foo_price_check"
or
new row for relation "foo" violates check constraint "positive_price"
In Oracle this is even worse because a system generated does not contain any hint on what was wrong:
ORA-02290: check constraint (SYS_C0024109) violated
vs.
ORA-02290: check constraint (POSITIVE_PRICE) violated
You don't specify RDBMS. the following points apply to SQL Server and I guess quite likely other RDBMSs too.
You need to know the name of the constraint to drop it, also the name of the constraint appears in constraint violation error messages so giving an explicit name can make these more meaningful (SQL Server will auto generate a name for the constraint that tells you nothing about the intent of the constraint).
Constraints as object in SQL, in the same manner that a PK, FK, Table or almost anything else is. If you give your constraint a name, you can easily drop it if required, say for example during some sort of bulk data import. If you don't give it a name, you can still drop it, however you have to find out the auto-genreated name that SQL will give it.
If your constraint is violated, having its name in the error message helps to debug it and present the error message to the user.
Named constraint have scenarios where they are really useful. Here are the ones I have encountered so far:
Schema compare tools
If you ever need to compare environment like DEV and UT to see differences, you may got "false possitives" on table level which is really annoying. The table definition is literally the same, but because autogenerated name is different, it is marked as change.
Yes, some tools allow to skip/ignore constraint name, but not all of them.
Deployment script from state-based migration tool
If you are using tools for state-based migration tool like MS SSDT, and then you manually review generated changes before applying them on PRODUCTION, you want your final script to be as short as possible.
Having a hundred of lines, that do drop constraint and create it with different name is "noise". Some constraint could be disabled from that tool, some are stil regenerated(DEFAULT/CHECK). Having explicit name and a good naming convention solves it for all.
EIBTI
Last, but not least. Being explicit simplifies things. If you have explicit name, you could refer that database object without searching system/metadata views.
Why does the SQL Standard accept this? Which are the benefits?
If have those tables:
create table prova_a (a number, b number);
alter table prova_a add primary key (a,b);
create table prova_b (a number, b number);
alter table prova_b add foreign key (a,b) references prova_a(a,b) ;
insert into prova_a values (1,2);
You can insert this without error:
insert into prova_b values (123,null);
insert into prova_b values (null,123);
Note1: This comes from this answer.
Note2: This can be avoid, setting not null on both columns.
Remarks: I'm not asking about avoid, I'm interested on which are the beneficts.
References:
Oracle documentation: The relational model permits the value of foreign keys to match either the referenced primary or unique key value, or be null. If any column of a composite foreign key is null, then the non-null portions of the key do not have to match any corresponding portion of a parent key.
SQL Server documentation: A FOREIGN KEY constraint can contain null values; however, if any column of a composite FOREIGN KEY constraint contains null values, verification of all values that make up the FOREIGN KEY constraint is skipped.
I know some DBMSs simply don't enforce referential integrity when it comes to foreign keys with foreign key constraints. SQLite comes to mind. It's talked about here.
Other DBMSs are different, I know that MS SQL Server will complain if you attempt something like that.
SQLite has its uses but it is not meant to be used in high-concurrency situations. If you are seeing this behavior in a different DBMS, check their documentation to see if they did something similar. Most should be enforcing integrity however.
at least do your DEV work with a reasonably standard RDBMS, even if you are doing your production system with something like SQLite (which is an excellent database- it runs in your Ipod touch!) It will flush out all these mistakes- like Lint really. If you run your code with SQL Server Express, which you can download for free, you'll get plenty of errors such as...
Msg 8111, Level 16, State 1, Line 2
Cannot define PRIMARY KEY constraint on nullable column in table 'prova_a'.
Msg 1750, Level 16, State 0, Line 2
Could not create constraint. See previous errors.
Oracle and SQL Server both allow NULL foreign keys, and it is easily understandable why this is necessary.
Think of a tree, for instance, where every row has a parent key that references the primary key of the same table. There has to be a root node in the tree that does not have a parent, and the parent key will be null.
A more tangible example: think of employees and managers. Some people in the company, and if it is only the CEO, will not have a manager. Were it not possible to set the manager id on the employee table to NULL, you would have to create a "No Manager" employee - something that is just wrong, because it has no real-life correspondence.
Now that we know this, it is obvious why your composite keys behave like they do. Logically, if part of the composite is NULL, the entire key is null. A string concatenation returns NULL if one of the pieces is NULL. There cannot be a match, and the constraint is not enforced in these cases.
The SQL standard doesn't accept this; you've found a DBMS that doesn't enforce referential integrity. Uninstall it now if you're smart. At a bare minimum, don't use it for production purposes.
Earlier SQL standards (SQL86) had no referential integrity and SQL89 level 2 fixed that.
Try adding this declaration:
alter table prova_b add primary key (a,b);
This will forbid NULLS in prova_b. It will also forbid duplicate entries. In Oracle and SQL server, it will also create an index. This index will speed up lookups and joins, at the cost of slowing down inserts a tiny bit.
Is this what you want to do?
As to why standard SQL lets you do something you consider stupid, that's a philosophical question. Most tools allow some stupid choices. Tools that try to forbid all stupid choices generally end up forbidding some really smart choices unintentionally.