How to add a length constraint to a text field - sql

It seems it is preferable to use the TEXT datatype when using PostgreSQL (or some other databases which support it as well) rather than character varying(NN) because there is no performance penalty, and the maximum possible length can be adjusted by dropping and re-applying constraints without effecting any views etc. which use the field.
But, how is this constraint applied (SQL code)?

When you create the table you can do something of this sort,
CREATE TABLE names (
name text CONSTRAINT namechk CHECK (char_length(name) <= 255)
)
(namechk is just a name for the constraint)
Same goes for ALTER TABLE for example:
ALTER TABLE names
ADD CONSTRAINT namechk CHECK (char_length(name) <= 255);

There are really three things here:
Is it better to use text + a check constraint, or varchar(N)?
How would you write an appropriate check constraint?
Should you name your constraints, or let an automatic name be assigned?
Answers:
A varchar(N) will be more obvious when inspecting the schema, and what developers coming from other DBs will expect to see. However, as you say, it is harder to change later. Bear in mind that applying a new/modified check constraint is not free - all existing rows must be checked against the constraint, so on a large table, a lot of reading is necessary.
The syntax for a check constraint is CONSTRAINT name CHECK (condition) (or just CHECK (condition) and Postgres itself will come up with a name) in a CREATE TABLE statement, and ALTER TABLE table_name ADD CONSTRAINT name CHECK (condition);. condition would be an expression using an appropriate string function, e.g. char_length(foo) <= 255.
Adding a name for a constraint is very useful if you want to manage the constraint later. In particular, since you're using this for flexibility, you may well want to write code to drop and recreate the constraint with a new length. If you only ever use graphical tools, this isn't a problem, but managing multiple servers (e.g. development, testing, and production copies) becomes much easier if you can script your changes. With a named constraint, this would like ALTER TABLE foo DROP CONSTRAINT ck_bar_length; ALTER TABLE foo ADD CONSTRAINT ck_bar_length CHECK ( char_length(bar) <= 100 ); I can't actually think of a disadvantage of naming your constraint.

Related

PostgreSQL - Modify CREATE TABLE syntax in an extension

Is it possible to write an PostgreSQL extension that modifies the DDL syntax?
I'm creating an extension on top of PostgreSQL and PostGIS to support integrity constraints of a specific spatial data model (OMT-G). For that, I want to modify the CREATE TABLE syntax, which accept CONSTRAINTS with this syntax:
CONSTRAINT constraint_name CHECK ( expression )
But I want to create my own syntax, like those in the following example, which will then call functions or triggers I have written already.
CREATE TABLE school_district (
id integer PRIMARY KEY,
school_name varchar(120)
geom geometry,
SPATIAL_CONSTRAINT PLANAR_SUBDIVISION (geom),
SPATIAL_CONSTRAINT CONTAINS school (geom)**
);
Is that possible? If so, how?
As others have commented, it's not possible to alter Postgres grammar via an extension. There's been some discussion related to this on the hacker's mailing list, but no one sees any feasible way to make the Bison grammar extensible.
The other problem with what you're proposing is that CHECK constraints (which it appears is what you're trying to do here) can not safely reference other tables.
It sounds like what you really want here is extensible foreign key support. That's something the community would actually like as well, at least for arrays. The idea is to support something like int[], where each element should be treated as a foreign key reference to another table. What you're describing is similar: instead of a different data type, you want to use a different operator.
For now though, I think the best you could do is to provide your users a function that will put an appropriate column and trigger on the table. The column would be a foreign key to the school table. The trigger would find an appropriate school and populate it's primary key in the new column. (You'd probably want a BEFORE DELETE trigger on the school table to deal with deleting a school too.)
The reason you need the foreign key field is because foreign key triggers operate with different visibility rules than normal queries, so you can't fully simulate them in user space. If you're not quite that paranoid you could just have a single AFTER INSERT trigger that looks for a school and throws an error if it doesn't find one.
As far as any pure-DDL solutions, the most promising method is using CREATE TABLE ... CONSTRAINT ... EXCLUDE, which operates GiST on a single table, and has relatively limited operators that work only on the bounding box (e.g. &&):
CREATE TABLE polygons (
geom geometry(Polygon,4326),
EXCLUDE USING gist (geom WITH &&)
);
INSERT INTO polygons(geom)
VALUES('SRID=4326;POLYGON ((0 0, 0 1, 1 0, 0 0))');
But then, this conflicts (even though the geometries don't actually overlap):
INSERT INTO polygons(geom)
VALUES('SRID=4326;POLYGON ((1 1, 1 2, 2 2, 2 1, 1 1))');
ERROR: conflicting key value violates exclusion constraint "polygons_geom_excl"
DETAIL: Key (geom)=(0103000020E61000000100000005000000000000000000F03F000000000000F03F000000000000F03F0000000000000040000000000000004000000000000000400000000000000040000000000000F03F000000000000F03F000000000000F03F) conflicts with existing key (geom)=(0103000020E61000000100000004000000000000000000000000000000000000000000000000000000000000000000F03F000000000000F03F000000000000000000000000000000000000000000000000).
As mentioned above by #Jim, the best approach to have one table to build a constraint over another table is to make a good trigger function, and use it on both tables. It would normally be written in PL/pgSQL, where you can embed useful messages, such as:
RAISE EXCEPTION 'School % is not located in school district % (it is % away)',
s.name, d.name, ST_Distance(s.geom, d.geom);
This way, if you edit either the school_district or schools table, the trigger would to the check on UPDATE, INSERT, or DELETE to see if the conditions remain valid.

What is the best practice DDL for creating Tables? Single statement with all objects or many individual statements creating and altering?

Is there a best practice in that is closest to one of these examples?
CREATE TABLE TABLE1
(
ID NUMBER(18) CONSTRAINT TABLE1_PK PRIMARY KEY,
NAME VARCHAR2(10) CONSTRAINT NAME_NN NOT NULL
);
or
CREATE TABLE TABLE1
(
ID NUMBER(18),
NAME VARCHAR2(10) CONSTRAINT NAME_NN NOT NULL
);
ALTER TABLE TABLE1 ADD CONSTRAINT TABLE1_PK
PRIMARY KEY (ID)
USING INDEX (CREATE UNIQUE INDEX IDX_TABLE1_PK ON TABLE1 (ID));
Is either scenario going to result in a better outcome in general? The first option is much more readable, but perhaps there are reasons why the latter is preferable.
Definitely personal preference. I prefer to do as much as I can in the single CREATE TABLE statement simply because I find it more concise. Most everything I need is described right there.
Sometimes that's not possible. Say you have two tables with references to each, or you want to load up a table with a bunch of data first, so you add the additional indexes after the table is loaded.
You'll find many tool that create schemas from DBs will separate them (mostly because it's always correct -- define all the tables, then define all of the relationships).
But personally, if practical, I find having it all in one place is best.
When building a deployment script that is eventually going to be run by someone else later on, I prefer splitting the scripts a fair bit. If something goes wrong, it's a bit easier to tell from the logs what exactly failed.
My table creation script will usually only have NOT NULL constraints. The PK, unique and FK constraints will be added afterwards.
This is a minor point though, and I don't have anything in particular against combining it all in one big CREATE TABLE statement.
You may find that your workplace already has a standard in place. e.g. my current client requires separate scripts for the CREATE TABLE, then more separate scripts for constraints, indexes, etc.
The exception, of course, is index-organized tables which must have a PK constraint declared upfront.
It's a personal preference to define any attributes or defaults for a field in the actual create statement. One thing I noticed is your second statement won't work since you haven't specified the id field is NOT NULL.
I guess it's a personal best practice for readability that I specify the table's primary key upfront.
Another thing to consider when creating the table is how you want items identified, uniquely or composite. ALTER TABLE is good for creating composite keys after the fact.

Where to state a foreign key when creating a table?

I would like to design a table named arguments whose an attribute name is linked to another attribute name in a table called names.
I see two ways to express it in SQL:
by creating a constraint on the table:
CREATE TABLE names ( name text UNIQUE,
short text UNIQUE,
comment text);
CREATE TABLE arguments ( name text UNIQUE,
comment text,
FOREIGN KEY (name) REFERENCES names (name));
by qualifying the attribute on-the-fly:
CREATE TABLE names ( name text UNIQUE,
short text UNIQUE,
comment text);
CREATE TABLE arguments ( name text UNIQUE REFERENCES names (name),
comment text);
I would like to know:
if one of the two is commonly known as better than the other, and
if it can have consequences that I should be aware of.
Thank you for your help.
These are just different syntax for the same end result.
Either is appropriate, but the former is a more common style in my experience. This may simply be to allow the human mind to more easily digest all the information. First describe the data, second describe how it relates to the rest of the world.
One comment I would make though, is that is more common to have IDs as unique identifiers and references. This allows you to change the Value in the Name field without changing it's Identity and breaking Referential Integrity. There are databases that can cascade such changes and update all occurrences of the Name, but in general it's considered cleaner to have Identifiers that are Separate from the Data.
While the first option is known as out-of-line constraint declaration and the second option is in-line, both of them are functionally same.
What would be better is to assign a name to the foreign key constraint. If you have a name, you can selectively enable and disable the constraint if required.
Create table
CREATE TABLE arguments
(
name text UNIQUE,
comment text,
constraint arguments_fk FOREIGN KEY (name) REFERENCES names (name)
);
Disable constraint
ALTER TABLE arguments NOCHECK CONSTRAINT arguments_fk;
Enable constraint
ALTER TABLE arguments CHECK CONSTRAINT arguments_fk;
This is for SQL Server. Oracle has equivalent commands.
Use a foreign key. If you later find you have a performance problem (a measurable problem), then you can change it up to be different.
Keep things simple at first and get the product into the users' hands as fast as possible. Don't optimize things that you can't prove need it.

Named CONSTRAINT benefits

I'm learning SQL and stumbled about CONSTRAINT. I can define them like this:
CREATE TABLE products (
product_no integer,
name text,
price numeric CHECK (price > 0)
);
and like this:
CREATE TABLE products (
product_no integer,
name text,
price numeric CONSTRAINT positive_price CHECK (price > 0)
);
Why do I give them names? or Why I should or should not give them names?
As I can see in this example and most situations in my mind I can't reuse them. So what is the benefit of giving a name to a CONSTRAINT?
There are significant benefits of giving explicit names to your constraints. Just a few examples:
You can drop them by name.
If you use conventions when choosing the
name, then you can collect them
from meta tables and process them
programmatically.
It seems you are using PostgreSQL and there the difference isn't actually that big.
This is because a system generated name in PostgreSQL is actually somewhat meaningful. But "positive_price" is still easier to understand than "foo_price_check":
Think about which error message is better to understand:
new row for relation "foo" violates check constraint "foo_price_check"
or
new row for relation "foo" violates check constraint "positive_price"
In Oracle this is even worse because a system generated does not contain any hint on what was wrong:
ORA-02290: check constraint (SYS_C0024109) violated
vs.
ORA-02290: check constraint (POSITIVE_PRICE) violated
You don't specify RDBMS. the following points apply to SQL Server and I guess quite likely other RDBMSs too.
You need to know the name of the constraint to drop it, also the name of the constraint appears in constraint violation error messages so giving an explicit name can make these more meaningful (SQL Server will auto generate a name for the constraint that tells you nothing about the intent of the constraint).
Constraints as object in SQL, in the same manner that a PK, FK, Table or almost anything else is. If you give your constraint a name, you can easily drop it if required, say for example during some sort of bulk data import. If you don't give it a name, you can still drop it, however you have to find out the auto-genreated name that SQL will give it.
If your constraint is violated, having its name in the error message helps to debug it and present the error message to the user.
Named constraint have scenarios where they are really useful. Here are the ones I have encountered so far:
Schema compare tools
If you ever need to compare environment like DEV and UT to see differences, you may got "false possitives" on table level which is really annoying. The table definition is literally the same, but because autogenerated name is different, it is marked as change.
Yes, some tools allow to skip/ignore constraint name, but not all of them.
Deployment script from state-based migration tool
If you are using tools for state-based migration tool like MS SSDT, and then you manually review generated changes before applying them on PRODUCTION, you want your final script to be as short as possible.
Having a hundred of lines, that do drop constraint and create it with different name is "noise". Some constraint could be disabled from that tool, some are stil regenerated(DEFAULT/CHECK). Having explicit name and a good naming convention solves it for all.
EIBTI
Last, but not least. Being explicit simplifies things. If you have explicit name, you could refer that database object without searching system/metadata views.

What is the purpose of constraint naming

What is the purpose of naming your constraints (unique, primary key, foreign key)?
Say I have a table which is using natural keys as a primary key:
CREATE TABLE Order
(
LoginName VARCHAR(50) NOT NULL,
ProductName VARCHAR(50) NOT NULL,
NumberOrdered INT NOT NULL,
OrderDateTime DATETIME NOT NULL,
PRIMARY KEY(LoginName, OrderDateTime)
);
What benefits (if any) does naming my PK bring?
Eg.
Replace:
PRIMARY KEY(LoginName, OrderDateTime)
With:
CONSTRAINT Order_PK PRIMARY KEY(LoginName, OrderDateTime)
Sorry if my data model is not the best, I'm new to this!
Here's some pretty basic reasons.
(1) If a query (insert, update, delete) violates a constraint, SQL will generate an error message that will contain the constraint name. If the constraint name is clear and descriptive, the error message will be easier to understand; if the constraint name is a random guid-based name, it's a lot less clear. Particulary for end-users, who will (ok, might) phone you up and ask what "FK__B__B_COL1__75435199" means.
(2) If a constraint needs to be modified in the future (yes, it happens), it's very hard to do if you don't know what it's named. (ALTER TABLE MyTable drop CONSTRAINT um...) And if you create more than one instance of the database "from scratch" and use system-generated default names, no two names will ever match.
(3) If the person who gets to support your code (aka a DBA) has to waste a lot of pointless time dealing with case (1) or case (2) at 3am on Sunday, they're quite probably in a position to identify where the code came from and be able to react accordingly.
To identify the constraint in the future (e.g. you want to drop it in the future), it should have a unique name. If you don't specify a name for it, the database engine will probably assign a weird name (e.g. containing random stuff to ensure uniqueness) for you.
It keeps the DBAs happy, so they let your schema definition into the production database.
When your code randomly violates some foreign key constraint, it sure as hell saves time on debugging to figure out which one it was. Naming them greatly simplifies debugging your inserts and your updates.
It helps someone to know quickly what constraints are doing without having to look at the actual constraint, as the name gives you all the info you need.
So, I know if it is a primary key, unique key or default key, as well as the table and possibly columns involved.
By correctly naming all constraints, You can quickly associate a particular constraint with our data model. This gives us two real advantages:
We can quickly identify and fix any errors.
We can reliably modify or drop constraints.
By naming the constraints you can differentiate violations of them. This is not only useful for admins and developers, but your program can also use the constraint names. This is much more robust than trying to parse the error message. By using constraint names your program can react differently depending on which constraint was violated.
Constraint names are also very useful to display appropriate error messages in the user’s language mentioning which field caused a constraint violation instead of just forwarding a cryptic error message from the database server to the user.
See my answer on how to do this with PostgreSQL and Java.
While the OP's example used a permanent table, just remember that named constraints on temp tables behave like named constraints on permanent tables (i.e. you can't have multiple sessions with the exact same code handling the temp table, without it generating an error because the constraints are named the same). Because named constraints must be unique, if you absolutely must name a constraint on a temp table try to do so with some sort of randomized GUID (like SELECT NEWID() ) on the end of it to ensure that it will uniquely-named across sessions.
Another good reason to name constraints is if you are using version control on your database schema. In this case, if you have to drop and re-create a constraint using the default database naming (in my case SQL Server) then you will see differences between your committed version and the working copy because it will have a newly generated name. Giving an explicit name to the constraint will avoid this being flagged as a change.