What is the best practice DDL for creating Tables? Single statement with all objects or many individual statements creating and altering?

What is the best practice DDL for creating Tables? Single statement with all objects or many individual statements creating and altering? - sql

Is there a best practice in that is closest to one of these examples?
CREATE TABLE TABLE1
(
ID NUMBER(18) CONSTRAINT TABLE1_PK PRIMARY KEY,
NAME VARCHAR2(10) CONSTRAINT NAME_NN NOT NULL
);
or
CREATE TABLE TABLE1
(
ID NUMBER(18),
NAME VARCHAR2(10) CONSTRAINT NAME_NN NOT NULL
);
ALTER TABLE TABLE1 ADD CONSTRAINT TABLE1_PK
PRIMARY KEY (ID)
USING INDEX (CREATE UNIQUE INDEX IDX_TABLE1_PK ON TABLE1 (ID));
Is either scenario going to result in a better outcome in general? The first option is much more readable, but perhaps there are reasons why the latter is preferable.

Definitely personal preference. I prefer to do as much as I can in the single CREATE TABLE statement simply because I find it more concise. Most everything I need is described right there.
Sometimes that's not possible. Say you have two tables with references to each, or you want to load up a table with a bunch of data first, so you add the additional indexes after the table is loaded.
You'll find many tool that create schemas from DBs will separate them (mostly because it's always correct -- define all the tables, then define all of the relationships).
But personally, if practical, I find having it all in one place is best.

When building a deployment script that is eventually going to be run by someone else later on, I prefer splitting the scripts a fair bit. If something goes wrong, it's a bit easier to tell from the logs what exactly failed.
My table creation script will usually only have NOT NULL constraints. The PK, unique and FK constraints will be added afterwards.
This is a minor point though, and I don't have anything in particular against combining it all in one big CREATE TABLE statement.
You may find that your workplace already has a standard in place. e.g. my current client requires separate scripts for the CREATE TABLE, then more separate scripts for constraints, indexes, etc.
The exception, of course, is index-organized tables which must have a PK constraint declared upfront.

It's a personal preference to define any attributes or defaults for a field in the actual create statement. One thing I noticed is your second statement won't work since you haven't specified the id field is NOT NULL.
I guess it's a personal best practice for readability that I specify the table's primary key upfront.
Another thing to consider when creating the table is how you want items identified, uniquely or composite. ALTER TABLE is good for creating composite keys after the fact.

Related

SQL: lookup for UUID

I have my user table (pseudo sql, because I use an ORM and I must support several different DB types):
id: INTEGER, PK, AUTOINCREMENT
UUID : BINARY(16) (inserted by an update, it's a hash(id) )
I am currently using id for FK in all other tables.
However, in my REST API, I have to serve informations with the UUID, which causes a problem later to query.
Should I:
FK on the UUID instead?
just lookup id(UUID) each time (fast thanks to cache mechanism after a while)?

In general, it is better to use the auto-incremented id for the foreign key reference rather than some other combination of unique columns.
One important reason is that indexes on a single integer are more efficient than indexes on other column types -- if for no other reason than the index being smaller, so it occupies less disk and less memory. Also, there is additional overhead to storing the longer UUID in secondary tables.
This is not the only consideration. Another consideration is that you could change the UUID, if necessary, without changing the foreign key references. For instance, you may wake up one day and say "that id has to start with AAA". You can alter the table and update the table and be done with it -- or you could worry about foreign key references as well. Or, you might add an organization column and decide that the unique key is a combination of the UUID and organization. These operations are much harder/slower if the UUID is being used as a foreign key reference.
When you have composite primary keys (more than one column), using the auto-incremented id is an even better idea. In this case, using the id for joins prevents mistakes where one of the join conditions might be left out.
As you point out, looking up the UUID for a given id should be a fast operation with the correct indexes. There may be some borderline cases where you would not want to have an id, but in general, it is a good idea.

Oracle Database Enforce CHECK on multiple tables

I am trying to enforce a CHECK Constraint in a ORACLE Database on multiple tables
CREATE TABLE RollingStocks (
Id NUMBER,
Name Varchar2(80) NOT NULL,
RollingStockCategoryId NUMBER NOT NULL,
CONSTRAINT Pk_RollingStocks Primary Key (Id),
CONSTRAINT Check_RollingStocks_CategoryId
CHECK ((RollingStockCategoryId IN (SELECT Id FROM FreightWagonTypes))
OR
(RollingStockCategoryId IN (SELECT Id FROM LocomotiveClasses)))
);
...but i get the following error:
*Cause: Subquery is not allowed here in the statement.
*Action: Remove the subquery from the statement.
Can you help me understanding what is the problem or how to achieve the same result?

Check constraints are very limited in Oracle. To do a check like you propose, you'd have to implement a PL/SQL trigger.
My advise would be to avoid triggers altogether. Implement a stored procedure that modifies the database and includes the checks. Stored procedures are easier to maintain, although they are slightly harder to implement. But changing a front end from direct table access to stored procedure access pays back many times in the long run.

What you are trying to is ensure that the values inserted in one table exist in another table i.e. enforce a foreign key. So that would be :
CREATE TABLE RollingStocks (
...
CONSTRAINT Pk_RollingStocks Primary Key (Id),
CONSTRAINT RollingStocks_CategoryId_FK (RollingStockCategoryId )
REFERENCES FreightWagonTypes (ID)
);
Except that you want to enforce a foreign key which references two tables. This cannot be done.
You have a couple of options. One would be to merge FreightWagonTypes and LocomotiveClasses into a single table. If you need separate tables for other parts of your application then you could build a materialized view for the purposes of enforcing the foreign key. Materialized Views are like tables and can be referenced by foreign keys. This option won't work if the key values for the two tables clash.
Another option is to recognise that the presence of two candidate referenced tables suggests that RollingStock maybe needs to be split into two tables - or perhaps three: a super type and two sub-type tables, that is RollingStock and FreightWagons, Locomotives.
By the way, what about PassengerCoaches, GuardsWagons and RestaurantCars?

Oracle doesn't support complex check constraints like that, unfortunately.
In this case, your best option is to change the data model a bit - add a parent table over FreightWagonTypes and LocomotiveClasses, which will hold all the ids from both of these tables. That way you can add a FK to a single table.

What is the purpose of constraint naming

What is the purpose of naming your constraints (unique, primary key, foreign key)?
Say I have a table which is using natural keys as a primary key:
CREATE TABLE Order
(
LoginName VARCHAR(50) NOT NULL,
ProductName VARCHAR(50) NOT NULL,
NumberOrdered INT NOT NULL,
OrderDateTime DATETIME NOT NULL,
PRIMARY KEY(LoginName, OrderDateTime)
);
What benefits (if any) does naming my PK bring?
Eg.
Replace:
PRIMARY KEY(LoginName, OrderDateTime)
With:
CONSTRAINT Order_PK PRIMARY KEY(LoginName, OrderDateTime)
Sorry if my data model is not the best, I'm new to this!

Here's some pretty basic reasons.
(1) If a query (insert, update, delete) violates a constraint, SQL will generate an error message that will contain the constraint name. If the constraint name is clear and descriptive, the error message will be easier to understand; if the constraint name is a random guid-based name, it's a lot less clear. Particulary for end-users, who will (ok, might) phone you up and ask what "FK__B__B_COL1__75435199" means.
(2) If a constraint needs to be modified in the future (yes, it happens), it's very hard to do if you don't know what it's named. (ALTER TABLE MyTable drop CONSTRAINT um...) And if you create more than one instance of the database "from scratch" and use system-generated default names, no two names will ever match.
(3) If the person who gets to support your code (aka a DBA) has to waste a lot of pointless time dealing with case (1) or case (2) at 3am on Sunday, they're quite probably in a position to identify where the code came from and be able to react accordingly.

To identify the constraint in the future (e.g. you want to drop it in the future), it should have a unique name. If you don't specify a name for it, the database engine will probably assign a weird name (e.g. containing random stuff to ensure uniqueness) for you.

It keeps the DBAs happy, so they let your schema definition into the production database.

When your code randomly violates some foreign key constraint, it sure as hell saves time on debugging to figure out which one it was. Naming them greatly simplifies debugging your inserts and your updates.

It helps someone to know quickly what constraints are doing without having to look at the actual constraint, as the name gives you all the info you need.
So, I know if it is a primary key, unique key or default key, as well as the table and possibly columns involved.

By correctly naming all constraints, You can quickly associate a particular constraint with our data model. This gives us two real advantages:
We can quickly identify and fix any errors.
We can reliably modify or drop constraints.

By naming the constraints you can differentiate violations of them. This is not only useful for admins and developers, but your program can also use the constraint names. This is much more robust than trying to parse the error message. By using constraint names your program can react differently depending on which constraint was violated.
Constraint names are also very useful to display appropriate error messages in the user’s language mentioning which field caused a constraint violation instead of just forwarding a cryptic error message from the database server to the user.
See my answer on how to do this with PostgreSQL and Java.

While the OP's example used a permanent table, just remember that named constraints on temp tables behave like named constraints on permanent tables (i.e. you can't have multiple sessions with the exact same code handling the temp table, without it generating an error because the constraints are named the same). Because named constraints must be unique, if you absolutely must name a constraint on a temp table try to do so with some sort of randomized GUID (like SELECT NEWID() ) on the end of it to ensure that it will uniquely-named across sessions.

Another good reason to name constraints is if you are using version control on your database schema. In this case, if you have to drop and re-create a constraint using the default database naming (in my case SQL Server) then you will see differences between your committed version and the working copy because it will have a newly generated name. Giving an explicit name to the constraint will avoid this being flagged as a change.

Why specify primary/foreign key attributes in column names

A couple of recent questions discuss strategies for naming columns, and I was rather surprised to discover the concept of embedding the notion of foreign and primary keys in column names. That is
select t1.col_a, t1.col_b, t2.col_z
from t1 inner join t2 on t1.id_foo_pk = t2.id_foo_fk
I have to confess I have never worked on any database system that uses this sort of scheme, and I'm wondering what the benefits are. The way I see it, once you've learnt the N principal tables of a system, you'll write several orders of magnitude more requests with those tables.
To become productive in development, you'll need to learn which tables are the important tables, and which are simple tributaries. You'll want to commit an good number of column names to memory. And one of the basic tasks is to join two tables together. To reduce the learning effort, the easiest thing to do is to ensure that the column name is the same in both tables:
select t1.col_a, t1.col_b, t2.col_z
from t1 inner join t2 on t1.id_foo = t2.id_foo
I posit that, as a developer, you don't need to be reminded that much about which columns are primary keys, which are foreign and which are nothing. It's easy enough to look at the schema if you're curious. When looking at a random
tx inner join ty on tx.id_bar = ty.id_bar
... is it all that important to know which one is the foreign key? Foreign keys are important only to the database engine itself, to allow it to ensure referential integrity and do the right thing during updates and deletes.
What problem is being solved here? (I know this is an invitation to discuss, and feel free to do so. But at the same time, I am looking for an answer, in that I may be genuinely missing something).

I agree with you. Putting this information in the column name smacks of the crappy Hungarian Notation idiocy of the early Windows days.

I agree with you that the foreign key column in a child table should have the same name as the primary key column in the parent table. Note that this permits syntax like the following:
SELECT * FROM foo JOIN bar USING (foo_id);
The USING keyword assumes that a column exists by the same name in both tables, and that you want an equi-join. It's nice to have this available as shorthand for the more verbose:
SELECT * FROM foo JOIN bar ON (foo.foo_id = bar.foo_id);
Note, however, there are cases when you can't name the foreign key the same as the primary key it references. For example, in a table that has a self-reference:
CREATE TABLE Employees (
emp_id INT PRIMARY KEY,
manager_id INT REFERENCES Employees(emp_id)
);
Also a table may have multiple foreign keys to the same parent table. It's useful to use the name of the column to describe the nature of the relationship:
CREATE TABLE Bugs (
...
reported_by INT REFERENCES Accounts(account_id),
assigned_to INT REFERENCES Accounts(account_id),
...
);
I don't like to include the name of the table in the column name. I also eschew the obligatory "id" as the name of the primary key column in every table.

I've espoused most of the ideas proposed here over the 20-ish years I've been developing with SQL databases, I'm embarrassed to say. Most of them delivered few or none of the expected benefits and were with hindsight, a pain in the neck.
Any time I've spent more than a few hours with a schema I've fairly rapidly become familiar with the most important tables and their columns. Once it got to a few months, I'd pretty much have the whole thing in my head.
Who is all this explanation for? Someone who only spends a few minutes with the design isn't going to be doing anything serious anyway. Someone who plans to work with it for a long time will learn it if you named your columns in Sanskrit.
Ignoring compound primary keys, I don't see why something as simple as "id" won't suffice for a primary key, and "_id" for foreign keys.
So a typical join condition becomes customer.id = order.customer_id.
Where more than one association between two tables exists, I'd be inclined to use the association rather than the table name, so perhaps "parent_id" and "child_id" rather than "parent_person_id" etc

I only use the tablename with an Id suffix for the primary key, e.g. CustomerId, and foreign keys referencing that from other tables would also be called CustomerId. When you reference in the application it becomes obvious the table from the object properties, e.g. Customer.TelephoneNumber, Customer.CustomerId, etc.

I used "fk_" on the front end of any foreign keys for a table mostly because it helped me keep it straight when developing the DB for a project at my shop. Having not done any DB work in the past, this did help me. In hindsight, perhaps I didn't need to do that but it was three characters tacked onto some column names so I didn't sweat it.
Being a newcomer to writing DB apps, I may have made a few decisions which would make a seasoned DB developer shudder, but I'm not sure the foreign key thing really is that big a deal. Again, I guess it is a difference in viewpoint on this issue and I'll certainly take what you've written and cogitate on it.
Have a good one!

I agree with you--I take a different approach that I have seen recommended in many corporate environments:
Name columns in the format TableNameFieldName, so if I had a Customer table and UserName was one of my fields, the field would be called CustomerUserName. That means that if I had another table called Invoice, and the customer's user name was a foreign key, I would call it InvoiceCustomerUserName, and when I referenced it, I would call it Invoice.CustomerUserName, which immediately tells me which table it's in.
Also, this naming helps you to keep track of the tables your columns are coming from when you're joiining.
I only use FK_ and PK_ in the ACTUAL names of the foreign and primary keys in the DBMS.

Should I use an ENUM for primary and foreign keys?

An associate has created a schema that uses an ENUM() column for the primary key on a lookup table. The table turns a product code "FB" into it's name "Foo Bar".
This primary key is then used as a foreign key elsewhere. And at the moment, the FK is also an ENUM().
I think this is not a good idea. This means that to join these two tables, we end up with four lookups. The two tables, plus the two ENUM(). Am I correct?
I'd prefer to have the FKs be CHAR(2) to reduce the lookups. I'd also prefer that the PKs were also CHAR(2) to reduce it completely.
The benefit of the ENUM()s is to get constraints on the values. I wish there was something like: CHAR(2) ALLOW('FB', 'AB', 'CD') that we could use for both the PK and FK columns.
What is: Best PracticeYour preference
This concept is used elsewhere too. What if the ENUM()'s values are longer? ENUM('Ding, dong, dell', 'Baa baa black sheep'). Now the ENUM() is useful from a space point-of-view. Should I only care about this if there are several million rows using the values? In which case, the ENUM() saves storage space.

ENUM should be used to define a possible range of values for a given field. This also implies that you may have multiple rows which have the same value for this perticular field.
I would not recommend using an ENUM for a primary key type of foreign key type.
Using an ENUM for a primary key means that adding a new key would involve modifying the table since the ENUM has to be modified before you can insert a new key.
I am guessing that your associate is trying to limit who can insert a new row and that number of rows is limited. I think that this should be achieved through proper permission settings either at the database level or at the application and not through using an ENUM for the primary key.
IMHO, using an ENUM for the primary key type violates the KISS principle.

but when you only trapped with differently 10 or less rows that wont be a problem
e.g's
CREATE TABLE `grade`(
`grade` ENUM('A','B','C','D','E','F') PRIMARY KEY,
`description` VARCHAR(50) NOT NULL
)
This table it is more than diffecult to get a DML

We've had more discussion about it and here's what we've come up with:
Use CHAR(2) everywhere. For both the PK and FK. Then use mysql's foreign key constraints to disallow creating an FK to a row that doesn't exist in the lookup table.
That way, given the lookup table is L, and two referring tables X and Y, we can join X to Y without any looking up of ENUM()s or table L and can know with certainty that there's a row in L if (when) we need it.
I'm still interested in comments and other thoughts.

Having a lookup table and a enum means you are changing values in two places all the time. Funny... We spent to many years using enums causing issues where we need to recompile to add values. In recent years, we have moved away from enums in many situations an using the values in our lookup tables. The biggest value I like about lookup tables is that you add or change values without needing to compile. Even with millions of rows I would stick to the lookup tables and just be intelligent in your database design

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas