Why do we use default along with not null in database columns? - sql

Isn't it extra overhead?
When we mention default x (for example, alter table users add column id default 0 it is not gonna allow null at database level ) . So why use not null along with default in queries for a column?

A default clause is only applied when you don't reference the column when inserting, explicitly inserting (or updating) null to a column will still allow you to store null. Using a not null constraint prevents that.
So both clauses serve different purposes, and there is no overlap.
The SQL standard allows you to use DEFAULT instead of a value to explicitly assign the default value in an insert or update. Be aware though that not all DBMSes support this.

A DEFAULT value is useful when you don't want to always specify a value (especially when you have tables with 10+ columns, and a successful INSERT only requires a couple column's worth of data). If you had a cars table with a has_steering_wheel column (which is t in most cases), you could rely on the DEFAULT instead of specifying the column every time you wanted to do an INSERT. Basically, it saves you some keystrokes.
A NOT NULL constraint is useful when you need to require a value, and there should be no exceptions. A typical example would be a color on a cars table.
To combine both DEFAULT and NOT NULL would be to require a value, and that value doesn't usually deviate from the standard. An example would be a has_power_locks column on a cars table (typically the answer is t, and sometimes it could be f, but it should never be NULL -- unless your application is coded to handle NULL with either t or f).

Related

Is there any difference between inline and out-of-line constraint in sql?

Which one to use or which one is the better? There's any difference?
searchtype_id bigint NOT NULL
or
CONSTRAINT searchtype_id_nn CHECK ((searchtype_id IS NOT NULL))
Is there a difference? Yes. NOT NULL is part of the storage definition of the type of the column. So, NOT NULL can affect how the value is stored (is a NULL-flag required?). A NOT NULL definition can also be used for optimizations during the compilation phase of a query.
By contrast, a CHECK constraint does validate that the data meets certain characteristics, but it is less likely that this information will be used in the compilation phase.
The NOT NULL definition predates the CHECK constraint and is standard across all databases.
NULL-ability is something that I think of as part of the type -- because it is a declaration built into the language that says "this column is required to have a value". An integer column that can take on NULL values is subtly different from an integer that cannot.
I would recommend using the NOT NULL syntax rather than a CHECK constraint. It gives the database more information about the column.
Both are different and it's hard to choose between them as some things apply to one while some doesn't to other like NOT NULL constraint can only be declared inline while CHECK constraints can applied to out-of-line constraint.
If I had to choose between one of them I'll choose out-of-line as:
I'll be able to individually name my constraints which helps in debugging once you get an error or something.
Also CHECK constraint will allow me to refer single as well as multiple columns which can only be used as out-of-line constraint.

NULL vs NOT NULL Performance differences

Recently i was looking at a SQL table we have and noticed the following.
[FooColumn] CHAR (1) DEFAULT ('N') NOT NULL,
Above you can see that FooColumn Would always default to 'N' but still has a "NOT NULL" specified.
Would there be some Storage/Performance differences in setting a column to "NOT NULL" instead of "NULL" ?
How would SQL Server Treat a "NOT NULL: different from a "NULL" column ?
NOTE: This is ONLY for SQL and not the overhead of externally doing NULL checks
You should only use NOT NULL when you have a reason (ie required field for UI or for backend relationships). NOT NULL vs NULL performance is negligible and as per this article from 2016 (SQL SERVER), performance shouldn't be a consideration when deciding NOT NULL vs NULL.
Even though that field will default to 'N', a command could still set it to NULL if nulls were allowed. It comes down to is NULL a valid piece of data for that column.
EDIT
In a data-driven technical application, in my experience these are some guidelines we use:
for numeric fields, NULL is unknown to the user, and all numbers have meaning.
for string fields, NULL and "" are identical to the user, so it depends on you backend application.
I know that your question was excluding ISNULL checks but if you are doing a lot of them then it might be a code smell that those fields should be NOT NULL if possible since they can get expensive.
It's a complicated "debate".
NULL means unknown. It's different from 0 or empty string.
NOT NULL means you NEED to insert a value in there, always, even if it's a blank string or a 0. Many designers argue that's it's better design. Other see no issues with having NULL values. Different software houses will enforce different rules.
Having a "default" value simply means that when you create new records without specifying a value, it will use the default value instead. This is regardless of whether the field is NULL or NOT NULL.
Having NULL values MAY have an impact on performance (as the DBMS needs to deal with this special case), it will depend on which DBMS you are using, which version, which config etc... You need to do bench-marking with your own setup to see what's what.
Here's a good article: http://www.itprotoday.com/microsoft-sql-server/designing-performance-null-or-not-null
As the question is asked :
"NULL vs NOT NULL Performance differences"
, the answer must be based on the storage structure of the line and the difference in treatment of the line in the event of a Null.
The answer is: there is no difference.
Here are articles discussing line structure into SQL server:
https://www.red-gate.com/simple-talk/sql/database-administration/sql-server-storage-internals-101/
https://aboutsqlserver.com/2013/10/15/sql-server-storage-engine-data-pages-and-data-rows/
Here the column is defined as CHAR(1) so it is a fixed size column.
The difference between an empty string''' and Null is checked in the line structure information. There is no structural space saving storing null or an empty string; the structural information does not change depending on the definition of the constraint.
If you are looking for performance in the relation of data structure then you need to look elsewhere.
IMHO :
A column defined as CHAR(1) often contains coded information with few distinct values.
It is also common that this kind of column points to a "translation" table through FK.
So, if it is a "2-state indicator value" then the BIT type can be used knowing that all columns of this type are grouped together in the same byte.
If more different cases (more distinct values) are needed then the tinyint type will also occupy 1 byte of fixed size but will not require validation of the collation to process ties. (note : TinyInt offer more values than CHAR(1) )
Elsewhere, if you don't have a FK constraint yet, this must be balanced.
[FooColumn] CHAR (1) DEFAULT ('N') NOT NULL,
It is far better than NCHAR(1), VARCHAR(1) or NVARCHAR(1) !
(For MySQL check FooColumn CHARACTER SET)
But, depend your RDBMs and existant development, investigate if you can use BIT or TinyInt (no collation)
The extra cost of the needed test to check 'NOT NULL' compared to none for 'NULL' is very, very minimal.

What's the appropriate table structure for storing dynamic fields?

I like the entity-attribute-value thing because I can add new fields and have the rows automatically removed when the foreign table row is removed, but I don't like the fact that I can't enforce a data type. And the select queries are complicated.
Are there better ways that don't involve creating a table for each attribute?
If I create a very big table with every possible attribute, will this table take up space even if most rows will have NULL on most columns?
You can enforce data types in an EAV model by using multiple value fields. This gets a bit tricky because you need another column to specify the type and then additional constraints to specify that only one value is filled and that it matches the type.
In most databases, you can handle this using check constraints.
In addition, you can use just a single string value and then enforce contents of the string using check constraints. This is often sufficient. Such constraints make good use of regular expressions in databases that support them.
As for your second question. Each row is going to occupy space for the entity/attribute columns. Whether or not the NULL value occupies any space depends on the database, but this space would typically be small.

Unique Constraint column can only contain one NULL value

A Unique Constraint can be created upon a column that can contain NULLs. However, at most, only a single row may ever contain a NULL in that column.
I do not understand why this is the case since, by definition, a NULL is not equal to another NULL (since NULL is really an unknown value and one unknown value does not equal another unknown value).
My questions:
1. Why is this so?
2. Is this specific to MsSQL?
I have a hunch that it is because a Unique Constraint can act as a reference field for a Foreign Key and that the FK would otherwise not know which record in the reference table to which it was refering if more than one record with NULL existed. But, it is just a hunch.
(Yes, I understand that UCs can be across multiple columns, but that doesn't change the question; rather it just complicates it a bit.)
Yes, it's "specific" to Microsoft SQL Server (in that some other database systems have the opposite approach, the one you expected - and the one defined in the ANSI standard, but I believe there are other database systems that are the same as SQL Server).
If you're working on a version of SQL Server that supports filtered indexes, you can apply one of those:
CREATE UNIQUE INDEX IX_T ON [Table] ([Column]) WHERE [Column] IS NOT NULL
(But note that this index cannot be the target of an FK constraint)
The "Why" of it really just comes down to, that's how it was implemented long ago (possibly pre-standards) and it's one of those awkward situations where to change it now could potentially break a lot of existing systems.
Re: Foreign Keys - you would be correct, if it wasn't for the fact that a NULL value in a foreign key column causes the foreign key not to be checked - there's no way (in SQL Server) to use NULL as an actual key.
Yes it's a SQL Server feature (and a feature of a few other DBMSs) that is contrary to the ISO SQL standard. It perhaps doesn't make much sense given the logic applied to nulls in other places in SQL - but then the ISO SQL Standard isn't very consistent about its treatment of nulls either. The behaviour of nullable uniqueness constraints in Standard SQL is not very helpful. Such constraints aren't necessarily "unique" at all because they permit duplicate rows. E.g., the constraint UNIQUE(foo,bar) permits the following rows to exist simultaneously in a table:
foo bar
------ ------
999 NULL
999 NULL
(!)
Avoid nullable uniqueness constraints. It's usually straightforward to move the columns to a new table as non-nullable columns and put the uniqueness constraint there. The information that would have been represented by populating those columns with nulls can (presumably) be represented by simply not populating those columns in the new table at all.

How liberal should I be with NOT NULL columns?

I'm designing a database schema, and I'm wondering what criteria I should use for deciding whether each column should be nullable or not.
Should I mark as NOT NULL only those columns that absolutely must be filled out for a row to make any sense at all to my application?
Or should I mark all columns that I intend to never be null?
What are the performance implications of small vs large numbers of NOT NULL columns?
I assume lots of NOT NULL columns would slow down inserts a bit, but it might actually speed up selects, since the query execution plan generator has more information about the columns..
Can someone with more knowledge than me give me the low-down?
Honestly, I've always thought NOT NULL should be the default. NULL is the odd special case, and you should make a case for it whenever you use it. Plus it's much easier to change a column from NOT NULL to nullable than it is to go the other way.
There are no significant performance consequences. Don't even think about considering this as an issue. To do so is a huge early optimization antipattern.
"Should I only mark as NOT NULL only those columns that absolutely must be filled out for a row to make any sense at all to my application?"
Yes. It's as simple as that. You're a lot better off with a NULLable column without any NULL values in it, than with the need for NULLs and having to fake it. And anyway, any ambiguous cases are better filtered out in your Business Rules.
EDIT:
There's another argument for nullable fields that I think is ultimately the most compelling, which is the Use Case argument. We've all been subject to data entry forms that require values for some fields; and we've all abandoned forms where we had no sensible values for required fields. Ultimately, the application, the form, and the database design are only defensible if they reflect the user requirements; and it's clear that there are many, many database columns for which users can present no value - sometimes at given points in the business process, sometimes ever.
Err on the side of NOT NULL. You will, at some point, have to decide what NULL "means" in your application - more than likely, it will be different things for different columns. Some of the common cases are "not specified", "unknown", "inapplicable", "hasn't happened yet", etc. You will know when you need one of those values, and then you can appropriately allow a NULLable column and code the logic around it.
Allowing random things to be NULL is, sooner or later, always a nightmare IME. Use NULL carefully and sparingly - and know what it means in your logic.
Edit: There seems to be an idea that I'm arguing for NO null columns, ever. That's ridiculous. NULL is useful, but only where it's expected.
Le Dorfier's DateOfDeath example is a good example. A NULL DateOfDeath would indicate "not happened yet". Now, I can write a view LivingPersons WHERE DateOfDeath IS NULL.
But, what does a NULL OrderDate mean? That the order wasn't placed yet? Even though there's a record in the Order table? How about a NULL address? Those are the thoughts that should go through your head before you let NULL be a value.
Back to DateOfDeath - a query of persons WHERE DateOfDeath > '1/1/1999' would not return the NULL records - even though we logically know they must die after 1999. Is that what you want? If not, then you better include OR DateOfDeath IS NULL in that query. If you allow all columns to be NULL, you have to think about that every single time you write a query. IME, that's too much of a mental tax for the 10% or so of columns that actually have legit meaning when they're NULL.
I have found marking a column as NOT NULL is usually a good idea unless you have a useful meaning for NULL in the column. Otherwise you may unexpectedly find NULL in there later when you realise you don't want it, and changing is harder.
I try to avoid using NULL's in the database as much as possible. This means that character fields are always not null. Same for numeric fields, especially anything representing money or similar (shares, units, etc).
I have 2 exceptions:
Dates where the date might not be known (eg. DivorcedOn)
Optional foriegn key relationships (MarriedToPersonId). Though on occasion I have used "blank" rows in the foreign key table and made the relatonship mandatory (eg. JobDescriptionCode)
I have also on occasion used explicit bit fields for "unknown"/"not set" (eg. JobDescriptionCode and IsEmployeed).
I have a few core reasons why:
NULLs will always cause problems in numeric fields. Always. Always. Always. Doesn't matter how careful you are at somepoint select X + Y as Total is going to happen and it will return NULL.
NULLs can easily cause problems in string fields, typically address fields (eg. select AddrLine1 + AddrLine2 from Addresses).
Guarding against NULLs in the business logic tier is a tedious waste of effort... just don't let them in the DB and you can save 100's of lines of code.
My preferred defaults:
Strings -> "", aka an empty string
Numbers -> 0
Dates -> Today or NULL (see exception #1)
Bit -> false
You may find Chris Date's Database In Depth a useful resource for these kinds of questions. You can get a taste for his ideas in this interview, where he says among other things:
So yes, I do think SQL is pretty bad.
But you explicitly ask what its major
flaws are. Well, here are a few:
Duplicate rows
Nulls
Left-to-right column ordering
Unnamed columns and duplicate column names
Failure to support "=" properly
Pointers
High redundancy
In my own experience, nearly all "planned nulls" can be represented better with a child table that has a foreign key to a base table. Participating in the child table is optional, and that's where the null/not null distinction is actually made.
This maps well to the interpretation of a relation as a first-order logic proposition. It also is just common sense. When one does not know Bob's address, does one write in one's Rolodex:
Bob. ____
Or does one merely refrain from filling out an address card for Bob until one has an actual address for him?
Edit: Date's argument appears on pages 53-55 of Database In Depth, under the section heading "Why Nulls are Prohibited."
I lean toward NOT NULL unless I see a reason otherwise -- like someone else said, like it or not, NULL is the weird special case.
One of my favorites in regards to NULL is:
SELECT F1 FROM T WHERE F2 <> 'OK'
...which (in DB2 at least) won't include any rows where f2 is null -- because in relational jargon, (NULL <> 'OK') IS NULL. But your intent was to return all not-OK rows. You need an extra OR predicate, or write F2 DISTINCT FROM 'OK' instead (which is special case coding in the first place).
IMO, NULL is just one of those programmer's tools, like pointer arithmetic or operator overloading, that requires as much art as science.
Joe Celko writes about this in SQL For Smarties -- the trap of using NULL in an application is that its meaning is, well, undefined. It could mean unknown, uninitialized, incomplete, not applicable -- or as in the dumb example above, does it mean OK or not-OK?
Thanks for all the great answers, guys. You gave me a lot to think about, and helped me form my own opinion/strategy, which boils down to this:
Allow nulls if-and-only-if a null in
that column would have a specific
meaning to your application.
A couple of common meanings for null:
Anything that comes directly from the user
Here null means "user did not enter"
For these columns, it's better to allow nulls, or you'll just get asdasd#asd.com type input anyway.
Foreign keys for "0 or 1" relationships
null means "no related row"
So allow nulls for these columns
This one is controversial, but this is my opinion.
In general, if you cannot think of a useful meaning for null in a column, it should be NOT NULL. You can always change it to nullable later.
Example of the sort of thing I ended up with:
create table SalesOrderLine (
Id int identity primary key,
-- a line must have exactly one header:
IdHeader int not null foreign key references SalesOrderHeader,
LineNumber int not null, -- a line must have a line number
IdItem int not null, -- cannot have null item
Quantity decimal not null, -- maybe could sell 0, but not null
UnitPrice decimal not null, -- price can be 0, but not null
-- a null delivery address means not for delivery:
IdDeliveryAddress int foreign key references Address,
Comment varchar(100), -- null means user skipped it
Cancelled bit not null default (0) -- true boolean, not three-state!
Delivered datetime, -- null means not yet delivered
Logged datetime not null default (GetDate()) -- must be filled out
)
I would tend to agree with dorfier.
Be serious in your application about being flexible when receiving database NULL values and treating them as empty values, and you give yourself a lot of flexibility to let NULL's get inserted for values you don't specify.
There's probably a lot of cases where you need some very serious data integrity (and/or the intense speed optimization of disallowing NULL fields) but I think that these concerns are tempered against the extra effort it takes to make sure every field has a default value and/or gets set to a sensible value.
Stick with NOT NULL on everything until someone squeaks with pain about it. Then remove it on one column at a time, as reluctantly as possible. Avoid nulls in your DB as much as you can, for as long as you can.
Personally I think you should mark the columns as Null or not null based on what kind of data they contain, if there is a genuine requirement for the data to always be there, and whether the data is always known at the time of input. Marking a column as not null when the users don't have the data will force then to make up the data which makes all your data useless (this how you end up with junk data such as an email field containing "thisissilly#Ihatethisaplication.com"). Failing to require something that must be there for the process to work(say the key field to show what customer made the order) is equally stupid. Null vice not null is a data integrity issue at the heart, do what makes the most sense towards keeping your data useable.
If you can think long term, having NULLs in a column affects how you can design your queries. Whether you use CASE statements, COALESCE, or have to explicitly test for NULL values can make the decision for you.
From a performance standpoint, it's faster to not have to worry about NULLS. From a design standpoint, using NULL is an easy way to know that an item has never been filled in. Useful examples include "UpdatedDateTime" columns. NULL means an item has never been updated.
Personally I allow NULLs in most situations.
What are the performance implications of small vs large numbers of NOT NULL columns?
This may be stating the obvious, but, when a column is nullable, each record will require 1 extra bit of storage. So a BIT column will consume 100% more storage when it is nullable, while a UNIQUEIDENTIFIER will consume only 0.8% more storage when it is nullable.
In the pathological case, if your database has a single table consisting of a single BIT column, the decision to make that column nullable would reduce your database's performance in half. However, under the vast majority of real world scenarios, nullability will not have a measurable performance impact.
Using 'Not Null' or 'Null' should be primarily driven by your particular persistance requirements.
Having a value being Nullable means there are two or three states (three states with Bit fields)
For instance; if I had a bit field which was called 'IsApproved' and the value is set at a later stage than insertion. Then there are three states:
'IsApproved' Not answered
'IsApproved' Is Approved
'IsApproved' Is Not Approved
So if a field can be legitimently considered Not Answered and there is no default value that is suitable. These fields should be considered for being nullable
Any nullable column is a violation of third normal form.
But, that's not an answer.
Maybe this is: there are two types of columns in databases - ones that hold the structure of the data, and ones that hold the content of the data. Keys are structure, user-enterable fields are data. Other things - well - it's a judgment call.
Stuff that's structure, that is used in join clauses, is typically not null. Stuff that's data is typically nullable.
When you have a column that hold one of a list of choices or null (no choice made), it is usually a good idea to have a specific value for "no choice made" rather than a nullable column. These types of columns often participate in joins.