MySQL "NULL" questions - sql

I have a table with several columns.
Sometimes some of these column fields may be empty (ie. I won't use them in some cases).
My questions:
Would it be smart to set them to NULL in phpmyadmin?
What does the "NULL" property actually do?
Would I gain anything at all by setting them to NULL?
Is it possible to use a NULL field the same way even though it is set to null?

The concept of the NULL value is a common source of confusion for newcomers to SQL, who often think that NULL is the same as an empty string '', or a value of zero.
This is not the case. Conceptually, NULL means "a missing unknown value" and it is treated somewhat differently from other values. For example, to test for NULL, you cannot use the arithmetic comparison operators such as =, <, or <>.
If you have columns that may contain "a missing unknown value", you have to set them to accept NULLs.
On the other hand, a table with many NULL columns may be indicating that this table needs to be refactored into smaller tables that better describe the entities they represent.

I recommend you read Problems with NULL Values.

1- Would it be smart to set them to
NULL in phpmyadmin?
All fields are null by default unless you specify a default value for them or insert some value for them. No need to do this...
2 -What does the "NULL" property
actually do?
Null means that you have not assigned any value to it.
3- Would I gain anything at all by
setting them to NULL?
As said before, all fields are null by default unless you specify a default value for them or insert some value for them. I don't think you are going to gain anything.
4- Is it possible to use a NULL field
the same way even though it is set to
null?
What would you gain out of a field having value of NULL? No need for this too.

Going to try to answer your questions all at once here.
NULL represents something along the lines of "Unknown"/"No value" or "Not applicable". So yes, if there are columns that are unused in certain circumstances, it would be appropriate to set them to NULL when not used (as no other value is appropriate).
It is possible to constrain a column to NOT NULL, meaning that the column must have a value for each row. An example would the "name" column of a "person" record. It doesn't make sense for a name to be NULL, as everybody has a name.
You can "use" a NULL column, just keep in mind you have to be careful when doing comparisons. A NULL field is never equal to another field. Check for "IS NULL" or "IS NOT NULL".

Brief answers to your questions:
Yes, NULL means that the field contains nothing at all. If that's the true state of affairs, that's what the data should say. An example would be the shipped_date for an order which has not yet shipped. In this case, NULL would accurately represent the value until the order ships out, since until it does there isn't a valid time at which it did (and in this case, checking for the NULL value might be quite a valuable tool in determining which orders do still need to be shipped).
NULL means that the field contains nothing. "Nothing" is different from, say, the value 0 or the string "", as these are values. NULL means roughly the same thing as "N/A" or "I decline to answer". What exactly it would mean is context dependent on the column. Of course, some columns should never be NULL, and you can enforce that with your table design.
If most of the fields in a column are NULL, you should rethink exactly how you're using that column. Generally speaking, a large number of NULL values indicates you could design your tables better. As to defaulting, you can always set a nullable value to default to NULL.
The same way as what? NULL is a unique value. It's not equivalent to 0, or "", or anything else like that. In a query, you must check for IS NULL or IS NOT NULL, and if a null is pulled in to a dataset, you must check for it specifically there too. Asking if a column set to NULL is equal to 0, or "", or what have you, will return false.

Now sometimes some of these column fields may be empty (ie. I wont use them in some cases).
Would it be smart to set them to NULL in phpmyadmin?
Yes, that's what it's for.
What does the "NULL" property actually do?
It makes the database allow NULL as a value stored in the column. "NOT NULL" means a column
must have a value that is not NULL.
Would I gain anything at all by setting them to NULL?
No. If your logic requires that a column never contains NULL as a value, it's better to set it to "NOT NULL". Think of it as an assertion: it is safe to assume the column value will never be NULL, so you don't have to test for it. That database takes care of that assertion.
Is it possible to use a NULL field the same way even though it is set to null?
I'm not sure what you mean by that... Anyway, NULL and NOT NULL columns are identical in every way, except that NULL columns can contain NULL.
And NULL is a strange value. val = NULL is never true, even if val is NULL. For that you have to test with "IsNull()", "IS NULL" or "IS NOT NULL". See Reference Manual: Comparison Functions and Operators.

Related

Null vs empty field in database

I have database in which data is imported from the other table. if data is empty there. so while importing to here it became null. When i query the columns like, name doesnot starts with 'a', it should return all records whose name doesn't start with 'a'. including NULL/empty column. its returning epty records but not null. But i need null feilds also. I useing hibernate and sqlserver 2005. how to achieve this.?please help.
Thanks
Null and Empty are different things.
When you say "Retrieve all the entries that do not start with a" it means that it will retrieve all the entries with something that is not a. Null is not something. Null is nothing. Empty is something.
You should modify your query to add OR IS NULL, to retrieve also the null fields.
From Wiki:
Null is a special marker used in Structured Query Language (SQL) to indicate that a data value does not exist in the database. Introduced by the creator of the relational database model...
...Since Null is not a member of any data domain, it is not considered a "value", but rather a marker (or placeholder) indicating the absence of value. Because of this, comparisons with Null can never result in either True or False, but always in a third logical result, Unknown.
Checkout this discussion.

SQL Server 2008 - Default column value - should i use null or empty string?

For some time i'm debating if i should leave columns which i don't know if data will be passed in and set the value to empty string ('') or just allow null.
i would like to hear what is the recommended practice here.
if it makes a difference, i'm using c# as the consuming application.
I'm afraid that...
it depends!
There is no single answer to this question.
As indicated in other responses, at the level of SQL, NULL and empty string have very different semantics, the former indicating that the value is unknown, the latter indicating that the value is this "invisible thing" (in displays and report), but none the less it a "known value". A example commonly given in this context is that of the middle name. A null value in the "middle_name" column would indicate that we do not know whether the underlying person has a middle name or not, and if so what this name is, an empty string would indicate that we "know" that this person does not have a middle name.
This said, two other kinds of factors may help you choose between these options, for a given column.
The very semantics of the underlying data, at the level of the application.
Some considerations in the way SQL works with null values
Data semantics
For example it is important to know if the empty-string is a valid value for the underlying data. If that is the case, we may loose information if we also use empty string for "unknown info". Another consideration is whether some alternate value may be used in the case when we do not have info for the column; Maybe 'n/a' or 'unspecified' or 'tbd' are better values.
SQL behavior and utilities
Considering SQL behavior, the choice of using or not using NULL, may be driven by space consideration, by the desire to create a filtered index, or also by the convenience of the COALESCE() function (which can be emulated with CASE statements, but in a more verbose fashion). Another consideration is whether any query may attempt to query multiple columns to append them (as in SELECT name + ', ' + middle_name AS LongName etc.).
Beyond the validity of the choice of NULL vs. empty string, in given situation, a general consideration it to try and be as consistent as possible, i.e. to try and stick to ONE particular way, and to only/purposely/explicitly depart from this way for good reasons and in few cases.
Don't use empty string if there is no value. If you need to know if a value is unknown, have a flag for it. But 9 times out of 10, if the information is not provided, it's unknown, and that's fine.
NULL means unknown value. An empty string means a known value - a string with length zero. These are totally different things.
empty when I want a valid default value that may or may not be changed, for example, a user's middle name.
NULL when it is an error if the ensuing code does not set the value explicitly.
However, By initializing strings with the Empty value instead of null, you can reduce the chances of a NullReferenceException occurring.
Theory aside, I tend to view:
Empty string as a known value
NULL as unknown
In this case, I'd probably use NULL.
One important thing is to be consistent: mixing NULLs and empty strings will end in tears.
On a practical implementation level, empty string takes 2 bytes in SQL Server where as NULLs are bitmapped. In some conditions and for wide/larger tables it makes a different in performance because it's more data to shift around.

SQL: Using NULL values vs. default values

What are the pros and cons of using NULL values in SQL as opposed to default values?
PS. Many similar questions has been asked on here but none answer my question.
I don't know why you're even trying to compare these to cases. null means that some column is empty/has no value, while default value gives a column some value when we don't set it directly in query.
Maybe some example will be better explanation. Let's say we've member table. Each member has an ID and username. Optional he might has an e-mail address (but he doesn't have to). Also each member has a postCount column (which is increased every time user write a post). So e-mail column can have a null value (because e-mail is optional), while postCount column is NOT NULL but has default value 0 (because when we create a new member he doesn't have any posts).
Null values are not ... values!
Null means 'has no value' ... beside the database aspect, one important dimension of non valued variables or fields is that it is not possible to use '=' (or '>', '<'), when comparing variables.
Writting something like (VB):
if myFirstValue = mySecondValue
will not return either True or False if one or both of the variables are non-valued. You will have to use a 'turnaround' such as:
if (isnull(myFirstValue) and isNull(mySecondValue)) or myFirstValue = mySecondValue
The 'usual' code used in such circumstances is
if Nz(myFirstValue) = Nz(mySecondValue, defaultValue)
Is not strictly correct, as non-valued variables will be considered as 'equal' to the 'defaultValue' value (usually Zero-length string).
In spite of this unpleasant behaviour, never never never turn on your default values to zero-length string (or '0's) without a valuable reason, and easing value comparison in code is not a valuable reason.
NULL values are meant to indicate that the attribute is either not applicable or unknown. There are religious wars fought over whether they're a good thing or a bad thing but I fall in the "good thing" camp.
They are often necessary to distinguish known values from unknown values in many situations and they make a sentinel value unnecessary for those attributes that don't have a suitable default value.
For example, whilst the default value for a bank balance may be zero, what is the default value for a mobile phone number. You may need to distinguish between "customer has no mobile phone" and "customer's mobile number is not (yet) known" in which case a blank column won't do (and having an extra column to decide whether that column is one or the other is not a good idea).
Default values are simply what the DBMS will put in a column if you don't explicitly specify it.
It depends on the situation, but it's really ultimately simple. Which one is closer to the truth?
A lot of people deal with data as though it's just data, and truth doesn't matter. However, whenever you talk to the stakeholders in the data, you find that truth always matters. sometimes more, sometimes less, but it always matters.
A default value is useful when you may presume that if the user (or other data source) had provided a value, the value would have been the default. If this presumption does more harm then good, then NULL is better, even though dealing with NULL is a pain in SQL.
Note that there are three different ways default values can be implemented. First, in the application, before inserting new data. The database never sees the difference between a default value provided by the user or one provided by the app!
Second, by declaring a default value for the column, and leaving the data missing in an insert.
Third, by substituting the default value at retrieval time, whenever a NULL is detected. Only a few DBMS products permit this third mode to be declared in the database.
In an ideal world, data is never missing. If you are developing for the real world, required data will eventually be missing. Your applications can either do something that makes sense or something that doesn't make sense when that happens.
As with many things, there are good and bad points to each.
Good points about default values: they give you the ability to set a column to a known value if no other value is given. For example, when creating BOOLEAN columns I commonly give the column a default value (TRUE or FALSE, whatever is appropriate) and make the column NOT NULL. In this way I can be confident that the column will have a value, and it'll be set appropriate.
Bad points about default values: not everything has a default value.
Good things about NULLs: not everything has a known value at all times. For example, when creating a new row representing a person I may not have values for all the columns - let's say I know their name but not their birth date. It's not appropriate to put in a default value for the birth date - people don't like getting birthday cards on January 1st (if that's the default) if their birthday is actually July 22nd.
Bad things about NULLs: NULLs require careful handling. In most databases built on the relational model as commonly implemented NULLs are poison - the presence of a NULL in a calculation causes the result of the calculation to be NULL. NULLs used in comparisons can also cause unexpected results because any comparison with NULL returns UNKNOWN (which is neither TRUE nor FALSE). For example, consider the following PL/SQL script:
declare
nValue NUMBER;
begin
IF nValue > 0 THEN
dbms_output.put_line('nValue > 0');
ELSE
dbms_output.put_line('nValue <= 0');
END IF;
IF nValue <= 0 THEN
dbms_output.put_line('nValue <= 0');
ELSE
dbms_output.put_line('nValue > 0');
END IF;
end;
The output of the above is:
nValue <= 0
nValue > 0
This may be a little surprising. You have a NUMBER (nValue) which is both less than or equal to zero and greater than zero, at least according to this code. The reason this happens is that nValue is actually NULL, and all comparisons with NULL result in UNKNOWN instead of TRUE or FALSE. This can result in subtle bugs which are hard to figure out.
Share and enjoy.
To me, they are somewhat orthogonal.
Default values allow you to gracefully evolve your database schema (think adding columns) without having to modify client code. Plus, they save some typing, but relying on default values for this is IMO bad.
Nulls are just that: nulls. Missing value and a huge PITA when dealing with Three-Valued Logic.
In a Data Warehouse, you would always want to have default values rather than NULLs.
Instead you would have value such as "unknown","not ready","missing"
This allows INNER JOINs to be performed efficiently on the Fact and Dimension tables as 'everything always has a value'
Nulls and default values are different things used for different purposes. If you are trying to avoid using nulls by giving everything a default value, that is a poor practice as I will explain.
Null means we do not know what the value is or will be. For instance suppose you have an enddate field. You don't know when the process being recorded will end, so null is the only appropriate value; using a default value of some fake date way out in the future will cause as much trouble to program around as handling the nulls and is more likely in my experience to create a problem with incorrect results being returned.
Now there are times when we might know what the value should be if the person inserting the record does not. For instance, if you have a date inserted field, it is appropriate to have a default value of the current date and not expect the user to fill this in. You are likely to actually have better information that way for this field.
Sometimes, it's a judgement call and depends on the business rules you have to apply. Suppose you have a speaker honoraria field (Which is the amount a speaker would get paid). A default value of 0 could be dangerous as it it might mean that speakers are hired and we intend to pay them nothing. It is also possible that there may occasionally be speakers who are donating their time for a particular project (or who are employees of the company and thus not paid extra to speak) where zero is a correct value, so you can't use zero as the value to determine that you don't know how much this speaker is to be paid. In this case Null is the only appropriate value and the code should trigger an issue if someone tries to add the speaker to a conference. In a different situation, you may know already that the minimum any speaker will be paid is 3000 and that only speakers who have negotiated a different rate will have data entered in the honoraria field. In this case, it is appropriate to put in a default value of 3000. In another cases, different clients may have different minimums, so the default should be handled differently (usually through a lookup table that automatically populates the minimum honoraria value for that client on the data entry form.
So I feel the best rule is leave the value as null if you truly cannot know at the time the data is entered what the value of the field should be. Use a default value only it is has meaning all the time for that particular situation and use some other technique to fill in the value if it could be different under different circumstances.
I so appreciate all of this discussion. I am in the midst of building a data warehouse and am using the Kimball model rather strictly. There is one very vocal user, however, who hates surrogate keys and wants NULLs all over the place. I told him that it is OK to have NULLable columns for attributes of dimensions and for any dates or numbers that are used in calculations because default values there imply incorrect data. There are, I agree, advantages to allowing NULL in certain columns but it makes cubing a lot better and more reliable if there is a surrogate key for every foreign key to a dimension, even if that surrogate is -1 or 0 for a dummy record. SQL likes integers for joins and if there is a missing dimension value and a dummy is provided as a surrogate key, then you will get the same number of records using one dimension as you would cubing on another dimension. However, calculations have to be done correctly and you have to accommodate for NULL values in those. Birthday should be NULL so that age is not calculated, for example. I believe in good data governance and making these decisions with the users forces them to think about their data in more ways than ever.
As one responder already said, NULL is not a value.
Be very ware of anything proclaimed by anyone who speaks of "the NULL value" as if it were a value.
NULL is not equal to itself. x=y yields false if both x and y are NULL. x=y yields true if both x and y are the default value.
There are almost endless consequences to this seemingly very simple difference. And most of those consequences are booby traps that bite you real bad.
Nulls NEVER save storage space in DB2 for OS/390 and z/OS. Every nullable column requires one additional byte of storage for the null indicator. So, a CHAR(10) column that is nullable will require 11 bytes of storage per row – 10 for the data and 1 for the null indicator. This is the case regardless of whether the column is set to null or not.
DB2 for Linux, Unix, and Windows has a compression option that allows columns set to null to save space. Using this option causes DB2 to eliminate the unused space from a row where columns are set to null. This option is not available on the mainframe, though.
REF: http://www.craigsmullins.com/bp7.htm
So, the best modeling practice for DB2 Z/OS is to use "NOT NULL WITH DEFAULT" as a standard for all columns. It's the same followed in some major shops I knew. Makes the life of programmers more easier not having to handle the Null Indicator and actually saves on storage by eliminating the need to use the extra byte for the NULL INDICATOR.
Two very good Access-oriented articles about Nulls by Allen Browne:
Nulls: Do I need them?
Common Errors with Null
Aspects of working with Nulls in VBA code:
Nothing? Empty? Missing? Null?
The articles are Access-oriented, but could be valuable to those using any database, particularly relative novices because of the conversational style of the writing.

TSQL: No value instead of Null

Due to a weird request, I can't put null in a database if there is no value. I'm wondering what can I put in the store procedure for nothing instead of null.
For example:
insert into blah (blah1) values (null)
Is there something like nothing or empty for "blah1" instead using null?
I would push back on this bizarre request. That's exactly what NULL is for in SQL, to denote a missing or inapplicable value in a column.
Is the requester experiencing grief over SQL logic with NULL?
edit: Okay, I've read your reply with the extra detail about this job assignment (btw, generally you should edit your original question instead of posting more information in an answer).
You'll have to declare all columns as NOT NULL and designate a special value in the domain of that column's data type to signify "no value." The appropriate value to choose might be different on a case by case basis, i.e. zero may signify nothing in a person_age column, but it might have significance in an items_in_stock column.
You should document the no-value value for each column. But I suppose they don't believe in documentation either. :-(
Depends on the data type of the column. For numbers (integers, etc) it could be zero (0) but if varchar then it can be an empty string ("").
I agree with other responses that NULL is best suited for this because it transcends all data types denoting the absence of a value. Therefore, zero and empty string might serve as a workaround/hack but they are fundamentally still actual values themselves that might have business domain meaning other than "not a value".
(If only the SQL language supported a "Not Applicable" (N/A) value type that would serve as an alternative to NULL...)
Is null is a valid value for whatever you're storing?
Use a sentry value like INT32.MaxValue, empty string, or "XXXXXXXXXX" and assume it will never be a legitimate value
Add a bit column 'Exists' that you populate with true at the same time you insert.
Edit: But yeah, I'll agree with the other answers that trying to change the requirements might be better than trying to solve the problem.
If you're using a varchar or equivalent field, then use the empty string.
If you're using a numeric field such as int then you'll have to force the user to enter data, else come up with a value that means NULL.
I don't envy you your situation.
There's a difference between NULLs as assigned values (e.g. inserted into a column), and NULLs as a SQL artifact (as for a field in a missing record for an OUTER JOIN. Which might be a foreign concept to these users. Lots of people use Access, or any database, just to maintain single-table lists.) I wouldn't be surprised if naive users would prefer to use an alternative for assignments; and though repugnant, it should work ok. Just let them use whatever they want.
There is some validity to the requirement to not use NULL values. NULL values can cause a lot of headache when they are in a field that will be included in a JOIN or a WHERE clause or in a field that will be aggregated.
Some SQL implementations (such as MSSQL) disallow NULLable fields to be included in indexes.
MSSQL especially behaves in unexpected ways when NULL is evaluated for equality. Does a NULL value in a PaymentDue field mean the same as zero when we search for records that are up to date? What if we have names in a table and somebody has no middle name. It is conceivable that either an empty string or a NULL could be stored, but how do we then get a comprehensive list of people that have no middle name?
In general I prefer to avoid NULL values. If you cannot represent what you want to store using either a number (including zero) or a string (including the empty string as mentioned before) then you should probably look closer into what you are trying to store. Perhaps you are trying to communicate more than one piece of data in a single field.

TSQL - When to use 'not null'

What general guidelines should I go by when considering whether I should mark a field 'not null' as opposed to just declaring everything but the primary key null?
Should 'not null' fields have DEFAULT values?
Depends on how you want your application to behave.
First, there will never ever ever be a possible row where this value does NOT contain meaningful data, then use NOT NULL. That is, this value will always be meaningful and represent something.
Do you want the value to always be filled out by the user or programmer in some form or fashion? Use NOT NULL without a DEFAULT
Do you want it to be optional to users and programmers? Use NOT NULL with a DEFAULT
I think you've got 2 questions there:
Should you mark fields as not null?
Yes, assuming that you never intend a valid row to have a null value in that field. Think of "not null" as the easiest type of constraint you can put on a field. Constraints in a database help ensure the data is kept consistent by meeting expectations.
Should not null fields have defaults?
Only when there is an obvious default. For example the Paid field of an invoices table might have a default of 0 (false). In general it works the other way around - if a field has a default value, then it should probably also be not null.
Don't create defaults just for the sake of defaults - if a field should not be null, but there isn't a universal default, then leave it be. That ensures that any INSERT statements must provide a value for that field.
A lot of people look down upon so-called "magic numbers" and would advocate leaving the field as null instead of putting a default down. The only time I ever use default values is when I have a bit field and I just want it to default to false.
Do what is semantically correct.
NULL - does not always exist
NOT NULL - always exists
Try not to define and persist artificial values, like "No value selected" for a drop-down field, or a "No Manager" for an employee's manager.
Whether to use defaults depends on how data gets inserted. If there is a UI with validation, you don't need defaults, IMHO.
Not null fields need to have defaults if they are appropriate especially if you are adding a new field to the table (or changing the field from allowing nulls to not allowing nulls) and you need to give a value to all existing records. However a default is not appropriate or available for all possible fields. For instance, we have a person table, lastname is not allowed to be null. There is no default lastname we could assign though, if the person doesn't have a name, the record doesn't get created. On the other hand, you might have a DateCreated field with a default value of the current date. This is also a field that you would want to have as not null and you would want to make sure that the current date was put in whether the record was inserted from the user interface or from an import or from the query window.