Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 6 months ago.
Improve this question
I looked many videos but I didn't understand anything about null. Is it important? What is null? What is used for? Is there any difference between kotlin and other languages? Please explain it simple as you can.
Nullability does not have an inherent purpose; a for. Otherwise, null is something that happens.
Something is null when it is no longer allocated to a memory reference. This means the instance no longer exists, or it never exists.
You can have null variables because something didn't happen, or because you can support null. For example:
suspend fun getServerResult(id: String) : Result? {
val localModel = source.findInDb(id) ?: return null
//...
}
In the above case we can't proceed with the server operation because we don't have the local model that we need, so we can return null.
data class SomeScreen(val screenText: String?)
someTextView.text = someScree.screenText
In Android TextView supports setting null text, it clears the text.
However, there are better patterns for everything, always. So in the first example, you could return Error and never return null. In the second example, you could have "" empty text instead.
In Kotlin, nullability sprout from the root inheritance. Everything always inherits from Any or Any?. In comparison with Java where everything stem from Object, which was nullable by default.
The simple answer is that null is what a reference points to when you don't have an object of the correct type for it to point to.
There's no single reason why you might not have such an object, and so there's no single meaning for null; it will depend on the context. But most of the common meanings are fairly similar (and not specific to Kotlin); there's a lot of overlap but also subtle differences:
An empty value. For example, if you have an object representing an address, its house-number field might be null for addresses that don't have a house number. Or a leaf node of a binary tree might have its ‘left’ and ‘right’ pointers set to null to indicate that it has no child nodes.
An unavailable/missing value. For example, you may not be authorised to access certain data fields within some object; in that case, they may be returned as null. Or if there's a database error, you may choose to return null for non-critical fields instead of throwing an exception.
An unknown value. For example, a genealogical database may store null for birth dates which are uncertain.
An unspecified value. For example, when creating a TreeSet, you can specify a Comparator for it to use when comparing two elements. But you can pass null, to get the elements' natural ordering instead.
An uninitialised value. Sometimes you don't have all the relevant data until after an object has been initialised; so what should its properties hold before that? You could specify values such as empty string, but it's neater and safer to use a special value such as null.
An invalid value. For example, if an input string isn't a valid date, then you might store null in the corresponding date property to indicate this.
An inapplicable value. For example, the maxByOrNull() function gives the largest value in a collection. As long as the collection has at least one element, then it can find a largest (though it may not be unique); however, it can't do so if the collection is empty, so it returns null in that case.
A default value. For example, when creating a connection to a remote system, it might allow you to pass null for parameters such as time-out durations that you don't care about, so it can pick the most suitable defaults.
An unchanged value. For example, if you pass an object used to update a database record, you might make all its fields nullable, and use null to indicate that the record should keep its existing value for that field.
The common threads are:
They all represent a value which is absent for some reason.
And there will usually be some context (often spelled out or at least implied in the documentation) providing the exact meaning.
Related
Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 9 years ago.
Improve this question
I have heard both ends of the story. One view is that using a special value of a numeric variable to indicate special state is a design flaw. However, I meet this behavior all the time.
An example would be a byte unsigned integer, where the value 255 indicates lack of information.
Is this a bad practice? If so, in what exceptional cases it is allowed/encouraged?
That depends very much on the situation; as always, strive to write things the simplest, easiest to understand way. Hopefully exceptional situation handling doesn't clutter up the code for the normal case (that's the idea behind exceptions, but they create their own mess...).
Be careful when selecting the value to be used for exceptional cases. For example, C/Unix conventions use this quite a bit, but make sure to use "impossible" values. So, getchar(3) returns a character code as an int, but returns a non-character EOF for end of file. Similarly, read(2) returns the number of characters read (could be 0 if nothing to be read), or -1 on error.
This is absolutely not a bad practice, assuming that you can afford it, i.e. the range of representable integers has enough extra values that you never need to use in real-life situations.
For example, if you needed a full range of unsigned bytes, including the 255, then giving 255 a new meaning of "unknown" would be a problem. If 255 is never used for the "real" data, then you are free to assign it any meaning that you would like.
What's wrong, however, is using special numbers throughout the code without assigning them a special name. For example,
if (myByte == 255) {
// Process the situation when the value is unknown
}
is unquestionably bad. You should name all your special constants, and use only these names throughout your code:
if (myByte == UNKNOWN_VALUE) {
// You can skip the comment about processing an unknown value:
// the name of the constant tells your readers what's going on.
}
I use a string type for my Id attribute on all my domain abjects. E.g.:
public class Person {
property string Id { get; set; }
// ... more properties
}
no tricks here. null represents a "no-value" value, when a new Person is created and before it is persisted, Id will remain null.
Now there is a discussion to enhance "no-value" space and say that null, empty string and white-space strings are all "no-value" values.
I.e. to check if entity is new instead of doing: if (person.Id == null) it will become if (string.IsNullOrWhiteSpace(person.Id))
In my humble opinion this is a smell or a design principle violation, but I can't figure out which one.
Question: which (if any) design principle does this decision violate (the decision to allow for more than just null to represent no-value value)?
(I think it should be something similar to Occam's razor principle or entropy or KISS, I just not sure)
It does violate the KISS principle. If there is no special need for handling empty strings beside nulls, then why do it? All operations must now check for two values, instead of one. When exploring the DB, a simple SELECT to find "NULL" records, becomes slightly less trivial for no good reason.
Another violated principle is the principle of least surprise - usually people expect only NULL values to represent NULL objects. The design with two special values is less obvious, and less "readable".
If something more should be hidden behind these "second-category-special-objects", then it should be made explicit. Otherwise, it should be trivial to handle empty string input, and store it as NULL to be coherent with the rest of the system.
EDIT:
I've also found another "principle" in Bob Martin's book Clean code - "one word per concept" which is somehow related to this case. Empty string and a NULL are two "words" used for one concept so they clearly violate this guideline.
I'm gonna go out on a limb and say defining both null and "" as the empty String for your application does not violate any design principles and is not a code smell. You need to just clearly define the semantics of the field for your purpose, and you have done so (i.e., in your application, but null and "" mean "no value").
You should have tests that ensure behavior is correct for both null and "".
This is not to say that you also can't make the decision to force all empty strings to null. That is an equally valid decision. You would need to have tests that verify that in all cases where you set the "No value", the actual value is null. You might want to go this way if your persistence layer expects null and only null to indicate no value.
So, in this case, there are no wrong decisions, just decisions.
Wikipedia says NULL is for representing "missing information and inapplicable information".
I have always underestood NULL to be appropriate to represent "Unknown" information, as in the field is applicable to the record, it is just an unknown value.
Given the problems that NULL introduces to queries (3-point logic and tons of exceptions that have to be accounted for) should NULL still be used in records for fields that are not applicable to a record? Or should it just be used for applicable fields whose values are unknown for a given record?
I too accept null to mean "unknown", but "inapplicable" fits for example when saving the CEO's employee record, we set employee_manager_id = null because the CEO has no manager.
One technique to avoid the hassle of nulls is to use a special value instead, for example saving -1 for an person.age, but then you have a bit more complexity checking this value. When using a special value for a foreign key (as in the manager id example), you actually have to create a dummy record with id = 0 for example, and this may introduce problems when processing "all employees" - you must skip the dummy record.
Personally, I think things stay cleaner just using null and suffering the hassle of more complex SQL - at least everyone knows what's going on.
In fact, Null does not connote 'unknown' so much as just 'no data'. It is used in SQL (and other environments) where data is simply absent for a given field.
With regard to your concern about 3-point logic and exceptions, your application is probably making use of more language than just sql. The code of your system that interfaces with SQL should be handling the question of what to do with Null fields.
If Null is simply unacceptable (i.e. you can't have your data structure without a non-null value), then you had better avoid the concepts of 'unknown' 'no data' altogether. Make the field required by setting the SQL column's Null value to false; that makes it so that Null cannot be entered as a valid value. E.g.
CREATE TABLE foo (bar INT NOT NULL);
Note: I use C# as an example, but the problem is virtually the same in Java and probably many other languages.
Assume you implement a value object (as in value object pattern by M. Fowler) and it has some nullable field:
class MyValueObject
{
// Nullable field (with public access to keep the example short):
public string MyField;
}
Then, when overriding Equals(), how do you treat the case when both value objects have their MyField set to null? Are they equal or not?
In C#, treating them as equal seems obvious, because:
This is the behaviour of Equals() when you use a C# struct instead of a class and do not override Equals().
The following expressions are true:
null == null
object.ReferenceEquals(null, null)
object.Equals(null, null)
However, in SQL (at least in SQL Server's dialect), NULL = NULL is false, whereas NULL is NULL is true.
I am wondering what implementation is expected when using an O/R mapper (in my case, NHibernate). If you implement the "natural" C# equality semantics, may there be any ill effects when the O/R mapper maps them to the database?
Or maybe allowing nullable fields in value objects is wrong anyway?
Since ORMs know the relational model, they usually expose a way to query using SQL semantics.
NHibernate, for example, provides the is [not] null operator in HQL, and Restrictions.Is[Not]Null in Criteria.
Of course, there's an API where these paradigms collide: LINQ. Most ORMs try to do the right thing when comparing to null (i.e. replacing with is null), although there can be issues some times, especially where the behavior is not obvious.
Personally what I think is that if it can be null (in error free code), then they should be treated as equal.
However, if it shouldn't be null(ie: a Name for a Customer, or a Street Address for a Delivery) then it should never get to null in the first place.
I think you have two issues:
One being that you need to know if one instance of MyValueObject is equal to another instance.
And secondly, how that should translate to persistence.
I think you need to look at these separately as it seems that your angle is coupling them too close to each other which seems to me to violate some DDD principals - the Domain should not know/care about the persistence.
If you are unsure of the effect of the null value of MyField either (a) have it return a different Type other than string; (b) have it return a derivitave of string like EmptyString (or similar Special Case implementation); (c) or override the Equals method and specify exactly what it means for these instances to be equal.
If your ORM can not translate a particular expression (that involves MyValueObject) to SQL then perhaps its ok to do the harder work in the persistence layer (have the compare happen out of the SQL translation - yes, performance issues i know, but im sure not impossible to solve) in favour of keeping your Domain Model clean. It seems the solution should derive from "what's best for the domain model" to me.
#James Anderson makes a good point. Reserve null for error and failure states. I think Special Case seems more and more appropriate.
We generally prefer to have all our varchar/nvarchar columns non-nullable with a empty string ('') as a default value. Someone on the team suggested that nullable is better because:
A query like this:
Select * From MyTable Where MyColumn IS NOT NULL
is faster than this:
Select * From MyTable Where MyColumn == ''
Anyone have any experience to validate whether this is true?
On some platforms (and even versions), this is going to depend on how NULLs are indexed.
My basic rule of thumb for NULLs is:
Don't allow NULLs until justified
Don't allow NULLs unless the data can really be unknown
A good example of this is modeling address lines. If you have an AddressLine1 and AddressLine2, what does it mean for the first to have data and the second to be NULL? It seems to me, you either know the address or not, and having partial NULLs in a set of data just asks for trouble when somebody concatenates them and gets NULL (ANSI behavior). You might solve this with allowing NULLs and adding a check constraint - either all the Address information is NULL or none is.
Similar thing with middle initial/name. Some people don't have one. Is this different from it being unknown and do you care?
ALso, date of death - what does NULL mean? Not dead? Unknown date of death? Many times a single column is not sufficient to encode knowledge in a domain.
So to me, whether to allow NULLs would depend very much on the semantics of the data first - performance is going to be second, because having data misinterpreted (potentially by many different people) is usually a far more expensive problem than performance.
It might seem like a little thing (in SQL Server the implementation is a bitmask stored with the row), but only allowing NULLs after justification seems to me to work best. It catches things early in development, forces you to address assumptions and understand your problem domain.
If you want to know that there is no value, use NULL.
As for speed, IS NULL should be faster, because it doesn't use string comparison.
If you need NULL, use NULL. Ditto empty string.
As for performance, "it depends"
If you have varchar, you are storing an actual value in the row for the length. If you have char, then you store the actual length. NULL won't be stored in-row depending on the engine (NULL bitmap for SQL Server for example).
This means IS NULL is quicker, query for query, but it could add COALESCE/NULLIF/ISNULL complexity.
So, your colleague is partially correct but may not appreciate it fully.
Blindly using empty string is use of a sentinel value rather then working through the NULL semantic issue
FWIW and personally:
I would tend to use NULL but don't always. I like to avoid dates like 31 Dec 9999 which is where NULL avoidance leads you.
From Cade Roux's answer... I also find that discussions about "Is date of death nullable" pointless. For an field, in practical terms, either there is a value or there isn't.
Sentinel values are worse then NULLs. Magic numbers. anyone?
Tell that guy on your team to get his prematurely optimizin' head out of his ass! (But in a nice way).
Developers like that can be poison to the team, full of low-level optimization myths, all of which may be true or have been true at one point in time for some specific vendor or query pattern, or possibly only true in theory but never true in practice. Acting upon these myths is a costly waste of time, and can destroy an otherwise good design.
He probably means well and wants to contribute his knowledge to the team. Unfortunately, he is wrong. Not wrong in the sense of whether a benchmark will prove his statement correct or incorrect. He's wrong in the sense that this is not how you design a database. The question of whether to make a field NULL-able is a question about domain of the data for the purposes of defining the type of the field. It should be answered in terms of what it means for the field to have no value.
In a nutshell, NULL = UNKNOWN!.. Which means (using date of death example) that the entity could be 1)alive, 2)dead but date of death is not known, or 3)unknown if entity is dead or alive. For numeric columns I always default them to 0 (ZERO) because somewhere along the line you may have to perform aggregate calculations and NULL + 123 = NULL. For alphanumerics I use NULL since its least expensive performance-wise and easier to say '...where a IS NULL' than saying '...where a = "" '. Using '...where a = " "[space]' is not a good idea because [space] is not a NULL! For dates, if you have to leave a date column NULL, you may want to add a status indicator column which, in the above example, A=Alive, D=Dead, Q=Dead, date of death not known, N=Alive or Dead is unknown.