Related
When comparing two different bags using an Any-of-Any function, is XACML Version 3 able to identify which was the element that produced the match (signalled by the boolean true value). Beside this return value, is there an index value available, or as integer, or as a list of integers?
No, there isn't any such mechanism. As you know, in XACML (and ALFA), attribute values are always bags of values. You can convert a single-valued bag into a single atomic value but you can never know the order of a bag nor can you pick a specific value by first, last, or any index. Bags are unordered.
You could potentially introduce a function that would sort a bag e.g. alphabetically and then return the first element or the last.
Consequently, you cannot know which value produced the match. If, for instance, you have a function that says stringAtLeastOneMemberOf(userQualifications, requiredQualifications), you cannot know which one triggered the match.
HTH,
David.
What are the pros and cons of using NULL values in SQL as opposed to default values?
PS. Many similar questions has been asked on here but none answer my question.
I don't know why you're even trying to compare these to cases. null means that some column is empty/has no value, while default value gives a column some value when we don't set it directly in query.
Maybe some example will be better explanation. Let's say we've member table. Each member has an ID and username. Optional he might has an e-mail address (but he doesn't have to). Also each member has a postCount column (which is increased every time user write a post). So e-mail column can have a null value (because e-mail is optional), while postCount column is NOT NULL but has default value 0 (because when we create a new member he doesn't have any posts).
Null values are not ... values!
Null means 'has no value' ... beside the database aspect, one important dimension of non valued variables or fields is that it is not possible to use '=' (or '>', '<'), when comparing variables.
Writting something like (VB):
if myFirstValue = mySecondValue
will not return either True or False if one or both of the variables are non-valued. You will have to use a 'turnaround' such as:
if (isnull(myFirstValue) and isNull(mySecondValue)) or myFirstValue = mySecondValue
The 'usual' code used in such circumstances is
if Nz(myFirstValue) = Nz(mySecondValue, defaultValue)
Is not strictly correct, as non-valued variables will be considered as 'equal' to the 'defaultValue' value (usually Zero-length string).
In spite of this unpleasant behaviour, never never never turn on your default values to zero-length string (or '0's) without a valuable reason, and easing value comparison in code is not a valuable reason.
NULL values are meant to indicate that the attribute is either not applicable or unknown. There are religious wars fought over whether they're a good thing or a bad thing but I fall in the "good thing" camp.
They are often necessary to distinguish known values from unknown values in many situations and they make a sentinel value unnecessary for those attributes that don't have a suitable default value.
For example, whilst the default value for a bank balance may be zero, what is the default value for a mobile phone number. You may need to distinguish between "customer has no mobile phone" and "customer's mobile number is not (yet) known" in which case a blank column won't do (and having an extra column to decide whether that column is one or the other is not a good idea).
Default values are simply what the DBMS will put in a column if you don't explicitly specify it.
It depends on the situation, but it's really ultimately simple. Which one is closer to the truth?
A lot of people deal with data as though it's just data, and truth doesn't matter. However, whenever you talk to the stakeholders in the data, you find that truth always matters. sometimes more, sometimes less, but it always matters.
A default value is useful when you may presume that if the user (or other data source) had provided a value, the value would have been the default. If this presumption does more harm then good, then NULL is better, even though dealing with NULL is a pain in SQL.
Note that there are three different ways default values can be implemented. First, in the application, before inserting new data. The database never sees the difference between a default value provided by the user or one provided by the app!
Second, by declaring a default value for the column, and leaving the data missing in an insert.
Third, by substituting the default value at retrieval time, whenever a NULL is detected. Only a few DBMS products permit this third mode to be declared in the database.
In an ideal world, data is never missing. If you are developing for the real world, required data will eventually be missing. Your applications can either do something that makes sense or something that doesn't make sense when that happens.
As with many things, there are good and bad points to each.
Good points about default values: they give you the ability to set a column to a known value if no other value is given. For example, when creating BOOLEAN columns I commonly give the column a default value (TRUE or FALSE, whatever is appropriate) and make the column NOT NULL. In this way I can be confident that the column will have a value, and it'll be set appropriate.
Bad points about default values: not everything has a default value.
Good things about NULLs: not everything has a known value at all times. For example, when creating a new row representing a person I may not have values for all the columns - let's say I know their name but not their birth date. It's not appropriate to put in a default value for the birth date - people don't like getting birthday cards on January 1st (if that's the default) if their birthday is actually July 22nd.
Bad things about NULLs: NULLs require careful handling. In most databases built on the relational model as commonly implemented NULLs are poison - the presence of a NULL in a calculation causes the result of the calculation to be NULL. NULLs used in comparisons can also cause unexpected results because any comparison with NULL returns UNKNOWN (which is neither TRUE nor FALSE). For example, consider the following PL/SQL script:
declare
nValue NUMBER;
begin
IF nValue > 0 THEN
dbms_output.put_line('nValue > 0');
ELSE
dbms_output.put_line('nValue <= 0');
END IF;
IF nValue <= 0 THEN
dbms_output.put_line('nValue <= 0');
ELSE
dbms_output.put_line('nValue > 0');
END IF;
end;
The output of the above is:
nValue <= 0
nValue > 0
This may be a little surprising. You have a NUMBER (nValue) which is both less than or equal to zero and greater than zero, at least according to this code. The reason this happens is that nValue is actually NULL, and all comparisons with NULL result in UNKNOWN instead of TRUE or FALSE. This can result in subtle bugs which are hard to figure out.
Share and enjoy.
To me, they are somewhat orthogonal.
Default values allow you to gracefully evolve your database schema (think adding columns) without having to modify client code. Plus, they save some typing, but relying on default values for this is IMO bad.
Nulls are just that: nulls. Missing value and a huge PITA when dealing with Three-Valued Logic.
In a Data Warehouse, you would always want to have default values rather than NULLs.
Instead you would have value such as "unknown","not ready","missing"
This allows INNER JOINs to be performed efficiently on the Fact and Dimension tables as 'everything always has a value'
Nulls and default values are different things used for different purposes. If you are trying to avoid using nulls by giving everything a default value, that is a poor practice as I will explain.
Null means we do not know what the value is or will be. For instance suppose you have an enddate field. You don't know when the process being recorded will end, so null is the only appropriate value; using a default value of some fake date way out in the future will cause as much trouble to program around as handling the nulls and is more likely in my experience to create a problem with incorrect results being returned.
Now there are times when we might know what the value should be if the person inserting the record does not. For instance, if you have a date inserted field, it is appropriate to have a default value of the current date and not expect the user to fill this in. You are likely to actually have better information that way for this field.
Sometimes, it's a judgement call and depends on the business rules you have to apply. Suppose you have a speaker honoraria field (Which is the amount a speaker would get paid). A default value of 0 could be dangerous as it it might mean that speakers are hired and we intend to pay them nothing. It is also possible that there may occasionally be speakers who are donating their time for a particular project (or who are employees of the company and thus not paid extra to speak) where zero is a correct value, so you can't use zero as the value to determine that you don't know how much this speaker is to be paid. In this case Null is the only appropriate value and the code should trigger an issue if someone tries to add the speaker to a conference. In a different situation, you may know already that the minimum any speaker will be paid is 3000 and that only speakers who have negotiated a different rate will have data entered in the honoraria field. In this case, it is appropriate to put in a default value of 3000. In another cases, different clients may have different minimums, so the default should be handled differently (usually through a lookup table that automatically populates the minimum honoraria value for that client on the data entry form.
So I feel the best rule is leave the value as null if you truly cannot know at the time the data is entered what the value of the field should be. Use a default value only it is has meaning all the time for that particular situation and use some other technique to fill in the value if it could be different under different circumstances.
I so appreciate all of this discussion. I am in the midst of building a data warehouse and am using the Kimball model rather strictly. There is one very vocal user, however, who hates surrogate keys and wants NULLs all over the place. I told him that it is OK to have NULLable columns for attributes of dimensions and for any dates or numbers that are used in calculations because default values there imply incorrect data. There are, I agree, advantages to allowing NULL in certain columns but it makes cubing a lot better and more reliable if there is a surrogate key for every foreign key to a dimension, even if that surrogate is -1 or 0 for a dummy record. SQL likes integers for joins and if there is a missing dimension value and a dummy is provided as a surrogate key, then you will get the same number of records using one dimension as you would cubing on another dimension. However, calculations have to be done correctly and you have to accommodate for NULL values in those. Birthday should be NULL so that age is not calculated, for example. I believe in good data governance and making these decisions with the users forces them to think about their data in more ways than ever.
As one responder already said, NULL is not a value.
Be very ware of anything proclaimed by anyone who speaks of "the NULL value" as if it were a value.
NULL is not equal to itself. x=y yields false if both x and y are NULL. x=y yields true if both x and y are the default value.
There are almost endless consequences to this seemingly very simple difference. And most of those consequences are booby traps that bite you real bad.
Nulls NEVER save storage space in DB2 for OS/390 and z/OS. Every nullable column requires one additional byte of storage for the null indicator. So, a CHAR(10) column that is nullable will require 11 bytes of storage per row – 10 for the data and 1 for the null indicator. This is the case regardless of whether the column is set to null or not.
DB2 for Linux, Unix, and Windows has a compression option that allows columns set to null to save space. Using this option causes DB2 to eliminate the unused space from a row where columns are set to null. This option is not available on the mainframe, though.
REF: http://www.craigsmullins.com/bp7.htm
So, the best modeling practice for DB2 Z/OS is to use "NOT NULL WITH DEFAULT" as a standard for all columns. It's the same followed in some major shops I knew. Makes the life of programmers more easier not having to handle the Null Indicator and actually saves on storage by eliminating the need to use the extra byte for the NULL INDICATOR.
Two very good Access-oriented articles about Nulls by Allen Browne:
Nulls: Do I need them?
Common Errors with Null
Aspects of working with Nulls in VBA code:
Nothing? Empty? Missing? Null?
The articles are Access-oriented, but could be valuable to those using any database, particularly relative novices because of the conversational style of the writing.
I'm trying to write an SQL query that would search within a CSV (or similar) array in a column. Here's an example:
insert into properties set
bedrooms = 1,2,3 (or 1-3)
title = nice property
price = 500
I'd like to then search where bedrooms = 2+. Is this even possible?
The correct way to handle this in SQL is to add another table for a multi-valued property. It's against the relational model to store multiple discrete values in a single column. Since it's intended to be a no-no, there's little support for it in the SQL language.
The only workaround for finding a given value in a comma-separated list is to use regular expressions, which are in general ugly and slow. You have to deal with edge cases like when a value may or may not be at the start or end of the string, as well as next to a comma.
SELECT * FROM properties WHERE bedrooms RLIKE '[[:<:]]2[[:>:]]';
There are other types of queries that are easy when you have a normalized table, but hard with the comma-separated list. The example you give, of searching for a value that is equal to or greater than the search criteria, is one such case. Also consider:
How do I delete one element from a comma-separated list?
How do I ensure the list is in sorted order?
What is the average number of rooms?
How do I ensure the values in the list are even valid entries? E.g. what's to prevent me from entering "1,2,banana"?
If you don't want to create a second table, then come up with a way to represent your data with a single value.
More accurately, I should say I recommend that you represent your data with a single value per column, and Mike Atlas' solution accomplishes that.
Generally, this isn't how you should be storing data in a relational database.
Perhaps you should have a MinBedroom and MaxBedroom column. Eg:
SELECT * FROM properties WHERE MinBedroom > 1 AND MaxBedroom < 3;
I have a three tables
Results:
TestID
TestCode
Value
Tests:
TestID
TestType
SysCodeID
SystemCodes
SysCodeID
ParentSysCodeID
Description
The question I have is for when the user is entering data into the results table.
The formatting code when the row gets the focus changes the value field to a dropdown combobox if the testCode is of type SystemList. The drop down has a list of all the system codes that have a parentsyscodeID of the test.SysCodeID. When the user chooses a value in the list it translates into a number which goes into the value field.
The datatype of the Results.Value field is integer. I made it an integer instead of a string because when reporting it is easier to do calculations and sorting if it is a number. There are issues if you are putting integer/decimal value into a string field. As well, when the system was being designed they only wanted numbers in there.
The users now want to put strings into the value field as well as numbers/values from a list and I'm wondering what the best way of doing that would be.
Would it be bad practice to convert the field over to a string and then store both strings and integers in the same field? There are different issues related to this one but i'm not sure if any are a really big deal.
Should I add another column into the table of string datatype and if the test is a string type then put the data the user enters into the different field.
Another option would be to create a 1-1 relationship to another table and if the user types in a string into the value field it adds it into the new table with a key of a number.
Anyone have any interesting ideas?
What about treating Results.Value as if it were a numeric ValueCode that becomes an foreign key referencing another table that contains a ValueCode and a string that matches it.
CREATE TABLE ValueCodes
(
Value INTEGER NOT NULL PRIMARY KEY,
Meaning VARCHAR(32) NOT NULL UNIQUE
);
CREATE TABLE Results
(
TestID ...,
TestCode ...,
Value INTEGER NOT NULL FOREIGN KEY REFERENCES ValueCodes
);
You continue storing integers as now, but they are references to a limited set of values in the ValueCodes table. Most of the existing values appear as an integer such as 100 with a string representing the same value "100". New codes can be added as needed.
Are you saying that they want to do free-form text entry? If that's the case, they will ruin the ability to do meaningful reporting on the field, because I can guarantee that they will not consistently enter the strings.
If they are going to be entering one of several preset strings (for example, grades of A, B, C, etc.) then make a lookup table for those strings which maps to numeric values for sorting, evaluating, averaging, etc.
If they really want to be able to start entering in free-form text and you can't dissuade them from it, add another column along the lines of other_entry. Have a predefined value that means "other" to put in your value column. That way, when you're doing reporting you can either roll up all of those random "other" values or you can simply ignore them. Make sure that you add the "other" into your SystemCodes table so that you can keep a foreign key between that and the Results table. If you don't already have one, then you should definitely consider adding one.
Good luck!
The users now want to put strings into
the value field as well as
numbers/values from a list and I'm
wondering what the best way of doing
that would be.
It sounds like the users want to add new 'testCodes'. If that is the case why not just add them to your existing testcode table and keep your existing format.
Would it be bad practice to convert
the field over to a string and then
store both strings and integers in
the same field? There are different
issues related to this one but i'm not
sure if any are a really big deal.
No it's not a big deal. Often PO numbers or Invoice numbers have numbers or a combination of letters and numbers. You are right however about the performance of the database on a number field as opposed to a string, but if you index the string field you end up with the database doing it's scans on numeric indexes anyway.
The problems you may have had with your decimals as strings probably have to do with the floating point data types in which the server essentially estimates the value of the field and only retains accuracy to a certain number of digits. This can lead to a whole host of rounding errors if you are concerned about the digits. You can avoid that issue by using currency fields or the like that have static accuracy of the decimals. lol I learned this the hard way.
Tom H. did a great job addressing everything else.
I think the easiest way to do it would be to convert Results.Value to a "string" (char, varchar, whatever). Yes, this ruins the ability to do numeric sorting (and you won't be able to do a cast or convert on the column any longer since text will be intermingled with integer values), but I think any other method would be too complex to maintain properly. (For example, in the 1-1 case you mentioned, is that integer value the actual value or a foreign key to the string table? Now we need another column to determine that.)
I would create the extra column for string values. It's not true normalization but it's the easiest to implement and to work with.
Using the same field for both numbers and strings would work to as long as you don't plan on doing anything with the numbers like summing or sorting.
The extra table approach while good from a normalization standpoint is probably overly complex.
I'd convert the value field to string and add a column indicating what the datatype should be treated as for post processing and reporting.
Sql Server at least has an IsNumeric function you can use:
ORDER BY IsNumeric(Results.Value) DESC,
CASE WHEN IsNumeric(Results.Value) = 1 THEN Len(Results.Value) ELSE 99 END,
Results.Value
One of two solutions comes to mind. It kind of depends on what you're doing with the numbers. If they just represent a choice of some kind, then pick one. If you need to do math on it (sorting, conversion, etc..) then pick another.
Change the column to be a varchar, and then either put numbers or text in it. Sorting numerically will suck, but hey, it's one column.
Have both a varchar column for the text, and an int column for the number. Use a view to hide the differences, and to control the sorting if necessary. You can coalesce the two columns together if you don't care about whether you're looking at numbers or text.
Due to a weird request, I can't put null in a database if there is no value. I'm wondering what can I put in the store procedure for nothing instead of null.
For example:
insert into blah (blah1) values (null)
Is there something like nothing or empty for "blah1" instead using null?
I would push back on this bizarre request. That's exactly what NULL is for in SQL, to denote a missing or inapplicable value in a column.
Is the requester experiencing grief over SQL logic with NULL?
edit: Okay, I've read your reply with the extra detail about this job assignment (btw, generally you should edit your original question instead of posting more information in an answer).
You'll have to declare all columns as NOT NULL and designate a special value in the domain of that column's data type to signify "no value." The appropriate value to choose might be different on a case by case basis, i.e. zero may signify nothing in a person_age column, but it might have significance in an items_in_stock column.
You should document the no-value value for each column. But I suppose they don't believe in documentation either. :-(
Depends on the data type of the column. For numbers (integers, etc) it could be zero (0) but if varchar then it can be an empty string ("").
I agree with other responses that NULL is best suited for this because it transcends all data types denoting the absence of a value. Therefore, zero and empty string might serve as a workaround/hack but they are fundamentally still actual values themselves that might have business domain meaning other than "not a value".
(If only the SQL language supported a "Not Applicable" (N/A) value type that would serve as an alternative to NULL...)
Is null is a valid value for whatever you're storing?
Use a sentry value like INT32.MaxValue, empty string, or "XXXXXXXXXX" and assume it will never be a legitimate value
Add a bit column 'Exists' that you populate with true at the same time you insert.
Edit: But yeah, I'll agree with the other answers that trying to change the requirements might be better than trying to solve the problem.
If you're using a varchar or equivalent field, then use the empty string.
If you're using a numeric field such as int then you'll have to force the user to enter data, else come up with a value that means NULL.
I don't envy you your situation.
There's a difference between NULLs as assigned values (e.g. inserted into a column), and NULLs as a SQL artifact (as for a field in a missing record for an OUTER JOIN. Which might be a foreign concept to these users. Lots of people use Access, or any database, just to maintain single-table lists.) I wouldn't be surprised if naive users would prefer to use an alternative for assignments; and though repugnant, it should work ok. Just let them use whatever they want.
There is some validity to the requirement to not use NULL values. NULL values can cause a lot of headache when they are in a field that will be included in a JOIN or a WHERE clause or in a field that will be aggregated.
Some SQL implementations (such as MSSQL) disallow NULLable fields to be included in indexes.
MSSQL especially behaves in unexpected ways when NULL is evaluated for equality. Does a NULL value in a PaymentDue field mean the same as zero when we search for records that are up to date? What if we have names in a table and somebody has no middle name. It is conceivable that either an empty string or a NULL could be stored, but how do we then get a comprehensive list of people that have no middle name?
In general I prefer to avoid NULL values. If you cannot represent what you want to store using either a number (including zero) or a string (including the empty string as mentioned before) then you should probably look closer into what you are trying to store. Perhaps you are trying to communicate more than one piece of data in a single field.