Boolean Look up Tables - Bool Column or Simple Lookup? - sql

We need a table to store requests for a copy of a contract, let's say contract_copy_requests.
The user either made a request, or did not. The field is unlikely to ever toggle.
Which of these column options is "correct":
profile_id, request date (and only write ids that made a request in here) OR
profile_id, has_requested_copy, updated_date
The latter is more complete and extensible, but the former is simpler and doesn't require either a nullable field or backfilling existing data with false for everyone since they haven't made the choice yet.
Is there any real reason to do #2? Btw, this is more of a generic question - I have had this use case come up about 3 times in the past month and I always lean toward #1 but wanted to validate it.

The first option is better.
It is smaller, and you lose no functionality: for example, it will be easy to get the users that didn't make a request:
... WHERE NOT EXISTS (SELECT 1 FROM contract_copy_requests AS ccr
WHERE ccr.request.id = /* column from the outside */)

Related

Autoincrement varchar column optionally

I have a table with guid identifier and one field that is a 5 characters string that can be specified by user, but it is optional, and it should be unique per user. I'm looking for a way to have this field always there, even if user doesn't specify it. The easiest approach is to have it like "00001", "00002"... etc. in case that user doesn't specify it, it is stored like this. I'm using SQL and entity framework core. What is the best way to achieve this?
EDIT: maybe trigger that will check after insert if that field is not specified and then just take current row number and convert it to string? does this make sense?
Cheers
Setting a default value to '00001' can be done define the field with:
NOT NULL DEFAULT right('0000' || to_char(SomeSequence.nextval),5) (pseudo-code to be adapted to the DBMS you are connected to).
Compared to the solution in your EDIT, this will at least guarantee that 2 inserts at the same time from 2 different users get assigned different values.
The real problem comes with the unique constraint on the column. This does not work nicely when mixing manual input with calculated values.
If as a user, I input (manually) 00005, then the insertion will fail when SomeSequence reaches 5.
I think this problem will exist regardless of how you implement the generation of values (sequence, trigger, external code, ...)
Even if you are fine with coding some additional (and probably complicated) logic to manage that, it will probably decrease concurrency.

'-999' used for all condition

I have a sample of a stored procedure like this (from my previous working experience):
Select * from table where (id=#id or id='-999')
Based on my understanding on this query, the '-999' is used to avoid exception when no value is transferred from users. So far in my research, I have not found its usage on the internet and other company implementations.
#id is transferred from user.
Any help will be appreciated in providing some links related to it.
I'd like to add my two guesses on this, although please note that to my disadvantage, I'm one of the very youngest in the field, so this is not coming from that much of history or experience.
Also, please note that for any reason anybody provides you, you might not be able to confirm it 100%. Your oven might just not have any leftover evidence in and of itself.
Now, per another question I read before, extreme integers were used in some systems to denote missing values, since text and NULL weren't options at those systems. Say I'm looking for ID#84, and I cannot find it in the table:
Not Found Is Unlikely:
Perhaps in some systems it's far more likely that a record exists with a missing/incorrect ID, than to not be existing at all? Hence, when no match is found, designers preferred all records without valid IDs to be returned?
This however has a few problems. First, depending on the design, user might not recognize the results are a set of records with missing IDs, especially if only one was returned. Second, current query poses a problem as it will always return the missing ID records in addition to the normal matches. Perhaps they relied on ORDERing to ease readability?
Exception Above SQL:
AFAIK, SQL is fine with a zero-row result, but maybe whatever thing that calls/used to call it wasn't as robust, and something goes wrong (hard exception, soft UI bug, etc.) when zero rows are returned? Perhaps then, this ID represented a dummy row (e.g. blanks and zeroes) to keep things running.
Then again, this also suffers from the same arguments above regarding "record is always outputted" and ORDER, with the added possibility that the SQL-caller might have dedicated logic to when the -999 record is the only record returned, which I doubt was the most practical approach even in whatever era this was done at.
... the more I type, the more I think this is the oven, and only the great grandmother can explain this to us.
If you want to avoid exception when no value transferred from user, in your stored procedure declare parameter as null. Like #id int = null
for instance :
CREATE PROCEDURE [dbo].[TableCheck]
#id int = null
AS
BEGIN
Select * from table where (id=#id)
END
Now you can execute it in either ways :
exec [dbo].[TableCheck] 2 or exec [dbo].[TableCheck]
Remember, it's a separate thing if you want to return whole table when your input parameter is null.
To answer your id = -999 condition, I tried it your way. It doesn't prevent any exception

What's the reasoning behind result columns being excluded from auto-select statements in PetaPoco

If I have a POCO class with ResultColumn attribute set and then when I do a Single<Entity>() call, my result column isn't mapped. I've set my column to be a result column because its value should always be generated by SQL column's default constraint. I don't want this column to be injected or updated from business layer. What I'm trying to say is that my column's type is a simple SQL data type and not a related entity type (as I've seen ResultColumn being used mostly on those).
Looking at code I can see this line in PetaPoco:
// Build column list for automatic select
QueryColumns = ( from c in Columns
where !c.Value.ResultColumn
select c.Key
).ToArray();
Why are result columns excluded from automatic select statement because as I understand it their nature is to be read only. So used in selects only. I can see this scenario when a column is actually a related entity type (complex). Ok. but then we should have a separate attribute like ComputedColumnAttribute that would always be returned in selects but never used in inserts or updates...
Why did PetaPoco team decide to omit result columns from selects then?
How am I supposed to read result columns then?
I can't answer why the creator did not add them to auto-selects, though I would assume it's because your particular use-case is not the main one that they were considering. If you look at the examples and explanation for that feature on their site, it's more geared towards extra columns you bring back in a join or calculation (like maybe a description from a lookup table for a code value). In these situations, you could not have them automatically added to the select because they are not part of the underlying table.
So if you want to use that attribute, and get a value for the property, you'll have to use your own manual select statement rather than relying on the auto-select.
Of course, the beauty of using PetaPoco is that you can easily modify it to suit your needs, by either creating a new attribute, like you suggest above, or modifying the code you showed to not exclude those fields from the select (assuming you are not using ResultColumn in other join-type situations).

Standard use of 'Z' instead of NULL to represent missing data?

Outside of the argument of whether or not NULLs should ever be used: I am responsible for an existing database that uses NULL to mean "missing or never entered" data. It is different from empty string, which means "a user set this value, and they selected 'empty'."
Another contractor on the project is firmly on the "NULLs do not exist for me; I never use NULL and nobody else should, either" side of the argument. However, what confuses me is that since the contractor's team DOES acknowledge the difference between "missing/never entered" and "intentionally empty or indicated by the user as unknown," they use a single character 'Z' throughout their code and stored procedures to represent "missing/never entered" with the same meaning as NULL throughout the rest of the database.
Although our shared customer has asked for this to be changed, and I have supported this request, the team cites this as "standard practice" among DBAs far more advanced than I; they are reluctant to change to use NULLs based on my ignorant request alone. So, can anyone help me overcome my ignorance? Is there any standard, or small group of individuals, or even a single loud voice among SQL experts which advocates the use of 'Z' in place of NULL?
Update
I have a response from the contractor to add. Here's what he said when the customer asked for the special values to be removed to allow NULL in columns with no data:
Basically, I designed the database to avoid NULLs whenever possible. Here is the rationale:
• A NULL in a string [VARCHAR] field is never necessary because an empty (zero-length) string furnishes exactly the same information.
• A NULL in an integer field (e.g., an ID value) can be handled by using a value that would never occur in the data (e.g, -1 for an integer IDENTITY field).
• A NULL in a date field can easily cause complications in date calculations. For example, in logic that computes date differences, such as the difference in days between a [RecoveryDate] and an [OnsetDate], the logic will blow up if one or both dates are NULL -- unless an explicit allowance is made for both dates being NULL. That's extra work and extra handling. If "default" or "placeholder" dates are used for [RecoveryDate] and [OnsetDate] (e.g., "1/1/1900") , mathematical calculations might show "unusual" values -- but date logic will not blow up.
NULL handling has traditionally been an area where developers make mistakes in stored procedures.
In my 15 years as a DBA, I've found it best to avoid NULLs wherever possible.
This seems to validate the mostly negative reaction to this question. Instead of applying an accepted 6NF approach to designing out NULLs, special values are used to "avoid NULLs wherever possible." I posted this question with an open mind, and I am glad I learned more about the "NULLs are useful / NULLs are evil" debate, but I am now quite comfortable labeling the 'special values' approach to be complete nonsense.
an empty (zero-length) string furnishes exactly the same information.
No, it doesn't; in the existing database we are modifying, NULL means "never entered" and empty string means "entered as empty".
NULL handling has traditionally been an area where developers make mistakes in stored procedures.
Yes, but those mistakes have been made thousands of times by thousands of developers, and the lessons and caveats for avoiding those mistakes are known and documented. As has been mentioned here: whether you accept or reject NULLs, representation of missing values is a solved problem. There is no need to invent a new solution just because developers continue make easy-to-overcome (and easy-to-identify) mistakes.
As a footnote: I have been a DBE and developer for more than 20 years (which is certainly enough time for me to know the difference beetween a database engineer and a database administrator). Throughout my career I have always been in the "NULLs are useful" camp, though I was aware that several very smart people disagreed. I was extremely skeptical about the "special values" approach, but not well-versed enough in the academics of "How To Avoid NULL the Right Way" to make a firm stand. I always love learning new things—and I still have lots to learn after 20 years. Thanks to all who contributed to make this a useful discussion.
Sack your contractor.
Okay, seriously, this isn't standard practice. This can be seen simply because all RDBMS that I have ever worked with implement NULL, logic for NULL, take account of NULL in foreign keys, have different behaviour for NULL in COUNT, etc, etc.
I would actually contend that using 'Z' or any other place holder is worse. You still require code to check for 'Z'. But you also need to document that 'Z' doesn't mean 'Z', it means something else. And you have to ensure that such documentation is read. And then what happens if 'Z' ever becomes a valid piece of data? (Such as a field for an initial?)
At a basic level, even without debating the validity of NULL vs 'Z', I would insist that the contractor conforms to standard practices that exist within your company, not his. Instituting his standard practice in an environment with an alternative standard practice will cause confusion, maintenance overheads, mis-understanding, and in the end increased costs and mistakes.
EDIT
There are cases where using an alternative to NULL is valid in my opinion. But only where doing so reduces code, rather than creating special cases which require accounting for.
I've used that for date bound data, for example. If data is valid between a start-date and an end-date, code can be simplified by not having NULL values. Instead a NULL start-date could be replaced with '01 Jan 1900' and a NULL end-date could be replaced with '31 Dec 2079'.
This still can change behaviour from what may be expected, and so should be used with care:
WHERE end-date IS NULL no longer give data that is still valid
You just created your own millennium bug
etc.
This is equivalent to reforming abstractions such that all properties can always have valid values. It is markedly different from implicitly encoding specific meaning into arbitrarily chosen values.
Still, sack the contractor.
This is easily one of the weirdest opinions I've ever heard. Using a magic value to represent "no data" rather than NULL means that every piece of code that you have will have to post-process the results to account/discard the "no-data"/"Z" values.
NULL is special because of the way that the database handles it in queries. For instance, take these two simple queries:
select * from mytable where name = 'bob';
select * from mytable where name != 'bob';
If name is ever NULL, it obviously won't show up in the first query's results. More importantly, neither will it show up in the second queries results. NULL doesn't match anything other than an explicit search for NULL, as in:
select * from mytable where name is NULL;
And what happens when the data could have Z as a valid value? Let's say you're storing someone's middle initial? Would Zachary Z Zonkas be lumped in with those people with no middle initial? Or would your contractor come up with yet another magic value to handle this?
Avoid magic values that require you to implement database features in code that the database is already fully capable of handling. This is a solved and well understood problem, and it may just be that your contractor never really grokked the notion of NULL and therefore avoids using it.
If the domain allows missing values, then using NULL to represent 'undefined' is perfectly OK (that's what it is there for). The only downside is that code that consumes the data has to be written to check for NULLs. This is the way I've always done it.
I have never heard of (or seen in practice) the use of 'Z' to represent missing data. As to "the contractor cites this as 'standard practice' among DBAs", can he provide some evidence of that assertion? As #Dems mentioned, you also need to document that 'Z' doesn't mean 'Z': what about a MiddleInitial column?
Like Aaron Alton and many others, I believe that NULL values are an integral part of database design, and should be used where appropriate.
Even if you somehow manage to explain to all your current and future developers and DBAs about "Z" instead of NULL, and even if they code everything perfectly, you will still confuse the optimizer because it will not know that you've cooked this up.
Using a special value to represent NULL (which is already a special value to represent NULL) will result in skews in the data. e.g. So many things happened on 1-Jan-1900 that it will throw out the optimizer's ability to understand that actual range of dates that really are relevant to your application.
This is like a manager deciding: "Wearing a tie is bad for productivity, so we're all going to wear masking tape around our necks. Problem solved."
I've never heard about the wide-spread use of 'Z' as a substitute for NULL.
(Incidentally, I'd not particularly like to work with a contractor who tells you in the face that they and other "advanced" DBAs are so much more knowledgeable and better than you.)
+=================================+
| FavoriteLetters |
+=================================+
| Person | FavoriteLetter |
+--------------+------------------+
| 'Anna' | 'A' |
| 'Bob' | 'B' |
| 'Claire' | 'C' |
| 'Zaphod' | 'Z' |
+---------------------------------+
How would your contractor interpret the data from the last row?
Probably he would choose a different "magic value" in this table to avoid collision with the real data 'Z'? Meaning you'd have to remember several magic values and also which one is used where... how is this better than having just one magic token NULL, and having to remember the three-valued logic rules (and pitfalls) that go with it? NULL at least is standardized, unlike your contractor's 'Z'.
I don't particularly like NULL either, but mindlessly substituting it with an actual value (or worse, with several actual values) everywhere is almost definitely worse than NULL.
Let me repeat my above comment here for better visibility: If you want to read something serious and well-grounded by people who are against NULL, I would recommend the short article "How to handle missing information without using NULLs" (links to a PDF from The Third Manifesto homepage).
Nothing in principle requires nulls for correct database design. In fact there are plenty of databases designed without using null and there are plenty of very good database designers and whole development teams who design databases without using nulls. In general it's a good thing to be cautious about adding nulls to a database because they inevitably lead to incorrect or ambiguous results later on.
I've not heard of using Z being called "standard practice" as a placeholder value instead of nulls but I expect your contractor is referring to the concept of sentinel values in general, which are sometimes used in database design. However, a much more common and flexible way to avoid nulls without using "dummy" data is simply to design them out. Decompose the table such that each type of fact is recorded in a table that doesn't have "extra", unspecified attributes.
In reply to contractors comments
Empty string <> NULL
Empty string requires 2 bytes storage + an offset read
NULL uses null bitmap = quicker
IDENTITY doesn't always start at 1 (why waste half your range?)
The whole concept is flawed as per most other answers here
While I have never seen 'Z' as a magic value to represent null, I have seen 'X' used to represent a field that has not been filled in. That said, I have only ever seen this in one place, and my interface to it was not a database, but rather an XML file… so I would not be prepared to use this an argument for being common practice.
Note that we do have to handle the 'X' specially, and, as Dems mentioned, we do have to document it, and people have been confused by it. In our defence, this is forced on us by an external supplier, not something that we cooked up ourselves!

Positive or negative boolean field names

A table's boolean fields can be named using the positive vs the negative...
for example, calling a field:
"ACTIVE" , 1=on / 0=off
or
"INACTIVE" , 0=on / 1=off
Question:
Is there a proper way to make this type of table design decision, or is it arbitrary?
My specific example is a messages table with a bool field (private/public). This field will be set using a form checkbox when a user enters a new message. Is there a benefit in naming the field "public" vs "private"?
thanks.
I always prefer positive names, to avoid double negatives in code. "Is not inactive" is often cause for a double take when reading. "Is inactive" can always be written as "if (!Active)" whilst taking advantage of built-in language semantics.
My personal preference:
Use prefixes like "Is", "Has", etc. for Boolean fields to make their purpose clear.
Always name variables in the affirmative. For Active/Inactive, I would name it IsActive.
Don't make a bit field nullable unless you really have a specific purpose in doing so.
In your specific use case, the field should be named either IsPublic or IsPrivate--whichever name would result in a True answer when the user ticks the checkbox.
i would not disagree with some of the other answers but definitely avoid the incorrect answer which is not to put in double negatives always
Always use positive names.
If use negative names, you very quickly get into double negation. Not that double negation is rocket surgery, but it's a brain cycle and those are valuable :)
Always use positive.
It's simpler.
Take using the negation to the logical extreme: if InActive is better than Active, then why not InInActive, or InInInActive?
Because it would be less simple.
The proper way to handle these situations is to create a table to house the values associated with the column, and create a foreign key relationship between the two tables. IE:
WIDGETS table:
WIDGET_ID
WIDGET_STATUS (fk)
WIDGET_STATUS_CODES table:
WIDGET_STATUS_CODE (pk)
DESCRIPTION
If possible, WIDGET_STATUS_CODE would be a natural key (IE: ACT for "Active", INA for "Inactive"). This would make records more human readable, but isn't always possible so you'd use an artificial/surrogate key (like an auto-number/sequence/etc).
You want to do this because:
It's readable what status indicates (which was the original question)
Future proof in the need to define/use more statuses
Provides referencial integrity so someone couldn't set the value to 2, 3, 4, etc.
Space is cheap; there's nothing efficient about allowing bad data
Try to avoid boolean fields in databases alltogether.
One, the RM has a much better way to represent truth-valued information than via boolean fields : via the presence of a tuple in a table.
Two, boolean fields are very bad discriminators when querying. It's virtually complete madness to index them, so when querying, the presence of boolean fields gives no benefit at all.