Char(4) versus int as StatusID/StatusCode column in a table - sql

I need a status column that will have about a dozen possible values.
Is there any reason why I should choose int (StatusID) over char(4) (StatusCode)?
Since sql server doesn't support named constants, char is far more descriptive than int when used in stored procedure and views as constants.
To clarify, I would still use a lookup table either way. Since the I will need a more descriptive text for the UI. So this decision is only to help me as the developer when I'm maintaining the stored procedures and views.
Right now I'm leaning toward char(4). Especially since designing views in SQL Server Management Studio prevents me from adding comments (I know it's possible to add it in the script editor, but realistically I will use the View Designer far more often, especially if the view is trivial). StateCODE = 'NEW' is much more readable than StateID = 1000.
I guess the question is will there be cases where char(4) is problematic, and since the database is pretty small, I'm not too concerned about slight performance hit (like using TinyInt versus int), but more afraid of code maintenance problems.

Database purists will say a key should have no meaning in the business domain, and that you should create a status table where you look up the description and other meanings of the status.
But for operators and end users, having a descriptive status code can be a blessing. And it doesn't even have to be char(4), you can make it varchar(20). This allows them to query without joins, and inspect the database in an easier way.
In the end, I think the char(20) organization will run more smoothly, and go home earlier on Friday. But the int organization has a better abstraction of the database, and they can enjoy meta programming on friday evening (or boosting on forums.)
(All of this assuming that you're writing business support software. One of the more succesful business support systems, SAP, makes successful use of meaningful keys.)

There are many pro's and con's to each method. I'm sure other arguments will come up in favour of using a char(4). My reasons for choosing an int over a char include:
I always use lookup tables. They allow for an audit trail of the value to be retained and easily examined. For example, if one of your status codes is 'MING' and a business decision is made to change it from 'MING' to 'MONG' from a certain date, my lookup table handles this.
Smaller index - if you need to index this column, it will be thinner.
Extendability - OK, I made that word up, but if you need to go from 4 chars to 5 chars for example, a lookup table would be a blessing.
Descriptions: We use a lot of TLA's here which once you know what they are is great but if I gave a business user a report that said "GDA's 2007 1001", they wouldn't necessarily twig that GDA = Good Dead on Arrival. With a lookup table, I can add this description.
Best practice: Can't find the link to hand but it might be something I read in a K.Tripp article. Aim to make your clustered primary key incrementing integers to optimise the index.
Of course if you are absolutely positive that you will never need any more than a handful of 4 characters, there is no reason not to bang it in the table.

The best thing should be a lookup table with defined values and then relate it to original table, that uses that enumeration.

Collation ambigities are one reason to say no to char 4: Does ABcD = abCD = äBCd?
If you have 12 possible values, why not tinyint/byte and a Status table?
If you have to store the status for 10 million rows the 3 bytes different and the collation/string compares add up.

The place where I've run into this use case is columns that would map onto things that I would typically use an Enum for when programming. Do you store the integer value of the Enum or the name of the Enum in the database column? Honestly, I've done it both ways. Usually, I ask myself if the database will be used outside the application I'm building. If so, I will choose the human readable format to store in the database. If not, then I'll choose the integer value as it saves a little time when reconstituting (it's just a cast instead of a parse operation) the Enum in code.

You could also use a tinyint over an int

i always choose int's simply because they are easier to map to enums in code.

If you're dealing with huge amounts of data and high throughput then a smallint or tinyint can give better performance and a smaller footprint on the hard disk. If the data in your application is often viewed directly through applications like Access or Cognos then your business people will probably appreciate the descriptive values. I know that when I'm analyzing data as part of my Database Developer role I get tired of joining a lot of lookup tables because I can't remember if 1 = Foo and 2 = Bar or 1 = Bar and 2 = Foo.
Also, although performance will be enhanced if you have to lookup rows by these codes which can have smaller indexes, it can also be hurt (in a minor way) by having to do the joins if you are often looking up rows regardless of the code but where you have to include the text value. In most applications that's not an issue though and would probably only come into play in large data warehousing/reporting environments.

Related

Is using JOINs to avoid numerical IDs a bad thing? [duplicate]

This question already has answers here:
Performance of string comparison vs int join in SQL
(5 answers)
Closed 9 years ago.
Yesterday I was looking at queries like this:
SELECT <some fields>
FROM Thing
WHERE thing_type_id = 4
... and couldn't but think this was very "readable". What's '4'? What does it mean? I did the same thing in coding languages before but now I would use constants for this, turning the 4 in a THING_TYPE_AVAILABLE or some such name. No arcane number with no meaning anymore!
I asked about this on here and got answers as to how to achieve this in SQL.
I'm mostly partial to using JOINS with existing type tables where you have an ID and a Code, with other solutions possibly of use when there are no such tables (not every database is perfect...)
SELECT thing_id
FROM Thing
JOIN ThingType USING (thing_type_id)
WHERE thing_type_code IN ('OPENED', 'ONHOLD')
So I started using this on a query or two and my colleagues were soon upon me: "hey, you have literal codes in the query!" "Um, you know, we usually go with pks for that".
While I can understand that this method is not the usual method (hey, it wasn't for me either until now), is it really so bad?
What are the pros and cons of doing things this way? My main goal was readability, but I'm worried about performance and would like to confirm whether the idea is sound or not.
EDIT: Note that I'm not talking about PL/SQL but straight-up queries, the kind that usually starts with a SELECT.
EDIT 2:
To further clarify my situation with fake (but structurally similar) examples, here are the tables I have:
Thing
------------------------------------------
thing_id | <attributes...> | thing_type_id
1 3
4 7
5 3
ThingType
--------------------------------------------------
thing_type_id | thing_type_code | <attributes...>
3 'TYPE_C'
5 'TYPE_E'
7 'TYPE_G'
thing_type_code is just as unique as thing_type_id. It is currently also used as a display string, which is a mistake in my opinion, but would be easily fixable by adding a thing_type_label field duplicating thing_type_code for now, and changeable at any time later on if needed.
Supposedly, filtering with thing_type_code = 'TYPE_C', I'm sure to get that one line which happens to be thing_type_id = 3. Joins can (and quite probably should) still be done with the numerical IDs.
Primary key values should not be coded as literals in queries.
The reasons are:
Relational theory says that PKs should not convey any meaning. Not even a specific identity. They should be strictly row identifiers and not relied upon to be a specific value
Due to operational reasons, PKs are often different in different environments (like dev, qa and prod), even for "lookup" tables
For these reasons, coding literal IDs in queries is brittle.
Coding data literals like 'OPENED' and 'ONHOLD' is GOOD practice, because these values are going to be consistent across all servers and environments. If they do change, changing queries to be in sync will be part of the change script.
I assume that the question is about the two versions of the query -- one with the numeric comparison and the other with the join and string comparison.
Your colleagues are correct that the form with where thing_id in (list of ids) will perform better than the join. The difference in performance, however, might be quite minor if thing_id is not indexed. The query will already require a full table scan on the original table.
In most other respects, your version with the join is better. In particular, it makes the intent of the query cleaner and overall make the query more maintainable. For a small reference table, the performance hit may not be noticeable. In fact, in some databases, this form could be faster. This would occur when the in is evaluated as a series of or expressions. If the list is long, it might be faster to do an index lookup.
There is one downside to the join approach. If the values in the columns change, then the code also needs to be changed. I wouldn't be surprised if your colleague who suggests using primary keys has had this experience. S/he is working on an application and builds it using joins. Great. Lots of code. All clear. All maintainable. Then every week, the users decide to change the definitions of the codes. That can make almost any sane person prefer primary keys over using the reference table.
See Mark comment. I assume you are ok but can give my 2 cents on matter.
If that value is in the scope of one query I like to write that this, readable, way:
declare HOLD int = 4
SELECT <some fields>
FROM Thing
WHERE thing_type_id = HOLD
If that values are used many times in many points (queries, SP, views, etc)
I create a domain table.
create table ThingType (id int not null primary key, varchar(50) description)
GO
insert into ThingType values (4,'HOLD'),(5, 'ONHOLD')
GO
that way i can reuse that types on my selects as an enumerator
declare TYPE int
set TYPE = (select id from ThingType where description = 'HOLD')
SELECT <some fields>
FROM Thing
WHERE thing_type_id = TYPE
that way I keep meaning and performance (and also can enforce relational integrity over domain values)
Also I can just use enumerator at app level and just pass numeric values to the queries. A quick glimpse in that enumerator ill give me that number meaning.
In SQL queries you will definitely introduce a performance hit for JOINs (effectively multiple queries are taking place inside the SQL server). The question is whether the performance hit is significant enough to offset the benefits.
If it's just a readability thing then you may prefer to go for better performance and avoid the JOINs, but I would suggest you take into account potential integrity problems (e.g. what happens if the typed value of 4 in your example is changed by another process further down the line - the entire application may fail).
If the values will NEVER change then use PKs - this is a decision for you as the developer - there is no rule. One options may be best for one query and not for another.
In case of PL/SQL it makes sense to define constants in your package, e.g.
DECLARE
C_OPENED CONSTANT NUMBER := 3;
C_ONHOLD CONSTANT NUMBER := 4;
BEGIN
SELECT <some fields>
INTO ...
FROM Thing
WHERE thing_type_id in (C_OPENED, C_ONHOLD);
END;
Sometime it is usefull to create global package (without a body) where all commonly used constants are defined. In case the literal changes, you only have to modify the constant definition at a single place.

Compare queries on converted columns

Certain parts of my database are required to be extremely flexible to the point that the user might decide to manipulate number and/or data types of columns in a table. The data that is already in the table though should be preserved.
That leaves me with the only option of using nvarchar(max) as the data type for any column in any of those tables.
Be it the case that the user chooses to store integers in a certain column and then wants to get all rows with that field in a certain range. Then I should run a compare query over converted values of that column into int.
I am afraid that would a performance disaster. Assuming that I am left with no other design alternatives, what can I do to improve some performance in this scenario?
I can relate to this problem. An application, for instance, might be taking user input from an Excel spreadsheet and need to store this in a format as the user sees it. Once in the database, though, you might have other requirements on filtering and combining data.
You've solved half the problem. By storing the value in a character field, you can store what the user wants.
The second half is to store the value also as a reasonable way for the database to manipulate. I would decide on a set of base types, perhaps just float and datetime, depending on the application. Then, when a user inserts a value, you can do the conversion and set the value in a separate columns. Your table might have columns like this:
ColumnX_WhatTheUserSees nvarchar(max),
ColumnX_Type char(1) not null default 'C', -- 'C'haracter, 'F'loat, 'D'atetime
ColumnX_Float float,
ColumnX_Datetme
The insertion logic then goes something like this:
insert into t(ColumnX_WhatTheUSerSees, ColumnX_type, ColumnX_Float, ColumnX_Datetime)
select #ColX,
(case when isnumeric(#Colx) = 1 then 'F'
when isdate(#Colx) = 1 then 'D'
else 'C'
end),
(case when isnumeric(#Colx) = 1 then cast(#Colx as float) end),
(case when isdate(#Colx) = 1 then cast(#Colx as datetime) end)
The above code is meant for illustrative purposes only. You may need to handle special cases you are not interested in (perhaps you think '1e5' should be a string or you might want to handle numbers with parentheses as negative numbers).
You can handle the extra part of the update through a before insert or before update trigger, so the user would never see the extra complexity. You can provide a view so the user sees only the "WhatTheUserSess" columns.
Finally, SQL does offer the sql_variant data type. This provides an alternative route for what you want. However, it would lose the initial user formatting (which has been important when I've encountered similar problems).
Given what you said, perhaps you could add an additional int column for each column and a trigger that will populate it as an int if the user puts one in the nvarchar(max) column) then at least you would only have to convert the data once, rather than each time you query it. Otherwise , yes you are stuck with the poorly performing conversion to an integer (whcih is problematic since you have to preserve earlier information that may not be int) in order to do any kind of ordering or mathmatical calculation. Another possibility is to have a string column and an int column (and a trigger to make sure only one of the two is populated) and then a view that coalesces them for display for when you ned to show all records. A meta table to tell you which one the client is using could help you in wswrting queries. No matter what this is a mess. Have you considered that a nosql solution might be better for your requirment?? That is the use case for NoSQL, data athat is unstructured. If we knew the real use for this data, it is possible we could suggest a better design alternative.
(Turn Rant on - Personally, without knowing more, I would question the need for any application to be that flexible. Often requirements add more flexibility than users actually require or will use and developers dutifully build it. I have seen this in every single COTS program I have had to support. Users in general think they want flexibility - making it a sales point, but find it so hard to use that they will not use it in practice. Sometimes we need to do a better job of pushing back when the requirement will make the software run slowly or be virtually unusable. Turn Rant off. )

When would combining columns into a single, delimited column be better in a RDB schema?

Consider for example the case where you have two peaces of data, where one value is rarely used without the other. As one example, here is a table holding user authentication data :
CREATE TABLE users
(
id INT PRIMARY KEY,
auth_name STRING,
auth_password STRING,
auth_password_salt STRING
)
I think that password is meaningless without salt, and the other way around. I also have the option on representing the data this way:
CREATE TABLE users
(
id INT PRIMARY KEY,
auth_name STRING,
auth_secret STRING,
)
And in auth_secret, store strings such as D5SDfsuuAedW:unguessable42
In general, are there any situations where combining columns into one, delimited column would be a better choice?
Even if it is never a "better choice" overall, are there any costs (performance, space, anything) to having more columns vs fewer columns (for the same data)? My motivation is better understanding and to be able to more competently argue against it when someone suggests this sort of thing.
--edited I changed the example... original example as follows:
CREATE TABLE points
(
id INT PRIMARY KEY,
x_coordinate INT,
y_coordinate INT,
z_coordinate INT
)
vs
CREATE TABLE points
(
id INT PRIMARY KEY,
position STRING
)
In position, storing strings such as 7:3:15
You do that when there is no chance of needing to join, query, report or aggregate the data.
In other words - never. It is bad database design.
First Normal form (NF1) states that attributes should be distinct - it is the basic requirement.
The only possible answer to this question is never. Never, ever, store delimited data in a column. It defeats the entire point of columns, which are there to delimit your data, and makes it inordinately difficult to do anything that a database has been designed to do. It's a violation of normalisation so huge that you'll spend hours on Stack Overflow trying to correct it in a months time.
Never do this.
However, "never say never".
In certain, extremely limited, circumstances it's okay. Never assume it's okay but it can be.
A good example is Stack Overflow's own Posts table, which stores the tags in a delimited format for quick reading. The tags a question has are read from the database far more often than they are edited. The tags are stored in a separate table, PostTags, and then denormalised to Posts when they are updated.
In short, even though you can denormalise your data in this way, don't. Try everything possible to avoid it. If you come across a situation where you've been optimizing for days and the only way to get something quicker is to denormalize, then it's okay. Just ensure that you are only ever going to read data from that column and you have a secondary process in place to ensure that it is kept up-to-date. If the update of the denormalised data fails, roll everything back to ensure that your data is consistent.
You left out a significant option: create an appropriate user-defined data type. (PostgreSQL has long had an intrinsic data type for 2-space.)
PostgreSQL
Oracle
SQL Server
DB2
These implementations differ quite a lot.
But you might not have the luxury of using one of those platforms. You might have to use MySQL, for example, which doesn't support user-defined data types.
Relational theory says that data types can be arbitrarily complex; they can have internal structure. The most common data type that has internal structure is the type "date". Relational theory specifies what the dbms is supposed to do with data types like that. The dbms must either
ignore the internal structure entirely, or
provide functions to manipulate the parts.
In the case of dates, every SQL dbms provides functions to manipulate the parts.
You can make a good argument for a single column that stores 3-space coordinates like "7:3:15" in MySQL. To keep in line with relational theory, you'd want the dbms to ignore the structure, and return only the single value "7:3:15"; manipulation of parts is left to application code.
One problem with implementing something like that in MySQL is that MySQL doesn't enforce CHECK constraints. So it's a lot harder to prevent values like "wibble:frog:foo" from finding their way into the database.

Are there any advantages to use varchar over decimal for Price and Value

I was arguing with my friend against his suggestion to store price, value and other similar informations in varchar.
My point of view are on the basis of
Calculations will become difficult as we need to cast back and forth.
Integrity of the data will be lost.
Poor performance of Indexes
Sorting and aggregate functions will also need casting
etc. etc.
But he was saying that in his previous employement everybody used to store such values in varchar, because the communication between DB and the APP will be very effective in this approach. (I still cant accept this)
Are there really some advantages in storing such values in varchar ?
Note : I'm not talking about columns like PhoneNo, IDs, ZIP Code, SSN etc. I know varchar is best suited for those. The columns are value based, and will for sure be involved in calculations some way or other.
None at all.
Try casting a values back and too and see how much data you lose.
DECLARE #foo TABLE (bar varchar(30))
INSERT #foo VALUES (11.2222222222)
INSERT #foo VALUES (22.3333333333)
INSERT #foo VALUES (33.1111111111)
SELECT CAST(CAST(bar AS float) AS varchar(30)) FROM #foo
I would also mention that his current employment does things differently... he isn't at his previous employment any more....
I think a big part of the reason to use the APPROPRIATE (in this case decimal) data type is to prevent invalid data. There's nothing to stop someone entering "The King" as a price in a varchar field.
I can see no advantages, and a whole heap of very severe disadvantages - the most pressing of which is performance (particularly when sorting).
Consider if you want to get a list of the N most expensive products, and you are storing your price as a VARCHAR. Here are some sample values (sorted in descending order)
SELECT Price FROM Table ORDER BY Price DESC
Price
-----
90
600
50
1000
Whoops! The sort order is, well, wrong! (Alphanumerical sorting, rather than value sorting).
If we want to do the sort properly then this means we either need to pad values with zeroes at the start, or convert each value to a double before we sort - but if we have to do a convert on every row this means that SQL server has no way of using statistics to predict what the results will be! This in turn means extremely poor performance, probably a table scan.
As Kragen notes, sorts will not necessarily come out in the right order.
Compares won't necessarily work either. If a field is defined as, say, decimal(8,2) and I give it the value "37.20", and later I write "select ... where price=37.2", the result will be true. But if I store a varchar 37.20 and compare it to 37.2, it will not be equal. Similarly if one or the other has leading zeros.
You could solve these problems by having the application insure that you always store the numbers with a fixed number of decimal places and padded with leading zeros. Oh, and make sure you have a consistent convention about storing minus signs. But then every place in the app that writes to this field must be sure that it follows exactly the same rules. We could do this of course, but why? The database engine will do it for us if we just declare the field numeric. Like, yes, I COULD mow my lawn with a pair of scissors, but why would I want to do this?
I don't understand what your friend is saying the advantage is supposed to be. Easier communication between app and database? How? Maybe he was using some unconventional language or database interface that couldn't read numeric values from the DB. I've never had an issue with this. Actually just saying that gets me to wondering if that isn't what happenned: That at his previous company they were using some language or tool that couldn't read decimals from the database because of an implementation problem, the only way they could get it to work was to declare all the numbers as varchar, and now he walks away thinking that's a generally good idea.
Ok . One word answer . Dont
You are right about correct data types having impact on performance (SQL Optimizer works differently for INT VS VARCHAR) , data consistency and integrity etc
if all we needed was VARCHAR I dont think we ever invented other types.
SQL is not dynamically typed. Static typing makes optimization better , index pages smaller and query operators efficient.
It is not the problem of source that consumer needs all strings as input. it is upto consumer to do type checking and consuming data. A DB should always have correct types .
(Forget about choosing between INT and VARCHAR i would say you should also think whether you should have INT or TINYINT ) these consideration makes a lot of difference
Data Types are best stored in fields that match the type between two different systems. In this case you are referring from your .Net objects to MS SQL server. You are correct with data integrity loss and with the need to cast/convert data types into useable forms. As for other types such as Phone Number, ZIP Code, SSN and so on; they too would benefit from dedicated data types. The main reason these are stored in VARCHAR/NVARCHAR is due to the number of different possibilities that are not needed in every system. But if you have a type that is commonly used and you want to constrain it you can build custom data types called User-defined types to store that data in SQL server. (Even more fun is CLR defined types see example on code project.)
The only advantage I can see with using any sort of variable-sized string-ish format would be if the field would have to accommodate an unknown amount of additional information. For example, "49.95#1/39.95#5/29.95#20/14.95#100,match=true/24.95#100" to indicate that this particular product has price points at 1, 5, 20, and 100 units, and the best 100-unit price is only available when all items are identical. Using strings to store such things is icky, but if the number of price-points is open-ended, using a variable-sized field might be better than having to create another table with one row per product/price-point combination. If you do go that route, it may be good to use XML serialization for the data, rather than an ad-hoc thing as shown above. An ad-hoc approach might allow faster parsing in some cases, but if things really are open-ended it could become a real pain to maintain.
Addendum: If you want to be able to do any type of sorting or searching based on price, you'll need to have separate columns for that. If you want to allow users to e.g. find the ten cheapest items at 100-piece mix/match quantity, and the database holds 10,000 possible items, the only way to satisfy the query with varchar-stored data would be to read all l0,000 items and evaluate what the best price would be given the restrictions. If users can only query based upon a small number of price/restriction combinations, it may be helpful to have a column for each one to allow direct queries.

Why use "Y"/"N" instead of a bit field in Microsoft SQL Server?

I'm working on an application developed by another mob and am confounded by the use of a char field instead of bit for all the boolean columns in the database. It uses "Y" for true and "N" for false (these have to be uppercase). The type name itself is then aliased with some obscure name like ybln.
This is very annoying to work with for a lot of reasons, not the least of which is that it just looks downright aesthetically unpleasing.
But maybe its me that's stupid - why would anyone do this? Is it a database compatibility issue or some design pattern that I am not aware of?
Can anyone enlighten me?
I've seen this practice in older database schemas quite often. One advantage I've seen is that using CHAR(1) fields provides support for more than Y/N options, like "Yes", "No", "Maybe".
Other posters have mentioned that Oracle might have been used. The schema I referred to was in-fact deployed on Oracle and SQL Server. It limited the usage of data types to a common subset available on both platforms.
They did diverge in a few places between Oracle and SQL Server but for the most part they used a common schema between the databases to minimize the development work needed to support both DBs.
Welcome to brownfield. You've inherited an app designed by old-schoolers. It's not a design pattern (at least not a design pattern with something good going for it), it's a vestige of coders who cut their teeth on databases with limited data types. Short of refactoring the DB and lots of code, grit your teeth and gut your way through it (and watch your case)!
Other platforms (e.g. Oracle) do not have a bit SQL type. In which case, it's a choice between NUMBER(1) and a single character field. Maybe they started on a different platform or wanted cross platform compatibility.
I don't like the Y/N char(1) field as a replacement to a bit column too, but there is one major down-side to a bit field in a table: You can't create an index for a bit column or include it in a compound index (at least not in SQL Server 2000).
Sure, you could discuss if you'll ever need such an index. See this request on a SQL Server forum.
They may have started development back with Microsoft SQl 6.5
Back then, adding a bit field to an existing table with data in place was a royal pain in the rear. Bit fields couldn't be null, so the only way to add one to an existing table was to create a temp table with all the existing fields of the target table plus the bit field, and then copy the data over, populating the bit field with a default value. Then you had to delete the original table and rename the temp table to the original name. Throw in some foriegn key relationships and you've got a long script to write.
Having said that, there were always 3rd party tools to help with the process. If the previous developer chose to use char fields in lieu of bit fields, the reason, in a nutshell, was probably laziness.
The reasons are as follows (btw, they are not good reasons):
1) Y/N can quickly become "X" (for unknown), "L" (for likely), etc. - What I mean by this is that I have personally worked with programmers who were so used to not collecting requirements correctly that they just started with Y/N as sort of 'flags' with the superstition that it might need to expand (to which they should use an int as a status ID).
2) "Performance" - but as was mentioned above, SQL indexes are ruled out if they are not 'selective' enough... a field that only has 2 possible values will never use that index.
3) Lazyness. - Sometimes developers want to output directly to some visual display with the letter "Y" or "N" for human readableness, and they don't want to convert it themselves :)
There are all 3 bad reasons that I've heard/seen before.
I can't imagine any disadvantage in not being able to index a "BIT" column, as it would be unlikely to have enough different values to help the execution of a query at all.
I also imagine that in most cases the storage difference between BIT and CHAR(1) is negligible (is that CHAR a NCHAR? does it store a 16bit, 24bit or 32bit unicode char? Do we really care?)
This is terribly common in mainframe files, COBOL, etc.
If you only have one such column in a table, it's not that terrible in practice (no real bit-wasting); after all SQL Server will not let you say the natural WHERE BooleanColumn, you have to say WHERE BitColumn = 1 and IF #BitFlag = 1 instead of the far more natural IF #BooleanFlag. When you have multiple bit columns, SQL Server will pack them. The case of the Y/N should only be an issue if case-sensitive collation is used, and to stop invalid data, there is always the option of a constraint.
Having said all that, my personal preference is for bits and only allowing NULLs after careful consideration.
Apparently, bit columns aren't a good idea in MySQL.
They probably were used to using Oracle and didn't properly read up on the available datatypes for SQL Server. I'm in exactly that situation myself (and the Y/N field is driving me nuts).
I've seen worse ...
One O/R mapper I had occasion to work with used 'true' and 'false' as they could be cleanly cast into Java booleans.
Also, On a reporting database such as a data warehouse, the DB is the user interface (metadata based reporting tools notwithstanding). You might want to do this sort of thing as an aid to people developing reports. Also, an index with two values will still get used by index intersection operations on a star schema.
Sometimes such quirks are more associated with the application than the database. For example, handling booleans between PHP and MySQL is a bit hit-and-miss and makes for non-intuitive code. Using CHAR(1) fields and 'Y' and 'N' makes for much more maintainable code.
I don't have any strong feelings either way. I can't see any great benefit to doing it one way over another. I know philosophically the bit fields are better for storage. My reality is that I have very few databases that contain a lot of logical fields in a single record. If I had a lot then I would definitely want bit fields. If you only have a few I don't think it matters. I currently work with Oracle and SQL server DB's and I started with Cullinet's IDMS database (1980) where we packed all kinds of data into records and worried about bits and bytes. While I do still worry about the size of data, I long ago stopped worrying about a few bits.