sql boolean datatype alternatives

sql boolean datatype alternatives - sql

What are the situations when you would use a foreign key to a separate table rather than use a boolean (i.e. BIT in SQL Server)
For example would you replace the following two booleans in this table:
engine_sensors
--------------
id (int, primary key)
name (varchar(50))
fault_code (int)
display_warning (boolean) /* if fault show driver a warning */
is_test_sensor (boolean) /* ignore this sensor in diagnostic checks */
e.g. display_warning here might not display the warning to a driver but do display a warning to a mechanic who is testing the engine. So a separate table would be appropriate.
is_test_sensor could be replaced by sensor_type (FK to sensor_types table) which has types of test,live.

If the fields model a boolean value, I would leave them as booleans. That's why they exist.
I would not attempt to future proof my database design (YAGNI principle), you can always change it at a later date.

This depends why you'd want to avoid it. You could just have number fields with 0 for false and 1 for true. Not sure if there are any benefits though.

Np. Now I see what you are getting at (I think).
If you are asking what I think you are asking than yes, you might want to use a FK to a sensor table and list the sensors. This is typically what I would do...
CREATE TABLE [SensorType](
[Id] [int] NOT NULL,
[Type] [int] NOT NULL,
[DisplayWarningTo] [int] NOT NULL,
[Description] [nvarchar](100) NULL,
CONSTRAINT [PK_SensorType_Id] PRIMARY KEY (Id),
CONSTRAINT [FK_SensorType_WarningReceivor] FOREIGN KEY (DisplayWarningTo) REFERENCES WarningReceivor(Id)
);
CREATE TABLE [WarningReceiver](
[Id] [int] NOT NULL,
[Receiver] [int] NOT NULL,
CONSTRAINT [PK_WarningReceiver_Id] PRIMARY KEY (Id)
);
------
INSERT INTO WarningReceiver(Id, Type) VALUES (1, 'Mechanic');
INSERT INTO WarningReceiver(Id, Type) VALUES (2, 'Driver');
INSERT INTO SensorType(Id, Type, DisplayWarningTo) VALUES (1, 'Rear sensor', 2);
INSERT INTO SensorType(Id, Type, DisplayWarningTo) VALUES (2, 'Test sensor', 1);
INSERT INTO SensorType(Id, Type, DisplayWarningTo) VALUES (3, 'Production sensor', 2);
I tend not to use identity columns on 'type' things like this and specify my own id which I map directly to a C# enumerated constant like
public enum SensorType
{
RearSensor = 1,
TestSensor = 2,
ProductionSensor = 3
}
Then in your code when you pull out your engine sensor from the database you can just compare against your enum. e.g.
var engine_sensor = // get engine sensor from db.
if (engine_sensor == (int)SensorType.RearSensor)
{
// do something
}
else if (engine_sensor == (int)SensorType.TestSensor)
{
// display something to mechanic or whatever
}
I don't really know what your application domain is, so sorry if this doesn't make sense.
So to wrap up a couple of points and try and answer your question;
Yes I do think you are better off with FK's
You could just have them as int columns and define the sensors in code as I did with the enum
I tend to do both
--- define the enum for nice strong typing in code and
--- create a foreign key table to complete my schema in the database. It's still worth having this for two reasons; 1) When writing sql queries in management studio or something and your looking at your engine_sensors table and see numbers for the sensor type you can join on your FK table to see what the sensors are. Makes things a bit easier
Lastly, if you have a FK table it enforces referential integrity and restricts the values you can put in as sensor types to what you have defined in the sensor type table.
Hope this helps somewhat.

You have added one piece of information
'Its is a SQL Server database and I understand that you'd use a bit field. I was wondering on how bad an idea it is to replace them with foreign keys to a separate table'
You should edit your original question and put this in if this is important to the solution you seek so that more people will be able to easily find it.
But you still haven't said why you were wondering about replacing them with a FK. If you tell people what your end goal is or what you are trying to achieve, they are more likely to provide a range of solutions.
I'm sorry I can't suggest a solution. Is a foreign key (to what?) better than a boolean value? compared to ?
I think you need to clarify / re-structure your question a bit. Good luck.

engine_sensors
id (primary key)
name
fault_code
display_warning_engine_sensors /* if fault show driver a warning */
id (primary key, FK to engine_sensors)
test_sensors /* ignore this sensor in diagnostic checks */
id (primary key, FK to engine_sensors)
Remember : codes are poor, tables are rich. Regardless of how contradictory this seems : never use booleans to represent truth-valued information. The relational model already has a way for representing truth-valued information, and that is as the presence of some tuple in some relation that is the value of some relvar (in SQL terms : as the presence of some row in a table).
You cannot easily "extend" booleans to add extra functionality such as "display a warning only if the time is within the range [x hrs - y hrs].

Relational theory can help to answer the questions for you. WHERE the booleans go (in the table as shown or in a separate table as described in your example) should be determined by what the boolean data element is dependent (and fully dependent) on.
For example, if the data element "display_warning" depends only on the sensor (the entity your table is describing), then that column belongs in that table. However if it depends on other entities (the person - owner or mechanic - interacting with the sensor), then it more properly belongs in a different table where it can be fully and only dependent on the primary key of that table.
Ensuring that the data elements are dependent on the key (and nothing other than the key) is arrived at through "normalization" of the data. It's much to envolved to include in an answer here, but there are many references on the web that will help you to understand normalization more fully. Wikipedia is as good a place to start as any:
http://en.wikipedia.org/wiki/Database_normalization#Normal_forms

Indexing is one reason to avoid boolean types. You can;t index boolean fields, so if you have many records and search often on the fields that are boolean, having a separate table might be helpful. A separate sensor type table is also more extensible if you add more types. It is also helpful if there becomes a one to many relationships for a particular sensor type. Suppose an engine needed to have three of a particular type of sensor.

Using 'bit' in SQL server (0 or 1) will automatically map back to a boolean in linq to SQL if you still wanted a boolean in your code.
But you haven't stated what database or why you want an alternative to bool.

First, I'd like to confirm Oded. If you need boolean values you should use the responsible column types.
However, if you have to store very much different boolean values it is sometimes more useful to use old-school bit-masks, stored in an INT, BIGINT or even BINARY column.

Related

SQL/DB design: multi-column foreign key with mixed NULL/NOT NULL treated as mandatory or optional?

In DB design/SQL is it theoretically possible to declare something like that:
CREATE TABLE Groups
(
round_id INTEGER NOT NULL,
ordinal_nbr SMALLINT NOT NULL,
PRIMARY KEY (round_id, ordinal_nbr),
FOREIGN KEY (round_id) REFERENCES Rounds (id) /* irrelevant, just a reference to another table's ID */
)
CREATE TABLE Games
(
id INTEGER NOT NULL IDENTITY,
round_id INTEGER NOT NULL, /* !!! */
ordinal_nbr SMALLINT NULL, /* !!! */
scheduled_tipoff DATETIME NOT NULL,
PRIMARY KEY (id),
FOREIGN KEY (round_id, ordinal_nbr) REFERENCES Groups (round_id, ordinal_nbr) /* multi-column FK round_id NOT NULL, ordinal_nbr NULL */
)
Now the question:
Since this has to be considered rather a programming mistake, what is the best thing to adjust for such scenarios: treat such FK's as mandatory or treat them as optional?
What would be a logical policy here?

As I'm thinking about it it seems to make more sense to consider the whole FK rather optional. As soon as one piece of information is missing, whether intended or not, the whole FK depends on the column(s) whose value hasn't been set YET.
After all, the NULL destroys obligation. It makes more sense to me than the other way around.

I don't think there's a general answer to that. In the existing data, are there any null values in games.ordinal_nbr? There might be code out there that expects to be able to put a null in there, so you have check all the code that uses that table. Even more fun is different databases handle that in different ways. Some consider "null = null" to be true while others consider any comparison operator false if either side is null even if both are.

Nullable foreign keys in SQL have plenty of disadvantages. From a semantic modelling perspective it is unlikely to make sense. For example suppose the intended meaning of a null ordinal_nbr is that that attribute is unknown. In that case SQL may not evaluate the other attribute and won't return an error even if there is no matching row for the known value of round_id (YMMV depending on DBMS and other options).
I suggest you redesign it so as to eliminate all nullable foreign keys.

Why does it have to be a mistake?
Consider a Adults table and a Offspring table that references it.
(mis-stated previously) edit:
What if there were Offspring that you did know the parents of? Just ignore that they exist.
edit:
This is a real example I just encountered recently:
I am basing my database against a de-normalized source where I have no control over the data. A key referencing the owners table should be required logically. Running a query I came across an owner name that was:
'UNKOWN OWNER'
Hmm, that could floated around for a long time and it was only chance that I caught it. I run aggregate queries against the owner table where things like that will give me incorrect results. The designers of that database traded in the 'enormous complexity' of dealing with nulls by hiding it with their own brand of null.
If the value was blank or null it would raised an error immediately and I could changed the table in the beginning of the design. Also, in aggregate queries nulls fall out of joins so you do not have incorrect results. And when I want them I just left join the tables.

Create a single Field in your Groups table to act as the primary key and join on that..
You have an administrative nightmare here.
NULL<>NULL!!!!!
If X is Unknown and Y is Unknown can you say X=Y?
Depending on the you DB\ Configuration the Join may even fail.. an in my option the fact that any DB allows NUll=NULL to return True is a Mistake.

Arrays in database tables and normalization

Is it smart to keep arrays in table columns? More precisely I am thinking of the following schema which to my understanding violates normalization:
create table Permissions(
GroupID int not null default(-1),
CategoryID int not null default(-1),
Permissions varchar(max) not null default(''),
constraint PK_GroupCategory primary key clustered(GroupID,CategoryID)
);
and this:
create table Permissions(
GroupID int not null default(-1),
CategoryID int not null default(-1),
PermissionID int not null default(-1),
constraint PK_GroupCategory primary key clustered(GroupID,CategoryID)
);
UPD3: I envision Permissions as a comma-delimited string since MSSQL is our primary deployment target.
UPD: Forgot to mention, in the scope of this concrete question we will consider that the "fetch rows that have permission X" won't be performed, instead all the lookups will be made by GroupID and CategoryID only
UPD2: I envision the typical usage scenario as following:
int category_id=42;
int[] array_of_groups=new int[]{40,2,42};
if(!Permissions.Check(category_id, array_of_groups, Permission.EatAndDrink)) {
throw new StarveToDeathException();
}
Thoughts?
Thanks in advance!

I'd suggest to take the normalized road for the following reasons:
By having a table containing all possible permissions, you have self-documenting data. You may add a description to each permission. This definitely beats concatenated id values without any meaning.
You get all the advantages of referential integrity and can be sure that there are no bogus permission ids in your data.
Inserting and deleting permissions will be easier - you add or delete records. With the concatenated string you will be updating a column, and delete the record only when you remove the last permission.
Your design is future-proof - you say you only want to query by CategoryID and GroupID, you can do this already with normalized tables. On top of that, you will also for example be able to add other properties to your permissions, query by permission, etc.
Performance-wise, I think it will actually be faster to get a resultset of id's than having to parse a string to integers. To be measured with actual data and implementation...

Your second example should probably be:
constraint PK_GroupCategory primary key clustered(GroupID,CategoryID,PermissionID)
Your first example would violate normal form (and string parsing might not be a good use of your processing time), but that doesn't mean it's necessarily wrong for your application. It really depends how you use the data.

Is it smart
Occasionally, it depends. I'd say it depends how narrowly you define the things being normalised.
If you can see no way in which a table with one row for each item would ever be useful then I'd suggest that the encapsulate-in-a-string might be considered.
In the example given, I'd want to be sure that executing a query to find all group/category combinations for a specified permission would not cause me a problem if I had to write a WHERE clause that used string pattern matching. Of course, if I never have to perform such a query then it's a moot point.
In general I'm happiest with this approach when the data being assembled thus has no significance in isolation: the data only makes sense when considered as a complete set. If there's a little more structure, say a list of data/value pairs, then formatting with XML or JSON can be useful.

If you're only querying by GroupID and/or CategoryID then there's nothing wrong with it. Normalizing would mean more tables, rows, and joins. So for large databases this can have a negative performance impact.
If you're absolutely certain you'll never need a query which processes Permissions, and it's only parsed by your application, there's nothing improper about this solution. It could also be preferable if you always want the complete set of permissions (i.e. you're not querying just to get part of the string, but always want all of its values).

The problem with the first implementation is that it doesn't actually use an array but a concatenated string.
This means that you won't easily be able to use the value stored in that string to perform set based queries such as finding all people with a specific permission or specific set of permissions.
If you were using a database that natively supported arrays as an atomic value such PostgreSQL then the argument would be different.
Based upon the second requirement of the proposed query I'd have to suggest the second one is best as you can simply query SELECT count(*) FROM Permissions WHERE CategoryID = 42 AND GroupID IN (40, 2, 42) AND PermissionID = 2 (assuming EatAndDrink has an ID of 2). The first version however would require retrieving all the permissions for each group and parsing the string before you can test if it includes the requested permission.

Decision between storing lookup table id's or pure data

I find this comes up a lot, and I'm not sure the best way to approach it.
The question I have is how to make the decision between using foreign keys to lookup tables, or using lookup table values directly in the tables requesting it, avoiding the lookup table relationship completely.
Points to keep in mind:
With the second method you would
need to do mass updates to all
records referencing the data if it
is changed in the lookup table.
This is focused more
towards tables that have a lot of
the column's referencing many lookup
tables.Therefore lots of foreign
keys means a lot of
joins every time you query the
table.
This data would be coming from drop
down lists which would be pulled
from the lookup tables. In order to match up data when reloading, the values need to be in the existing list (related to the first point).
Is there a best practice here, or any key points to consider?

You can use a lookup table with a VARCHAR primary key, and your main data table uses a FOREIGN KEY on its column, with cascading updates.
CREATE TABLE ColorLookup (
color VARCHAR(20) PRIMARY KEY
);
CREATE TABLE ItemsWithColors (
...other columns...,
color VARCHAR(20),
FOREIGN KEY (color) REFERENCES ColorLookup(color)
ON UPDATE CASCADE ON DELETE SET NULL
);
This solution has the following advantages:
You can query the color names in the main data table without requiring a join to the lookup table.
Nevertheless, color names are constrained to the set of colors in the lookup table.
You can get a list of unique colors names (even if none are currently in use in the main data) by querying the lookup table.
If you change a color in the lookup table, the change automatically cascades to all referencing rows in the main data table.
It's surprising to me that so many other people on this thread seem to have mistaken ideas of what "normalization" is. Using a surrogate keys (the ubiquitous "id") has nothing to do with normalization!
Re comment from #MacGruber:
Yes, the size is a factor. In InnoDB for example, every secondary index stores the primary key value of the row(s) where a given index value occurs. So the more secondary indexes you have, the greater the overhead for using a "bulky" data type for the primary key.
Also this affects foreign keys; the foreign key column must be the same data type as the primary key it references. You might have a small lookup table so you think the primary key size in a 50-row table doesn't matter. But that lookup table might be referenced by millions or billions of rows in other tables!
There's no right answer for all cases. Any answer can be correct for different cases. You just learn about the tradeoffs, and try to make an informed decision on a case by case basis.

In cases of simple atomic values, I tend to disagree with the common wisdom on this one, mainly on the complexity front. Consider a table containing hats. You can do the "denormalized" way:
CREATE TABLE Hat (
hat_id INT NOT NULL PRIMARY KEY,
brand VARCHAR(255) NOT NULL,
size INT NOT NULL,
color VARCHAR(30) NOT NULL /* color is a string, like "Red", "Blue" */
)
Or you can normalize it more by making a "color" table:
CREATE TABLE Color (
color_id INT NOT NULL PRIMARY KEY,
color_name VARCHAR(30) NOT NULL
)
CREATE TABLE Hat (
hat_id INT NOT NULL PRIMARY KEY,
brand VARCHAR(255) NOT NULL,
size INT NOT NULL,
color_id INT NOT NULL REFERENCES Color(color_id)
)
The end result of the latter is that you've added some complexity - instead of:
SELECT * FROM Hat
You now have to say:
SELECT * FROM Hat H INNER JOIN Color C ON H.color_id = C.color_id
Is that extra join a huge deal? No - in fact, that's the foundation of the relational design model - normalizing allows you to prevent possible inconsistencies in the data. But every situation like this adds a little bit of complexity, and unless there's a good reason, it's worth asking why you're doing it. I consider possible "good reasons" to include:
Are there other attributes that "hang off of" this attribute? Are you capturing, say, both "color name" and "hex value", such that hex value is always dependent on color name? If so, then you definitely want a separate color table, to prevent situations where one row has ("Red", "#FF0000") and another has ("Red", "#FF3333"). Multiple correlated attributes are the #1 signal that an entity should be normalized.
Will the set of possible values change frequently? Using a normalized lookup table will make future changes to the elements of the set easier, because you're just updating a single row. If it's infrequent, though, don't balk at statements that have to update lots of rows in the main table instead; databases are quite good at that. Do some speed tests if you're not sure.
Will the set of possible values be directly administered by the users? I.e. is there a screen where they can add / remove / reorder the elements in the list? If so, a separate table is a must, obviously.
Will the list of distinct values power some UI element? E.g. is "color" a droplist in the UI? Then you'll be better off having it in its own table, rather than doing a SELECT DISTINCT on the table every time you need to show the droplist.
If none of those apply, I'd be hard pressed to find another (good) reason to normalize. If you just want to make sure that the value is one of a certain (small) set of legal values, you're better off using a CONSTRAINT that says the value must be in a specific list; keeps things simple, and you can always "upgrade" to a separate table later if the need arises.

One thing no one has considered is that you would not join to the lookup table if the data in it can change over time and the records joined to are historical. The example is a parts table and an order table. The vendors may drop parts or change part numbers, but the orders table should alawys have exactly what was ordered at the time it was ordered. Therefore, it should lookup the data to do the record insert but should never join to the lookup table to get information about an existing order. Instead the part number and description and price, etc. should be stored in the orders table. This is espceially critical so that price changes do not propagate through historical data and make your financial records inaccurate. In this case, you would also want to avoid using any kind of cascading update as well.

rauhr.myopenid.com wrote:
The way we decided to solve this problem is with 4th normal form.
...
That is not 4th normal form. That is a common mistake called One True Lookup:
http://www.dbazine.com/ofinterest/oi-articles/celko22
4th normal form is :
http://en.wikipedia.org/wiki/Fourth_normal_form

Normalization is pretty universally regarded as part of best practices in databases, and normalization says yeah, you push the data out and refer to it by key.

Since no one else has addressed your second point: When queries become long and difficult to read and write due to all those joins, a view will usually resolve that.

You can even make it a rule to always program against the views, having the view get the lookups.
This makes it possible to optimize the view and make your code resistant to changes in the tables.
In oracle, you could even convert the view into a materialized view if you ever need to.

Should I use an ENUM for primary and foreign keys?

An associate has created a schema that uses an ENUM() column for the primary key on a lookup table. The table turns a product code "FB" into it's name "Foo Bar".
This primary key is then used as a foreign key elsewhere. And at the moment, the FK is also an ENUM().
I think this is not a good idea. This means that to join these two tables, we end up with four lookups. The two tables, plus the two ENUM(). Am I correct?
I'd prefer to have the FKs be CHAR(2) to reduce the lookups. I'd also prefer that the PKs were also CHAR(2) to reduce it completely.
The benefit of the ENUM()s is to get constraints on the values. I wish there was something like: CHAR(2) ALLOW('FB', 'AB', 'CD') that we could use for both the PK and FK columns.
What is: Best PracticeYour preference
This concept is used elsewhere too. What if the ENUM()'s values are longer? ENUM('Ding, dong, dell', 'Baa baa black sheep'). Now the ENUM() is useful from a space point-of-view. Should I only care about this if there are several million rows using the values? In which case, the ENUM() saves storage space.

ENUM should be used to define a possible range of values for a given field. This also implies that you may have multiple rows which have the same value for this perticular field.
I would not recommend using an ENUM for a primary key type of foreign key type.
Using an ENUM for a primary key means that adding a new key would involve modifying the table since the ENUM has to be modified before you can insert a new key.
I am guessing that your associate is trying to limit who can insert a new row and that number of rows is limited. I think that this should be achieved through proper permission settings either at the database level or at the application and not through using an ENUM for the primary key.
IMHO, using an ENUM for the primary key type violates the KISS principle.

but when you only trapped with differently 10 or less rows that wont be a problem
e.g's
CREATE TABLE `grade`(
`grade` ENUM('A','B','C','D','E','F') PRIMARY KEY,
`description` VARCHAR(50) NOT NULL
)
This table it is more than diffecult to get a DML

We've had more discussion about it and here's what we've come up with:
Use CHAR(2) everywhere. For both the PK and FK. Then use mysql's foreign key constraints to disallow creating an FK to a row that doesn't exist in the lookup table.
That way, given the lookup table is L, and two referring tables X and Y, we can join X to Y without any looking up of ENUM()s or table L and can know with certainty that there's a row in L if (when) we need it.
I'm still interested in comments and other thoughts.

Having a lookup table and a enum means you are changing values in two places all the time. Funny... We spent to many years using enums causing issues where we need to recompile to add values. In recent years, we have moved away from enums in many situations an using the values in our lookup tables. The biggest value I like about lookup tables is that you add or change values without needing to compile. Even with millions of rows I would stick to the lookup tables and just be intelligent in your database design

Flags in a database rows, best practices

I am asking this out of a curiosity. Basically my question is when you have a database which needs a row entry to have things which act like flags, what is the best practice? A good example of this would be the badges on stack overflow, or the operating system field in bugzilla. Any subset of the flags may be set for a given entry.
Usually, I do c and c++ work, so my gut reaction is to use an unsigned integer field as a set of bits which can be flipped... But i know that isn't a good solution for several reasons. The most obvious of which is scale-ability, there will be a hard upper limit on how many flags I can have.
I can also think of a couple of other solutions which scale better but would have performance issues because they would require multiple selects to get all the information.
So, what is the "right" way to do this?

Generally speaking, I avoid bitmask fields. They're difficult to read in the future and they require a much more in-depth knowledge of the data to understanding.
The relational solution has been proposed previously. Given the example you outlined, I would create something like this (in SQL Server):
CREATE TABLE Users (
UserId INT IDENTITY(1, 1) PRIMARY KEY,
FirstName VARCHAR(50),
LastName VARCHAR(50),
EmailAddress VARCHAR(255)
);
CREATE TABLE Badges (
BadgeId INT IDENTITY(1, 1) PRIMARY KEY,
[Name] VARCHAR(50),
[Description] VARCHAR(255)
);
CREATE TABLE UserBadges (
UserId INT REFERENCES Users(UserId),
BadgeId INT REFERENCES Badges(BadgeId)
);

If you really need an unbounded selection from a closed set of flags (e.g. stackoverflow badges), then the "relational way" would be to create a table of flags and a separate table which relates those flags to your target entities. Thus, users, flags and usersToFlags.
However, if space efficiency is a serious concern and query-ability is not, an unsigned mask would work almost as well.

A Very Relational Approach
For databases without the set type, you could open a new table to represent the set of entities for which each flag is set.
E.g. for a Table "Students" you could have tables "RegisteredStudents", "SickStudents", TroublesomeStudents etc. Each table will have only one column: the student_id. This would actually be very fast if all you want to know is which students are "Registered" or "Sick", and would work the same way in every DBMS.

For many cases, it depends on a lot of things - like your database backend. If you're using MySQL, for example, the SET datatype is exactly what you want.
Basically, it's just a bitmask, with values assigned to each bit. MySQL supports up to 64-bit values (meaning 64 different toggles). If you only need 8, then it only takes a byte per row, which is pretty awesome savings.
If you honestly have more than 64 values in a single field, your field might be getting more complicated. You may want to expand then to the BLOB datatype, which is just a raw set of bits that MySQL has no inherent understanding of. Using this, you can create an arbitrary number of bit fields that MySQL is happy to treat as binary, hex, or decimal values, however you need. If you need more than 64 options, create as many fields as is appropriate for your application. The downside is that is is difficult to make the field human readable. The BIT datatype is also limited to 64.

If the flags have very different meanings and are used directly in SQL queries or VIEWS, then using multiple columns of type BOOLEAN might be a good idea.
Put each flag into an extra column, because you'll read and modify them separately anyway. If you want to group the flags, just give their column names a common prefix, i.e. instead of:
CREATE TABLE ... (
warnings INTEGER,
errors INTEGER,
...
)
you should use:
CREATE TABLE ... (
warning_foo BOOLEAN,
warning_bar BOOLEAN,
warning_...
error_foo BOOLEAN,
error_bar BOOLEAN,
error_... BOOLEAN,
...
)
Although MySQL doesn't have a BOOLEAN type, you can use the quasi standard TINYINT(1) for that purpose, and set it only to 0 or 1.

I would recommend using a BOOLEAN datatype if your database supports this.
Otherwise, the best approach is to use NUMBER(1) or equivalent, and put a check constraint on the column that limits valid values to (0,1) and perhaps NULL if you need that. If there is no built-in type, using a number is less ambiguous that using a character column. (What's the value for true? "T" or "Y" or "t")
The nice thing about this is that you can use SUM() to count the number of TRUE rows.
SELECT COUNT(1), SUM(ActiveFlag)
FROM myusers;

If there are more than just a few flags, or likely to be so in the future, I'll use a separate table of flags and a many-to-many table between them.
If there are a handful of flags and I'm never going to use them in a WHERE, I'll use a SET() or bitfield or whatever. They're easy to read and more compact, but a pain to query and sometimes even more of a headache with an ORM.
If there are only a few flags -- and only ever going to be a few flags -- then I'll just make a couple BIT/BOOLEAN/etc columns.

Came across this when I was pondering best way to store bitmask flags (similar to OP's original use of integers) in a database.
The other answers are all valid solutions, but I think its worth mentioning that you may not have to resign yourself to horrible query problems if you choose to store bitmasks directly in the database.
If you are working on an application that uses bitmasks and you really want the convenience of storing them in the database as one integer or byte column, go ahead and do that. Down the road, you can write yourself a little utility that will generate another table of flags (in whatever pattern of rows/columns you choose) from the bitmasks in your primary working table. You can then do ordinary SQL queries on that computed/derived table.
This way your application gets the convenience of only reading/writing the bitmask field/column. But you can still use SQL to really dive into your data if that becomes necessary at a later time.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

sql boolean datatype alternatives - sql

If the fields model a boolean value, I would leave them as booleans. That's why they exist. I would not attempt to future proof my database design (YAGNI principle), you can always change it at a later date.

This depends why you'd want to avoid it. You could just have number fields with 0 for false and 1 for true. Not sure if there are any benefits though.

Using 'bit' in SQL server (0 or 1) will automatically map back to a boolean in linq to SQL if you still wanted a boolean in your code. But you haven't stated what database or why you want an alternative to bool.

First, I'd like to confirm Oded. If you need boolean values you should use the responsible column types. However, if you have to store very much different boolean values it is sometimes more useful to use old-school bit-masks, stored in an INT, BIGINT or even BINARY column.

Related

SQL/DB design: multi-column foreign key with mixed NULL/NOT NULL treated as mandatory or optional?

Arrays in database tables and normalization

Decision between storing lookup table id's or pure data

Should I use an ENUM for primary and foreign keys?

Flags in a database rows, best practices

Categories

Resources