is there standard sql that works in all database

is there standard sql that works in all database - sql

As seen below the syntax is different for different Database .Isnt there a standard way that works in all Databases.
Is there any tool to convert any sql to any sql
SqlServer2005:
CREATE TABLE Table01 (
Field01 int primary key identity(1,1)
)
Sqlite:
CREATE TABLE Table01 (
Field01 integer PRIMARY KEY AUTOINCREMENT NOT NULL UNIQUE
);

The "query" part of SQL (commonly called DML - Data Manipulation Language) is reasonably standardized, and queries written on one database will often run on another database if no "vendor enhancement" features are used. The "create database bits" parts of SQL (often known as DDL - Data Definition Language) is not standardized, and every database out there does it a little differently. So, as you've discovered, statements such as CREATE TABLE written for one database will not work without some tweaking on another database.
<soapbox>
In my opinion this is a Good Thing. DDL effectively defines what can be created in the database. If all vendors use exactly the same DDL, then all products are going to be exactly the same in terms of the features they support. For example, in order to allow for table inheritance in PostgreSQL there must be some way to define how one table inherits from another, and this definition must be part of how a table is defined, either as part of the CREATE TABLE statement or in some other fashion, but it has to be a DDL statement. Thus, for purely functional reasons (because PostgreSQL supports a feature which most databases do not) DDL in PostgreSQL must be different than in other databases. A similar situation arises in MySQL because it allows the use of different ISAM engines with different capabilities. Similar situations arise in Oracle...and in SQL Server...and <your favorite database here>.
Standards are a double-edged sword. One one hand they're a great thing, because they provide a "standard" way of doing something. On the other hand they're a terrible thing because the very purpose of a standard is to provide a "standard" way of doing something, which is another way of saying "create stagnation and inhibit innovation".
</soapbox>
Share and enjoy.

There are several SQL standards out there, and SQL 1999 is probably the closest you will get, as each DB adopted a different standard (if any).

No, there's no standard SQL for this. As a shameless plug, I can recommend trying Wizardby if you're into anything related to database schema migration/continuous integration.

Related

Difference between a table (SQL) and a collection (Mongo)?

And what are the (dis)advantages of each?

1) In SQL, you have to create a table and define the data types; In mongoDB, you can't create a collection, it creates itself automatically when inserting data.
2) In SQL, you must insert values as per data types; In mongoDB, you can insert values of any types.
3) In SQL, you can't create a column at insert or update time; In mongoDB, it is possible.
4) In SQL, almost nothing is case sensitive; In mongoDB, everything is case sensitive .
EX. In SQL, the use of "use [demo]" and "use [DEMO]" will select the same database; In mongoDB, the use of "use demo" and "use Demo" will select two different databases.

Table(SQL) - RDBMS
Maintains relations between the data
Fixed or predefined schema Data is stored in rows and columns
Foreign Key relations are supported by DB.
Data will not be stored if we violate any of the column data type or foreign key or primary key.
Joins can be used effectively to query the data.
Vertically Scalable (would be limited on the hardware, say you cannot
keep on adding RAM into a server machine, The machine has its own
limit of how much RAM can be increased) Storing and Retrieving is
comparatively slower when data is huge.
MongoDB Collection - NoSQL DB
No relation is maintained between the data - Dynamic Schema
Data is stored as Document
Dynamic schema allows to save the document of any data type or any number of parameters.
Horizontally Scalable which is simply can be done by adding more servers - Storing and Retrieving is faster
No explicit foreign Key support is available whereas we can design the schema by having foreign key(but remember we need to maintain the relationship).
$lookup performs similar operation like LEFT OUTER JOIN in SQL.
Hope it Helps!!

Before you step into the examination of the differences, you should first assert for yourself what kind of data you need to store, and to what degree your data has structure. Table-base databases are perfect for well-structured information. Non-SQL databases (like MongoDB) are best for heterogeneous data (and hence they talk about documents).
So, the answer to your question is another question: How does your data look like?
I know this may not be the answer you are expecting, but it may point for you to the right path of thinking.

Portable SQL : unique primary keys

Trying to develop something which should be portable between the bigger RDBMS'es.
The issue is around generating and using auto-increment numbers as the primary key for a table.
There are two topics here
The mechanism used to generate the auto-increment numbers.
How to specify that you want to use this as the primary key on a
table.
I'm looking for verification for what I think is the current state of affairs:
Unfortunately standardization came late to this area and in some respect is still not implemented (as a mandatory standard). This means it is still in 2013 impossible to write a CREATE TABLE statement in a portable way ... if you want it with a auto-generated primary key.
Can this really be so?
Re (1). This is standardized because it came in SQL:2003. As far as I understand the way to go is SEQUENCEs. I believe these are a mandatory part of SQL:2003, right? The other possibility is the IDENTITY keyword which is also defined in SQL:2003 but that one is - as far as I can tell - an optional part of the standard ... which means a key player like Oracle doesn't implement it... and can still claim compliance. Ok, so SEQUENCEs is the designated portable method for this, right ?
Re (2). Database vendors implement this in different ways. In PostgreSQL you can link the CREATE TABLE statement directly with the sequence, in Oracle you would have to create a trigger to ensure the SEQUENCE is used with the table.
So my conclusion is that without a standardized solution to (2) it really doesn't help much that all the major players now support SEQUENCEs. I would still have to write db-specific code for something as simple as a CREATE TABLE statement.
Is this right?
Standards and their implementation aside I would also be interested if anyone has a portable solution to the problem, no matter if it is a hack from a RDBMS best practice perspective. For such a solution to work it would have to be independent from any application, i.e. it must the database that solves the issue, not the application layer. Perhaps if both the concept of TRIGGERs and SEQUENCEs can be said to be standardized then a solution that combines the two of them would be portable?

As for "portable create table statements": It starts with the data types: Whether boolean, int or long data types are part of any SQL standard or not, I really appreciate these types. PostgreSql supports these data types, Oracle does not. Ironically Oracle supports boolean in PL/SQL, but not as a data type in a table. Even the length of table/column names etc. are restricted in Oracle to 30 characters. So not even the most simple "create table" is always portable.
As for auto-generated primary keys: I am not aware of a syntax which is portable, so I do not define this in the "create table". Of couse this only delays the problem, and leaves it to the insert statements. This topic is connected with another problem: Getting the generated key after an insert using JDBC in the most efficient way. This differs substantially between Oracle and PostgreSql, and if you have ever dared to use case sensitive table/column names in Oracle, it won't be funny.
As for constraints, I prefer to add them in separate statements after "create table". The set of constraints may differ, if you implement a boolean data type in Oracle using char(1) together with a check constraint whereas PostgreSql supports this data type directly.
As for "standards": One example
SQL99 standard: for SELECT DISTINCT, ORDER BY expressions must appear in select list
This message is from PostgreSql, Oracle 11g does not complain. After 14 years, will they change it?
Generally speaking, you still have to write database specific code.
As for your conclusion: In our scenario we implemented a portable database application using a model driven approach. This logical meta data is used by the application, and there are different back ends for different database types. We do not use any ORM, just "direct SQL", because this simplifies tuning of SQL statements, and it gives full access to all SQL features. We wrote our own library, and later we found out that the key ideas match these of "Anorm".
The good news is that while there are tons of small annoyances, it works pretty well, even with complex queries. For example, window aggregate functions are quite portable (row_number(), partition by). You have to use listagg on Oracle, whereas you need string_agg on PostgreSql. Recursive commen table expressions require "with recursive" in PostgreSql, Oracle does not like it. PostgreSql supports "limit" and "offset" in queries, you need to wrap this in Oracle. It drives you crazy, if you use SQL arrays both in Oracle and PostgreSql (arrays as columns in tables). There are materialized views on Oracle, but they do not exist in PostgreSql. Surprisingly enough, it is possible to write database stored procedures not only in Java, but in Scala, and this works amazingly well in both Oracle and PostgreSql. This list is not complete. But so far we managed to find an acceptable (= fast) solution for any "portability problem".
Does it pay off? In our scenario, there is a central Oracle installation (RAC, read/write), but there are distributed PostgreSql installations as localhost databases on each application server (only readonly). This gives a big performance and scalability boost, without the cost penalty.
If you really want to have it solved in the database only, there is one possibility: Put anything in stored procedures, write these in Java/Scala, and restrict yourself in the application to call these procedures, and to read the result sets. This of course just moves the complexity from the application layer into the database, but you accepted hacks :-)
Triggers are quite standardized, if you use Java stored procedures. And if it is supported by your databases, by your management, your data center people, and your colleagues. The non-technical/social aspects are to be considered as well. I have even heard of database tuning people which do not accept the general "left outer join" syntax; they insisted on the Oracle way of using "(+)".
So even if triggers (PL/SQL) and sequences were standardized, there would be so many other things to consider.
Update
As for returning the generated primary keys I can only judge the situation from JDBC's perspective.
PostgreSql returns it, if you use Statement.getGeneratedKeys (I consider this the normal way).
Oracle requires you to specify the (primary key) column(s) whose values you want to get back explicitly when you create the prepared statement. This works, but only if you are not using case sensitive table names. In that case all you receive is a misleading ORA-00942: table or view does not exist thrown in Oracle's JDBC driver: There was/is a bug in Oracle's JDBC driver, and I have not found a way to get the value using a portable JDBC method. So at the cost of an additional proprietary "select sequence.currVal from dual" within the same transaction right after the insert, you can get back the primary key. The additional time was acceptable in our case, we compared the times to insert 100000 rows: PostgreSql is faster until the 10000th row, after that Oracle performs better.
See a stackoverflow question regarding the ways to get the primary key and
the bug report with case sensitive table names from 2008
This example shows pretty well the problems. Normally PostgreSql follows the way you expect it to work, but you may have to find a special way for Oracle.

Dynamic Database/Key - Value/Entity - Key Value Dillemma

I have been programming relational database for many years, but now have come across an unusual and tricky problem:
I am building an application that needs to have very quick and easily defined entities (by the user). Instances of these entities could then be created, updated, deleted etc.
There are two options I can think of.
Option 1 - Dynamically created tables
The first option is to write an engine to dynamically generate the tables, and insert the data into these. However, this would become very tricky, as every query would also need to be dynamic, or at least dynamically created stored procedures etc.
Option 2 - Entity - Key - Value Pattern
This is the only realistic option I can think of, where I have 5 table structure:
EntityTypes
EntityTypeID int
EntityTypeName nvarchar(50)
Entities
EntityID int
EntityTypeID int
FieldTypes
FieldTypeID int
FieldTypeName nvarchar(50)
SQLtype int
FieldValues
EntityID int
FIeldID int
Value nvarchar(MAX)
Fields
FieldID int
FieldName nvarchar(50)
FieldTypeID int
The "FieldValues" table would work a little like a datawarehouse fact table, and all my inserts/updates would work by filling a "Key/Value" table valued parameter and passing this to a SPROC (to avoid multiple inserts/updates).
All the tables would be heavily indexed, and I would end up doing many self joins to obtain the data.
I have read a lot about how bad Key/Value databases are, but for this problem it still seems to be the best.
Now my questions!
Can anyone suggest another approach or pattern other than these two options?
Would option two be feasible for medium sized datasets (1 million rows max)?
Are there further optimizations for option 2 I could use?
Any direction and advice much appreciated!

Personally I would just use a "noSQL" (key/value) database like MongoDB.
But if you need to use a relational database option 2 is the way to go. A good example of that kind of model is the Alfresco Data Dictionary (Alfresco is an enterprise content management system). It's design is similar to what you describe, although they have multiple columns for field values (for every simple type available in the database). If you add a good cache system to that (for example Ehcache) it should work fine.

As others have suggested NoSQL, I'm going to say that, in my opinion, schemaless databases really is best suited for use-cases with no schema.
From the description, and the schema you came up with, it looks like your case is not in fact "no schema", but rather it seems to be "user-defined schema".
In fact, the schema you came up with looks very similar to the internal meta-schema of a relational database. (You're sort of building a relational database on top of a relational database, which in my experience is not a good idea, as this "meta-database" will have at least twice the overhead and complexity for any basic operation - tables will get very large, which doesn't scale well, and the data will be difficult to query and update, problems will be difficult to debug, and so on.)
For use-cases like that, you probably want DDL: Data Definition Language.
You didn't say which SQL database you're using, but most SQL databases (such as MySQL, PostgreSQL and MS-SQL) support some dialect of DDL extensions to SQL syntax, which let you manipulate the actual schema.
I've done this successfully for use-cases like yours in the past. It works well for cases where the schema rarely changes, and the data volumes are relatively low for each user. (For high volumes or frequent schema updates, you might want schemaless or some other type of NoSQL database.)
You might need some tables on the side for additional field information that doesn't fit in SQL schema - you may want to duplicate some schema information there as well, as this can be difficult or inefficient to read back from actual schema.
Ensuring atomic updates to your field information tables and the schema probably requires transactions, which may not be supported by your database engine - PostgreSQL at least does support transactional schema updates.
You have to be vigilant when it comes to security - you don't want to open yourself up to users creating, storing or deleting things they're not supposed to.
If it suits your use-case, consider using not only separate tables, but separate databases, which can also by created and destroyed on demand using DDL. This could be applicable if each customer has ownership of data collections that can't, shouldn't, or don't need to be queried across customers. (Arguably, these are rare - typically, you want at least analytics or something across customers, but there are cases where each customer "owns" an isolated, hosted wiki, shop or CMS/DMS of some sort.)
(I saw in your comment that you already decided on NoSQL, so just posting this option here for completeness.)

It sounds like this might be a solution in search of a problem. Is there any chance your domain can be refactored? If not - theres still hope.
Your scalability for option 2 will depend a lot on the width of the custom objects. How many fields can be created dynamically? 1 million entities when each entity has 100 fields could be a drag... Efficient indexing could make performance bearable.
For another option - you could have one data table that has a few string fields, a few double fields, and a few integer fields. For example, a table with String1, String2, String3, Int1, Int2, Int3. A second table with have rows that define a user object and map your "CustomObjectName" => String1, and such. A stored procedure reading INFORMATION_SCHEMA and some dynamic sql would be able to read the schema table and return a strongly typed recordset...
Yet another option (for recent versions of SQL Server) would be to store a row with an id, a type name, and an XML field that contains a XML document that contains the object data. In MS Sql Server this can be queried against directly, and maybe even validated against a schema.

PErsonally I would take the time to define as many attritbutes as you can ratheer than use EAV for everything. Surely you know some of the attributes. Then you only need EAv for the things that are truly client specific.
But if all must be EAV, then a nosql databse is the way to go. Or you can use a relationsla datbase for some stuff and a nosql database for the rest.

Most useful SQL meta-queries

I have a passion for meta-queries, by which I mean queries that answer questions about data rather than answering with data.
Before I get a lot of justified criticism, I do realize that the approach of meta-queries is not ideal, as eloquently described here for example. Nevertheless, I believe they do have their place. (So much so that I created a WinForms user control that supports parameterized meta-queries for SQL Server, Oracle, and MySql, and I describe extensively the design and use of this QueryPicker in a three-part series published on Simple-Talk.com.)
My motivation for using meta-queries:
When I sit down with a new database and want to understand it, I probe with meta-queries. Most common are those that let me answer questions about fields and tables, such as "What other tables have this 'xyz' field?" or "What tables have identity columns?" or "What are the keys for this table?"
I regularly work with multiple database types (SQL Server, Oracle, MySql) and--practicing the great programming ideal of laziness--I do not want to have to look up or remember an arcane SQL recipe every time I need it. I want to point and click.
Sure there are other (better?) ways to get meta-information--for a given database type. SQL Server, particularly, provides SQL Server Management Studio. Oracle and MySql tools do not seem to provide the same usefulness. (I freely admit that I make this claim with my SQL-Server-leaning-view of the universe. :-) Even if they did, they would be be different--I want a uniform approach across database types.
So, finally, the question:
What SQL Server, Oracle, or MySql meta-queries do you find useful?
Summary Matrix
This first view summarizes my collection thus far by database type (and, as I said, heavily weighted toward SQL Server).
Query SQL Server Oracle MySql
DB Version yes yes yes
Databases with properties yes yes
Databases with space usage yes
National Language Support yes
Procedures and functions yes yes
Primary keys yes yes
Primary to foreign keys yes
Session Information/brief yes
Session Information/details yes
Session SET options yes
Users and Roles yes
Currently running statements yes
Constraints yes
Indexes yes
Column info/brief yes yes yes
Column info/details yes yes yes
Object level details yes
Rows and space used yes
Row/column counts yes
Non-empty tables yes yes yes
Show table schema yes yes
Seed/max values yes
References By Database Type
I have developed some of these meta-queries myself but many have come from community forums. This second view itemizes the source URLs where appropriate.
SQL Server
System Category
-----------------
DB Version
Databases with properties http://www.mssqltips.com/tip.asp?tip=1033
Databases with space usage http://www.sqlservercentral.com/Forums/Topic261080-5-1.aspx
Procedures and functions
Primary keys http://databases.aspfaq.com/schema-tutorials/schema-how-do-i-show-all-the-primary-keys-in-a-database.html
Primary to foreign keys http://www.sqlservercentral.com/scripts/Miscellaneous/61481/
Session Information/brief http://www.sqlservercentral.com/blogs/glennberry/archive/2009/12/28/how-to-get-a-count-of-sql-connections-by-ip-address.aspx
Session Information/details http://www.mssqltips.com/tip.asp?tip=1817
Session SET options
Users and Roles http://www.sqlservercentral.com/scripts/users/69379/
Currently running statements http://www.sqlservercentral.com/articles/DMV/64425/
Constraints
Indexes http://www.sqlservercentral.com/scripts/Index+Management/63932/
Column Category
-----------------
Column info/brief
Column info/details
Table Category
-----------------
Object level details
Rows and space used http://www.mssqltips.com/tip.asp?tip=1177
Row/column counts
Non-empty tables
DDL Category
-----------------
Show table schema http://www.sqlservercentral.com/scripts/Create+DDL+sql+statements/65863/
Data Category
-----------------
Seed/max values
Oracle
System Category
-----------------
DB Version
National Language Support
Column Category
-----------------
Column info/brief
Column info/details
Table Category
-----------------
Non-empty tables
DDL Category
-----------------
Show table schema
MySql
System Category
-----------------
DB Version
Databases
Procedures and functions
Primary keys http://databases.aspfaq.com/schema-tutorials/schema-how-do-i-show-all-the-primary-keys-in-a-database.html
Column Category
-----------------
Column info/brief
Column info/details
DDL Category
-----------------
Show table schema

Oracle SQL Developer has a set of built-in reports that include these categories. I have expanded one of the categories.
About Your Database
All Objects
Application Express
ASH and AWR
Database Administration
All Tables
Cursors
Database Parameters
Locks
Memory
Sessions
Storage
Top SQL
Users
Waits and Events
Data Dictionary
Jobs
PLSQL
Security
Streams
Table
XML
These are a few of the actual report names,
Tables without Indexes
Tables without Primary Keys
Tables with Unindexed Foreign Keys
Largest Average Row Length
Most Rows
Unusable Indexes
There are many more reports available.

I have a number of these I use regularly on SQL Server, including, but not limited to:
Tables without primary keys
Tables without a clustered index
Tables without any indexes
Scalar User-Defined Functions which are not deterministic
Database object which do not have extended property 'MS_Description' (the default 'Description' property, which is useful for generating documentation)
Schemas which are empty
SQL modules (views, procs, functions, triggers) without standard documentation/comment blocks
System-specific:
Configuration tables which contain references to missing stored procedures or views
Views based on tables/views which cannot be schemabound or verified (because they are based on a view/table in another database)
Columns in views which are unused in the system
Certain kinds of NULLable columns which do not have defaults
Numeric columns which are NULLable

On Oracle the most useful is the one on v$session about the waits on the running sessions, it means, what are the session doing in that moment (reading from disc, waiting for lock, ...)

Oracle has a large range of metadata views, probably the one I query the most would be DBA_OBJECTS, which can be queried for all kinds of different object types. The same info and more can be obtained from other views (e.g. more info about the tables may be found in DBA_TABLES).
A good overview of the Oracle data dictionary may be found here.

The concern with using a set of canned scripts off the internet is that "it's not what you know, it's what you know that isn't so, or isn't so anymore." One needs to make sure when lifting scripts it's version appropriate. For example, Oracle as of 10.1 or 10.2 enables setting a column as UNUSED. It still shows up in DBA_TAB_COLUMNS, but it's not really there anymore.
It's better to understand what's in the Data Dictionary -- specifically in Oracle, the contents of the Database Reference (V$, DBA_*) and the PL/SQL Packages and Types reference, as more and more functionality moves in that direction (e.g. DBMS_STATS package superceding ANALYZE statement)
Some of the more esoteric but useful ones in Oracle:
DICT -- a name and a brief description of every table\view in the data dictionary.
DBA_TAB_MODIFICATIONS -- which tables have had how much insert/update/delete traffic since last analyzed.
V$OBJECT_USAGE -- when used with ALTER INDEX ... MONITORING USAGE, shows which indexes have not been used in SQL statements since monitoring was enabled. (Indexes used to support foreign key or unique constraints may not appear, but may have been "used" nonetheless.)
V$SESSION_LONGOPS -- what SQL statements are running "long running" operations, like full scans, sorts, and merges, and how long does Oracle think it'll be before it finishes.
DBA_HISTOGRAMS -- What skew has existed in your data
DBA_OBJECTS -- it's got everything
DBA_SOURCE (by line)/ DBA_TRIGGERS (by block)-- all the executable code in the system.

msorens,
I totally agree with you. Understanding the data and the schema helps you code better, avoid bugs better, identify special cases and analyse requirements.
You might like to check out my other post at:
Compare two schemas and update the old schema with the new columns of new schema
My schema comparison script is a mine of information about the Oracle catalog views. Unfortunately it is not complete, but that's a task for another day. ;-)
Matthew

Why use "Y"/"N" instead of a bit field in Microsoft SQL Server?

I'm working on an application developed by another mob and am confounded by the use of a char field instead of bit for all the boolean columns in the database. It uses "Y" for true and "N" for false (these have to be uppercase). The type name itself is then aliased with some obscure name like ybln.
This is very annoying to work with for a lot of reasons, not the least of which is that it just looks downright aesthetically unpleasing.
But maybe its me that's stupid - why would anyone do this? Is it a database compatibility issue or some design pattern that I am not aware of?
Can anyone enlighten me?

I've seen this practice in older database schemas quite often. One advantage I've seen is that using CHAR(1) fields provides support for more than Y/N options, like "Yes", "No", "Maybe".
Other posters have mentioned that Oracle might have been used. The schema I referred to was in-fact deployed on Oracle and SQL Server. It limited the usage of data types to a common subset available on both platforms.
They did diverge in a few places between Oracle and SQL Server but for the most part they used a common schema between the databases to minimize the development work needed to support both DBs.

Welcome to brownfield. You've inherited an app designed by old-schoolers. It's not a design pattern (at least not a design pattern with something good going for it), it's a vestige of coders who cut their teeth on databases with limited data types. Short of refactoring the DB and lots of code, grit your teeth and gut your way through it (and watch your case)!

Other platforms (e.g. Oracle) do not have a bit SQL type. In which case, it's a choice between NUMBER(1) and a single character field. Maybe they started on a different platform or wanted cross platform compatibility.

I don't like the Y/N char(1) field as a replacement to a bit column too, but there is one major down-side to a bit field in a table: You can't create an index for a bit column or include it in a compound index (at least not in SQL Server 2000).
Sure, you could discuss if you'll ever need such an index. See this request on a SQL Server forum.

They may have started development back with Microsoft SQl 6.5
Back then, adding a bit field to an existing table with data in place was a royal pain in the rear. Bit fields couldn't be null, so the only way to add one to an existing table was to create a temp table with all the existing fields of the target table plus the bit field, and then copy the data over, populating the bit field with a default value. Then you had to delete the original table and rename the temp table to the original name. Throw in some foriegn key relationships and you've got a long script to write.
Having said that, there were always 3rd party tools to help with the process. If the previous developer chose to use char fields in lieu of bit fields, the reason, in a nutshell, was probably laziness.

The reasons are as follows (btw, they are not good reasons):
1) Y/N can quickly become "X" (for unknown), "L" (for likely), etc. - What I mean by this is that I have personally worked with programmers who were so used to not collecting requirements correctly that they just started with Y/N as sort of 'flags' with the superstition that it might need to expand (to which they should use an int as a status ID).
2) "Performance" - but as was mentioned above, SQL indexes are ruled out if they are not 'selective' enough... a field that only has 2 possible values will never use that index.
3) Lazyness. - Sometimes developers want to output directly to some visual display with the letter "Y" or "N" for human readableness, and they don't want to convert it themselves :)
There are all 3 bad reasons that I've heard/seen before.

I can't imagine any disadvantage in not being able to index a "BIT" column, as it would be unlikely to have enough different values to help the execution of a query at all.
I also imagine that in most cases the storage difference between BIT and CHAR(1) is negligible (is that CHAR a NCHAR? does it store a 16bit, 24bit or 32bit unicode char? Do we really care?)

This is terribly common in mainframe files, COBOL, etc.
If you only have one such column in a table, it's not that terrible in practice (no real bit-wasting); after all SQL Server will not let you say the natural WHERE BooleanColumn, you have to say WHERE BitColumn = 1 and IF #BitFlag = 1 instead of the far more natural IF #BooleanFlag. When you have multiple bit columns, SQL Server will pack them. The case of the Y/N should only be an issue if case-sensitive collation is used, and to stop invalid data, there is always the option of a constraint.
Having said all that, my personal preference is for bits and only allowing NULLs after careful consideration.
Apparently, bit columns aren't a good idea in MySQL.

They probably were used to using Oracle and didn't properly read up on the available datatypes for SQL Server. I'm in exactly that situation myself (and the Y/N field is driving me nuts).

I've seen worse ...
One O/R mapper I had occasion to work with used 'true' and 'false' as they could be cleanly cast into Java booleans.
Also, On a reporting database such as a data warehouse, the DB is the user interface (metadata based reporting tools notwithstanding). You might want to do this sort of thing as an aid to people developing reports. Also, an index with two values will still get used by index intersection operations on a star schema.

Sometimes such quirks are more associated with the application than the database. For example, handling booleans between PHP and MySQL is a bit hit-and-miss and makes for non-intuitive code. Using CHAR(1) fields and 'Y' and 'N' makes for much more maintainable code.

I don't have any strong feelings either way. I can't see any great benefit to doing it one way over another. I know philosophically the bit fields are better for storage. My reality is that I have very few databases that contain a lot of logical fields in a single record. If I had a lot then I would definitely want bit fields. If you only have a few I don't think it matters. I currently work with Oracle and SQL server DB's and I started with Cullinet's IDMS database (1980) where we packed all kinds of data into records and worried about bits and bytes. While I do still worry about the size of data, I long ago stopped worrying about a few bits.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas