Tool for validating SQL Server database schema - sql

Are there any tools available for validating a database schema against a set of design rules, naming conventions, etc.
I'm not talking about comparing one database to another (as covered by this question).
I want to be able to say "What in this database doesn't meet this set of rules".
Some examples of the type of rules I'm talking about would be like:
- Primary key fields should be the first in the table.
- Foreign keys should have an index on that field.
- Field names ending 'xxx' should be of a certain type.
- Fields with a constraint limiting it it certain values it should have a default.
I've written a bunch of scripts to do this in the past and was wondering if there was something generic available.
Ideally I'd like something for SQL Server, but if you're aware of something for other databases it may be useful to know about them too.

One way to accomplish this would be to script out an entire database and then apply rules consisting of regular expressions to the script. SSW's commercial tool does something similiar for SQL Server.

A tool called SQLCop is doing what your asking for, but I don't believe it actually allows you to write rules yourself.
http://sqlcop.lessthandot.com/detectedissues.php

Related

Automatic verification of database contents

Background
I have a software component that writes data to a postgres database (into several tables) and I want to write an automatic functional test for this component. I already have a host of unit tests in place that check the subcomponents, but I'd like a test that checks the whole system end-to-end.
For each test run, I use a clean database (actually a completely new, this-test-run-only database). The software component is stable in the sense that given the same input, it will always write the same user data to the database.
The database design is relational, such that most tables contain foreign keys. Obviously, I don't want to check the value of these keys, because I don't want to rely on the fact that these keys are generated in a predictive manner by postgres.
Assume that there are no issues regarding user rights on the database, connection issues etc. Also disregard development/production disparities.
I currently use a number of select statements to produce a textual "dump" of the database and compare it to a reference dump (ignoring whitespace and so on), but this seems rather clumsy. Also, this doesn't take into account the relationships between the tables. Extending the current approach to deal with this doesn't strike me as maintainable at all, should the database layout ever change.
My software as well as the testing framework is written in C++, the testing scripts are simple bash scripts. I'm open to use any language to achieve this.
Question
How can I automatically verify the database contents in "the database way"?
Even better would be an approach that doesn't rely on postgres as the backend.
pgTap is a testing framework for PostgreSQL. You can use it to test both the structure and the content of a PostgreSQL database. I've used it on projects that had to meet certain contractual standards for seeded data (data for "lookup" tables like state codes and abbreviations, delivery carriers, user roles, etc.). It has worked well for that purpose.
But I don't yet see a compelling reason to abandon your current method, which is already written and working. Text dumps of single tables are supported by all current SQL dbms, as far as I know. If you move to a different dbms, you'll have to change the name of the dump program and the arguments to it. I can't imagine why you'd need to change the reference file, but I suppose that could happen.
The "database way" is really just to select the data you expect to be in the database, and see if it's really there. That's pretty much what you're doing now, and what pgTap does with perhaps greater flexibility.
To increase maintainability (to reduce duplication), you could generate the INSERT statements from the reference data, or you could generate the reference data from the INSERT statements. I can imagine development environments where that would be a wise thing to do, but I don't know whether yours is one of them.

SQL database checklist

I need to create a Sql Database Checklist,
I have some basics points like
Each table must have a primary key
Normalize data to third normal form
Check for Integrity column the column value should be incremented properly.
But can anyone help me to enhance this list ?
Objects conform to a single naming convention
Create foreign key relationships
Apply appropriate index(es)
Use of schema or other mechanisms for controlling read/write access, etc
Consideration given to how long data should be kept before deletion or archive
Version control over scripts for updating the database structure
Mechanism for applications to determine version of database
Backup and recovery plans in place
First, It would help if this is supposed to be a recuring check list, or a checklist for each new instance. Also, is there a specific implementation in mind like SQL Server? MySQL? (this is where the real check list begins). For example, you want to keep an eye on the Transactions Log if its SQL Server...
If this is a relational DB, ER diamgrams go a long way in making sure that you have your problem domain identified and analyzed. You are right track using third normal form where practical. I want to emphasize practical because you also want to try and anticipate and identify which data will be used more than others. If data is highly accessed, you may want consider indexing more than just the primary and/or denormalizing to 2nd normal form. (uses more space, but better performance). Remember that accessing data and updating data are inversely related when indexing is concerned. Hope this helps.

SQL Compatibility Chart (esp data types)

So...happens I'm working on some code which...will end up being used on different sql servers at the same time.
Although the SQL code is different depending on the server, the data types and columns are not.
Therefor, I need to know which are the data types common to (at least most) sql server types.
As a starting point, I have the following types:
byte, char, float, int, text, varchar, blob
Please note that spelling is quite important, since the data type name will end in the query as is (eg: although both int and integer are supported, I need the common one).
So, the question is, does anyone know of a chart comparing compatibility between sql servers? Or perhaps someone which did some research in the field?
As far as bias goes, I'm obviously biased to a particular RDBMS, so no need for answers on which RDBMS happens to be better. Let's keep this focused and on topic, ok?
I think you will end up writing specific, casy by case SQL statements for each type of database server. Certainly I did.
I've been in your situation, including having the intention to write database agnostic code, but in the long run it just does not work. One database will not, for example, handle multi-byte strings while another will demand them (ie, SQL Server CE), this will force you to use either Varchar vs NVarchar on columns, for example. Some databses will support multi byte strings, but with awful performance. One will use VARCHAR2 (Oracle), and everyone else will use VARCHAR. One will handle BLOBs one way while another will do so differently. Don't get me started on date data types, either.
Rather than find the magic subset of the SQL language and data types that works in all databases, you would be wiser to look for a data access method/library that can hide the differences for you (maybe some ORM library that lets you create DB objects as well as access them?)
Like I said, I have been (and still am) in your situation of having to support multiple databases and the best solution for me is to write optimal code for each database, rather that trying to find SQL data types and code that works in all of them (I wasn't able to, not to a satisfactory level).
Also, you will be able to squeeze more performance out of each DB if you create separate SQL text for each database (ie, the performance-related parameters you can specify while creating an Oracle table that do not apply at all when creating a table in any other database).
I say, do not fight the syntax differences in the different databases, you will not win. It's a better idea to put up with and use those differences to your advantage as much as possible.
I'd look into the SQL ANSI standard specification and use the data types specified there. A book like this may help you.
They all have good documentation, so I would just read up on their data types. Would probably have all the info you need. The only other information I could find before is pretty old.
Hope that helps.
Edit: Just another thought... you could use the strategy pattern for your SQL, that way it wouldn't matter if it was different, you could use the more advanced features. Though this way you'd have more work to do and more to maintain :/

Which Database can i Safely use a GUID as Primary Key besides SQL Server?

The reason I want to use a Guid is because in the event that I have to split the database into I won't have primary keys that overlap on both databases. So if I use a Guid there won't be any overlapping. I also want to use the GUID in the url also, so the Guid will need to be Indexed.
I will be using ASP.NET C# as my web server.
Postgres has a UUID type. MySQL has a UUID function. Oracle has a SYS_GUID function.
As others have said you can use GUIDs/UUIDs in pretty much any modern DB. The algorithm for generating a GUID is pretty straitforward and you can be reasonably sure that you won't get dupes however there are some considerations.
+) Although GUIDs are generally representations of 128 Bit values the actual format used differs from implementation to implemenation - you may want to consider normalizing them by removing non-significant characters (usually dashes or spaces).
+) To absolutely ensure uniqueness you can also append a value to the guid. For example if you're worried about MS and Oracle guids colliding add "MS" to the former and "Or" to the latter - now even if the guids themselves do collide they keys won't.
As others have mentioned however there is a potentially severe price to pay here: your keys will be large (128 bits) and won't index very well (although this is somewhat dependent on the implementation).
The techique works very well for small databases (especially those where the entire dataset can fit in memory) but as DBs grow you'll definately have to accept a performance trade-off.
One thing you might consider is a hybrid approach. Without more information it's hard to really know what you're trying to do so these might not help:
1) Remember that primary keys don't have to be a single column - you can have a simple numeric key to identify your rows and another row, containing a single value, that identifies the database that hosts the data or created the key. Creating the primary key as aggregate of both columns allows indexing to index fewer complex values and should be significantly faster.
2) You can "fake it" by constructing the key as a concatenated field (as in the above idea to append a DB identifier to the key). So your key would be a simple number followed by some DB identifier (perhaps a guid for each DB).
Indexing such a value (since the values would still be sequential) should be much faster.
In both cases you'll have some manual work to do if you ever do split the DB(s) - you'll have to update some keys with a new DB ID, but this would be a one-time,infrequent event. In exchange you can tune your DB much better.
There are definately other ways to ensure data integrity across mutiple databases. Many enterprise DBMSs have tools built-in for clustering data across multiple servers or databases, some have special tools or design patterns that make it easier, etc.
In short I would say that guids are nice and simple and do what you want, but that you should only consider them if either a) the dataset is small or b) the DBMS has specific features to optimize their use as keys (for example sequential guids). If the datasets are going to be very large or if you're trying to limit DBMS-specific dependencies I would play around more with optimizing a "key + identifier" strategy.
Most any RDBMS you will use can take any number and type of columns as a PK. So, if you're storing the GUID as a CHAR(n) for some length n, you should be fine. Now, I'm not sure if this is advisable, as I'm guessing indexing on CHARs is not as efficient as on integers.
Hope that helps.
I suppose you could store a GUID as an int128 as well.
Both mySQL and postgres are known to support GUID datatypes (I believe it's called UUID but it's the same thing).
Unless I have completely lost my memory, a properly designed 3rd+ normal form database schema does not rely on unique ints, or by extension GUIDs or UUIDs for primary keys. Nor does it use intermediate lookup tables of ints/GUIDS/UUIDS to relate the tables containing the data.
You should grind your schema until it expresses the relations amongst tables of data in terms of the data in the tables, not auto-generated identifiers that have no intrinsic relationship to the data.
I freely grant that you may just possibly be doing something that really really requires GUIDs (or auto-increment integers) for primary keys. But I seriously doubt that is the case - it almost never is.
You can implement your own membership provider based on whatever database schema you choose to design. It's nowhere near as tricky as it may look at first.
google "roll your own membership provider" for plenty of pointers.
In my theoretical little world, you'd be able to do this with SQLite. You'd generate the Guid from .Net and write it to the SQLite database as a string. You could also index that field.
You do loose some of the index benefits because it'd be stored as a string but it should be fully backwards compatible so that you could import/export to/from SQL Server.
From looking through the comments it looks like you are trying to use a different database to MS SQL with the ASP.net membership provider - as others have mentioned you could roll your own provider to use a different DB however a quick Google search turned up a few ready made options:
MySQL Provider
MySQL Provider 2
SqlLite Provider
Hope these help
If you are using other MS technologies already you should consider Sql Server Express.
http://www.microsoft.com/express/sql/default.aspx
It is a real implementation of MS Sql Server and it is free. It does have significant limitations as you might imagine, but if your product can fit inside those you get the support, developer community and stability of Sql Server and a clear upgrade path if you need to grow.

Naming database table fields

Exists any naming guideline for naming columns at Sql Server? I searched at MSDN, but dont found anything, just for .Net
There are lots of different conventions out there (and I'm sure other answers may make some specific suggestions) but I think that the most important thing is that you be consistent. If you are going to use a prefix for something, use it everywhere. If you are going to have a foreign key to another table, use the same column name everywhere. If you are going to separate words with underscores, do that everywhere.
In other words, if someone looks at a few tables, they should be able to extrapolate out and guess the names of other tables and columns. It will require less mental processing to remember what things are called.
There are many resources out there, but nothing that I have been able to truly pin down as a SQL Server specific set or anything published by Microsoft.
However, I really like this list.
Also, very important to NOT start out stored procedures with sp_
To be 100% honest though, the first part of my posted link is the most important. It must make sense for your organization, application, and implementation.
As always, google is your friend...
I find the following short list helpful:
Name tables as pluralnouns (or singular, but as a previous response stated, be consistent) for example "Customers", "Orders", "LineItems"
Stored procedures should be named without any prefixes such as "sp_" since SQL Server uses the "sp_" prefix to denote special meaning for system procedures.
Name columns as though as you would name attributes on a class (without using underscores)
Try not to use space characters in naming columns or database entities since you would have to escape all names with "[...]"
Many-to-many tables: for example "CustomerOrders"