Related
I'm working with a client who wants to add timestamps to a bunch of tables so that they may sort the records in those tables chronologically. All of the tables also have an auto incrementing integer field as their primary key (id).
The (simple) idea - save the overhead/storage and rely on the primary key to sort fields chronologically. Sure this works, but I'm uncertain whether or not this approach is acceptable in sound database design.
Pros: less storage required per record, simpler VO classes, etc. etc.
Con: it implies a characteristic of that field, an otherwise simple identifer, whose definition does not in any way define or guarantee that it should/will function as such.
Assume for the sake of my question that the DB table definitions are set in stone. Still - is this acceptable in terms of best practices?
Thanks
You asked for "best practices", rather than "not terrible practices" so: no, you should not rely on an autoincremented primary key to establish chronology. One day you're going to introduce a change to the db design and that will break. I've seen it happen.
A datetime column whose default value is GETDATE() has very little overhead (about as much as an integer) and (better still) tells you not just sequence but actual date and time, which often turns out to be priceless. Even maintaining an index on the column is relatively cheap.
These days, I always put a CreateDate column data objects connected to real world events (such as account creation).
Edited to add:
If exact chronology is crucial to your application, you can't rely on either auto-increment or timestamps (since there can always be identical timestamps, no matter how high the resolution). You'll probably have to make something application-specific instead.
Further to egrunin's answer, a change to the persistence or processing logic of these rows may cause rows to be inserted into the database in a non-sequential or nondeterministic manner. You may implement a parallelized file processor that throws a row into the DB as soon as the thread finishes transforming it, which may be before another thread has finished processing a row that occurred earlier in the file. Using an ORM for record persistence may result in a similar behavior; the ORM may just maintain a "bag" (unordered collection) of object graphs awaiting persistence, and grab them at random to persist them to the DB when it's told to "flush" its object buffer.
In either case, trusting the autoincrement column to tell you the order in which records came into the SYSTEM is bad juju. It may or may not be able to tell you the order in which records his the DATABASE; that depends on the DB implementation.
You can acheive the same goal in the short term by sorting on the ID column. This would be better that adding additional data to acheive the same result. I don't think that it would be confusing for anyone to look at the data table and know that it's chronological when they see that it's an identity column.
There are a few drawbacks or limitations that I see however.
The chronological sort can be messed up if someone re-seeds the column
Chronology for a date period cannot be ascertained without the additional data
This setup prevents you from sorting chronologically if the system ever accepts new, non-chronological data
Based on the realistic evaluation of these "limitations" you should be able to advise a proper approach.
Auto-incrementing ID will give you an idea of order as Brad points out, but do it right - if you want to know WHEN something was added, have a datetime column. Then you can not only chronologically sort but also apply filters.
Don't do it. You should never rely on the actual value of your ID column. Treat it like a black box, only useful for doing key lookups.
You say "less storage required per record," but how important is that? How big are the rows we're talking about? If you've got 200-byte rows, another 4 bytes probably isn't going to matter much.
Don't optimize without measuring. Get it working right first, and THEN optimize.
#MadBreaker
There's to separate things, if you need to know the order you create a column order with autoincrement, however if you want to know the date and time it was inserted you use datetime2.
Chronological order can be garanteed if you don't allow updates or deletes, but if you want time control over select you should use datetime2.
You didnt mention if you are running on a single db or clustered. If you are clustered, be wary of increment implementations, as you are not always guaranteed things will come out in the order you would naturally think. For example, Oracle sequences can cache groups of next values (depending on your setup) and give you a 1,3,2,4,5 sort of list...
I have a query
UPDATE dbo.M_Room
SET
//do something
WHERE PK_RoomId= #RoomId AND IsActive=1 AND FK_DepartmentId =#DepartmentId
Now suppose PK_RoomId is my Pk of M_Room and is autoincremented field.
So according to this I could have used
WHERE PK_RoomId= #RoomId
rather than
WHERE PK_RoomId= #RoomId AND IsActive=1 AND FK_DepartmentId =#DepartmentId
What all threats I could overcome if I use the second condition rather than 1st one.
If we don't have any relationship/Constraints(PK, FK etc.) physically exists and cant implement due to unmanaged structure of database.
What will be your recommendation in such scenario.
What all things should be done to keep data consistent.
I don't think it's a good idea to change the WHERE to just WHERE PK_RoomId= #RoomId. The first part (the part you want to keep) is for identifying the record. The second part (AND IsActive=1) is used to maybe restrict the update based on whether the room is active or not. About the last part (AND FK_DepartmentId =#DepartmentId), that could mean that sometimes you only want to update the room if it belongs to the department you specified. This could also be useful.
Why exactly would you want to change the query?
If you are using READ-UNCOMMITTED transactions or no transactions at all, or the data has been sitting round on someone's screen for a long time, the additional conditions could save you from a buried update, presuming that your // do something does something to the IsActive column.
It could also be a final guard against just getting it wrong (seeing if the room isn't active and then forgetting to make use of the fact).
Make sure to check the number of rows updated in either case.
Your second-last paragraph suggests the room_id may not be unique when it is supposed to be; you will always have trouble if that's the case.
Myself, I'd be inclined to check explicitly for buried updates if I suspected they may occur, and I'd think that form of defending against programming errors to be unusual.
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
What is the point (if any) in having a table in a database with only one row?
Note: I'm not talking about the possibility of having only one row in a table, but when a developer deliberately makes a table that is intended to always have exactly one row.
Edit:
The sales tax example is a good one.
I've just observed in some code I'm reviewing three different tables that contain three different kinds of certificates (a la SSL), each having exactly one row. I don't understand why this isn't made into one large table; I assume I'm missing something.
I've seen something like this when a developer was asked to create a configuration table to store name-value pairs of data that needs to persist without being changed often. He ended up creating a one-row table with a column for each configuration variable. I wouldn't say it's a good idea, but I can certainly see why the developer did it given his instructions. Needless to say it didn't pass review.
I've just observed in some code I'm reviewing three different tables that contain three different kinds of certificates (a la SSL), each having exactly one row. I don't understand why this isn't made into one row; I assume I'm missing something.
This doesn't sound like good design, unless there are some important details you don't know about. If there are three pieces of information that have the same constraints, the same use and the same structure, they should be stored in the same table, 99% of the time. That's a big part of what tables are for fundamentally.
For some things you only need one row - typically system configuration data. For example, "current sales tax rate". This might change in the future and so shouldn't be hardcoded, but you'll typically only ever need one at any given time. This kind of data needs to be in the database so that queries can use it in computations.
It's not necessarily a bad idea.
What if you had some global state (say, a boolean) that you wanted to store somewhere? And you wanted your stored procedures to easily access this state?
You could create a table with a primary key whose value range was limited to exactly one value.
Single row is like a singleton class. purpose: to control or manage some other process.
Single row table could act as a critical section or as deterministic automaton (kind of dispatcher based on row values)
Single row is use full in a table COMPANY_DESCRIPTION, to obtain consistent data about that company. Use full on company letters and addressing.
Single row is use full to contain an actual value like VAT or Date or Time, and so on.
It can be useful sometime to emulate some features the Database system doesn't provide. I'm thinking of sequences in MySQL for instance.
If your database is your application, then it probably makes sense for storing configuration data that might be required by stored procedures implementing business logic.
If you have an application that could use the file system to store information, then I don't think there is an advantage to using the database over an XML or flat file, except maybe that most developers are now far more well versed in using SQL to store and retrieve data than accessing the file system.
What is the point (if any) in having a table in a database with only one row?
A relational database stores things as relations: a tuples of data satisfying some relation.
Like, this one: "a VAT of this many percent is in effect in my country now".
If only one tuple satisifies this relation, then yes, it will be the only one in the table.
SQL cannot store variables: it can store a set consisting of 1 element, this is a one-row table.
Also, SQL is a set based language, and for some operations you need a fake set of only one row, like, to select a constant expression.
You cannot just SELECT out of nothing in Oracle, you need a FROM clause.
Oracle has a pseudotable, dual, which contains only one row and only one column.
Once, long time ago, it used to have two rows (hence the name dual), but lost its second row somewhere on its way to version 7.
MySQL has this pseudotable too, but MySQL is able to do selects without FROM clause. Still, it's useful when you need an empty rowset: SELECT 1 FROM dual WHERE NULL
I've just observed in some code I'm reviewing three different tables that contain three different kinds of certificates (a la SSL), each having exactly one row. I don't understand why this isn't made into one large table; I assume I'm missing something.
It may be a kind of "have it all or lose" scenario, when all three certificates are needed at once:
SELECT *
FROM ssl1
CROSS JOIN
ssl2
CROSS JOIN
ssl3
If any if the certificates is missing, the whole query returns nothing.
A table with a single row can be used to store application level settings that are shared across all database users. 'Maximum Allowed Users' for example.
Funny... I asked myself the same question. If you just want to store some simple value and your ONLY method of storage is an SQL server, that's pretty much what you have to do. If I have to do this, I usually end up creating a table with several columns and one row. I've seen a couple commercial products do this as well.
We have used a single-row table in the past (not often). In our case, this table was used to store system-wide configuration values that were updatable via a web interface. We could have gone the route of a simple name/value table, but the end client preferred a single row. I personally would have preferred the latter, but it really is up to preference, especially if this table will never have any sort of relationship with another table.
I really cannot figure out why this would be the best solution. It seams more efficient to just have some kind of config file that will contain the data that would be in the tables one row. The cost of connecting to the database and querying the one row would be more costly. However if this is going to be some kind of config for the database logic. Then this would make a little bit more sense depending on the type of database you are using.
I use the totally awesome rails-settings plugin for this http://github.com/Squeegy/rails-settings/tree/master
It's really easy to set up and provides a nice syntax:
Settings.admin_password = 'supersecret'
Settings.date_format = '%m %d, %Y'
Settings.cocktails = ['Martini', 'Screwdriver', 'White Russian']
Settings.foo = 123
Want a list of all the settings?
Settings.all # returns {'admin_password' => 'super_secret', 'date_format' => '%m %d, %Y'}
Set defaults for certain settings of your app. This will cause the defined settings to return with the Specified value even if they are not in the database. Make a new file in config/initializers/settings.rb with the following:
Settings.defaults[:some_setting] = 'footastic'
A use for this might be to store the current version of the database.
If one were storing database versions for schema changes it would need to reside within the database itself.
I currently analyse the schema and update accordingly but am thinking of moving to versioning. Unless someone has a better idea.
I use vb.net and sql express
Unless there are insert constraints on the table a timestamp for versioning then this sounds like a bad idea.
There was a table set up like this in a project I inherited. It was for configuration data, and the reason that was given was that it made for very simple queries:
SELECT WidgetSize FROM ConfigTable
SELECT FooLength FROM ConfigTable
Okay fine. We converted to a generalized configuration table:
ID Name IntValue StringValue TextValue
This has served our purposes well.
CREATE TABLE VERSION (VERSION_STRING VARCHAR2(20 BYTE))
?
I used a single datum in a SQLite database as a counter in a dynamic web page. That's the simplest way I can think of to make it thread-safe (or process-safe to be precise). But I am not sure whether it's a good idea.
I think the best way to deal with these scenarios is to, rather than using a database at all, use the configuration file (which is usually XML) or make your own configuration file that is read during start up of the application. It only takes a few minutes to write the code to read the file in.
The advantage here is that the there is no chance accidentally adding additional values for the same XML variable, and its great for testing because you don't need to write a lot of code to test the different inputs, just a simple change to the text value and re-run the application.
What is the best, DBMS-independent way of generating an ID number that will be used immediately in an INSERT statement, keeping the IDs roughly in sequence?
DBMS independent? That's a problem. The two most common methods are auto incrementing columns, and sequences, and most DBMSes do one or the other but not both. So the database independent way is to have another table with one column with one value that you lock, select, update, and unlock.
Usually I say "to hell with DBMS independence" and do it with sequences in PostgreSQL, or autoincrement columns in MySQL. For my purposes, supporting both is better than trying to find out one way that works everywhere.
If you can create a Globally Unique Identifier (GUID) in your chosen programming language - consider that as your id.
They are harder to work with when troubleshooting (it is much easier to type in a where condition that is an INT) but there are also some advantages. By assigning the GUID as your key locally, you can easily build parent-child record relationships without first having to save the parent to the database and retrieve the id. And since the GUID, by definition, is unique, you don't have to worry about incrementing your key on the server.
There is auto increment or sequence
What is the point of this, that is the least of your worries?
How will you handle SQL itself?
MySQL has Limit,
SQL Server has Top,
Oracle has Rank
Then there are a million other things like triggers, alter table syntax etc etc
Yep, the obvious ways in raw SQL (and in my order of preference) are a) sequences b) auto-increment fields. The better, more modern, more DBMS-independent way is to not touch SQL at all, but to use a (good) ORM.
There's no universal way to do this. If there were, everyone would use it. SQL by definition abhors the idea - it's an antipattern for set-based logic (although a useful one, in many real-world cases).
The biggest problem you'd have trying to interpose an identity value from elsewhere is when a SQL statement involves several records, and several values must be generated simultaneously.
If you need it, then make it part of your selection requirements for a database to use with your application. Any serious DBMS product will provide its own mechanism to use, and it's easy enough to code around the differences in DML. The variations are pretty much all in the DDL.
I'd always go for the DB specific solution, but if you really have to the usual way of doing this is to implement your own sequence. Your RDBMS has to support transactions.
You create a sequence table which contains an int column and seed this with the first number, your transaction logic then looks something like this
begin transaction
update tblSeq set intID = intID + 1
select #myID = intID from tblSeq
inset into tblData (intID, ...) values (#myID, ...)
end transaction
The transaction forces a write lock such that the then next queued insert cannot update the tblSeq value before the record has been inserted into tblData. So long as all inserts go though this transaction then your generated ID is in sequence.
Use an auto-incrementing id column.
Is there really a reason that they have to be in sequence? If you're just using it as an ID, then you should just be able to use part of a UUID or the first couple digits of md5(now()).
You could take the time and massage it. It'd be the equivalent of something like
DateTime.Now.Ticks
So it be something like YYYYMMDDHHMMSSSS
It may be of a bit lateral approach, but a good ORM-type library will probably be able to at least hide the differences. For example, in Ruby there is ActiveRecord (commonly used in but not exclusively tied to the Ruby the Rails web framework) which has Migrations. Within a table definition, which is declared in platform-agnostic code, implementation details such as datatypes, sequential id generation and index creation are pushed down below your vision.
I have transparently developed a schema on SQLite, then implemented it on MS SQL Server and later ported to Oracle. Without ever changing the code that generates my schema definition.
As I say, it may not be what you're looking for, but the easiest way to encapsulate what varies is to use a library that has already done the encapsulation for you.
With only SQL, following could be one to the approaches:
Create a table to contain the starting id for your needs
When the application is deployed for the first time, the application should read the value in its context.
Thereafter, increment id (in thread-safe fashion) as required
3.1 Write the id to the database (in thread-safe fashion) which always keeps updated value
3.2 Don't write it to the database, just keep incrementing in the memory (thread-safe manner)
If for any reason server is going down, write the current id value to the database
When the server is up again it will pick from where it left, the last time.
I have a table (T1) in t-sql with a column (C1) that contains almost 30,000 rows of data.
Each column contains values like MSA123, MSA245, MSA299, etc. I need to run an update script so the MSA part of the string changes to CMA. How can I do this?
update t1
set c1 = replace(c1,"MSA","CMA")
where c1 like "MSA%"
I don't have SQL Server in front of me, but I believe that this will work:
UPDATE T1 SET C1 = REPLACE(C1, 'MSA', 'CMA');
You can use REPLACE function to do it.
In addition to what fallen888 posted, if there are other values in that table/column as well you can use the LIKE operator in the where clause to make sure you only update the records you care about:
... WHERE [C1] LIKE 'MSA[0-9][0-9][0-9]'
While replace will appear to work, what happens when you need to replace M with C for MSA but not for MCA? Or if you have MSAD as well as MSA in the data right now and you didn't want that changed (or CMSA). Do you even know for sure if you have any data being replaced that you didn't want replaced?
The proper answer is never to store data that way. First rule of database design is to only store one piece of information per field. You should have a related table instead. It will be easier to maintain over time.
I have to disagree with HLGEM's post. While is is true that the first normal form talks about atomocity in E.F. Codd's original vision (is is the most controversial aspect of 1NF IMHO) the original request does not necessarily mean that there are no related tables or that the value is not atomic.
MSA123 may be the natural key of the object in question and the company may have simply decided to rename their product line. It is correct to say that if an artificial ID was used then even updates to the natural key would not require as many rows to be updated but if that is what you are implying then I would argue that artificial keys are definitely not the first rule of database design. They have their advantages but they also have many disadvantages which I won't go into here but a little googling would turn up quite a bit of controversy on whether or not to use artificial primary keys.
As for the original request, others have already nailed it, REPLACE is the way to go.