What is the best practice to work in unmanaged programming pattern - sql

I have a query
UPDATE dbo.M_Room
SET
//do something
WHERE PK_RoomId= #RoomId AND IsActive=1 AND FK_DepartmentId =#DepartmentId
Now suppose PK_RoomId is my Pk of M_Room and is autoincremented field.
So according to this I could have used
WHERE PK_RoomId= #RoomId
rather than
WHERE PK_RoomId= #RoomId AND IsActive=1 AND FK_DepartmentId =#DepartmentId
What all threats I could overcome if I use the second condition rather than 1st one.
If we don't have any relationship/Constraints(PK, FK etc.) physically exists and cant implement due to unmanaged structure of database.
What will be your recommendation in such scenario.
What all things should be done to keep data consistent.

I don't think it's a good idea to change the WHERE to just WHERE PK_RoomId= #RoomId. The first part (the part you want to keep) is for identifying the record. The second part (AND IsActive=1) is used to maybe restrict the update based on whether the room is active or not. About the last part (AND FK_DepartmentId =#DepartmentId), that could mean that sometimes you only want to update the room if it belongs to the department you specified. This could also be useful.
Why exactly would you want to change the query?

If you are using READ-UNCOMMITTED transactions or no transactions at all, or the data has been sitting round on someone's screen for a long time, the additional conditions could save you from a buried update, presuming that your // do something does something to the IsActive column.
It could also be a final guard against just getting it wrong (seeing if the room isn't active and then forgetting to make use of the fact).
Make sure to check the number of rows updated in either case.
Your second-last paragraph suggests the room_id may not be unique when it is supposed to be; you will always have trouble if that's the case.
Myself, I'd be inclined to check explicitly for buried updates if I suspected they may occur, and I'd think that form of defending against programming errors to be unusual.

Related

How to prevent a derived value (SUM) from being manually updated to an incorrect value

I am learning SQL and DB design for a college class. One assignment that was given to us is to create a table with a derived attribute which is the SUM of some child attributes. For example:
ORDERS
orderID {PK}
/orderTotal /* derived from SUM of child itemTotals */
ITEMS
itemNo {PK}
orderID {FK}
itemTotal
Now, I am not even sure this is good practice. From some reading I've done on the web, derived values should not be stored, but rather calculated by user applications. I can understand that perspective, but in this instance my assignment is to store derived values and more importantly to maintain their integrity via triggers, which are relatively new to me so I am enjoying using them. I'd also imagine in some more complex cases that it really would be worth the saved processing time to store derived values. Here are the safeguards I've put in place which are NOT giving me problems:
A trigger which updates parent /orderTotal when new child item is inserted.
A trigger which updates parent /orderTotal when child item is deleted.
A trigger which updates parent /orderTotal when child itemTotal is modified.
However, there is another safeguard I want which I cannot figure out how to accomplish. Since the parent attribute /orderTotal is derived, it should never be manually modified. If somebody does attempt to manually modify it (to an erroneous value which is not actually the correct SUM), I want to either (a) prevent them from doing this or (b) revert it to its old value as soon as they are done.
Which is the better approach, and which is possible (and how)? I am not sure how to accomplish the former, and I tried to accomplish the latter via either a trigger or a constraint, but neither one seemed appropriate. The trigger method kept giving me ORA-04091 error for attempting to mutate the table which fired the trigger. The constraint method, I do not think is appropriate either since I'm not sure how to do such a specific thing inside a constraint check.
I am using Oracle SQL by the way, in SQL Developer.
Thanks!
"Now, I am not even sure thise is good practice."
Your intuition is right: this is bad practice. For some reason, a lot of college professors set their students the task of writing poor code; this wouldn't be so bad if they at least explained that it is bad practice and should never be used in the real world. But then I guess most professors have only a limited grasp on what matters in the real world. sigh.
Anyhoo, to answer your question. There are two approaches to this. One would be to use a trigger to "correct" i.e. swallow the change. This would be wrong because the user trying to modify the value would probably waste a lot of time trying to discover why their change wasn't sticking, without realising they were breaking a business rule. So, it's much better to hurl an exception.
This example uses Oracle syntax, because I'm guessing that's what you're using.
create or replace trigger order_header_trg
before insert or update
on order_header for each row
begin
if :new.order_total != :old.order_total
then
raise_application_error
( -20000, 'You are not allowed to modify the value of ORDER_TOTAL');
end if;
end;
The only problem with this approach is that it will prevent you inserting rows into ORDER_LINES and then deriving a new total for ORDER_HEADER.
This is one reason why denormalised totals are Bad Practice.
The error you're getting - ORA-04091 - says "mutating table". This happens when we attempt to write a trigger which selects from the table which owns the trigger. It almost always points to a poor data model, one which is insufficiently normalised. This is obviously the case here.
Given that you are stuck with the data model, the only workaround is a clunky implementation using multiple triggers and a package. The internet offers various slightly different solutions: here is one.
One solution might be to move all the logic for maintaining the orderTotal into an INSTEAD OF UPDATE trigger on the ORDERS table.
The triggers you already have in ITEMS can be simplified to update ORDERS without making a calculation - setting orderTotal to 0 or something like that. TheINSTEAD OF` trigger will run in place of the update statement.
If an attempt is made to manually update the order total, the INSTEAD OF trigger will fire and re-calculate the existing value.
PS - I don't have an Oracle DB to test on, but from what I can tell, an INSTEAD OF trigger will get around the ORA-04091 error - aplologies if this is wrong
EDIT
Whilst this solution would work in some other RDBMS systems, Oracle only supports INSTEAD OF triggers on views.
EDIT 2
If the assignment allows it, a view could be a suitable solution to this problem, since this would enable the order value column to be calculated.

Reserve Id Without Insert

Sorry for my question but I need any answer. How do I reserve one id in database without inserting the record?
I don't know what id generator you use and what problem you actually need to solve. Roughly, this are your options:
insert a placeholder (null-object)
use Guids instead (guid.comb), you won't get into this problem at all.
write your own IdentityGenerator which allows reserved ids (there are again many possibilities to implement this, it depends on your needs)
This may be helpful for your needs. I'm not sure if there's a way to reserve an Id through NHibernate, but the idea behind the Hi/Lo approach in that link is to split the Id into two values and allow a client to work disconnected by reserving an entire range of values, thus being able to set its own keys before inserting (re-syncing).

SQL Best Practices - Ok to rely on auto increment field to sort rows chronologically?

I'm working with a client who wants to add timestamps to a bunch of tables so that they may sort the records in those tables chronologically. All of the tables also have an auto incrementing integer field as their primary key (id).
The (simple) idea - save the overhead/storage and rely on the primary key to sort fields chronologically. Sure this works, but I'm uncertain whether or not this approach is acceptable in sound database design.
Pros: less storage required per record, simpler VO classes, etc. etc.
Con: it implies a characteristic of that field, an otherwise simple identifer, whose definition does not in any way define or guarantee that it should/will function as such.
Assume for the sake of my question that the DB table definitions are set in stone. Still - is this acceptable in terms of best practices?
Thanks
You asked for "best practices", rather than "not terrible practices" so: no, you should not rely on an autoincremented primary key to establish chronology. One day you're going to introduce a change to the db design and that will break. I've seen it happen.
A datetime column whose default value is GETDATE() has very little overhead (about as much as an integer) and (better still) tells you not just sequence but actual date and time, which often turns out to be priceless. Even maintaining an index on the column is relatively cheap.
These days, I always put a CreateDate column data objects connected to real world events (such as account creation).
Edited to add:
If exact chronology is crucial to your application, you can't rely on either auto-increment or timestamps (since there can always be identical timestamps, no matter how high the resolution). You'll probably have to make something application-specific instead.
Further to egrunin's answer, a change to the persistence or processing logic of these rows may cause rows to be inserted into the database in a non-sequential or nondeterministic manner. You may implement a parallelized file processor that throws a row into the DB as soon as the thread finishes transforming it, which may be before another thread has finished processing a row that occurred earlier in the file. Using an ORM for record persistence may result in a similar behavior; the ORM may just maintain a "bag" (unordered collection) of object graphs awaiting persistence, and grab them at random to persist them to the DB when it's told to "flush" its object buffer.
In either case, trusting the autoincrement column to tell you the order in which records came into the SYSTEM is bad juju. It may or may not be able to tell you the order in which records his the DATABASE; that depends on the DB implementation.
You can acheive the same goal in the short term by sorting on the ID column. This would be better that adding additional data to acheive the same result. I don't think that it would be confusing for anyone to look at the data table and know that it's chronological when they see that it's an identity column.
There are a few drawbacks or limitations that I see however.
The chronological sort can be messed up if someone re-seeds the column
Chronology for a date period cannot be ascertained without the additional data
This setup prevents you from sorting chronologically if the system ever accepts new, non-chronological data
Based on the realistic evaluation of these "limitations" you should be able to advise a proper approach.
Auto-incrementing ID will give you an idea of order as Brad points out, but do it right - if you want to know WHEN something was added, have a datetime column. Then you can not only chronologically sort but also apply filters.
Don't do it. You should never rely on the actual value of your ID column. Treat it like a black box, only useful for doing key lookups.
You say "less storage required per record," but how important is that? How big are the rows we're talking about? If you've got 200-byte rows, another 4 bytes probably isn't going to matter much.
Don't optimize without measuring. Get it working right first, and THEN optimize.
#MadBreaker
There's to separate things, if you need to know the order you create a column order with autoincrement, however if you want to know the date and time it was inserted you use datetime2.
Chronological order can be garanteed if you don't allow updates or deletes, but if you want time control over select you should use datetime2.
You didnt mention if you are running on a single db or clustered. If you are clustered, be wary of increment implementations, as you are not always guaranteed things will come out in the order you would naturally think. For example, Oracle sequences can cache groups of next values (depending on your setup) and give you a 1,3,2,4,5 sort of list...

How to change string data with a mass update?

I have a table (T1) in t-sql with a column (C1) that contains almost 30,000 rows of data.
Each column contains values like MSA123, MSA245, MSA299, etc. I need to run an update script so the MSA part of the string changes to CMA. How can I do this?
update t1
set c1 = replace(c1,"MSA","CMA")
where c1 like "MSA%"
I don't have SQL Server in front of me, but I believe that this will work:
UPDATE T1 SET C1 = REPLACE(C1, 'MSA', 'CMA');
You can use REPLACE function to do it.
In addition to what fallen888 posted, if there are other values in that table/column as well you can use the LIKE operator in the where clause to make sure you only update the records you care about:
... WHERE [C1] LIKE 'MSA[0-9][0-9][0-9]'
While replace will appear to work, what happens when you need to replace M with C for MSA but not for MCA? Or if you have MSAD as well as MSA in the data right now and you didn't want that changed (or CMSA). Do you even know for sure if you have any data being replaced that you didn't want replaced?
The proper answer is never to store data that way. First rule of database design is to only store one piece of information per field. You should have a related table instead. It will be easier to maintain over time.
I have to disagree with HLGEM's post. While is is true that the first normal form talks about atomocity in E.F. Codd's original vision (is is the most controversial aspect of 1NF IMHO) the original request does not necessarily mean that there are no related tables or that the value is not atomic.
MSA123 may be the natural key of the object in question and the company may have simply decided to rename their product line. It is correct to say that if an artificial ID was used then even updates to the natural key would not require as many rows to be updated but if that is what you are implying then I would argue that artificial keys are definitely not the first rule of database design. They have their advantages but they also have many disadvantages which I won't go into here but a little googling would turn up quite a bit of controversy on whether or not to use artificial primary keys.
As for the original request, others have already nailed it, REPLACE is the way to go.

`active' flag or not?

OK, so practically every database based application has to deal with "non-active" records. Either, soft-deletions or marking something as "to be ignored". I'm curious as to whether there are any radical alternatives thoughts on an `active' column (or a status column).
For example, if I had a list of people
CREATE TABLE people (
id INTEGER PRIMARY KEY,
name VARCHAR(100),
active BOOLEAN,
...
);
That means to get a list of active people, you need to use
SELECT * FROM people WHERE active=True;
Does anyone suggest that non active records would be moved off to a separate table and where appropiate a UNION is done to join the two?
Curiosity striking...
EDIT: I should make clear, I'm coming at this from a purist perspective. I can see how data archiving might be necessary for large amounts of data, but that is not where I'm coming from. If you do a SELECT * FROM people it would make sense to me that those entries are in a sense "active"
Thanks
You partition the table on the active flag, so that active records are in one partition, and inactive records are in the other partition. Then you create an active view for each table which automatically has the active filter on it. The database query engine automatically restricts the query to the partition that has the active records in it, which is much faster than even using an index on that flag.
Here is an example of how to create a partitioned table in Oracle. Oracle doesn't have boolean column types, so I've modified your table structure for Oracle purposes.
CREATE TABLE people
(
id NUMBER(10),
name VARCHAR2(100),
active NUMBER(1)
)
PARTITION BY LIST(active)
(
PARTITION active_records VALUES (0)
PARTITION inactive_records VALUES (1)
);
If you wanted to you could put each partition in different tablespaces. You can also partition your indexes as well.
Incidentally, this seems a repeat of this question, as a newbie I need to ask, what's the procedure on dealing with unintended duplicates?
Edit: As requested in comments, provided an example for creating a partitioned table in Oracle
Well, to ensure that you only draw active records in most situations, you could create views that only contain the active records. That way it's much easier to not leave out the active part.
We use an enum('ACTIVE','INACTIVE','DELETED') in most tables so we actually have a 3-way flag. I find it works well for us in different situations. Your mileage may vary.
Moving inactive stuff is usually a stupid idea. It's a lot of overhead with lots of potential for bugs, everything becomes more complicated, like unarchiving the stuff etc. What do you do with related data? If you move all that, too, you have to modify every single query. If you don't move it, what advantage were you hoping to get?
That leads to the next point: WHY would you move it? A properly indexed table requires one additional lookup when the size doubles. Any performance improvement is bound to be negligible. And why would you even think about it until the distant future time when you actually have performance problems?
I think looking at it strictly as a piece of data then the way that is shown in the original post is proper. The active flag piece of data is directly dependent upon the primary key and should be in the table.
That table holds data on people, irrespective of the current status of their data.
The active flag is sort of ugly, but it is simple and works well.
You could move them to another table as you suggested. I'd suggest looking at the percentage of active / inactive records. If you have over 20 or 30 % inactive records, then you might consider moving them elsewhere. Otherwise, it's not a big deal.
Yes, we would. We currently have the "active='T/F'" column in many of our tables, mainly to show the 'latest' row. When a new row is inserted, the previous T row is marked F to keep it for audit purposes.
Now, we're moving to a 2-table approach, when a new row is inserted, the previous row is moved to an history table. This give us better performance for the majority of cases - looking at the current data.
The cost is slightly more than the old method, previously you had to update and insert, now you have to insert and update (ie instead of inserting a new T row, you modify the existing row with all the new data), so the cost is just that of passing in a whole row of data instead of passing in just the changes. That's hardly going to make any effect.
The performance benefit is that your main table's index is significantly smaller, and you can optimise your tablespaces better (they won't grow quite so much!)
Binary flags like this in your schema are a BAD idea. Consider the query
SELECT count(*) FROM users WHERE active=1
Looks simple enough. But what happens when you have a large number of users, so many that adding an index to this table would be required. Again, it looks straight forward
ALTER TABLE users ADD INDEX index_users_on_active (active)
EXCEPT!! This index is useless because the cardinality on this column is exactly two! Any database query optimiser will ignore this index because of it's low cardinality and do a table scan.
Before filling up your schema with helpful flags consider how you are going to access that data.
https://stackoverflow.com/questions/108503/mysql-advisable-number-of-rows
We use active flags quite often. If your database is going to be very large, I could see the value in migrating inactive values to a separate table, though.
You would then only require a union of the tables when someone wants to see all records, active or inactive.
In most cases a binary field indicating deletion is sufficient. Often there is a clean up mechanism that will remove those deleted records after a certain amount of time, so you may wish to start the schema with a deleted timestamp.
Moving off to a separate table and bringing them back up takes time. Depending on how many records go offline and how often you need to bring them back, it might or might not be a good idea.
If the mostly dont come back once they are buried, and are only used for summaries/reports/whatever, then it will make your main table smaller, queries simpler and probably faster.
We use both methods for dealing with inactive records. The method we use is dependent upon the situation. For records that are essentially lookup values, we use the Active bit field. This allows us to deactivate entries so they wont be used, but also allows us to maintain data integrity with relations.
We use the "move to separation table" method where the data is no longer needed and the data is not part of a relation.
The situation really dictates the solution, methinks:
If the table contains users, then several "flag" fields could be used. One for Deleted, Disabled etc. Or if space is an issue, then a flag for disabled would suffice, and then actually deleting the row if they have been deleted.
It also depends on policies for storing data. If there are policies for keeping data archived, then a separate table would most likely be necessary after any great length of time.
No - this is a pretty common thing - couple of variations depending on specific requirements (but you already covered them):
1) If you expect to have a whole BUNCH of data - like multiple terabytes or more - not a bad idea to archive deleted records immediately - though you might use a combination approach of marking as deleted then copying to archive tables.
2) Of course the option to hard delete a record still exists - though us developers tend to be data pack-rats - I suggest that you should look at the business process and decide if there is now any need to even keep the data - if there is - do so... if there isn't - you should probably feel free just to throw the stuff away.....again, according to the specific business scenario.
From a 'purist perspective' the realtional model doesn't differentiate between a view and a table - both are relations. So that use of a view that uses the discriminator is perfectly meaningful and valid provided the entities are correctly named e.g. Person/ActivePerson.
Also, from a 'purist perspective' the table should be named person, not people as the name of the relation reflects a tuple, not the entire set.
Regarding indexing the boolean, why not:
ALTER TABLE users ADD INDEX index_users_on_active (id, active) ;
Would that not improve the search?
However I don't know how much of that answer depends on the platform.
This is an old question but for those search for low cardinality/selectivity indexes, I'd like to propose the following approach that avoids partitioning, secondary tables, etc.:
The trick is to use "dateInactivated" column that stores the timestamp of when the record is inactivated/deleted. As the name implies, the value is NULL while the record is active, but once inactivated, write in the system datetime. Thus, an index on that column ends up having high selectivity as the number of "deleted" records grows since each record will have a unique (not strictly speaking) value.
Then your query becomes:
SELECT * FROM people WHERE dateInactivated is NULL;
The index will pull in just the right set of rows that you care about.
Filtering data on a bit flag for big tables is not really good in terms of performance. In case when 'active' determinate virtual deletion you can create 'TableName_delted' table with the same structure and move deleted data there using delete trigger.
That solution will help with performance and simplifies data queries.