Table and column name changes post database switch to postgresql in code - ruby-on-rails-3

We have a very old application with multiple databases. Recently we decided to make a switch from MSSQL to PostgreSQL. During the process we decided to keep all the table and column names in lower case. This affects our code a lot. We want to go with minimal change approach.
Problems :
Table name changes - We thought of overriding getters and setters for the table_name in model to avoid changes at many places. Creating a module and then including that in all the models was one option. But our setter can't get control because of "ar-octopus" gem. It actually hijacks the setter for table_name. So this approach fails.
We tried mapping dynamic methods like "find_by_UserID" to "find_by_userid" by overriding "method_missing". But dynamic_matchers in activerecord has marked the method as private, so this also doesn't works.
The only option now seems to be refactor the whole code to suit the table and column name changes. But we have used queries, direct column accessing like #person["Name"] in views as well and use of many dynamic functions like mentioned in point 2. So even after refactoring the whole code and testing it completely we wouldn't be sure whether all the code has been updated properly or not.
We want refactoring to be last option. But, don't know any other better way.

Related

Can we add a systematic WHERE clause to Entity Framework when it queries a DB table (IE Soft delete)?

I have just added a IsValidRecord column to a MyClass SQL table.
It will be used as a logical delete / soft delete.
Now I need to update my application to only query the valid records based on the new column.
I use Entity Frameword DB first.
Our app uses a business layer that centralizes all methods fetching the MyClass items.
So I have updated all the methods that query the concerned table with the appropriate filter based on IsValid.
It works fine.
However, I am pretty sure that these are bad practises because devs will forget to set this filter on new methods that will be added in the future, which will obviously bring incorrect records.
I wonder if EF would have features to automatically filter the queries with the appropriate "AND IsValid = 1" filter?
I used to be working for a company doing the same with NHibernate.
The only supported feature that I have seen for EF is this:
Soft Delete
Unfortunately,it overwrites OnModelCreating so I take it that it only works for Code First architecture.
We use DataBase first so I think it does not work as OnModelCreating is never called?
I would normally implement this filter using application-specific views in the database (after all, some uses of this data may need to be able to see deleted items).
With a simple definition for the view, they should automatically be considered updatable by SQL so you shouldn't need to have to write triggers to manage INSERT/UPDATE/DELETE operations. You then lie to Entity Framework about what its "tables" are and it should mostly be none-the-wiser.
Depending on how you want the soft-delete to work, you may choose to hide the existence of the IsValidRow column (nit: we have rows in SQL, not records) in this view and implement an INSTEAD OF DELETE trigger on the view allowing your application to soft delete these rows by asking EF to remove them.
The best link I have found is this:
EDMX Mapping
Use EDMX designer to add the filter condition. It's basically exacly what I want...
Are there any down sides for this solution?
At first sight, it sounds good enough to me.
Only disadvantage that I can think of is that the filter is well hidden. Other devs in the future might have very hard time to figure out why / where / how are the entities filtered.

Remove a database field when deleting property from a class using datamapper

I am using datamapper in a Sinatra application. I currently use the command
DataMapper.finalize.auto_upgrade!
to handle the migrations. I had two Classes (Artists and Events) with a 'has_n' and 'belongs_to' association. An Event 'belonged_to' one Artist and an Artist could have many Events associated with it.
I changed the association to be a many_to_many relationship by deleting the previous parts of the class definition which governed the original one_to_many association in the models and adding
has n, :artists, :through => Resource
to the Event class and the corresponding code to the Artist class. When I make a new Event, an error is kicked off.
#<DataObjects::IntegrityError: events.artist_id may not be NULL
The :artist_id field is a relic of the original association between the two classes. The new many_to_many association is accessed by event.artists[i] (where 'i' is just an integer index going from 0 to the number of associated artists -1). Apparently the original association method between the Artist and Event classes is still there? My guess is the solution to this is to not just use the auto_upgrade method built into datamapper but rather to write an explicit migration. If there is a way to handle this type of change to a database and still have the auto_upgrade method work, that would be great!
If you need more details about my models or anything please ask and I'll gladly add them.
In my experience, DataMapper's auto_upgrade does not work very well -- or, to say the least, it doesn't work the way I expect it to. If you want to add a new column to your model, it will do what it should; try to do anything more sophisticated to a column and it probably won't behave as you expect.
For example, if you create a property of type String, it will initially have a length of 50 characters. If you notice that 50 characters is not enough to hold your string, adding :length => 100 to the model won't be enough to make auto_upgrade change the column's width.
It seems you have stumbled upon another shortcoming, although one may argue that, in your case, maybe DataMapper's behavior isn't that bad (think of legacy databases). But the fact is that, when you changed the association, the Event's artist_id column wasn't removed, and then when you try to save an Event, you'll get an error because the database says it is a required field.
Notice that the error you are getting is not a validation error: DataMapper thinks everything looks ok, but gets an error from the database when trying to save the object.
Hope this helps!
Auto-upgrade is not a shortcoming at all! I think of auto-upgrade as a convenience feature of DataMapper. It's only intended purpose is to add columns for you as far as I know. So it is great for getting a project started quickly, and managing test and dev environments without having to write migrations, but not the best tool for making modifications to a mature, live project.
For that, DataMapper does have migrations! Use the dm-migrations gem. The shortcoming is that they are not very well documented... at all. I'm actually working on changing a current project of mine over to using migrations, and I hope to contribute some instructions to the dm-migrations github wiki. But if you aren't ready to switch to migrations, you can also just update columns manually using an SQL client, and then continue to use auto-upgrade for new columns. That's what I have been doing for 2 years on my project :)

Same business entity for identical tables?

I got a legacy database which have about 10 identical tables (only name differs).
Is it possible to be able to use the same business entity for all tables without having to create several classes/mapping files?
You can use the entity-name feature if you are using NHibernate v2.1 or higher. It is poorly documented but I am actively using the feature. It has gotten hard to find the documentation on it but look here:
Section 5.3 in
http://docs.jboss.org/hibernate/core/3.2/reference/en/html/mapping.html#mapping-entityname
A couple of things to be aware of. You must now use entity-name instead of class name to refer to the objects. In general it is not an entirely transparent change moving from class names to entity names.
Session actions now require two parameters, for example:
_session.Save("MyEntity", myobject)
The entity-name controls what table the data goes into.
Some HQL queries do not work right anymore, sometimes you must use Criteria instead.
If you need a set of sample code I may be able post some, but far too busy at the moment. I suggest you look at the limited info you can find and set it up for a very simple object and multiple tables to learn how it all works. It does work.
You can create a base class with all the properties, but you still need to map them all.
For that, you can either use copy&paste, XML entities (see examle at http://nhibernate.info/doc/nh/en/index.html#inheritance-tableperconcreate-polymorphism), or a code-based mapping method (Fluent or ConfORM). They usually make reuse easier.

Getting rid of hard coded values when dealing with lookup tables and related business logic

Example case:
We're building a renting service, using SQL Server. Information about items that can be rented is stored in a table. Each item has a state that can be either "Available", "Rented" or "Broken". The different states reside in a lookup table.
ItemState table:
id name
1 'Available'
2 'Rented'
3 'Broken'
Adding to this we have a business rule which states that whenever an item is returned, it's state is changed from "Rented" to "Available".
This could be done with a an update statement like "update Items set state=1 where id=#itemid". In application code we might have an enum that maps to the ItemState id:s. However, these contain hard coded values that could lead to maintenance issues later on. Say if a developer were to change the set of states but forgot to fix the related business logic layer...
What good methods or alternate designs are there for dealing with this type of design issues?
Links to related articles are also appreciated in addition to direct answers.
In my experience this is a case where you actually have to hardcode, preferably by using an Enum which integer values match the id's of your lookup tables. I can't see nothing wrong with saying that "1" is always "Available" and so forth.
Most systems that I've seen hard code the lookup table values and live with it. That's because, in practice, code tables rarely change as much as you think they might. And if they ever do change, you generally need to re-compile any programs that rely on that DDL anyway.
That said, if you want to make the code maintainable (a laudable goal), the best approach would be to externalize the values into a properties file. Then you can edit this file later without having to re-code your entire app.
The limiting factor here is that your app depends for its own internal state on the value you get from the lookup table, so that implies a certain amount of coupling.
For lookups where the app doesn't rely on that code, (for instance, if your code table stores a list of two-letter state codes for use in an address drop-down), then you can lazily load the codes into an object and access them only when needed. But that won't work for what you're doing.
When you have your lookup tables as well as enums defined in the code, then you always have an issue with keeping them in sync. There is not much that can be done here. Both live effectively in two different worlds and are generally unaware of each other.
You may wish to reject using lookup tables and only let your business logic operate these values. In that case you miss the options of relying on referential integrity to back you ap on the data integrity.
The other option is to build up your application in that way that you never need these values in your code. That means moving part of your business logic to the database layer, meaning, putting them in stored procedures and triggers. This will also have the benefit of being agnostic to the client. Anyone can invoke SPs and get assured the data will be kept in the consistence state, consistent with your business logic rules as well.
You'll need to have some predefined value that never changes, be it an integer, a string or something else.
In your case, the numerical value of the state is the state's surrogate PRIMARY KEY which should never change in a well-designed database.
If you're concerned about the consistency, use a CHAR code: A, R or B.
However, you should stick to it as well as to a numerical code so that A always means Available etc.
You database structure should be documented as well as the code is.
The answer depends entirely on the language you're using: solutions for this are not the same in Java, PHP, Smalltalk or even Assembler...
But let me tell you something: while it's true hard coded values are not a great thing, there are times in which you do need them. And this one is pretty much one of them: you need to declare in your code your current knowledge of the business logic, which includes these hard coded states.
So, in this particular case, I would hard code those values.
Don't overdesign it. Before trying to come up with a solution to this problem, you need to figure out if it's even a problem. Can you think of any legit hypothetical scenario where you would change the values in the itemState table? Not just "What if someone changes this table?" but "Someone wants to change this table in X way for Y reason, what effect would that have?". You need to stay realistic.
New state? you add a row, but it doesn't affect the existing ones.
Removing a state? You have to remove the references to it in code anyway.
Changing the id of a state? There is no legit reason to do that.
Changing the name of a state? There is no legit reason to do that.
So there really should be no reason to worry about this. But if you must have this cleanly maintainable in the case of irrational people who randomly decide to change Available to 2 because it just fits their Feng Shui better, make sure all tables are generated via a script which reads these values from a configuration file, and then make sure all code reads constants from that same configuration file. Then you have one definition location and any time you want to change the value you modify that configuration file instead of the DB/code.
I think this is a common problem and a valid concern, that's why I googled and found this article in the first place.
What about creating a public static class to hold all the lookup values, but instead of hard-coding, we initialize these values when the application is loaded and use names to refer them?
In my application, we tried this, it worked. Also you can do some checking, e.g. the number of different possible values of a lookup in code should be the same as in db, if it's not, log/email/etc. But I don't want to manually code this for the status of 40+ biz entities.
Moreover, this can be part of the bigger problem of OR mapping. We're exposed with too much details of the persistence layer, and thus we have to take care of it. With technologies like Entity Framework, we don't need to worry about the "sync" part because it's automated, am I right?
Thanks!
I've used a similar method to what you're describing - a table in the database with values and descriptions (useful for reporting, etc.) and an enum in code. I've handled the synchronization with a comment in code saying something like "these values are taken from table X in database ABC" so that the programmer knows the database needs to be updated. To prevent changes from the database side without the corresponding changes in code I set permissions on the table so that only certain people (who hopefully remember they need to change the code as well) have access.
The values have to be hard-coded, which effectively means that they can't be changed in the database, which means that storing them in the database is redundant.
Therefore, hard-code them and don't have a lookup table in the database. Instead store the items state directly in the items table.
You can structure your database so that your application doesn't actually have to care about the codes themselves, but rather the business rules behind them.
I have done both of the following:
Do one or more of your codes have a certain characteristic, such as IsAvailable, that the application cares about? If so, add it as a flag column to the code table, where those that match are set to true (or your DB's equivalent), and those that don't are set to false.
Do you need to use a specific, single code under a certain condition? You can create a singleton table, named something like EnvironmentSettings, with a column such as ItemStateIdOnReturn that's a foreign key to the ItemState table.
If I wanted to avoid declaring an enum in the application, I would use #2 to address the example in the question.
Whether you take this approach depends on your application's priorities. This type of structure comes at the cost of additional development and lookup overhead. Plus, if every individual code comes with its own business rules, then it's not practical to create one new column per required code.
But, it may be worthwhile if you don't want to worry about synchronizing your application with the contents of a code table.

What's the best way to deprecate a column in a database schema?

After reading through many of the questions here about DB schema migration and versions, I've come up with a scheme to safely update DB schema during our update process. The basic idea is that during an update, we export the database to file, drop and re-create all tables, and then re-import everything. Nothing too fancy or risky there.
The problem is that this system is somewhat "viral", meaning that it is only safe to add columns or tables, since removing them would cause problems when re-importing the data. Normally, I would be fine just ignoring these columns, but the problem is that many of the removed items have actually been refactored, and the presence of the old ones in the code fools other programmers into thinking that they can use them.
So, I would like to find a way to be able to mark columns or tables as deprecated. In the ideal case, the deprecated objects would be marked while updating the schema, but then during the next update our backup script would simply not SELECT the objects which have been marked in this way, allowing us to eventually phase out these parts of the schema.
I have found that MySQL (and probably other DB platforms too, but this is the one we are using) supports the COLUMN attribute to both fields and tables. This would be perfect, except that I can't figure out how to actually use it in a meaningful manner. How would I go about writing an SQL query to get all column names which do not contain a comment matching text containing the word "deprecated"? Or am I looking at this problem all wrong, and missing a much better way to do this?
Maybe you should refactor to use views over your tables, where the views never include the deprocated columns.
"Deprecate" usually means (to me at least) that something is marked for removal at some future date, should not used by new functionality and will be removed/changed in existing code.
I don't know of a good way to "mark" a deprecated column, other than to rename it, which is likely to break things! Even if such a facility existed, how much use would it really be?
So do you really want to deprecate or remove? From the content of your question, I'm guessing the latter.
I have the nasty feeling that you may be in one of those "if I wanted to get to there I wouldn't start from here" situations. However, here are some ideas that spring to mind:
Read Recipes for Continuous Database Integration which seems to address much of your problem area
Drop the column explicitly. In MySQL 5.0 (and even earlier?) the facility exists as part of DDL: see the ALTER TABLE syntax.
Look at how ActiveRecord::Migration works in Ruby. A migration can include the "remove_column" directive, which will deal with the problem in a platform-appropriate way. It definitely works with MySQL, from personal experience.
Run a script against your export to remove the column from the INSERT statements, both column and values lists. Probably quite viable if your DB is fairly small, which I'm guessing it must be if you export and re-import it as described.