Model validation logic in the database via constraints. Good idea, bad idea, or not worth it?

Model validation logic in the database via constraints. Good idea, bad idea, or not worth it? - sql

It's always rubbed me the wrong way to write code in my model's clean method to validate various constraints on the data when these same constraints aren't also present in the database.
After all, the database already has constraints for some of my data, like NOT NULL.
So, I've been writing RawSQL migrations that ADD CONSTRAINT some_logic in my most recent project that matches whatever logic I have in my clean() method.
It works OK, but it isn't an insignificant task to remember to add these constraints, add tests for these migrations, and update them when my model changes. Also, of course, I'm violating DRY by writing code in two places to do the same thing.
Should I give up this quixotic quest?

This is by no means a comprehensive answer, but at least I wanted to give my opinion.
There has been many frameworks that have pushed the idea of removing the constraints from the database, in order to check them at the application level. The idea seemed nice to me at first (in the early 2000s) but after some years I came to the (very personal) conclusion that this is a bad idea.
I think, to me it boils down to two things:
Data survives much longer than the applications. Whole systems go obsolete, but the data survives many more years. Sometimes the application is replaced, but the database is stil the same one.
The application is not as reliable when it comes to validate data. I'm talking about programming defects here. One version of the app may work well and then the next one has a bug. It may be that one developer moves out of the company, then the new replacement -- who doesn't know as much -- changes the app with disastrous consequences. All that time a simple database constraint (that is usually very cheap to implement) could have enforced data quality.
Yep, I'm a fan of strict database constraint. Nevertheless, this doesn't mean I'm against application validations. These ones can show much nicer error messages.

If writing too much logic in clean() feels dirty, an in-between solution would be to use Django's built-in validators directly on your model fields.
The validation logic isn't saved in the database, but it is tracked in migrations. Like clean() logic, Validators require you to call Model.clean_fields(), but a ModelForm does this automatically.
You can also dig into django-db-constraints. The library might help do what you're looking to do, and the source code might help you roll a solution that fits your needs.

Related

Trigger for updated date column vs explicitly setting it

I'm wondering if using a trigger to set the updated date column of a table (or all tables) is considered a better practice versus having the application explicitly set it. I realize this could devolve into a debate over preferences and design patterns, so in an effort to parameterize the question a bit - I'd like to get a take from a best practices stand point. Taking separation of concerns into consideration (keeping business logic out of the database type of thing) as well as making sure database columns have what is intended within them (actually updating the "last modified date" column), I'm more inclined to let the database handle this via a trigger. That said, I also tend to shy away from triggers since they tend to hide the consequences of an action in a database. I'm hoping that the many smarter people here on SO have a more concrete thought than mine.

It rather depends on what you want to do. If you want to know when any column has changed, then a trigger would be the most suitable. It would be a bit of a drag to have to change code every time you added a new column.
However, if you were only interested in certain columns, you might simply wish to handle this in the update SQL.
There are of course shades in between - you could choose to handle only certain columns in the trigger.
Another consideration is how many bits of code can update a given table. You might have some all singing all dancing update code, but it is perfectly possible this could be spread out.
One word of caution - triggers tend to be the last place you consider when tracking an issue. For this sake (and this is a personal preference), I tend to avoid them unless they're absolutely necessary.

Apart from the tuning, the issue with triggers is that if they are disabled, you will never know this, and transactions will be committed. This could cause a major problem if the update-date is important to your business logic (for instance for sync processes or for finance). I prefer to explicitly handle all business related data within the procedures. And use triggers mainly for auditing.

Best way to migrate data from Access to SQL Server

The problem
Ok, sorry that my question is somewhat abstract and subjective, but will try to make it as specific as possible. So, the situation I am in is simple - I am remaking a very old MS Access application on a new website using ASP.NET MVC. As currently the MVC site is using SQL Server 2008 (for many well known reasons) I need to find a way to migrate the tables AND the data, because the information in the old database will be used in the new application.
Alright, so far so good, however there are a few problems. The old application is written in a different language, meaning that I want to translate table names, field names, and all other names that are there to English. Furthermore, I will be making some changes on the models themselves (change the type of some fields, add additional fields to some tables, remove old unnecessary ones and more). So technically I'll be 'having my way' with everything.
Researched solutions
With those things in mind I researched for the ways to migrate data from Access database to a SQL Server. Of course, there is a lot of information on the matter, in Stack Overflow alone there are more than a few questions and solutions. So why am I struggling to find the answer ? Well I found a few solutions that will be sufficient to some extend (actually will definitely solve my problems) but I am writing to ask if someone experienced has a better perspective on it than I do. Alright, the solutions and why I am still looking for advice: /I'll be listing just a couple of the most common and popular ones that I found, many of the others share the same capabilities and/or results /
Upsize Wizzard (Access) - this is a tool devised specifically for migrating tables and data from Access. It is my most favourite one for the moment as I find it kind of straightforward to work with and it provides good overall results. I was able to migrate the tables to SQL Server (along with the data of course) which more or less is what I am intending to do. It is fast, it seems like it allows you to migrate indexes, primary keys and even to my knowledge foreign keys (table relationships). The downsides of this tool, however, include that it ignores your queries (which I don't really need honestly) and it doesn't provide a way to change the model, names or types of the properties of the table you migrate - which is the thing I kind of prefer, because I will have to make more than a few changes, adding, renaming, deleting, etc. And then continue with the development process (of the application) which will lead to a few additional minor changes. And finally I would need to apply all changes (migration + all changes) on the production server, which overall is prone to mistakes as I will be doing it by hand (and there are more than a few tables).
SQL Server Migration Assistant (SSMA) - ok, this is a separate tool (not included in Access) with again the same idea - to migrate data from Access to ... possibly everywhere, haven't researched that. Overall it offers more functionality and customizing from the Upsize Wizard, but of course it does it in a more complicated way. I haven't put enough effort to make a migration with this tool yet, as it involves a lot of installations and additional work, but according to my research it provides almost all (if not all) of the functionality I require. The downside however comes with the naming. As I mentioned it allows you to apply changes on the tables, schema, fields, indexes, keys and probably everything, but the articles advice that I change the names in Access first, as it will be easier and the migration process will run more smoothly. I am not allowed to make changes on the original Access database, as it will remain functional until the publish of the 'renewed' project, and the data inside it is being used, so a mere copy of the file is a solution I am not particularly fond of, because I might loose new records. Also I cant predict the changes I would want to make in the development process (as I said I believe I would want/need to apply some additional changes later on when I find 'weaknesses' in my data design in the development process) so I find it to be a little half baked solution.
Conclusion
The options presented, the way I see them, are two:
Use the Upsize Wizard to migrate the access tables, then write a script that applies the changes I want to make. Then in the development process add any additional changes to the script. When ready to publish on the production server, reapply the migration with the wizard, run the changes script and pray everything is fine.
Get more involved with the SSMA tool and try producing an updated version of the tables with the migration process. (See how efficient the renaming is and decide whether to use copied file to rename and then find a way to migrate only new records or do it all in the SSMA). Then again write a script for the changes that occur in the development process and re-do and apply it all on the production server when ready and then pray everything is fine.
Option I have not yet seen, apply it and then pray everything is fine.
I have researched the matter for a couple of days now, and found a few more solutions that I do not believe are better by the mentioned. However I include the possibility of missing the 'big red X on the map', a practical and easy solution which seems like it was designed specifically for me (though I doubt that a little). Anyway, reducing all the madness that I have written so far to a few simple questions will look like:
Is anyone aware if my conclusions are correct? I am leaning towards option one as it is easier to accomplish.
Has anyone experienced/found a better way to do that, or just found some 'logic-leaps' in my writings as I am overthinking the entire thing a little and may be doing some obvious miscalculation.
Very sorry for asking a trivial question and one that includes decision making that may involve deeper understanding of my project and situation, yet I am working with rather sensitive data and would appreciate feedback, even if only to improve my confidence into the chosen approach.

There is one other tool/method you might want to consider that seems to cater to your specific needs more. This would be to use the data import/export tool that ships with sqlserver to do a complete copy of all data into a temporary location within sql server and then write custom queries to reorganize the names and other changes you want to make. Is a bit more work but you could use the end product as a seed method for your migrations ;) (if you are doing code first anyway)

NHibernate - Usefulness

I work in a software and hardware development farm. Today one of my colleagues told me that NHibernate is only useful for small projects, and for complex or large scale projects it must be avoided. Also, it makes code harder to change.
Are those statements true?

Ebay uses Hibernate (the Java version that NHibernate is ported from). I don't consider that a small project.
As far as changing code goes, consider this: Let's assume we need to add a new property to an object.
Here is what has to be done with a hand-rolled data access layer:
Add the column to the db table.
Change every stored procedure that
deals with that object / table.
This is usually several stored
procedures in my experience.
Change the code in the mapping layer
Add the property to the Object
Here is what has to be done with NHibernate:
Add the column to the db table.
Add the property to the HBM file
Add the property to the object.

Have to agree with Daniel Augur on the first point.
On the second, "does it make code harder to change?", I'll provide a general view. Any time you use something ready-rolled you're going to run into restrictions that might not be easier to overcome. Even when the source is available, you may not wish to modify it for fear of deviating to the point of a breaking change.
Part of a software developer's job is determining whether the merits outweigh the drawbacks with 3rd party code.

Does an ORM integrate with existing applications or do I not understand?

Assume Hibernate for the ORM.
I'm not sure how to ask this. I want to build an application that can replace part of another. For example, say I have an application with various modules, called the "big" app. This application may handle HR, financial, purchases, skill sets, etc. But maybe, for whatever reason, I don't like the skill set module, but I like the rest of the application. I want to build an app that uses the same database that the rest of the "big" app uses but use my software as the front end for that piece.
I could build my app and have it hit the database directly with no ORM. My question is is there an advantage to using an ORM here. I'm thinking there is because if the "big" app goes away and another app is purchased, we could continue to use my version of skill set because I am using hibernate instead of hitting things directly. I'm still learning but I thought that my application used objects that I named and that in the case I just described I'd have to change my mapping files only or/and my code very little.
Here is another example. I have a legacy application and legacy database. It uses database X. I decide that I no longer like the old terminal emulator application that is used to get the data and that I want a graphical version. I can use hibernate with my application and when I finally decide to get rid of the legacy database and change to the latest Oracle or SQL Server, I can do so with minimal headache? Or is my database going to change so much that it wouldn't have matter anyway (I'm suggesting that upon changing to a new database more information will want to be captured)?
I was hoping for comments, if I am misunderstanding why hibernate/ORM might or might not be a benefit.
Thank you.

I do not think you will have a huge benefit frmo hibernate if the database schema changes to something completely different, you might have to change more than just your mapping - especially if more "structure" is added to the database (tables, column and such schema things). That said, if the database was structured mostly the same way, but lets say just the column names and tables names changes and a couple of tables are merged or something like that - you can get by with just changing your mapping.
But I would really recommend using hbernate for database agnosticity, that's is a pretty easy path.
AND then just because it doesn't exactly helps you if your entire database is changed, it such an incredible amount of other forces, that I would choose that over direct DB access most of the time.
Lastly you could think about using a service-layer such as the repository pattern that abstracs away the data access, so the business of your appilcation wouldn't need to change if the database changes.

Switching from one DBMS to another (ala Oracle to SQL Server) is one thing that using an ORM would certainly make much easier.
As for switching from one "big app" to another "big app", I doubt if using an ORM would help that much. It's likely that the database structure and business logic would be different enough that you would find yourself rewriting lots of code anyways.

You can generate domain objects with Hibernate Tools, if you do that than it will be painless and fast. however if you write all the objects by hand you will die. i think its good idea to rewrite part of the app and get to know hibernate better.

I think it's generally a bad idea to make any decision based on the
unknowns versus the knowns. Whether you're deciding on a data
access/persistence strategy, what car to buy, or what college to go
to, you should put the most weight on the things you know you want
today, rather than worrying about what may or may not happen tomorrow.
So when considering ORMs, I wouldn't worry too much about things such as apps
"going away" or DBMSs changing (unless that's either already been talked about, or
there's a history of this in your company). I'm not saying that these aren't things that will never happen, but rather that they should take a back seat to the generally much more important considerations of maintainability, performance, and developer productivity.
So in short, choose an ORM based on its ability to solve the problems and satisfy the requirements that you have today.

How do I 'refactor' SQL Queries?

I have several MS Access queries (in views and stored procedures) that I am converting to SQL Server 2000 (T-SQL). Due to Access's limitations regarding sub-queries, and or the limitations of the original developer, many views have been created that function only as sub-queries for other views.
I don't have a clear business requirements spec, except to 'do what the Access application does', and half a page of notes on reports/CSV extracts, but the Access application doesn't even do what I suspect is required properly.
I, therefore, have to take a bottom up approach, and 'copy' the Access DB to T-SQL, where I would normally have a better understanding of requirements and take a top down approach, creating new queries to satisfy well defined requirements.
Is there a method I can follow in doing this? Do I spread it all out and spend a few days 'grokking' it, or do I continue just copying the Access views and adopt an evolutionary approach to optimising the querying?

Work out what access does with the queries, and then use this knowledge to check that you've transferred it properly. Only once you've done this can you think about refactoring. I'd start with slow queries and then go from there: work out what indexes you need and then progressively rewrite. This way you can deliver as soon as you've proved that you moved everything successfully (even if it is potentially a bit slower). That's much better than not being able to deliver at all because problem X came along.

I'd probably start with the Access database, exercise the queries in situ and see what the resultset is. Often you can understand what the query accomplishes and then work back to your own design to accomplish it. (To be thorough, you'll need to understand the intent pretty completely anyway.) And that sounds like the best statement of requirements you're going to get - "Just like it's implemented now."
Other than that, You're approach is the best I can think of. Once they are in SQL Server, just start testing and grokking.

When you are dealing with a problem like this it's often helpful to keep things working as they are while you make incremental changes. This is better from a risk management perspective.
I'd concentrate on getting it working, then checking the database performance and optimizing performance problems. Then, as you add features and fix bugs, clean up the code that's hard to maintain. As you said, a sub-query is really very similar to a view. So if it's not broken you may not need to change it.

This depends on your timeline. If you have to get the project running absolutely as soon as possible (I know this is true for EVERY project, but if it's REALLY true for you), then yes, duplicate the functionality and infrastructure from Access then do your refactoring either later or as you go.
If you have SOME time you can dedicate to it, then refactoring it now will give you two things:
You'll be happier with the code, and it will (likely) perform better, since actual analysis was done rather than the transcoding equivalent of a copy-paste
You'll likely gain a greater understanding of what the true business rules are, since you'll almost certainly come across things that aren't in the spec (especially considering how you describe them)

I would recommend copying the views to SQL Server immediately, and then use its sophisticated tools to help you grok them.
For example, SQL Server can tell you what views, stored procedures, etc, rely on a particular view, so you can see from there whether the view is a one-of or if it's actually used in more than one place. It will help you determine which views are more important than which.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas