Should I ignore _NSCoreDataConstraintViolationException? - objective-c

For some reason I only recently found out about unique constraints for Core Data. It looks way cleaner than the alternative (doing a fetch first, then inserting the missing entities in the designated context) so I decided to refactor all my existing persistence code.
If I got it right, the gist of it is to always insert a new entity, and, as longs as I have a proper merge policy, saving the context will take care of the uniqueness and in a more efficient way. The problem is every time I save a context with the inserted entity I get a NSCoreDataConstraintViolationException, no error though. When I do the fetch to make sure
there is indeed only one instance with a unique field
other changes to this entity were applied
everything seems to be okay, but I’m still concerned about this exception, since I do saves and therefore get it quite often, a few times per second in some cases.
My project is in objective-c and I know exceptions are expensive there so I’m having doubts if I’m missing something.
Here is a sample project with this issue (just a few lines of code, be sure to add an exception breakpoint)

NSMergeByPropertyObjectTrumpMergePolicy and constraints are not useful tools and should also never be used. The correct way to manage uniqueness is with a fetch before the insert as it appears you have already been doing.
Let's starts with why the only correct merge policy is NSErrorMergePolicy. You should only be writing to core data in on synchronous say (performBackgroundTask is not enough you also need an operation queue). If you have two performBackgroundTask running at the same time and they contradict then you will lose data. Merge policy is answer the question of "which data would you like to lose?" the correct answer is "Don't lose my data!" which is NSErrorMergePolicy.
The same issue happens when you have a constraint. Let's says you have an entity with a unique constraint on the phone number. And you try to insert another entity with the same phone number. What would you like to happen? It depends on what exactly the data is. It might be two different people, and the phone number should be made different (perhaps they were lacking area code), or it might be one person and the data should be merged. Or you might have a constraint on an uniqueID and the number should just be incremented. But on the database level it doesn't know. It always just does a merge. It will silently lose data.
You can create a custom NSMergePolicy and inspect NSConstraintConflict to decide what do to. But in practice you'd have to think about every time you edit the database and what each change means, which can be very hard outside of the context of writing a change to the database. In other words, the problem with a constraints and merge policy is that it the run is on the wrong level of your application to effectively deal with the problem.
Using constraints with a merge policy of error is OK, as it is a way to find problems with your app (as long as you are monitoring crashes and fixing them). But you still need to do the fetch before the insert to make sure the error doesn't happen.
If you want to clean up code then just have one place that you create your objects. Something like objectWithId:createIfNeed:inContext: which does the fetch and create.

Related

Model validation logic in the database via constraints. Good idea, bad idea, or not worth it?

It's always rubbed me the wrong way to write code in my model's clean method to validate various constraints on the data when these same constraints aren't also present in the database.
After all, the database already has constraints for some of my data, like NOT NULL.
So, I've been writing RawSQL migrations that ADD CONSTRAINT some_logic in my most recent project that matches whatever logic I have in my clean() method.
It works OK, but it isn't an insignificant task to remember to add these constraints, add tests for these migrations, and update them when my model changes. Also, of course, I'm violating DRY by writing code in two places to do the same thing.
Should I give up this quixotic quest?
This is by no means a comprehensive answer, but at least I wanted to give my opinion.
There has been many frameworks that have pushed the idea of removing the constraints from the database, in order to check them at the application level. The idea seemed nice to me at first (in the early 2000s) but after some years I came to the (very personal) conclusion that this is a bad idea.
I think, to me it boils down to two things:
Data survives much longer than the applications. Whole systems go obsolete, but the data survives many more years. Sometimes the application is replaced, but the database is stil the same one.
The application is not as reliable when it comes to validate data. I'm talking about programming defects here. One version of the app may work well and then the next one has a bug. It may be that one developer moves out of the company, then the new replacement -- who doesn't know as much -- changes the app with disastrous consequences. All that time a simple database constraint (that is usually very cheap to implement) could have enforced data quality.
Yep, I'm a fan of strict database constraint. Nevertheless, this doesn't mean I'm against application validations. These ones can show much nicer error messages.
If writing too much logic in clean() feels dirty, an in-between solution would be to use Django's built-in validators directly on your model fields.
The validation logic isn't saved in the database, but it is tracked in migrations. Like clean() logic, Validators require you to call Model.clean_fields(), but a ModelForm does this automatically.
You can also dig into django-db-constraints. The library might help do what you're looking to do, and the source code might help you roll a solution that fits your needs.

Separating code logic from the actual data structures. Best practices? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 4 years ago.
Improve this question
I have an application that loads lots of data into memory (this is because it needs to perform some mathematical simulation on big data sets). This data comes from several database tables, that all refer to each other.
The consistency rules on the data are rather complex, and looking up all the relevant data requires quite some hashes and other additional data structures on the data.
Problem is that this data may also be changed interactively by the user in a dialog. When the user presses the OK button, I want to perform all the checks to see that he didn't introduce inconsistencies in the data. In practice all the data needs to be checked at once, so I cannot update my data set incrementally and perform the checks one by one.
However, all the checking code work on the actual data set loaded in memory, and use the hashing and other data structures. This means I have to do the following:
Take the user's changes from the dialog
Apply them to the big data set
Perform the checks on the big data set
Undo all the changes if the checks fail
I don't like this solution since other threads are also continuously using the data set, and I don't want to halt them while performing the checks. Also, the undo means that the old situation needs to be put aside, which is also not possible.
An alternative is to separate the checking code from the data set (and let it work on explicitly given data, e.g. coming from the dialog) but this means that the checking code cannot use hashing and other additional data structures, because they only work on the big data set, making the checks much slower.
What is a good practice to check user's changes on complex data before applying them to the 'application's' data set?
This is probably not much help now, since your app is built, and you probably don't want to reimplement, but I'll mention it for reference.
Using a ORM framework would help you here. Not only does it handle getting the data from the database into an object oriented representation, it also provides the tools to implement isolated temporary changes and views:
Using the ORM framework with transactions, you can allow the user to change the objects in the model without affecting other users, and without commiting the data "for real" until it has been checked. The ACID guarantees of transactions ensures that your changes are not persisted to the database, but held in your transaction, only visible to you. You can then run checks on the data and commit the transaction only if the data validates. If the data doesn't validate, you rollback the transaction and discard the changes. If it does validate, you commit the transaction and changes are made permanent.
Alternatively, you can create views which provide your data for validation. The views combine the base data and temporary tables (local to your current connection). This avoids locking tables, at the expense of having to write and maintain the views.
EDIT: If you already have a rich object model in memory, the hardest part to making that support incremental, local and isolated changes is direct references between objects. When you want to replace object A with A', that contains a change, you don't want to do a deep copy, with all referneces, since you mention that your object model is large. Also, you don't want to have to update all objects that were pointing to A to reference A'. As an example, consider a very large doubly linked list. It's not possible to create a new list that is the same as the old one with just one element changed, without duplicating the entire list. You can achieve isolation by storing the identifier for related objects rather than the object themselves. E.g. Instead of referencing A explicitly, your collaborators store a reference to the unique key that identifies A, key(A). This key is used to fetch the actual object at the time it is needed (e.g. during verification.) Your model then becomes a large Map of keys to objects, which can be decorated for local changes. When looking up an object by key, first check the local map for value, and if not found, check the universal map. To change A to A', you add an entry to the local map, that maps key(A) to A'. (Note that A and A' have the same key, since logically they are the same item.) When you run your veriification code, local changes are then incorporated, since objects referring to key(A) will get A', while other users using key(A) will get the original, A.
This may sound complex written down, but by removing explicit references and computing them on demand is the only way of supporting isolated updates without having to do a deep copy of the data.
An alternative, but equivalent way, is that your validator uses a map to lookup objects with their replacements before it uses them. E.g. your user modifies A, so you put A->A' into the map. The validator is iterating over the model and comes across A. Before using A, it checks the map, and finds A', which it then uses. The difficulty of this approach is that you have to make sure you check the map every time before an object is used. If you miss one, then your view on the model will be inconsistent.
I would try by any means to verify changes before applying them to the data set, as undoing the ripple effects of changes which later turn out to be invalid can easily become a nightmare.
If there is really a lot of data, I understand that creating a full copy of it may not be feasible - although in general "copy on write" would be the simplest and safest solution. If you really are only able to verify the changes by taking into account the whole set of data, you could try a "decorator"-like approach, i.e. somehow creating a "view" of the changes layered on top of the existing body of data, without actually modifying the latter. This could be used to validate the changes, and if the validation succeeds, you can actually apply the changes; otherwise you can simply throw away the "view" and the changes, without affecting the original data in any way.
Hmm, I would suggest rather than loading data copying it in memory. This is expensive, but will allow you to work on all data concurrently. When changes on data are valid, just apply the changes from the copy to all data using some locking strategy. This way you do not need any undo as long as you can apply the changes atomically. You could even try some transaction system if your needs are more complex.
Also think about lazy-loading(copying) your data as you really need them. Finally what comes to my mind is that if you need to workon large data sets from databases using transactions, try considering using Prolog. It might be reasonable to formulate your chcecks as predicates.
Sounds as if you should instead move the rules etc to the database where they belong, by having the checks in our app you will always issues. Instead by placing as much of the logic in for instance stored procedures that are run when the user insert the values you could catch and rollback invalid input. But I guess you have your reasons for keeping it all in memory.

Getting rid of hard coded values when dealing with lookup tables and related business logic

Example case:
We're building a renting service, using SQL Server. Information about items that can be rented is stored in a table. Each item has a state that can be either "Available", "Rented" or "Broken". The different states reside in a lookup table.
ItemState table:
id name
1 'Available'
2 'Rented'
3 'Broken'
Adding to this we have a business rule which states that whenever an item is returned, it's state is changed from "Rented" to "Available".
This could be done with a an update statement like "update Items set state=1 where id=#itemid". In application code we might have an enum that maps to the ItemState id:s. However, these contain hard coded values that could lead to maintenance issues later on. Say if a developer were to change the set of states but forgot to fix the related business logic layer...
What good methods or alternate designs are there for dealing with this type of design issues?
Links to related articles are also appreciated in addition to direct answers.
In my experience this is a case where you actually have to hardcode, preferably by using an Enum which integer values match the id's of your lookup tables. I can't see nothing wrong with saying that "1" is always "Available" and so forth.
Most systems that I've seen hard code the lookup table values and live with it. That's because, in practice, code tables rarely change as much as you think they might. And if they ever do change, you generally need to re-compile any programs that rely on that DDL anyway.
That said, if you want to make the code maintainable (a laudable goal), the best approach would be to externalize the values into a properties file. Then you can edit this file later without having to re-code your entire app.
The limiting factor here is that your app depends for its own internal state on the value you get from the lookup table, so that implies a certain amount of coupling.
For lookups where the app doesn't rely on that code, (for instance, if your code table stores a list of two-letter state codes for use in an address drop-down), then you can lazily load the codes into an object and access them only when needed. But that won't work for what you're doing.
When you have your lookup tables as well as enums defined in the code, then you always have an issue with keeping them in sync. There is not much that can be done here. Both live effectively in two different worlds and are generally unaware of each other.
You may wish to reject using lookup tables and only let your business logic operate these values. In that case you miss the options of relying on referential integrity to back you ap on the data integrity.
The other option is to build up your application in that way that you never need these values in your code. That means moving part of your business logic to the database layer, meaning, putting them in stored procedures and triggers. This will also have the benefit of being agnostic to the client. Anyone can invoke SPs and get assured the data will be kept in the consistence state, consistent with your business logic rules as well.
You'll need to have some predefined value that never changes, be it an integer, a string or something else.
In your case, the numerical value of the state is the state's surrogate PRIMARY KEY which should never change in a well-designed database.
If you're concerned about the consistency, use a CHAR code: A, R or B.
However, you should stick to it as well as to a numerical code so that A always means Available etc.
You database structure should be documented as well as the code is.
The answer depends entirely on the language you're using: solutions for this are not the same in Java, PHP, Smalltalk or even Assembler...
But let me tell you something: while it's true hard coded values are not a great thing, there are times in which you do need them. And this one is pretty much one of them: you need to declare in your code your current knowledge of the business logic, which includes these hard coded states.
So, in this particular case, I would hard code those values.
Don't overdesign it. Before trying to come up with a solution to this problem, you need to figure out if it's even a problem. Can you think of any legit hypothetical scenario where you would change the values in the itemState table? Not just "What if someone changes this table?" but "Someone wants to change this table in X way for Y reason, what effect would that have?". You need to stay realistic.
New state? you add a row, but it doesn't affect the existing ones.
Removing a state? You have to remove the references to it in code anyway.
Changing the id of a state? There is no legit reason to do that.
Changing the name of a state? There is no legit reason to do that.
So there really should be no reason to worry about this. But if you must have this cleanly maintainable in the case of irrational people who randomly decide to change Available to 2 because it just fits their Feng Shui better, make sure all tables are generated via a script which reads these values from a configuration file, and then make sure all code reads constants from that same configuration file. Then you have one definition location and any time you want to change the value you modify that configuration file instead of the DB/code.
I think this is a common problem and a valid concern, that's why I googled and found this article in the first place.
What about creating a public static class to hold all the lookup values, but instead of hard-coding, we initialize these values when the application is loaded and use names to refer them?
In my application, we tried this, it worked. Also you can do some checking, e.g. the number of different possible values of a lookup in code should be the same as in db, if it's not, log/email/etc. But I don't want to manually code this for the status of 40+ biz entities.
Moreover, this can be part of the bigger problem of OR mapping. We're exposed with too much details of the persistence layer, and thus we have to take care of it. With technologies like Entity Framework, we don't need to worry about the "sync" part because it's automated, am I right?
Thanks!
I've used a similar method to what you're describing - a table in the database with values and descriptions (useful for reporting, etc.) and an enum in code. I've handled the synchronization with a comment in code saying something like "these values are taken from table X in database ABC" so that the programmer knows the database needs to be updated. To prevent changes from the database side without the corresponding changes in code I set permissions on the table so that only certain people (who hopefully remember they need to change the code as well) have access.
The values have to be hard-coded, which effectively means that they can't be changed in the database, which means that storing them in the database is redundant.
Therefore, hard-code them and don't have a lookup table in the database. Instead store the items state directly in the items table.
You can structure your database so that your application doesn't actually have to care about the codes themselves, but rather the business rules behind them.
I have done both of the following:
Do one or more of your codes have a certain characteristic, such as IsAvailable, that the application cares about? If so, add it as a flag column to the code table, where those that match are set to true (or your DB's equivalent), and those that don't are set to false.
Do you need to use a specific, single code under a certain condition? You can create a singleton table, named something like EnvironmentSettings, with a column such as ItemStateIdOnReturn that's a foreign key to the ItemState table.
If I wanted to avoid declaring an enum in the application, I would use #2 to address the example in the question.
Whether you take this approach depends on your application's priorities. This type of structure comes at the cost of additional development and lookup overhead. Plus, if every individual code comes with its own business rules, then it's not practical to create one new column per required code.
But, it may be worthwhile if you don't want to worry about synchronizing your application with the contents of a code table.

Upgrade strategies for bad DB schema designs

I've shown up at a new job and discovered database which is in dire need of some help. There are many many things wrong with it, including
No foreign keys...anywhere. They're faked by using ints and managing the relationship in code.
Practically every field can be NULL, which isn't really true
Naming conventions for tables and columns are practically non-existent
Varchars which are storing concatenated strings of relational information
Folks can argue, "It works", which it is. But moving forward, it's a total pain to manage all of this with code and opens us up to bugs IMO. Basically, the DB is being used as a flat file since it's not doing a whole lot of work.
I want to fix this. The issues I see now are:
We have a lot of data (migration, possibly tricky)
All of the DB logic is in code (with migration comes big code changes)
I'm also tempted to do something "radical" like moving to a schema-free DB.
What are some good strategies when faced with an existing DB built upon a poorly designed schema?
Enforce Foreign Keys: If a relationship exists in the domain, then it should have a Foreign Key.
Renaming existing tables/columns is fraught with danger, especially if there are many systems accessing the Database directly. Gotchas include tasks that run only periodically; these are often missed.
Of Interest: Scott Ambler's article: Introduction To Database Refactoring
and Catalog of Database Refactorings
Views are commonly used to transition between changing data models because of the encapsulation. A view looks like a table, but does not exist as a finite object in the database - you can change what column is being returned for a given column alias as desired. This allows you to setup your codebase to use a view, so you can move from the old table structure to the new one without the application needing to be updated. But it means the view has to return the data in the existing format. For example - your current data model has:
SELECT t.column --a list of concatenated strings, assuming comma separated
FROM TABLE t
...so the first version of the view would be the query above, but once you created the new table that uses 3NF, the query for the view would use:
SELECT GROUP_CONCAT(t.column SEPARATOR ',')
FROM NEW_TABLE t
...and the application code would never know that anything changed.
The problem with MySQL is that the view support is limited - you can't use variables within it, nor can they have subqueries.
The reality to the changes you wish to make is effectively rewriting the application from the ground up. Moving logic from the codebase into the data model will drastically change how the application gets the data. Model-View-Controller (MVC) is ideal to implement with changes like these, to minimize the cost of future changes like these.
I'd say leave it alone until you really understand it. Then make sure you don't start with one of the Things You Should Never Do.
Read Scott Ambler's book on Refactoring Databases. It covers a good many techniques for how to go about improving a database - including the transitional measures needed to allow both old and new programs to work with the changing design.
Create a completely new schema and make sure that it is fully normalized and contains any unique, check and not null constraints etc that are required and that appropriate data types are used.
Prepopulate each table that fills the parent role in a foreign key relationship with a single 'Unknown' record.
Create an ETL (Extract Transform Load) process (I can recommend SSIS (SQL Server Integration Services) but there are plenty of others) that you can use to refill the new schema from the existing one on a regular basis. Use the 'Unknown' record as the parent of any orphaned records - there will be plenty ;). You will need to put some thought into how you will consolidate duplicate records - this will probably need to be on a case by case basis.
Use as many iterations as are necessary to refine your new schema (ensure that the ETL Process is maintained and run regularly).
Create views over the new schema that match the existing schema as closely as possible.
Incrementally modify any clients to use the new schema making temporary use of the views where necessary. You should be able to gradually turn off parts of the ETL process and eventually disable it completely.
First see how bad the code is related to the DB if it is all mixed in no DAO layer you shouldn't think about a rewrite but if there is a DAO layer then it would be time to rewrite that layer and DB along with it. If possible make the migration tool based on using the two DAOs.
But my guess is there is no DAO so you need to find what areas of the code you are going to be changing and what parts of the DB that relates to hopefully you can cut it up into smaller parts that can be updated as you maintain. Biggest deal is to get FKs in there and start checking for proper indexes there is a good chance they aren't being done correctly.
I wouldn't worry too much about naming until the rest of the db is under control. As for the NULLs if the program chokes on a value being NULL don't let it be NULL but if the program can handle it I wouldn't worry about it at this point in the future if it is doing a default value move that to the DB but that is way down the line from the sound of things.
Do something about the Varchars sooner rather then later. If anything make that the first pure background fix to the program.
The other thing to do is estimate the effort of each areas change and then add that price to the cost of new development on that section of code. That way you can fix the parts as you add new features.

Automatically update Last Modified Date in ASP.Net MVC app

General Info:
ASP.NET MVC app
using ADO.NET Entity Framework
SQL Server 2005
I have a few tables that more or less have a hierarchical structure of mostly one-to-many relationships. Most if not all of them have a Last Modified column and my question is simply how best to go about making sure that this column is updated properly throughout the hierarchy. I would prefer to have it done at the database level automatically, I'm just not sure how. A colleague mentioned triggers while at the same time mentioning that they might not be the best way to go. I don't know whether to do this within the class I'm using to manage the model or at the database level. I'd prefer not to have to keep updating each reference individually as that gets tedious and I'd have to make a bunch of separate functions for each level.
My questions then are:
Is there some way to do this at the database level with a stored procedure I could call?
If not, what would you suggest I do on the application side of things to best handle this programmatically?
If any more info would be helpful, I'll be happy to provide it, I had a difficult time trying to figure out how to even ask this question.
Another person had the same/similar question here. Their opinion use a trigger. Here's the full response:
Sql Server (Entity Framework): created_at , updated_at Columns
It depends on where you want your penalty to be. You can have it one of two ways.
Use a trigger to propagate changes up your hierarchy. This might be best when you write once and read a lot. eg: Update an address and the last modified gets set on the parent records as well as the address itself.
Just update one last modified and then query for everything when you need to see when something has been modified. This might be best if you write a lot, but read very little. eg: Only update the last modified on an address. When you query for changed records, query for records that have a last modified or records with addresses that have a last modified.
It's hard to say without knowing your situation. If you need to update all related last modified dates, then I would go the trigger route just to keep things simple.