NHibernate unique constraint remove if duplicate

NHibernate unique constraint remove if duplicate - sql

Is there a way to tell nHibernate to remove the duplicate value on a row's column that is uniquely constrained when updating a row with a duplicate value.
For example (OtherId and Animal is compositely unique constrained)
Id | OtherId | Animal
------------
1 | 1 | Dog
2 | 1 | Cat
3 | 1 | Bear
4 | 2 | Dog
Updating Id 3 to Dog, should result in this
Id | OtherId | Animal
1 | 1 | NULL
2 | 1 | Cat
3 | 1 | Dog
4 | 2 | Dog
EDIT:
I was able to solve my problem by creating an unique index in my table
CREATE UNIQUE INDEX [Id_OtherId_Animal_Index]
ON [dbo].[Animals] (OtherId, Animal)
WHERE Animal IS NOT NULL;
This way, I prevent insertion of duplicate (1, Dog) and still allow (2, Dog). This will also allow multiple (1, NULL) to be inserted.
Next, based on Frédéric's suggestion below, I edited my service layer to check BEFORE insertion if it will be a duplicate. If it will, then NULL the animal column of which would be uniquely constrained.

This answer has been outdated by substantial changes in OP question
I am quite sure there is no such feature in NHibernate, or any other ORM.
By the way, what should yield updating Id 3 to Cat after having updated it to Dog?
Id | Animal
1 |
2 |
3 | Cat
If that means that Id 1&2 now have the empty string value, that will be an unique constraint violation too.
If they have the null value, it depends then on the db engine being ANSI null compliant or not (null not considered equal to null). This is not the case of SQL Server, any version I know of, at least for the case of unique indexes. (I have not tested the unique constraint case.)
Anyway, this kind of logic, updating a row resulting in an additional update on some other rows, has to be handled explicitly.
You have many options for that:
Before each assignment to the Animal property, query the db for finding if another one has that name and take appropriate action on that another one. Take care of flushing right after having handling this other one, for ensuring it get handled prior to the actual update of the first one.
Or inject an event or an interceptor in NHibernate for catching any update on any entities, and add there your check for duplicates. Stack Overflow has examples of NHibernate events or interceptors, like this one.
But your case will probably bit a bit tough, since flushing some other changes while already flushing a session will probably cause troubles. You may have to directly tinker with the sql statement with IInterceptor.OnPrepareStatement by example, for injecting your other update first in it.
Or handle that with some trigger in DB.
Or detect a failed flush due to an unique constraint, analyze it and take appropriate action.
The third option is very likely easier and more robust than the others.

Related

SQL Server 2012 Query to extract subsets of data

I'm trying to 2nf some data:
Refid | Reason
------|---------
1 | Admission
1 | Advice and Support
1 | Behaviour
As you can see one person might have multiple reasons so i need another table to have the following format:
Refid | Reason1 | Reason2 | Reason3 | ETC...
------|-----------|--------------------|-----------
1 | Admission | Advice and Support | Behaviour
But I don't know how to write a query to extract the data and write it in a new table like this. The reasons don't have dates of other criteria that would make any reason to be in any special order. All reasons are assigned at the time of referral.
Thanks For yor Help.. SQL Server 2012

You are modelling a many to many relationship
You need 3 tables
- One for Reasons (say ReasonID and Reason)
- One for each entity identified by RefID (say RefID and ReferenceOtherData)
- An junction (or intersection) table with the keys (RefID, ReasonID)
This way,
Multiple reasons can apply to one Ref entity
Multiple Refs can have the same reason
You turn repeated columns into rows.

Efficiency between using a string as a table key and splitting it into two tables

I'm contemplating efficiency using a string as a key in a database table. I have an abstract class called Command in my application that has a command key hard coded into the derived class such as "CREATE_ADMIN_USER" or "DELETE_NEWS_POST".
There is some configuration I would like to store in a database such as which roles can execute those commands.
What would be more efficient:
CommandKey | Role
--------------------------
CREATE_ADMIN_USER | GodUser
DELETE_NEWS_POST | Admin
DELETE_NEWS_POST | Editor
OR
ID| CommandKey
--------------------------
1 | CREATE_ADMIN_USER
2 | DELETE_NEWS_POST
ID | Role
--------------------------
1 | GodUser
2 | Admin
3 | Editor
CommandKeyID | RoleId
---------------------
1 | 1
2 | 2
2 | 3
In the case of the first one, the downfall is that you have to store more characters for the command key string. This could become a space problem if you have 1000's of commands and multiple roles for many of them.
The downfall of the second one is that you have to create the join between the 3 tables.
Which one would be preferable between the two (if there is actually a preferable way): use more space OR join 3 tables.

Joins on the primary key of a table are usually fairly cheap. So I would expect the second alternative to be more efficient in most DBMSs, but since you haven't specified what DBMS you're using, it's hard to give a definitive answer.
Also, I believe that the second alternative is far more commonly used, for whatever reason. And I can definitively say is that the second option will be far more manageable in the event that you decide to change the name of a role, or the string of a command key.
For all these reasons, I recommend the second option.

Insert to particular location in Oracle DB table?

Suppose I have a table containing the following data:
Name | Things
-------------
Foo | 5
Bar | 3
Baz | 8
If I want to insert a row, so that the final state of the table is:
Name | Things
-------------
Foo | 5
Qux | 6
Bar | 3
Baz | 8
Is this possible?
I understand we don't typically rely on the order of rows in a table, but I've inherited some code that does precisely that. If I can insert to a location, I can avoid a significant refactor.

As you say, you can't rely on the order of rows in a table (without an ORDER BY).
I would probably refactor the code anyway - there's a chance it will break with no warning at some point in the future - surely better to deal with it now under controlled circumstances?

I would add a FLOAT column to the table and if you wanted to insert a row between the rows whose value in that column was 7.00000 and 8.000000 respectively, your new row would have value 7.50000. If you then wanted to insert a row between 7.00000 and 7.50000 the new row would get 7.25000, and so on. Then, when you order by that column, you get the columns in the desired order. Fairly easy to retrofit. But not as robust as one likes things to be. You should revoke all update/insert permissions from the table and handle I/O via a stored procedure.

Most efficient way of getting the next unused id

(related to Finding the lowest unused unique id in a list and Getting unused unique values on a SQL table)
Suppose I have a table containing on id column and some others (they don't make any difference here):
+-----+-----+
| id |other|
+-----+-----+
The id has numerical increasing value. My goal is to get the lowest unused id and creating that row. So of course for the first time I run it will return 0 and the the row of this row would have been created. After a few executions it will look like this:
+-----+-----+
| id |other|
+-----+-----+
| 0 | ... |
| 1 | ... |
| 2 | ... |
| 3 | ... |
| 4 | ... |
+-----+-----+
Fairly often some of these rows might get deleted. Let's assume the rows with the id's of 1 and 3 are removed. No the table will look like this:
+-----+-----+
| id |other|
+-----+-----+
| 0 | ... |
| 2 | ... |
| 4 | ... |
+-----+-----+
If I now run again the query it would like to get back the id 1 and this row should be created:
| id |other|
+-----+-----+
| 0 | ... |
| 1 | ... |
| 2 | ... |
| 4 | ... |
+-----+-----+
The next times the query runs it should return the id's 3, 5, 6, etc.
What's the most effective way to run those kinds of query as I need to execute them fairly often in a second (it is fair to assume that the the id's are the only purpose of the table)? Is it possible to get the next unused row with one query? Or is it easier and faster by introducing another table which keeps track of the unused id's?
If it is significantly faster it is also possible to get a way to reuse any hole in the table provided that all numbers get reused at some time.
Bonus question: I plan to use SQLite for this kind of storing information as I don't need a database except for storing these id's. Is there any other free (as in speech) server which can do this job significantly faster?

I think I'd create a trigger on delete, and insert the old.id in a separate table.
Then you can select min(id) from that table to get the lowest id.
disclaimer: i don't know what database engine you use, so i don't know if triggers are available to you.

Like Dennis Haarbrink said; a trigger on delete and another on insert :
The trigger on delete would take the deleted id and insert it in a id pool table (only one column id)
The trigger on before insert would check if an id value is provided, otherwise it just query the id pool table (ex: SELECT MIN(id) FROM id_pool_table) and assign it (i.g. deletes it from the id_pool_table)

Normally you'd let the database handle assigning the ids. Is there a particular reason you need to have the id's sequential rather than unique? Can you, instead, timestamp them, and just number them when you display them? Or make a separate column for the sequential id, and renumber them?
Alternatively, you could not delete the rows themselves, but rather, mark them as deleted with a flag in a column, and then re-use the id's of the marked rows by finding the lowest numbered 'deleted' row, and reusing that id.

The database doesn't care if the values are sequential, only that they are unique. The desire to have your id values sequential is purely cosmetic, and if you are exposing this value to users -- it should not be your primary key, nor should there be any referential integrity based on the value because a client could change the format if desired.
The fastest and safest way to deal with the id value generation is to rely on native functionality that gives you a unique integer value (IE: SQLite's autoincrement). Using triggers only adds overhead, using MAX(id) +1 is extremely risky...
Summary
Ideally, use the native unique integer generator (SQLite/MySQL auto_increment, Oracle/PostgreSQL sequences, SQL Server IDENTITY) for the primary key. If you want a value that is always sequential, add an additional column to store that sequential value & maintain it as necessary. MySQL/SQLite/SQL Server unique integer generation only allows one per column - sequences are more flexible.

redundant column

I have a database that has two tables, these tables look like this
codes
id | code | member_id
1 | 123 | 2
2 | 234 | 1
3 | 345 |
4 | 456 | 3
members
id | code_id | other info
1 | 2 | blabla
2 | 1 | blabla
3 | 4 | blabla
the basic idea is that if a code is taken then its member id field is filled in, however this is creating a circle link (members points to codes, codes points to members) is there a different way of doing this? is this actually a bad thing?
Update
To answer your questions there are three different code tables with approx 3.5 million codes each, each table is searched depending on different criteria, if the member_id column is empty then the code is unclaimed, else, the code is claimed, this is done so that when we are searching the database we do not need to include another table to tell if it it claimed.
the members table contains the claimants for every single code, so all 10.5 million members
the additional info has things like mobile, flybuys.
the mobile is how we identify the member, but each entry is considered a different member.

It's a bad thing because you can end up with anomalies. For example:
codes
id | code | member_id
1 | 123 | 2
members
id | code_id | other info
2 | 4 | blabla
See the anomaly? Code 1 references its corresponding member, but that member doesn't reference the same code in return. The problem with anomalies is you can't tell which one is the correct, intended reference and which one is a mistake.
Eliminating redundant columns reduces the chance for anomalies. This is a simple process that follows a few very well defined rules, called rules of normalization.
In your example, I would drop the codes.member_id column. I infer that a member must reference a code, but a code does not necessarily reference a member. So I would make members.code_id reference codes.id. But it could go the other way; you don't give enough information for the reader to be sure (as #OMG Ponies commented).

Yeah, this is not good because it presents opportunities for data integrity problems. You've got a one-to-one relationship, so either remove Code_id from the members table, or member_id from the codes table. (in this case it seems like it would make more sense to drop code_id from members since it sounds like you're more frequently going to be querying codes to see which are not assigned than querying members to see which have no code, but you can make that call)

You could simply drop the member_id column and use a foreign key relationship (or its absence) to signify the relationship or lack thereof. The code_id column would then be used as a foreign key to the code. Personally, I do think it's bad simply because it makes it more work to ensure that you don't have corrupt relationships in the DB -- i.e., you have to check that the two columns are synchronized between the tables -- and it doesn't really add anything in the general case. If you are running into performance problems, then you may need to denormalize, but I'd wait until it was definitely a problem (and you'd likely replicate more than just the id in that case).

It depends on what you're doing. If each member always gets exactly one unique code then just put the actual code in the member table.
If there are a set of codes and several members share a code (but each member still has just one) then remove the member_id from the codes table and only store the unique codes. Access a specific code through a member. (you can still join the code table to search on codes)
If a member can have multiple codes then remove the code_id from the member table and the member_id from the code table can create a third table that relates members to codes. Each record in the member table should be a unique record and each record in the code table should be a unique record.

What is the logic behind having the member code in the code table?
It's unnecessary since you can always just do a join if you need both pieces of information.
By having it there you create the potential for integrity issues since you need to update BOTH tables whenever an update is made.

Yes this is a bad idea. Never set up a database to have circular references if you can help it. Now any change has to be made both places and if one place is missed, you have a severe data integrity problem.
First question can each code be assigned to more than one member? Or can each member have more than one code? (this includes over time as well as at any one moment if you need historical records of who had what code when))If the answer to either is yes, then your current structure cannot work. If the answer to both is no, why do you need two tables?
If you can have mulitple codes and multiple members you need a bridging table that has memberid and code id. If you can have multiple members assigned one code, put the code id in the members table. If it is the other way it should be the memberid in the code table. Then properly set up the foreign key relationship.

#Bill Karwin correctly identifies this as a probably design flaw which will lead to anomalies.
Assuming code and member are distinct entities, I would create a thrid table...
What is the relationship between a code and member called? An oath? If this is a real life relationship, someone with domain knowledge in the business will be able to give it a name. If not look for further design flaws:
oaths
code_id | member_id
1 | 2
2 | 1
4 | 3
The data suggest that a unique constraint is required for (code_id, member_id).
Once the data is 'scrubbed', drop the columns codes.member_id and members.code_id.

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas