Efficient way to update SQL 'relationship' table - sql

Say I have three properly normalised tables. One of people, one of qualifications and one mapping people to qualifications:
People:
id | Name
----------
1 | Alice
2 | Bob
Degrees:
id | Name
---------
1 | PhD
2 | MA
People-to-degrees:
person_id | degree_id
---------------------
1 | 2 # Alice has an MA
2 | 1 # Bob has a PhD
So then I have to update this mapping via my web interface. (I made a mistake. Bob has a BA, not a PhD, and Alice just got her B Eng.)
There are four possible states of these one-to-many relationship mappings:
was true before, should now be false
was false before, should now be true
was true before, should remain true
was false before, should remain false
what I don't want to do is read the values from four checkboxes, then hit the database four times to say "Did Bob have a BA before? Well he does now." "Did Bob have PhD before? Because he doesn't any more" and so on.
How do other people address this issue?
I'm curious to see if someone else arrives at the same solution I did.
UPDATE 1: onedaywhen suggests the same thing which occurred to me -- simply delete all the old entries, correct or not, and INSERT new ones.
UPDATE 2: potatopeelings suggests adding some code to the form which stores the original value of the field which can be compared with the new value on submit.

Logically, an UPDATE is a DELETE followed by an INSERT (consider that SQL Server triggers can access logical tables named inserted and deleted but there is no updated table). So you should be able to hit the database only twice i.e. first DELETE all rows (correct or otherwise) for Bob, second INSERT all correct rows for Bob.
If you want to hit the database only once, consider using Standard SQL's MERGE, assuming your DBMS supports it (SQL Server introduced it in 2008).

Assuming the UI is a checkbox grid (1. in Ismail comment in the question)
MA PhD
Alice x
Bob x
where the x represents checked boxes. I'd go with using the front-end script to send only the changes back to the server. Then doing the INSERTs and DELETEs in the People-to-degrees under a single transaction, or a MERGE (as pointed out in Ismail's link)
BEGIN TRAN
INSERT query
DELETE query
COMMIT
You would pass the INSERT (and DELETE) query a list of people ID, degree ID pairs like. For your example, the INSERT query would be the single pair (2,2) and for the DELETE query the single pair (2,1).

Related

Does it follow best-practice DB design to mix staff and customer details in 1 table?

I have a table called Users which is currently holding data on both Customers and Staff. It has their names and emails and passwords etc. It also has a field called TypeOfUserID which holds a value to say what type of user they are .e.g Customer or Staff
Would it be better to have two separate tables: Customers and Staff?
It seems like duplication because the fields are the same for both types of user. The only field I can get rid of is the TypeOfUserID column.
However, having them both in one table called Users means that in my front-end application I have to keep adding a clause to check what type of user they are. If for any reason I need to allow a different type of user access e.g. External Supplier then I have to manage the addition of TypeOfUserID in multiple places in the WHERE clauses.
Short Answer:
It depends. If your current needs are met, and you don't foresee this model needing to be changed for a long time / it would be easy to change if you had to, stick with it.
Longer answer:
If staff members are just a special case of user, I don't see any reason you'd want to change anything about the database structure. Yes, for staff-specific stuff you'd need to be sure the person was staff, but I don't really see any way around that- you always have to know they're staff, first.
If, however, you want finer-grained permissions than binary (a person can belong to the 'staff' group but that doesn't necessarily say whether or not they're in the users' group, for instance), you might want to change the database.
The easiest way to do that, of course, would be to have a unique ID associated with each user, and use that key to look up their group permissions in a different table.
Something like:
uid | group
------------
1 | users
1 | staff
2 | users
3 | staff
4 | users
5 | admin
Although you may or may not want an actual string for each group; most likely you'd want another level of indirection by having a 'groups' table. So, that table above would be a
'group_membership' table, and it could look more like:
uid | gid
------------
1 | 1
1 | 2
2 | 1
3 | 2
4 | 1
5 | 3
To go along with it, you'd have the 'groups' table, which would be:
gid | group
-------------
1 | users
2 | staff
3 | admin
But, again, that's only if you're imagining a larger number of roles and you want more flexibility. If you only ever plan on having 'users' and 'staff' and staff are just highly privileged users, all of that extra stuff would be a waste of your time.
However, if you want really fine grained permissions, with maximum flexibility, you can use the above to make them happen via a 'permissions' table:
gid | can_create_user | can_fire_people | can_ban_user
-------------------------------------------------------
1 | false | false | false
2 | true | false | true
3 | true | true | true
Some Example Code
Here's a working PostgreSQL example of getting permissions can_create_user and can_fire_people for a user with uid 1:
SELECT bool_or(can_create_user) AS can_create_user,
bool_or(can_fire_people) AS can_fire_people
FROM permissions
WHERE gid IN (SELECT gid FROM group_membership WHERE uid = 1);
Which would return:
can_create_user | can_fire_people
----------------------------------
true | false
because user 1 is in groups 1 and 2, and group 2 has the can_create_user permission, but
neither group has the can_fire_people permission.
((I know you're using SQL Server, but I only have access to a PostgreSQL server at the moment. Sorry about that. The difference should be minor, though.)
Notes
You'll want to make sure that uid and gid are primary keys in the users and groups table, and that there are foreign key constraints for those values in every other table which uses them; you don't want nonexistent groups to have permissions, or nonexistent users to be accidentally added to groups.
Alternatively
A graph database solves this problem pretty elegantly; you'd simply create edges linking users to groups, and edges linking groups to permissions. If you want to work with a technology that's currently sexy / buzzword compliant, might want to give that a try, depending on how enormous of a change that'd be.
Further information
The phrase you'll want to google is "access control". You'll probably want to implement access control lists (as outlined above) or something similar. Since this is primarily a security-related topic, you might also want to ask this question on sec.se, or at least look around there for related answers.
Even they look similar, they are logically from different areas. You will never need a union between those tables. But as your application develops, you will need to add more and more specific fields for these tables and they will became more different than similar.
You could have a seperate table for staff holding only id from the user table as the foreign key. If you do that, then any functionality related only to the staff member can query the staff table joining to the user table. This solution will also give you the fexibility for the future extension as any data releted only to the staff (for example department they work) member can be placed in the staff table.

T-SQL(MSSQL 2005)Reorder Scope_identity

Have a small table of 2 columns on MSSQL Server 2005 which contains a lot of information let's say about 1 billion records and it is constantly being written into.
Definition of the table is :
Create table Test(
id int identity(1,1) primary key ,
name varchar(30) )
Te PK is int which I choose it over uniqueidentifier for a number of reasons. The problem comes with the auto increment I want to reorganize the 'id' every time a row is deleted. The objective to this is leaving no gaps. The table is active and a lot of rows are written into it, so dropping a column is not an option also locking the table for a long time.
Quick example of what I want to accomplish:
I have this :
id | name
----+-------
1 | Roy
2 | Boss
5 | Jane
7 | Janet
I want to reorganize it so it will look like this :
id | name
----+-------
1 | Roy
2 | Boss
3 | Jane
4 | Janet
I am aware of DBCC CHECKIDENT (TableName, RESEED, position) but I am not sure it will benefit my case, because my table is big and it will take a lot of time to reposition also if I am not mistaken it will lock the table for a very long time. This table is not used by any other table. But if you like you can submit a suggestion to the same problem having in mind that the table is used by other tables.
EDIT 1 :
The objective is to prove that the rows follow each other in case a row is deleted so I can see it is deleted and reinstate it.I was thinking of adding a third column that will contain a hash value from the row above , and if the row above is deleted I would know that I have a gap and need to restore it ,in that case the order will not matter because I can compare the has codes and see if they match , so I can see which row follows which.But still I wonder is there a more clever and safer way of doing this ?Maybe involve something else rather then hash codes , some other way of proving that the rows follow each other , or that the new row contains parts of the previous row?
EDIT 2 :
I'll try to explain it one more time if I can't well then I don't want to waste anyone's time.
In the perfect case scenario there will be nothing missing from this table , but due to
server errors some data maybe deleted or some of my associates might be wasteful and delete it by fault.
I have logs and can recover that data, but I want to prove that the records are sequenced , that they follow
each other even if there is a server error and some of them are deleted but later on reinstated.
Is there a way to do this ?
Example:
well let's say that 7 is deleted and after that reinstated as 23 , how would you prove that 23 is 7, meaning that 23 came after 6 and before 8 ?
I would suggest not worrying about trying to reseed your Identity column -- let SQL Server maintain it's uniqueness for each row.
Generally this is wanted for presentation logic instead, in which case, you could use the ROW_NUMBER() analytic function:
SELECT Row_Number() Over (Order By Id) NewId,
Id, Name
FROM YourTable
I agree with others that this shouldn't typically be done, but if you absolutely want to do it you can utilize the quirky update to get it done quickly, should be something like this:
DECLARE #prev_id INT = 0
UPDATE Test
SELECT id = CASE WHEN id - #prev_id = 1 THEN id
ELSE #prev_id + 1
END
,#prev_id = id
FROM test
You should read about the limitations of quirky update, primarily the conditions that must be met to ensure consistent output. This is a good article but they annoyingly have you sign in, but you can find other resources: http://www.sqlservercentral.com/articles/T-SQL/68467/
Edit: Actually, in this case I think you could just use:
DECLARE #prev_id INT = 0
UPDATE Test
SELECT id = #prev_id + 1
,#prev_id = id
FROM Test
The way to do it is to not implement your proposed fix.
Leave the identity alone.
If identity 7 is deleted you know it is just after 6 and and just before 8.
If you need them to stay in the same order then simple.
Place unique constraint on name.
Don't delete the record.
Just add a bool column for active.

How to merge two identical database data to one?

Two customers are going to merge. They are both using my application, with their own database. About a few weeks they are merging (they become one organisation). So they want to have all the data in 1 database.
So the two database structures are identical. The problem is with the data. For example, I have Table Locations and persons (these are just two tables of 50):
Database 1:
Locations:
Id Name Adress etc....
1 Location 1
2 Location 2
Persons:
Id LocationId Name etc...
1 1 Alex
2 1 Peter
3 2 Lisa
Database 2:
Locations:
Id Name Adress etc....
1 Location A
2 Location B
Persons:
Id LocationId Name etc...
1 1 Mark
2 2 Ashley
3 1 Ben
We see that person is related to location (column locationId). Note that I have more tables that is referring to the location table and persons table.
The databases contains their own locations and persons, but the Id's can be the same. In case, when I want to import everything to DB2 then the locations of DB1 should be inserted to DB2 with the ids 3 and 4. The the persons from DB1 should have new Id 4,5,6 and the locations in the person table also has to be changed to the ids 4,5,6.
My solution for this problem is to write a query which handle everything, but I don't know where to begin.
What is the best way (in a query) to renumber the Id fields also having a cascade to the childs? The databases does not containing referential integrity and foreign keys (foreign keys are NOT defined in the database). Creating FKeys and Cascading is not an option.
I'm using sql server 2005.
You say that both customers are using your application, so I assume that it's some kind of "shrink-wrap" software that is used by more customers than just these two, correct?
If yes, adding special columns to the tables or anything like this probably will cause pain in the future, because you either would have to maintain a special version for these two customers that can deal with the additional columns. Or you would have to introduce these columns to your main codebase, which means that all your other customers would get them as well.
I can think of an easier way to do this without changing any of your tables or adding any columns.
In order for this to work, you need to find out the largest ID that exists in both databases together (no matter in which table or in which database it is).
This may require some copy & paste to get a lot of queries that look like this:
select max(id) as maxlocationid from locations
select max(id) as maxpersonid from persons
-- and so on... (one query for each table)
When you find the largest ID after running the query in both databases, take a number that's larger than that ID, and add it to all IDs in all tables in the second database.
It's very important that the number needs to be larger than the largest ID that already exists in both databases!
It's a bit difficult to explain, so here's an example:
Let's say that the largest ID in any table in both databases is 8000.
Then you run some SQL that adds 10000 to every ID in every table in the second database:
update Locations set Id = Id + 10000
update Persons set Id = Id + 10000, LocationId = LocationId + 10000
-- and so on, for each table
The queries are relatively simple, but this is the most work because you have to build a query like this manually for each table in the database, with the correct names of all the ID columns.
After running the query on the second database, the example data from your question will look like this:
Database 1: (exactly like before)
Locations:
Id Name Adress etc....
1 Location 1
2 Location 2
Persons:
Id LocationId Name etc...
1 1 Alex
2 1 Peter
3 2 Lisa
Database 2:
Locations:
Id Name Adress etc....
10001 Location A
10002 Location B
Persons:
Id LocationId Name etc...
10001 10001 Mark
10002 10002 Ashley
10003 10001 Ben
And that's it! Now you can import the data from one database into the other, without getting any primary key violations at all.
If this were my problem, I would probably add some columns to the tables in the database I was going to keep. These would be used to store the pk values from the other db. Then I would insert records from the other tables. For the ones with foreign keys, I would use a known value. Then I would update as required and drop the columns I added.

sql cross reference table

I amy trying to build a statistics table for marketing issues for specific site :
currently I planning to build table like this
Source_IP source_city Destination_IP destination_city
127.0.0.1 NY 242.212.12.1 Paris
242.212.12.1 Paris 127.0.0.1 NY
I want to prevent case like the above I.E the combination of (Source_ip , source_city) and (Destination_IP destination_city) should only be one record and not 2 how can i prevent this on sql?
Assuming your DBMS allows function based indexes, the following would do the trick:
CREATE UNIQUE INDEX idx_no_dupes
ON your_table (least(source_ip, destination_ip), greatest(source_ip, destination_ip));
The use of least and greatest will always index the tuples in the "same order", so it doesn't matter which value is in which column.
If your DBMS does not support least or greatest you will need to replace this with a CASE construct.
For MS SQL Server you could implement a check constraint
My first idea was, well, a first idea. So, there are better suggestions here, but another possibility, though probably not a great one, is to implement a trigger on insert or update to first search the table to make sure that those items don't exist elsewhere. If none is found, then the trigger allows the action, otherwise, it rolls back. Like I said, not great, but a possibility.
normalize your table ...
take one table for locations containing a location ID, the ip, and the city name
and have a 2nd table where you can put pairs of location ids a location id and a pair id ...
example:
table 1:
Location
ID IP City
1 127.0.0.1 Ny
2 242.212.12.1 Paris
table 2:
Pairs
ID Location
1 1
1 2

SQL Table Design - Identity Columns

SQL Server 2008 Database Question.
I have 2 tables, for arguments sake called Customers and Users where a single Customer can have 1 to n Users. The Customers table generates a CustomerId which is a seeded identity with a +1 increment on it. What I'm after in the Users table is a compound key comprising the CustomerId and a sequence number such that in all cases, the first user has a sequence of 1 and subsequent users are added at x+1.
So the table looks like this...
CustomerId (PK, FK)
UserId (PK)
Name
...and if for example, Customer 485 had three customers the data would look like...
CustomerId | UserId | Name
----------
485 | 1 | John
485 | 2 | Mark
485 | 3 | Luke
I appreciate that I can manually add the 1,2,3,...,n entry for UserId however I would like to get this to happen automatically on row insert in SQL, so that in the example shown I could effectively insert rows with the CustomerId and the Name with SQL Server protecting the Identity etc. Is there a way to do this through the database design itself - when I set UserId as an identity it runs 1 to infinity across all customers which isn't what I am looking for - have I got a setting wrong somewhere, or is this not an option?
Hope that makes sense - thanks for your help
I can think of no automatic way to do this without implementing a custom Stored Procedure that inserted the rows and checked to increment the Id appropriately, althouh others with more knowledge may have a better idea.
However, this smells to me of naturalising a surrogate key - which is not always a good idea.
More info here:
http://www.agiledata.org/essays/keys.html
That's not really an option with a regular identity column, but you could set up an insert trigger to auto populate the user id though.
The naive way to do this would be to have the trigger select the max user id from the users table for the customer id on the inserted record, then add one to that. However, you'll run into concurrency problems there if more than one person is creating a user record at the same time.
A better solution would be to have a NextUserID column on the customers table. In your trigger you would:
Start a transaction.
Increment the NextUserID for the customer (locking the row).
Select the updated next user id.
use that for the new User record.
commit the transaction.
This should ensure that simultaneous additions of users don't result in the same user id being used more than once.
All that said, I would recommend that you just don't do it. It's more trouble than it's worth and just smells like a bad idea to begin with.
So you want a generated user_id field that increments within the confines of a customer_id.
I can't think of one database where that concept exists.
You could implement it with a trigger. But my question is: WHY?
Surrogate keys are supposed to not have any kind of meaning. Why would you try to make a key that, simultaneously, is the surrogate and implies order?
My suggestions:
Create a date_created field, defaulting to getDate(). That will allow you to know the order (time based) in which each user_id was created.
Create an ordinal field - which can be updated by a trigger, to support that order.
Hope that helps.