unique pair in a "friendship" database

unique pair in a "friendship" database - sql

I'm posting this question which is somewhat a summary of my other question.
I have two databases:
1) db_users.
2) db_friends.
I stress that they're stored in separate databases on different servers and therefore no foreign keys can be used.
In 'db_friends' I have the table 'tbl_friends' which has the following columns:
- id_user
- id_friend
Now how do I make sure that each pair is unique at this table ('tbl_friends')?
I'd like to enfore that at the table level, and not through a query.
For example these are invalid rows:
1 - 2
2 - 1
I'd like this to be impossible to add.
Additionally - how would I seach for all of the friends of user 713 while he could be mentioned, on some friendship rows, at the second column ('id_friend')?

You're probably not going to be able to do this at the database level -- your application code is going to have to do this. If you make sure that your tbl_friends records always go in with (lowId, highId), then a typical PK/Unique Index will solve the duplicate problem. In fact, I'd go so far to rename the columns in your tbl_friends to (id_low, id_high) just to reinforce this.
Your query to find anything with user 713 would then be something like
SELECT id_low AS friend FROM tbl_friends WHERE (id_high = ?)
UNION ALL
SELECT id_high AS friend FROM tbl_friends WHERE (id_low = ?)
For efficiency, you'd probably want to index it forward and backward -- that is by (id_user, id_friend) and (id_friend, id_user).
If you must do this at a DB level, then a stored procedure to swap arguments to (low,high) before inserting would work.

You'd have to use a trigger to enforce that business rule.
Making the two columns in tbl_friends the primary key (unique constraint failing that) would only ensure there can't be duplicates of the same set: 1, 2 can only appear once but 2, 1 would be valid.
how would I seach for all of the friends of user 713 while he could be mentioned, on some friendship rows, at the second column ('id_friend')?
You could use an IN:
WHERE 713 IN (id_user, id_friend)
..or a UNION:
JOIN (SELECT id_user AS user
FROM TBL_FRIENDS
UNION ALL
SELECT id_friend
FROM TBL_FRIENDS) x ON x.user = u.user

Well, a unique constraint on the pair of columns will get you half way there. I think the easiest way to ensure you don't get the reversed version would be to add a constraint ensuring that id_user < id_friend. You will need to compensate for this ordering at insertion time, but it will get you the database Level constraint you desire without duplicating data or relying on foreign keys.
As for the second question, to find all friends for id=1 you could select id_user, id_friend from tbl_friend where id_user = 1 or id_friend = 1 and then in your client code throw out all the 1's regardless of column.

One way you could do it is to store the two friends on two rows:
CREATE TABLE FriendPairs (
pair_id INT NOT NULL,
friend_id INT NOT NULL,
PRIMARY KEY (pair_id, friend_id)
);
INSERT INTO FriendPairs (pair_id, friend_id)
VALUES (1234, 317), (1234, 713);
See? It doesn't matter which order you insert them, because both friends go in the friend_id column. So you can enforce uniqueness easily.
You can also query easily for friends of 713:
SELECT f2.friend_id
FROM FriendPairs AS f1
JOIN FriendPairs AS f2 ON (f1.pair_id = f2.pair_id)
WHERE f1.friend_id = 713

Related

Maintaining logical consistency with a soft delete, whilst retaining the original information

I have a very simple table students, structure as below, where the primary key is id. This table is a stand-in for about 20 multi-million row tables that get joined together a lot.
+----+----------+------------+
| id | name | dob |
+----+----------+------------+
| 1 | Alice | 01/12/1989 |
| 2 | Bob | 04/06/1990 |
| 3 | Cuthbert | 23/01/1988 |
+----+----------+------------+
If Bob wants to change his date of birth, then I have a few options:
Update students with the new date of birth.
Positives: 1 DML operation; the table can always be accessed by a single primary key lookup.
Negatives: I lose the fact that Bob ever thought he was born on 04/06/1990
Add a column, created date default sysdate, to the table and change the primary key to id, created. Every update becomes:
insert into students(id, name, dob) values (:id, :name, :new_dob)
Then, whenever I want the most recent information do the following (Oracle but the question stands for every RDBMS):
select id, name, dob
from ( select a.*, rank() over ( partition by id
order by created desc ) as "rank"
from students a )
where "rank" = 1
Positives: I never lose any information.
Negatives: All queries over the entire database take that little bit longer. If the table was the size indicated this doesn't matter but once you're on your 5th left outer join using range scans rather than unique scans begins to have an effect.
Add a different column, deleted date default to_date('2100/01/01','yyyy/mm/dd'), or whatever overly early, or futuristic, date takes my fancy. Change the primary key to id, deleted then every update becomes:
update students x
set deleted = sysdate
where id = :id
and deleted = ( select max(deleted) from students where id = x.id );
insert into students(id, name, dob) values ( :id, :name, :new_dob );
and the query to get out the current information becomes:
select id, name, dob
from ( select a.*, rank() over ( partition by id
order by deleted desc ) as "rank"
from students a )
where "rank" = 1
Positives: I never lose any information.
Negatives: Two DML operations; I still have to use ranked queries with the additional cost or a range scan rather than a unique index scan in every query.
Create a second table, say student_archive and change every update into:
insert into student_archive select * from students where id = :id;
update students set dob = :newdob where id = :id;
Positives: Never lose any information.
Negatives: 2 DML operations; if you ever want to get all the information ever you have to use union or an extra left outer join.
For completeness, have a horribly de-normalised data-structure: id, name1, dob, name2, dob2... etc.
If number 1 is not an option if I never want to lose any information and always do a soft delete. Number 5 can be safely discarded as causing more trouble than it's worth.
I'm left with options 2, 3 and 4 with their attendant negative aspects. I usually end up using option 2 and the horrific 150 line (nicely-spaced) multiple sub-select joins that go along with it.
tl;dr I realise I'm skating close to the line on a "not constructive" vote here but:
What is the optimal (singular!) method of maintaining logical consistency while never deleting any data?
Is there a more efficient way than those I have documented? In this context I'll define efficient as "less DML operations" and / or "being able to remove the sub-queries". If you can think of a better definition when (if) answering please feel free.

I'd stick to #4 with some modifications.No need to delete data from original table ; it's enough to copy old values to archive table before updating(or before deleting) original record. That's can be easily done with row level trigger. Retrieving all information in my opinion is not a frequent operation, and I don't see anything wrong with extra join /union. Also, you can define a view , so all queries will be straightforward from end user perspective.

DB design: Should I use constraints within a table or a new table

I inherited a large existing DB and I'd like to know if I should refactor it because 95% of my queries require joining at least 4 tables.
The DB has a 5 tables that only have an ID and Name column with less than 20 rows. I assume the author did this so he could change the names there and not change them in the other tables, but many of those tables are only referenced in one other table. Should I refactor these small 2 column tables into the a larger table and add a constraint to the column so users can't input incorrect names instead of having seperate tables?

Resist that urge. From your description I can deduce that the existing design is solid and probably well normalized. Your refactoring may actually undo a good db structure.
If you are bothered by writing a lot of joins in your queries I would suggest creating views to mitigate the boilerplate.
...the author did this so he could change the names there not change
them in the other tables...
That is evidence of good design and exactly what you should strive for in a normalized database.

no.
your db is normalized and proper.
and you save space, lookup time, indexing for storing an int rather then a varchar name
small tables are optimized away if they are properly keyed.

Sounds like what you have are lookup tables. Let me tell you waht happens when you decide to put all lookups in one table with an additonal column to specify which type it is. Fisrt instead of joining to 4 different tables in one query, you have to join to the same table 4 times. There ends up being more contention for the resources in the "one table to rule them all". Further, you lose FK constraints. That means you eventually lose data integrity. So if one lookup is state, nothing wil prevent you from putting the id values for a different lookup for customer type in the stateid column in the customeraddress table. When the lookups are separate you con enforce that relationship.
Suppose instead of one big table you decide to have a constraint on the column for customer type. Constraints are now enforced but you have a problem when they need to change. Now you have to alter the database in order to add a new type. Again usually this is a very bad idea espcially when the table gets large.

Short story: Replacing strings with ID numbers has nothing to do with normalization. Using natural keys in your case might improve performance. In my tests, queries using natural keys were faster by 1 or 2 orders of magnitude.
You might have accepted an answer too quickly.
The DB has a 5 tables that only have an ID and Name column with less
than 20 rows.
I'm assuming these tables have a structure something like this.
create table a (
a_id integer primary key,
a_name varchar(30) not null unique
);
create table b (...
-- Just like a
create table your_data (
yet_another_id integer primary key,
a_id integer not null references a (a_id),
b_id integer not null references b (b_id),
c_id integer not null references c (c_id),
d_id integer not null references d (d_id),
unique (a_id, b_id, c_id, d_id),
-- other columns go here
);
And it's obvious that your_data will require four joins (at least) to get usable information from it.
But the names in table a, b, c, and d are unique (ahem), so you can use the unique names as targets for foreign key references. You could rewrite the table your_data like this.
create table your_data (
yet_another_id integer primary key,
a_name varchar(30) not null references a (a_name),
b_name varchar(30) not null references b (b_name),
c_name varchar(30) not null references c (c_name),
d_name varchar(30) not null references d (d_name),
unique (a_name, b_name, c_name, d_name),
-- other columns go here
);
Replacing id numbers with strings doesn't change the normal form. (And replacing strings with id numbers doesn't have anything to do with normalization.) If the original table were in 5NF, then this rewrite will be in 5NF, too.
But what about performance? Aren't id numbers plus joins supposed to be faster than strings?
I tested that by inserting 20 rows into each of the four tables a, b, c, and d. Then I generated a Cartesian product to fill one test table written with id numbers, and another using the names. (So, 160K rows in each.) I updated the statistics, and ran a couple of queries.
explain analyze
select a.a_name, b.b_name, c.c_name, d.d_name
from your_data_id
inner join a on (a.a_id = your_data_id.a_id)
inner join b on (b.b_id = your_data_id.b_id)
inner join c on (c.c_id = your_data_id.c_id)
inner join d on (d.d_id = your_data_id.d_id)
...
Total runtime: 808.472 ms
explain analyze
select a_name, b_name, c_name, d_name
from your_data
Total runtime: 132.098 ms
The query using id numbers takes a lot longer to execute. I used a WHERE clause on all four columns, which returns a single row.
explain analyze
select a.a_name, b.b_name, c.c_name, d.d_name
from your_data_id
inner join a on (a.a_id = your_data_id.a_id and a.a_name = 'a one')
inner join b on (b.b_id = your_data_id.b_id and b.b_name = 'b one')
inner join c on (c.c_id = your_data_id.c_id and c.c_name = 'c one')
inner join d on (d.d_id = your_data_id.d_id and d.d_name = 'd one)
...
Total runtime: 14.671 ms
explain analyze
select a_name, b_name, c_name, d_name
from your_data
where a_name = 'a one' and b_name = 'b one' and c_name = 'c one' and d_name = 'd one';
...
Total runtime: 0.133 ms
The tables using id numbers took about 100 times longer to query.
Tests used PostgreSQL 9.something.
My advice: Try before you buy. I mean, test before you invest. Try rewriting your data table to use natural keys. Think carefully about ON UPDATE CASCADE and ON DELETE CASCADE. Test performance with representative sample data. Edit your original question and let us know what you found.

order by field with more than 10000 ids

I need to do specific ordering with use of order by field.
select * from table order by field(id,3,4,1,2.......upto 10000 ids)
As the ordering required is not gettable from SQL then how much it affect as per performance and is it feasible to do?
Updates from the comments:
Ordering depends on user and category IDs and can be anything the user wants.
The ordering specification changes (about) daily.
So, we need a custom ordering that depends on the user and category and this ordering needs to change daily.

The easiest way would be to put your ordering in a separate table (called ordering_table in this example):
id | position
----+----------
1 | 11
2 | 42
3 | 23
etc.
The above would mean "put an id of 1 at position 11, 2 at position 42, 3 at position 23, ...". Then you can join that ordering table in:
SELECT t.id, t.col1, t.col2
FROM some_table t
JOIN ordering_table o ON (t.id = o.id)
ORDER BY o.position
Where ordering_table is the table (as above) that defines your strange ordering. This approach simply represents your ordering function as a table (any function with a finite domain is, essentially, just a table after all).
This "ordering table" approach should work fine as long as the ordering table is complete.
If you only need this strange ordering in one place then you could merge the position column into your main table and add NOT NULL and UNIQUE constraints on that column to make sure you cover everything and have a consistent ordering.
Further commenting indicates that you want different orderings for different users and categories and that the ordering will change on a daily basis. You could make separate tables for each condition (which would lead to a combinatorial explosion) or, as Mikael Eriksson and ypercube suggest, add a couple more columns to the ordering table to hold the user and category:
CREATE TABLE ordering_table (
thing_id INT NOT NULL,
position INT NOT NULL,
user_id INT NOT NULL,
category_id INT NOT NULL
);
The thing_id, user_id, and category_id would be foreign keys to their respective tables and you'd probably want to index all the columns in ordering_table but a couple minutes of looking at the query plans would be worthwhile to see if the indexes get used would be worthwhile. You could also make all four columns the primary key to avoid duplicates. Then, the lookup query would be something like this:
SELECT t.id, t.col1, t.col2
FROM some_table t
LEFT JOIN ordering_table o
ON (t.id = o.thing_id AND o.user_id = $user AND o.category_id = $cat)
ORDER BY COALESCE(o.position, 99999)
Where $user and $cat are the user and category IDs (respectively). Note the change to a LEFT JOIN and the addition of COALESCE to allow for missing rows in ordering_table, these changes will push anything that doesn't have a specified position in the order to the bottom of the list rather than removing them from the results completely.

TSQL foreign keys on views?

I have a SQL-Server 2008 database and a schema which uses foreign key constraints to enforce referential integrity. Works as intended. Now the user creates views on the original tables to work on subsets of the data only. My problem is that filtering certain datasets in some tables but not in others will violate the foreign key constraints.
Imagine two tables "one" and "two". "one" contains just an id column with values 1,2,3. "Two" references "one". Now you create views on both tables. The view for table "two" doesn't filter anything while the view for table "one" removes all rows but the first. You'll end up with entries in the second view that point nowhere.
Is there any way to avoid this? Can you have foreign key constraints between views?
Some Clarification in response to some of the comments:
I'm aware that the underlying constraints will ensure integrity of the data even when inserting through the views. My problem lies with the statements consuming the views. Those statements have been written with the original tables in mind and assume certain joins cannot fail. This assumption is always valid when working with the tables - but views potentially break it.
Joining/checking all constraints when creating the views in the first place is annyoing because of the large number of referencing tables. Thus I was hoping to avoid that.

I love your question. It screams of familiarity with the Query Optimizer, and how it can see that some joins are redundant if they serve no purpose, or if it can simplify something knowing that there is at most one hit on the other side of a join.
So, the big question is around whether you can make a FK against the CIX of an Indexed View. And the answer is no.
create table dbo.testtable (id int identity(1,1) primary key, val int not null);
go
create view dbo.testview with schemabinding as
select id, val
from dbo.testtable
where val >= 50
;
go
insert dbo.testtable
select 20 union all
select 30 union all
select 40 union all
select 50 union all
select 60 union all
select 70
go
create unique clustered index ixV on dbo.testview(id);
go
create table dbo.secondtable (id int references dbo.testview(id));
go
All this works except for the last statement, which errors with:
Msg 1768, Level 16, State 0, Line 1
Foreign key 'FK__secondtable__id__6A325CF7' references object 'dbo.testview' which is not a user table.
So the Foreign key must reference a user table.
But... the next question is about whether you could reference a unique index that is filtered in SQL 2008, to achieve a view-like FK.
And still the answer is no.
create unique index ixUV on dbo.testtable(val) where val >= 50;
go
This succeeded.
But now if I try to create a table that references the val column
create table dbo.thirdtable (id int identity(1,1) primary key, val int not null check (val >= 50) references dbo.testtable(val));
(I was hoping that the check constraint that matched the filter in the filtered index might help the system understand that the FK should hold)
But I get an error saying:
There are no primary or candidate keys in the referenced table 'dbo.testtable' that matching the referencing column list in the foreign key 'FK__thirdtable__val__0EA330E9'.
If I drop the filtered index and create a non-filtered unique non-clustered index, then I can create dbo.thirdtable without any problems.
So I'm afraid the answer still seems to be No.

It took me some time to figure out the misunderstaning here -- not sure if I still understand completely, but here it is.
I will use an example, close to yours, but with some data -- easier for me to think in these terms.
So first two tables; A = Department B = Employee
CREATE TABLE Department
(
DepartmentID int PRIMARY KEY
,DepartmentName varchar(20)
,DepartmentColor varchar(10)
)
GO
CREATE TABLE Employee
(
EmployeeID int PRIMARY KEY
,EmployeeName varchar(20)
,DepartmentID int FOREIGN KEY REFERENCES Department ( DepartmentID )
)
GO
Now I'll toss some data in
INSERT INTO Department
( DepartmentID, DepartmentName, DepartmentColor )
SELECT 1, 'Accounting', 'RED' UNION
SELECT 2, 'Engineering', 'BLUE' UNION
SELECT 3, 'Sales', 'YELLOW' UNION
SELECT 4, 'Marketing', 'GREEN' ;
INSERT INTO Employee
( EmployeeID, EmployeeName, DepartmentID )
SELECT 1, 'Lyne', 1 UNION
SELECT 2, 'Damir', 2 UNION
SELECT 3, 'Sandy', 2 UNION
SELECT 4, 'Steve', 3 UNION
SELECT 5, 'Brian', 3 UNION
SELECT 6, 'Susan', 3 UNION
SELECT 7, 'Joe', 4 ;
So, now I'll create a view on the first table to filter some departments out.
CREATE VIEW dbo.BlueDepartments
AS
SELECT * FROM dbo.Department
WHERE DepartmentColor = 'BLUE'
GO
This returns
DepartmentID DepartmentName DepartmentColor
------------ -------------------- ---------------
2 Engineering BLUE
And per your example, I'll add a view for the second table which does not filter anything.
CREATE VIEW dbo.AllEmployees
AS
SELECT * FROM dbo.Employee
GO
This returns
EmployeeID EmployeeName DepartmentID
----------- -------------------- ------------
1 Lyne 1
2 Damir 2
3 Sandy 2
4 Steve 3
5 Brian 3
6 Susan 3
7 Joe 4
It seems to me that you think that Employee No 5, DepartmentID = 3 points to nowhere?
"You'll end up with entries in the
second view that point nowhere."
Well, it points to the Department table DepartmentID = 3, as specified with the foreign key. Even if you try to join view on view nothing is broken:
SELECT e.EmployeeID
,e.EmployeeName
,d.DepartmentID
,d.DepartmentName
,d.DepartmentColor
FROM dbo.AllEmployees AS e
JOIN dbo.BlueDepartments AS d ON d.DepartmentID = e.DepartmentID
ORDER BY e.EmployeeID
Returns
EmployeeID EmployeeName DepartmentID DepartmentName DepartmentColor
----------- -------------------- ------------ -------------------- ---------------
2 Damir 2 Engineering BLUE
3 Sandy 2 Engineering BLUE
So nothing is broken here, the join simply did not find matching records for DepartmentID <> 2 This is actually the same as if I join tables and then include filter as in the first view:
SELECT e.EmployeeID
,e.EmployeeName
,d.DepartmentID
,d.DepartmentName
,d.DepartmentColor
FROM dbo.Employee AS e
JOIN dbo.Department AS d ON d.DepartmentID = e.DepartmentID
WHERE d.DepartmentColor = 'BLUE'
ORDER BY e.EmployeeID
Returns again:
EmployeeID EmployeeName DepartmentID DepartmentName DepartmentColor
----------- -------------------- ------------ -------------------- ---------------
2 Damir 2 Engineering BLUE
3 Sandy 2 Engineering BLUE
In both cases joins do not fail, they simply do as expected.
Now I will try to break the referential integrity through a view (there is no DepartmentID= 127)
INSERT INTO dbo.AllEmployees
( EmployeeID, EmployeeName, DepartmentID )
VALUES( 10, 'Bob', 127 )
And this results in:
Msg 547, Level 16, State 0, Line 1
The INSERT statement conflicted with the FOREIGN KEY constraint "FK__Employee__Depart__0519C6AF". The conflict occurred in database "Tinker_2", table "dbo.Department", column 'DepartmentID'.
If I try to delete a department through the view
DELETE FROM dbo.BlueDepartments
WHERE DepartmentID = 2
Which results in:
Msg 547, Level 16, State 0, Line 1
The DELETE statement conflicted with the REFERENCE constraint "FK__Employee__Depart__0519C6AF". The conflict occurred in database "Tinker_2", table "dbo.Employee", column 'DepartmentID'.
So constraints on underlying tables still apply.
Hope this helps, but then maybe I misunderstood your problem.

Peter already hit on this, but the best solution is to:
Create the "main" logic (that filtering the referenced table) once.
Have all views on related tables join to the view created for (1), not the original table.
I.e.,
CREATE VIEW v1 AS SELECT * FROM table1 WHERE blah
CREATE VIEW v2 AS SELECT * FROM table2 WHERE EXISTS
(SELECT NULL FROM v1 WHERE v1.id = table2.FKtoTable1)
Sure, syntactic sugar for propagating filters for views on one table to views on subordinate tables would be handy, but alas, it's not part of the SQL standard. That said, this solution is still good enough -- efficient, straightforward, maintainable, and guarantees the desired state for the consuming code.

If you try to insert, update or delete data through a view, the underlying table constraints still apply.

Something like this in View2 is probably your best bet:
CREATE VIEW View2
AS
SELECT
T2.col1,
T2.col2,
...
FROM
Table2 T2
INNER JOIN Table1 T1 ON
T1.pk = T2.t1_fk

If rolling over tables so that Identity columns will not clash, one possibility would be to use a lookup table that referenced the different data tables by Identity and a table reference.
Foreign keys on this table would work down the line for referencing tables.
This would be expensive in a number of ways
Referential integrity on the lookup table would have to be be enforced using triggers.
Additional storage of the lookup table and indexing in addition to the data tables.
Data reading would almost certainly involve a Stored Procedure or three to execute a filtered UNION.
Query plan evaluation would also have a development cost.
The list goes on but it might work on some scenarios.

Using Rob Farley's schema:
CREATE TABLE dbo.testtable(
id int IDENTITY(1,1) PRIMARY KEY,
val int NOT NULL);
go
INSERT dbo.testtable(val)
VALUES(20),(30),(40),(50),(60),(70);
go
CREATE TABLE dbo.secondtable(
id int NOT NULL,
CONSTRAINT FK_SecondTable FOREIGN KEY(id) REFERENCES dbo.TestTable(id));
go
CREATE TABLE z(n tinyint PRIMARY KEY);
INSERT z(n)
VALUES(0),(1);
go
CREATE VIEW dbo.SecondTableCheck WITH SCHEMABINDING AS
SELECT 1 n
FROM dbo.TestTable AS t JOIN dbo.SecondTable AS s ON t.Id = s.Id
CROSS JOIN dbo.z
WHERE t.Val < 50;
go
CREATE UNIQUE CLUSTERED INDEX NoSmallIds ON dbo.SecondTableCheck(n);
go
I had to create a tiny helper table (dbo.z) in order to make this work, because indexed views cannot have self joins, outer joins, subqueries, or derived tables (and TVCs count as derived tables).

Another approach, depending on your requirements, would be to use a stored procedure to return two recordsets. You pass it filtering criteria and it uses the filtering criteria to query table 1, and then those results can be used to filter the query to table 2 so that it's results are also consistent. Then you return both results.

You could stage the filtered table 1 data to another table. The contents of this staging table are your view 1, and then you build view 2 via a join of the staging table and table 2. This way the proccessing for filtering table 1 is done once and reused for both views.
Really what it boils down to is that view 2 has no idea what kind of filtering you performed in view 1, unless you tell view 2 the filtering criteria, or make it somehow dependent on the results of view 1, which means emulating the same filtering that occurs on view1.
Constraints don't perform any kind of filtering, they only prevent invalid data, or cascade key changes and deletes.

No, you can't create foreign keys on views.
Even if you could, where would that leave you? You would still have to declare the FK after creating the view. Who would declare the FK, you or the user? If the user is sophisticated enough to declare a FK, why couldn't he add an inner join to the referenced view? eg:
create view1 as select a, b, c, d from table1 where a in (1, 2, 3)
go
create view2 as select a, m, n, o from table2 where a in (select a from view1)
go
vs:
create view1 as select a, b, c, d from table1 where a in (1, 2, 3)
go
create view2 as select a, m, n, o from table2
--# pseudo-syntax for fk:
alter view2 add foreign key (a) references view1 (a)
go
I don't see how the foreign key would simplify your job.
Alternatively:
Copy the subset of data into another schema or database. Same tables, same keys, less data, faster analysis, less contention.
If you need a subset of all the tables, use another database. If you only need a subset of some tables, use a schema in the same database. That way your new tables can still reference the non-copied tables.
Then use the existing views to copy the data over. Any FK violations will raise an error and identify which views require editing. Create a job and schedule it daily, if necessary.

From a purely data integrity perspective (and nothing to do with the Query Optimizer), I had considered an Indexed View. I figured you could make a unique index on it, which could be broken when you try to have broken integrity in your underlying tables.
But... I don't think you can get around the restrictions of indexed views well enough.
For example:
You can't use outer joins, or sub-queries. That makes it very hard to find the rows that don't exist in the view. If you use aggregates, you can't use HAVING, so that cuts out some options you could use there too. You can't even have constants in an indexed view if you have grouping (whether or not you use a GROUP BY clause), so you can't even try putting an index on a constant field so that a second row will fall over. You can't use UNION ALL, so the idea of having a count which will break a unique index when it hits a second zero won't work.
I feel like there should be an answer, but I'm afraid you're going to have to take a good look at your actual design and work out what you really need. Perhaps triggers (and good indexes) on the tables involved, so that any changes that might break something can roll it all that.
But I was really hoping to be able to suggest something that the Query Optimizer might be able to leverage to help the performance of your system, but I don't think I can.

Merging contacts in SQL table without creating duplicate entries

I have a table that holds only two columns - a ListID and PersonID. When a person is merged with another in the system, I was to update all references from the "source" person to be references to the "destination" person.
Ideally, I would like to call something simple like
UPDATE MailingListSubscription
SET PersonID = #DestPerson
WHERE PersonID = #SourcePerson
However, if the destination person already exists in this table with the same ListID as the source person, a duplicate entry will be made. How can I perform this action without creating duplicated entries? (ListID, PersonID is the primary key)
EDIT: Multiple ListIDs are used. If SourcePerson is assigned to ListIDs 1, 2, and 3, and DestinationPerson is assigned to ListIDs 3 and 4, then the end result needs to have four rows - DestinationPerson assigned to ListID 1, 2, 3, and 4.

--out with the bad
DELETE
FROM MailingListSubscription
WHERE PersonId = #SourcePerson
and ListID in (SELECT ListID FROM MailingListSubscription WHERE PersonID = #DestPerson)
--update the rest (good)
UPDATE MailingListSubscription
SET PersonId = #DestPerson
WHERE PersonId = #SourcePerson

First you should subscribe destperson to all lists that SourcePerson is subscribed to that Destperson isn't already subscibed. Then delete all the SourcePersons subscriptions.
This will work with multiple ListIDs.
Insert into MailingListSubscription
(
ListID,
PersonID
)
Select
ListID,
#DestPerson
From
MailingListSubscription as t1
Where
PersonID = #SourcePerson and
Not Exists
(
Select *
From MailingListSubscription as t2
Where
PersonID = #DestPerson and
t1.ListID = t2.ListID
)
Delete From MailingListSubscription
Where
PersonID = #SourcePerson

I have to agree with David B here. Remove all the older stuff that shouldn't be there and then do your update.

Actually, I think you should go back and reconsider your database design as you really shouldn't be in circumstances where you're changing the primary key for a record as you're proposing to do - it implies that the PersonID column is not actually a suitable primary key in the first place.
My guess is your PersonID is exposed to your users, they've renumbered their database for some reason and you're syncing the change back in. This is generally a poor idea as it breaks audit trails and temporal consistency. In these circumstances, it's generally better to use your own non-changing primary key - usually an identity - and set up the PersonID that the users see as an attribute of that. It's extra work but will give you additional consistency and robustness in the long run.
A good rule of thumb is the primary key of a record should not be exposed to the users where possible and you should only do so after careful consideration. OK, I confess to breaking this myself on numerous occasions but it's worth striving for where you can :-)

We Keep Coding

sql objective-c vba vb.net react-native apache vue.js tensorflow api pandas

unique pair in a "friendship" database - sql

Related

Maintaining logical consistency with a soft delete, whilst retaining the original information

DB design: Should I use constraints within a table or a new table

order by field with more than 10000 ids

TSQL foreign keys on views?

Merging contacts in SQL table without creating duplicate entries

Categories

Resources