Database design for customer to skills - sql

I have an issue where we have a customer table includes name, email, address and a skills table which is qts, first aid which is associated by an id. For example
Customer
id = 1
Name = James
Address = some address
Skills
1, qts
2, first aid
I am now trying to pair up the relationship. I first came to a quick solution just by creating a skills table which just has customerId and each skill has a true / false value. Then created a go between customer_skills with an customerId to SkillId. But I would not know how to update the records when values change as there is no unique id.
can anyone help on what would be the best way to do this?
thanks....

The solution you want really depends on your data, and is a question that has been asked thousands of times before. If you google
Entity Attribute Value vs strict relational model you will see countless articles
comparing and contrasting the methods available.
Strict Relational Model
You would add additional BIT or DATETIME fields (Where a NULL datetime represents the customer not having the skill)
to your customer table for each skill. This works well if you have few skills that are unlikely to change much over time.
This allows simple queries to locate customers with skills, especially with various combinations of skills ee.g (Using datetime fields)
SELECT *
FROM Customer
WHERE Skill1 >= '20120101' -- SKILL 1 AQUIRED AFTER 1ST JAN 2012
AND Skill2 IS NOT NULL -- HAS SKILL 2
AND Skill2 IS NULL -- DOES NOT POSSESS SKILL 3
Entity-Attribute-Value Model
This is a slight adaptation of a classic entity-attribute-value model, because the value is boolean represented by the existence of a record.
You would create a table like this:
CREATE TABLE CustomerSkills
( CustomerID INT NOT NULL,
SkillID INT NOT NULL
PRIMARY KEY (CustomerID, SkillID),
FOREIGN KEY (CustomerID) REFERENCES Customer (ID),
FOREIGN KEY (SkillID) REFERENCES Skills (ID)
)
You may want additional columns such as DateAdded, AddedBy etc to track when skills were added and who by etc, but the core principles can be gathered from the above.
With this method it is much easier to add skills, as it doesn't require adding columns, but can make simple queries much more complicated. The above query would have to be written as:
SELECT Customer.*
FROM Customer
INNER JOIN
( SELECT CustomerID
FROM CustomerSkills
WHERE SkillID IN (2, 3) -- SKILL2,SKILL3
OR (SkillID = 1 AND DateAdded >= '20120101')
GROUP BY CustomerID
HAVING COUNT(*) = 2
AND COUNT(CASE WHEN SkillID = 3 THEN 1 END) = 0
) skills
ON Skills.CustomerID = Customer.ID
This is much more complext and resource intensive than with the relational model, but the overall structure is much more flexible.
So to summarise, it really depends on your own particular situation, there are a few factors to consider, but there are plenty of resources out there to help you decide.

If you have a table linking the primary keys from two other tables together in order to form a many-to-many relationship (like in you example) you don't have to update that table. Instead you can just delete and reinsert values into it.
If you are editing a custiomer (customerId 46 for instance) and changing the skills for that customer, you can just delete all skills for the customer and then reinsert the new set of skills when storing the changes.
If your "link table" contains some additional information besides just the two primary key columns, then the situation might be different. But from your description it seems like you just want to link the table together using the primary keys from each table. In that case a delete + reinsert should be fine.
Also in this kind of table, you should make the combination of the two foreign key fields be the primary key of the binding table.

Related

A field referencing two tables - Foreign Key Conflict

Please check the tables below for a simplified version of my problem:
Table Boys
BoyId
BoyName
...
Table Girls
GirlId
GirlName
Table Toys
ToyId
ToyName
ToyOwnerBoyOrGirl ( The toy could be owned by a boy or a girl)
ToyOwnerId
I created two constraints:
1) ToyOwnerId is a foreign key of the Primary Key Boys.BoyId
2) ToyOwnerId is a foreign key of the Primary Key Girls.GirlId
My purpose is to tell the database that ToyOwnerId will always be one of these Ids
My problem:
When I tried to insert a new Toy with an id of a Boy, I got an error that there is a foreign key conflict in the Girls constraint.
Is this a bag design or I can still use the same design with a fix ?
I think you should combine the boys and girls table to one table called children. It would have sex column that would have an M or F. That will simplify things.
You should simply have both IDs in your toys table plus a check constraint to ensure that always either a boy or a girl is the owner.
Table Toys
ToyId
ToyName
ToyOwnerBoyId
ToyOwnerGirlId
CONSTRAINT chkToyOwner CHECK
(
(ToyOwnerBoyId is null and ToyOwnerGirlId is not null)
OR
(ToyOwnerBoyId is not null and ToyOwnerGirlId is null)
)
As to selecting the data, use outer joins:
select ...
from toys
left join boys on boys.boyid = toys.toyownerboyid
left join girls on girls.girlid = toys.toyownergirlid;
To find toys owned by boys:
select ...
from toys
where toyownerboyid is not null;
It looks like a bad design. Why don't have one table for all children and some mark - is it boy or girl? Also I really doubt you need ToyOwnerBoyOrGirl field - as it can be easily obtained by join from toys to owners.
Consider following scheme:
Table Children
ID
Name
Is_Boy
Table Toys
ID
Name
Owner_ID
In this case you need just foreign key from toys to owners, and other tasks you might encounter will be much more simplier to solve.
EDIT: As per OP's comment - Boys and Girls tables are totally different.
So, in this case you still can have table Children (let's use previous terminology) as a "common" table for Boys and Girls.
Something like:
Table Children
ID
Table_Name ('Boys' or 'Girls' here)
Record_ID (ID from Boys or Girls respectively)
...maybe some common fields from boys and girls tables here...
Table Boys
ID
Child_ID
...the rest of fields
Table Girls
ID
Child_ID
...the rest of fields
It's a bad design, given the fact one column set (one column, being ToyOwnerId) is referred to different tables, both as a one-column FK to a one-column PK. The obvious question would then be : how come you have different tables with a similar PK ? And I see you have answered that already, by means of replying the data columns of the respective tables are different. That is a good reason to have different tables. But, then, how to solve the FK issue ? (I understood that these "boys" and "girls" are not the real entities). What you can do, is make a BoyToy and a GirlToy table. If you have very few data columns (data = non-PK and non-FK) then that is a perfect solution. No ?

In SQL Server I need to change data structure of relationships (FK)

Ok I wasn't entirely sure what to title this question, so here's the situation.
I'm big on data integrity... Meaning as many constraints and rules that I can use I want to use in SQL Server and not rely on the application.
So I have a website that has a business directory, and those businesses can create a post.
So I have two tables like this:
tbl_Business ( BusinessID, Title, etc. )
tbl_Business_Post ( PostID, BusinessID, PostTitle, etc. )
There's a FK relationship for the column BusinessID between the two tables. A post cannot exist in the tbl_Business_Post table without the BusinessID existing in the tbl_Business table.
So pretty standard...
I've recently added classifieds to the site. So now I have two more tables:
tbl_Classified ( ClassifiedID, SellerID, ClassifiedTitle, etc. )
tbl_Classified_Seller ( SellerID, SellerName, etc. )
What I'm wanting to do is take advantage of my tbl_Business_Post table to include classifieds in that as well. Think of its usage like a feed... So the site will show recent posts from businesses and classifieds all in one feed.
Here's where I need guidance.
I was tempted to remove the FK relationship on the tbl_Business_Posts...
I thought about creating another separate Posts table that holds the classifieds posts.
Is there a way to make a conditional FK relationship based on a column? For example, if it's a business posting the BusinessID must exist in the Business table, or if its a classifieds post, the SellerID must exist in the Seller table?
Or should I create a separate table to hold the classifieds posts and UNION both the tables on the query?
You might question why I have a "Posts" table and that's hard to explain... but I do need it for the way the site is organized and how the feed works.
It's just that the posts table is perfect and I wanted to combine all posts and organize them by type (Ie: 'business', 'classified', 'etc.') as there might be more later.
So it comes down to, what's the best way to organize this to sustain data integrity from SSMS?
Thank you for guidance.
======== EDIT =========
Full explanation of tbl_Business_Post
PostID PK
Post_Type int <-- 1-21 is business types, 22 for classified type
BusinessID INT <-- This is the FK currently for the tbl_Business
SiblingID INT <-- This is the ID of the related item they're posting on. So for example, if they post a story about one of their products, this is the ProductID, if it's a service, this is the ServiceID.
Post_Title <-- Depending on the post, this could be a Product title, a service title, etc.
So if I changed the structure so it's as follows:
PostID PK
Post_Type int
BusinessID INT <-- this is populated on insert if it's a business.
SellerID INT <-- This is populated on insert if it's a classified seller
SiblingID INT <-- This is either the classifiedID or ProductID, SeviceID, etc. Depending on post type.
So leaning toward Peter's 1st solution/example... interested in the proper way to create check constraints or triggers on this so that if the type is 1-21, it makes sure BusinessID exists in the Business table, or if it's type 22, make sure the SellerID exists in the seller table.
Even going further with this:
If Post_Type = 22, I should make sure that not only is the Seller in the seller table, but the SiblingID is also the ClassifiedID in the Classified table.
1) There's no way to do this kind of conditional FK you're thinking of. What you need here is basically a FK from tbl_Business_Post which points logically to one of two tables, depending on the value in another column of tbl_Business_Post. This situation is what people encounter quite often. But in a relational DB this is not a very native idea.
So OK, this cannot be enforced with a FK. Instead, you can probably enforce this with a trigger or check constraint on tbl_Business_Post.
2) Alternatively, you can do the below.
Create some table tbl_Basic_Post, put there all columns which pertain to the post itself (e.g. PostTitle) and not to the parent entity which this post record belongs/points to (Business or Classified). Then create two other tables which point via a FK to the tbl_Basic_Post table like e.g.
tbl_Business_Post.Basic_Post_ID (FK)
tbl_Classified_Post.Basic_Post_ID (FK)
Put in these two tables the columns which are Business_Post/Classified_Post-specific
(you see, this is basically inheritable in relational DB terms).
Also, make each of these two tables have FKs to their respective parent tables
tbl_Business and tbl_Classified too. Now these FKs become unconditional (in your sense).
To get business posts you join tbl_Basic_Post and tbl_Business_Post.
To get classified posts you join tbl_Basic_Post and tbl_Classified_Post.
Both approaches have their pros and cons.
Approach 1) is simple, does not lead to the creation of too many tables; but it's not trivial to enforce the data integrity.
Approach 2) does not require anything special to enforce data integrity but leads to the creation of more tables.

SQL database design pattern for user favorites?

Asked this on the database site but it seems to be really slow moving. So I'm new to SQL and databases in general, the only thing I have worked on with an SQL database used one to many relationships. I want to know the easiest way to go about implementing a "favorites" mechanism for users in my DB-similar to what loads of sites like Youtube, etc, offer. Users are of course unique, so one user can have many favorites, but one item can also be favorited by many users. Is this considered a many to many relationship? What is the typical design pattern for doing this? Many to many relationships look like a headache(I'm using SQLAlchemy so my tables are interacted with like objects) but this seems to be a fairly common feature on sites so I was wondering what is the most straightforward and easy way to go about it. Thanks
Yes, this is a classic many-to-many relationship. Usually, the way to deal with it is to create a link table, so in say, T-SQL you'd have...
create table user
(
user_id int identity primary key,
-- other user columns
)
create table item
(
item_id int identity primary key,
-- other item columns
)
create table userfavoriteitem
(
user_id int foreign key references user(user_id),
item_id int foreign key references item(item_id),
-- other information about favoriting you want to capture
)
To see who favorited what, all you need to do is run a query on the userfavoriteitem table which would now be a data mine of all sorts of useful stats about what items are popular and who liked them.
select ufi.item_id,
from userfavoriteitem ufi
where ufi.user_id = [id]
Or you can even get the most popular items on your site using the query below, though if you have a lot of users this will get slow and the results should be saved in a special table updated on by a schedules job on the backend every so often...
select top 10 ufi.item_id, count(ufi.item_id),
from userfavoriteitem ufi
where ufi.item_id = [id]
GROUP BY ufi.item_id
I've never seen any explicitly-for-database design patterns (except a couple of trivial misuses of the phrase 'design pattern' when it became fashionable some years ago).
M:M relationships are OK: use a link table (aka association table etc etc). Your example of a User and Favourite sounds like M:M indeed.
create table LinkTable
(
Id int IDENTITY(1, 1), -- PK of this table
IdOfTable1 int, -- PK of table 1
IdOfTable2 int -- PK of table 2
)
...and create a UNIQUE index on (IdOfTable1, IdOfTable2). Or do away with the Id column and make the PF on (IdOfTable1, IdOfTable2) instead.

Data modelling draft/quote/order/invoice

Im currently working on a small project in which I need to model the following scenario:
Scenario
Customer calls, he want an quote on a new car.
Sales rep. register customer information.
Sales rep. create a quote in the system, and add a item to the quote (the car).
Sales rep. send the quote to the customer on email.
Customer accept the quote, and the quote is now not longer a quote but an order.
Sales rep. check the order, everything is OK and he invoice the order. The order is now not longer an order, but an invoice.
Thoughts
I need a bit of help finding out the ideal way to model this, but I have some thoughts.
I'm thinking that both draft/quote/invoice is basically an order.
Draft/quote/invoice need seperate unique numbers(id's) so there for i'm thinking separate tables for all of them.
Model
This is my data model v.1.0, please let me know what you think.
Concerns
I however have som concerns regarding this model:
Draft/quote/invoice might have different items and prices on the order lines. In this model all draft/quote/invoice is connected to the same order and also order lines, making it impossible to have separate quote lines/draft lines/invoice lines. Maybe I shall make new tables for this, but then basically the same information would be stored in multiple tables, and that is not good either.
Sometimes two or more quotes become an invoice, how would this model take care of this?
If you have any tips on how to model this better, please let me know!
EDIT: Data model v.1.4
It looks like you've modeled every one of these things--quote, order, draft, invoice--as structurally identical to all the others. If that's the case, then you can "push" all the similar attributes up into a single table.
create table statement (
stmt_id integer primary key,
stmt_type char(1) not null check (stmt_type in ('d', 'q', 'o', 'i')),
stmt_date date not null default current_date,
customer_id integer not null -- references customer (customer_id)
);
create table statement_line_items (
stmt_id integer not null references statement (stmt_id),
line_item_number integer not null,
-- other columns for line items
primary key (stmt_id, line_item_number)
);
I think that will work for the model you've described, but I think you'll be better served in the long run by modeling these as a supertype/subtype. Columns common to all subtypes get pushed "up" into the supertype; each subtype has a separate table for the attributes unique to that subtype.
This SO question and its accepted answer (and comments) illustrate a supertype/subtype design for blog comments. Another question relates to individuals and organizations. Yet another relating to staffing and phone numbers.
Later . . .
This isn't complete, but I'm out of time. I know it doesn't include line items. Might have missed something else.
-- "Supertype". Comments appear above the column they apply to.
create table statement (
-- Autoincrement or serial is ok here.
stmt_id integer primary key,
stmt_type char(1) unique check (stmt_type in ('d','q','o','i')),
-- Guarantees that only the order_st table can reference rows having
-- stmt_type = 'o', only the invoice_st table can reference rows having
-- stmt_type = 'i', etc.
unique (stmt_id, stmt_type),
stmt_date date not null default current_date,
cust_id integer not null -- references customers (cust_id)
);
-- order "subtype"
create table order_st (
stmt_id integer primary key,
stmt_type char(1) not null default 'o' check (stmt_type = 'o'),
-- Guarantees that this row references a row having stmt_type = 'o'
-- in the table "statement".
unique (stmt_id, stmt_type),
-- Don't cascade deletes. Don't even allow deletes. Every order given
-- an order number must be maintained for accountability, if not for
-- accounting.
foreign key (stmt_id, stmt_type) references statement (stmt_id, stmt_type)
on delete restrict,
-- Autoincrement or serial is *not* ok here, because they can have gaps.
-- Database must account for each order number.
order_num integer not null,
is_canceled boolean not null
default FALSE
);
-- Write triggers, rules, whatever to make this view updatable.
-- You build one view per subtype, joining the supertype and the subtype.
-- Application code uses the updatable views, not the base tables.
create view orders as
select t1.stmt_id, t1.stmt_type, t1.stmt_date, t1.cust_id,
t2.order_num, t2.is_canceled
from statement t1
inner join order_st t2 on (t1.stmt_id = t2.stmt_id);
There should be a table "quotelines", which would be similar to "orderlines". Similarly, you should have an 'invoicelines' table. All these tables should have a 'price' field (which nominally will be the part's default price) along with a 'discount' field. You could also add a 'discount' field to the 'quotes', 'orders' and 'invoices' tables, to handle things like cash discounts or special offers. Despite what you write, it is good to have separate tables, as the amount and price in the quote may not match what the customer actually orders, and again it may not be the same amount that you actually supply.
I'm not sure what the 'draft' table is - you could probably combine the 'draft' and 'invoices' tables as they hold the same information, with one field containing the status of the invoice - draft or final. It is important to separate your invoice data from order data, as presumably you will be paying taxes according to your income (invoices).
'Quotes', 'Orders' and 'Invoices' should all have a field (foreign key) which holds the value of the sales rep; this field would point to the non-existent 'SalesRep' table. You could also add a 'salesrep' field in the 'customers' table, which points to the default rep for the customer. This value would be copied into the 'quotes' table, although it could be changed if a different rep to the default gave the quote. Similarly, this field should be copied when an order is made from a quote, and an invoice from an order.
I could probably add much more, but it all depends on how complex and detailed a system you want to make. You might need to add some form of 'bill of materials' if the cars are configured according to their options and priced accordingly.
Add a new column to line_items ( ex:Status as smallint)
When a quote_line becomes an order_line then set bit you choose from 0 to 3 to 1.
But when qty changes then add a new line with new qte and keep last line unchanged.
Kad.

How to Delete all data from a table which contain self referencing foreign key

I have a table which has employee relationship defined within itself.
i.e.
EmpID Name SeniorId
-----------------------
1 A NULL
2 B 1
3 C 1
4 D 3
and so on...
Where Senior ID is a foreign key whose primary key table is same with refrence column EmpId
I want to clear all rows from this table without removing any constraint. How can i do this?
Deletion need to be performed like this
4, 3 , 2 , 1
How can I do this
EDIT:
Jhonny's Answer is working for me but which of the answers are more efficient.
I don't know if I am missing something, but maybe you can try this.
UPDATE employee SET SeniorID = NULL
DELETE FROM employee
If the table is very large (cardinality of millions), and there is no need to log the DELETE transactions, dropping the constraint and TRUNCATEing and recreating constraints is by far the most efficient way. Also, if there are foreign keys in other tables (and in this particular table design it would seem to be so), those rows will all have to be deleted first in all cases, as well.
Normalization says nothing about recursive/hierarchical/tree relationships, so I believe that is a red herring in your reply to DVK's suggestion to split this into its own table - it certainly is viable to make a vertical partition of this table already and also to consider whether you can take advantage of that to get any of the other benefits I list below. As DVK alludes to, in this particular design, I have often seen a separate link table to record self-relationships and other kinds of relationships. This has numerous benefits:
have many to many up AND down instead of many-to-one (uncommon, but potentially useful)
track different types of direct relationships - manager, mentor, assistant, payroll approver, expense approver, technical report-to - with rows in the relationship and relationship type tables instead of new columns in the employee table
track changing hierarchies in a temporally consistent way (including terminated employee hierarchy history) by including active indicators and effective dates on the relationship rows - this is only fully possible when normalizing the relationship into its own table
no NULLs in the SeniorID (actually on either ID) - this is a distinct advantage in avoiding bad logic, but NULLs will usually appear in views when you have to left join to the relationship table anyway
a better dedicated indexing strategy - as opposed to adding SeniorID to selected indexes you already have on Employee (especially as the number of relationship types grows)
And of course, the more information you relate to this relationship, the more strongly is indicated that the relationship itself merits a table (i.e. it is a "relation" in the true sense of the word as used in relational databases - related data is stored in a relation or table - related to a primary key), and thus a normal form for relationships might strongly indicate that the relationship table be created instead of a simple foreign key relationship in the employee table.
Benefits also include its straightforward delete scenario:
DELETE FROM EmployeeRelationships;
DELETE FROM Employee;
You'll note a striking equivalence to the accepted answer here on SO, since, in your case, employees with no senior relationship have a NULL - so in that answer the poster set all to NULL first to eliminate relationships and then remove the employees.
There is a possibly appropriate usage of TRUNCATE depending upon constraints (EmpployeeRelationships is typically able to be TRUNCATEd since its primary key is usually a composite and not a foreign key in any other table).
Try this
DELETE FROM employee;
Inside a loop, run a command that deletes all rows with an unreferenced EmpID until there are zero rows left. There are a variety of ways to write that inner DELETE command:
DELETE FROM employee WHERE EmpID NOT IN (SELECT SeniorID FROM employee)
DELETE FROM employee e1 WHERE NOT EXISTS
(SELECT * FROM employee e2 WHERE e2.SeniorID = e.EmpID
and probably a third one using a JOIN, but I'm not familiar with the SQL Server syntax for that.
One solution is to normalize this by splitting out "senior" relationship into a separate table. For the sake of generality, make that second table "empID1|empID2|relationship_type".
Barring that, you need to do this in a loop. One way is to do it:
declare #count int
select #count=count(1) from table
while (#count > 0)
BEGIN
delete employee WHERE NOT EXISTS
(select 1 from employee 'e_senior'
where employee.EmpID=e_senior.SeniorID)
select #count=count(1) from table
END