SQL Table Relationships - sql

I am attempting to create some table relationships for a SQL database that handles service activity on a work order ticket. I have one-to-may relationships for these tables and have a question.
I have WOT (Work Order Ticket) table for the creation of a work order ticket.
The WOT can have multiple service activities on different dates associated with it.
Each service activity can have multiple parts used (for a repair).
This is just the basic idea, but I have created the following tables and relationships:
WOT Table:
wot>wotnum – PK
Service Activity Table:
service_activity>serviceid – PK
service_activity>wotnum – FK (link to PK in WOT table)
Part Used Table:
partused>partusedid – PK
partused>serviceid – FK (link to PK in Service Activity table)
Each of the tables above has other columns as well (not shown), but they are unique to the table, such as date fields, part numbers, etc.
The service_activity>serviceid (PK) field is an autoincrement field and so is the partused>partusedid (PK) field as well.
My question is, during data entry, how do I insure the partused>serviceid field (FK) synced with the service_activity>serviceid field (PK) without actually having to manually enter the partused>serviceid field (FK)?
Although I have a decent understanding of tables and relationships (critical to get this correct now), I am a bit of a neophyte as to the process of thinking through how the tables will interact during actual data input. I think the answer to this may be simple, but I am just not grasping it yet. If my current solution does not seem adequate, I would welcome a suggestion. I need some help to get going in the right direction.

If you have added the FK correctly, when you manually enter the partused.serviceid column data during data entry, it will automatically refer the service_activity.serviceid column.
For example: If you have only one service_activity with serviceid = 456.
When you attempt insert data in the table partused with partused.serviceid = 500, it will throw an error and the insertion will not be allowed.
When you attempt to insert data in the table partused with partused.serviceid = 456, it will be successful as the referenced row in the other table indeed exists.

Related

Is it relevant to use the fact table primary key in dimension table?

I am designing a database and I am wondering something about primary keys and foreign keys. I have a kind of snowflake database diagram with a fact table and some dimension tables (if I can call it like this). Because of what I am doing, I need to generate a record in my fact table before adding rows in dimension tables and these rows (and tables) are using the primary key of my fact table.
I am reading topics about it and I see that I should use a ID in dimension tables that should be referenced in the fact table (the opposite of what I am doing).
Let me show you a part my diagram for a better understanding :
First of all, sorry, attributes of tables are written in French (I am a French guy, sorry for my bad english btw).
The "MASQUENumMasque" in "dimension" tables reference "NumMasque" of the table "MASQUE", and I use this foreign key as primary key of tables using it.
So, my question is very simple, I am doing right?
If you need more informations or if you are misunderstanding something, tell me!
Thank you guys!
You are doing it wrong, the data should always be added from the edge of the snowflake model, to the center.
You always have to add a row in the dimension, and then the data in the fact table, pointing to the dimension you just added. Otherwise you will have constraint issues.
Example: You have a fact table: ORDERS (order_id, shop_id) and a dimension table SHOP ( shop_id, shop_name). When loading a new set of data, you will load in the dimension table first, because it will be then referenced in the fact table throught the shop_id key. If you load the fact table first, orders.shop_id will point to nowhere.
So in your case, for table RONCHI for example, you should have columns Ronchi_id. In your table Masque, you should have a column Ronchi_id pointing to the RONCHI's pk.

Db design - table dependency and naming conventions

I'm about to design a db for a new project and I'm kinda stuck on some "concept" stuff.
My initial question is very similar to this one.
Relational table naming convention
That is:
"If I got a table "user" and then I got products that only the user
will have, should the table be named "user-product" or just "product"?
This is a one to many relationship."
In the above thread I found the answer by PerformanceDBA very useful and well written,
but I'm not sure about some points. Quoting part of the answer:
"It doesn't matter if user::product is 1::n. What matters is whether product is a separate entity and whether it is Independent, ie. it can exist on its own. Therefore product, not user_product. And if product exists only in the context of an user, ie. it is Dependent,
then user_product."
This is a very interesting point, but generates another question:
what exactly are the definitions of Independent and Dependent table?
Example 1, we have two tables:
The table User
Id
Username
FullName
The 1::n table Message, representing a collection of messages sent by the users
UserId (FK to User.Id)
Text
The Message table is dependent from the User table or not?
The question I'm asking to myself here, is: "Would the message entity exist without the user?" but I'm not sure about the answer, because it would be "the message would exist but would be anonymous." is this enough to make the Message table dependent from the User table (so i should name the table "UserMessage")?
Example 2, we have two tables:
The table User
Id
Username
FullName
The 1::1 table Profile, representing a user profile
UserId (FK to User.Id)
First Name
Last Name
Gender
Same question, is the table Profile dependent by the User table? I think it is, because a profile without a user would not really make sense.
I'm not sure though, so how can I decide safely if a table is dependent by another or not?
I think you may really have 3 entities to consider. User, product and user_product. Test relationships by describing them with a verb. The relationship between a user and a product is most likely a many-to-many (a user can order many products, and a product can be ordered by many users). This indicates that a composite table between them that takes the primary keys of both tables is needed (and maybe attributes only if they describe a fact about the user/product combination). user_product is what links a user with his products (and a product with who ordered it) and is thus dependent.
That said, in your examples the message and profile tables are dependent, since they cannot exist without a user (their primary key). Use user - user_message and user - user_profile.
Another example of an independent table would be a lookup table (code/description table).
To answer your last question, an entity is considered dependent if its primary key must exist in another entity before it can exist i.e you can't have a profile without a user so it is dependent.

Current primary key is ineffective at preventing duplicates. Does this sound like a good way to rearchitect my tables?

Every so often, I update our research recruitment database with those who responded to our Craigslist ad. Each respondent is given a unique respondentID, which is the primary key.
Sometimes, people respond to these Craigslist ads multiple times. I think we may have duplicate people in our database, which is bad.
I would like to change the primary key of all my recruitment tables from respondentID to Email, which will prevent duplicates and make it easier to look up information. There are probably duplicate email records in my database already, and I need to clean this up if so.
Here's the current architecture for my three recruitment tables:
demographic - contains columns like RespondentID (PK), Email (I want this to be PK), Phone, etc
genre - contains columns like RespondentID (PK), Horror, etc
platform - contains columns like RespondentID (PK), TV, etc.
I want to join all three tables together at some point so we can get a better understanding of someone.
Here are my questions:
How can I eliminate duplicate respondents already in my database? (I can tell if they are duplicates because they will have the same Email value.)
Given my current architecture, how can I transition my database to have Email as the primary key without messing up my data?
After transitioning to a new architecture, what is the process I can use to delete duplicates in my Craigslist ad spreadsheet before I append them to Demo, Genre, and Platform tables?
Here are my ideas about solutions:
Create backup tables. Join the three tables and export the big table to Excel. In Excel, use Data Filtering and Conditional Formatting to find the duplicate entries, and delete them by hand. Unfortunately, I have 20,000 records which will crash Excel. :( The chief issue is that I don't know how to remove duplicate entries within a table using SQL. (Also, if I have two entries by bobdole#republican.com, one entry should remain.) Can you come up with a smarter solution involving SQL and Access?
After each Email record is unique, I will create new tables with each using Email as the primary key.
When I want to remove duplicates within the data I'd like to import, I should be able to easily do it within Excel. Next, I will use this SQL command to deduplicate between the current database and the incoming data:
DELETE * from newParticipantsList
WHERE Email in (SelectEmail from Demo)
I'm going to try to duplicate my current architecture in a small test table in Access and see if I can figure it out. Overall, I don't have much experience with joining tables and removing data in SQL, so it's a little scary.
Maybe I'm just being thick, but why don't you just create a new Identity column in the existing table? You can always remove those records you deem duplicates, but the Identity column is guaranteed to be unique under all circumstances.
It will be up to you to make sure that any new records inserted into the table are not duplicates, by checking the Email column.
To remove duplicates from demographic table you could do something like:
WITH RecordsToKeep AS (
SELECT MIN(RespondentID) as RespondentID
FROM demographic
GROUP BY Email
) DELETE demographic
FROM demographic
LEFT JOIN RecordsToKeep on RecordsToKeep.RespondentID = demographic.RespondentID
where RecordsToKeep.RespondentID IS NULL
This will keep the first record for each email address and delete the rest. You will need to remap the genre and platform tables before you delete the source.
In terms of what to do in the future, you could get SQL to do all the de-duplicating for you by importing the data into a staging table and then only import distinct records to the final when the address isn't already in the demographic table.
There is no reason to change the Email Address to be the primary key. String's aren't great primary keys for a number of reasons. The problem you have isn't with duplicate keys, the problem is how you are inserting the data.

Simple database table design

I'm trying to design a database structure using best practice but I can't get my head around something which I'm sure is fundamental. The DB is for the users(100+) to subscribe to which magazines(100+) they read.
I have a table for the usernames, user info and magazine titles, but I'm unsure where to list the magazines that each user follows. Do I add a column in the user table and link it to the magazine table or would each user be setup with their own "follow" table that lists the magazine there? I'm getting myself confused I think so any help would be great.
Regards
Ryan
What you're struggling with is called a many-to-many relationship.
To solve this problem, you need a third table--perhaps called user_magazines. This third table should two key fields, one from the user table and the other from the magazine table. For example, user_id column and a magazine_id column. This is called a compound key. With both of these columns, you are now able to discern which books have been read by a whichever user.
This is best understood visually:
In the picture above you can see that the third table (the middle table, stock_category) enables us to know what stock item belongs to which categories.
First of all, you must understand a many-to-many relationship, like take your example of users and magazines. First understand the scenario : A single user can follow many magazines, and a single magazine can be followed by many users, so there exists a many-to-many relationship between users and magazines.
Whenever there exists many-to-many relationship between two entities, we have to introduce a third entity between them which is called an associative entity!
so you have to introduce a third entity named as per your choice and it will be containing information about which user is following which magazine
you can go to http://sqlrelationship.com/many-to-many-relationship/ for better understanding using diagrams
You should have a users table, with an auto-incrementing primary key, username, and anything else you want to store about that user.
Next, a magazines table which contains another auto-incrementing primary key, the name of the mag and anything else you need to store about that magazine.
Finally, a subscriptions table. this should have an auto-incrementing primary key (actually that's not really necessary on this table but personally I would add it), a user_ID column and a magazine_ID column.
To add a subscription, just add a new record to the subscription table containing the ID of the user and the ID of the relevant magazine. This allows for users to subscribe to multiple magazines.
If you want to get fancy you can add referential integrity constraints to the subscriptions table - this tells the database management system that a particular column is a reference to another table, and can specify what to do upon modifying it (for example you could have the DBMS automatically delete subscriptions owned by a particular user if that user is deleted)
You definitely do NOT want to add a column to the user table and have it refer to the magazine table. Users would only be able to follow or subscribe to one magazine which doesn't reflect the real world.
You'll want to have a join table that has a userId and a magazineId. For each magazine that a user subscribes to there will be one entry in the join table.
I'm inferring a little bit about your table structure but if you had:
User (id, login)
Magazine (id, name)
User_Magazine (userId, magazineId)
Perhaps this last table should be called subscription because there may be other information like the subscription end date which you'd want to track and that is really what it is representing in the real world.
You'd be able to put an entry into the User_Magazine table for every subscription.
And if you wanted to see all the magazines a user with the login jdoe had you'd do:
SELECT name
FROM User, Magazine, User_Magazine
WHERE login = 'jdoe'
AND User.id = User_Magazine.userId
AND Magazine.id = User_Magazine.magazineId
You should create a separate table called UserMagazineSubs. Make the UserID + MagazineTile ID as a composite key.
This table will capture all User and Magazine relationship details.
A User_To_Magazine table, that has two columns - UserId and MagazineId, and the key is composite containing both columns

Normalize SQL database

I'm creating a database for a project and I'm a little confused about how normalization applies to my schema. Everytime a loan is aproved for a customer, they have 2 options a check or an EFT, so I want to know wheter the loan was a check or EFT.
This are my 3 tables:
Loans
id_loan (PK)
product
amount
status
Checks
id_check (PK)
id_customer
amount
EFT
id_eft (PK)
id_customer
amount
Then I created a 4th table to establish a relationship between loans and money disposal.
Disposal
id_payment (PK)
id_loan (FK loans)
id_disposal (FK checks or EFT)
disposal_type
In this table I store whether the loan is related to a check or an EFT, disposal_type field is a varchar with two possible values "check" or "EFT". id_disposal field acts as a foreign key for two tables.
The problem is that I think my database isn't normalized with this structure, am I right? What would be the best way to solve this?
You need something like the attached. Note that the customer_loans table is kind of extraneous and overkill, but if there's any columns that relate to the customer and the loan, and not the customer's loan payments, that's where it would go.
In the object world, you'd use inheritance for this. There would be a base type Disposal which CheckDisposal and EftDisposal would derive from. Modern O/RMs support several techniques for mapping this to a relational structure.
TablePerHierarchy puts all of the records into a single table with a discriminator column to identify what type a specific record holds and maps to. The advantage is that it requires fewer joins to get a record. Disadvantage is that it requires app logic to enforce data integrity.
TablePerType maps records into different tables with a fk relationship back to the base table. Of course this requires more joins (especially for deep or wide hierarchies) but data integrity can be enforced in the DB.