How to use SQL Server views with distinct clause to Link to a detail table? - sql

I may be total standard here, but I have a table with duplicate values across the records i.e. People and HairColour. What I need to do is create another table which contains all the distinct HairColour values in the Group of Person records.
i.e.
Name HairColour
--------------------
Sam Ginger
Julie Brown
Peter Brown
Caroline Blond
Andrew Blond
My Person feature view needs to list out the distinct HairColours:
HairColour Ginger
HairColour Brown
HairColour Blond
Against each of these Person feature rows I record the Recommended Products.
It is a bit weird from a Relational perspective, but there are reasons. I could build up the Person Feature"View as I add Person records using say an INSTEAD OF INSERT trigger on the View. But it gets messy. An alternative is just to have Person Feature as a View based on a SELECT DISTINCT of the Person table and then link Recommended Products to this. But I have no Primary Key on the Person Feature View since it is a SELECT DISTINCT View. I will not be updating this View. Also one would need to think about how to deal with the Person Recommendation records when a Person Feature record disappeared since since it is not based on a physical table.
Any thoughts on this please?
Edit
I have a table of People with duplicate values for HairColour across a number of records, e.g., more than one person has blond hair. I need to create a table or view that represents a distinct list of "HairColour" records as above. Against each of these "HairColour" records I need link another table called Product Recommendation. The main issue to start with is creating this distinct list of records. Should it be a table or could it be a View based on a SELECT DISTINCT query?
So Person >- HairColour (distinct Table or Distinct View) -< Product Recommendation.
If HairColour needs to be a table then I need to make sure it has the correct records in it every time a Person record is added. Obviously using a View would do this automatically, but I am unsure whether you can can hang another table off a View.

If I understand correctly, you need a table with a primary key that lists the distinct hair colors that are found in a different table.
CREATE TABLE Haircolour(
ID INT IDENTITY(1,1) NOT NULL,
Colour VARCHAR(50) NULL
CONSTRAINT [PK_Haircolour] PRIMARY KEY CLUSTERED (ID ASC))
Then insert your records. If this is querying a table called "Person" it will look like this:
INSERT INTO Haircolour (Colour) SELECT DISTINCT HairColour FROM Person
Does this do what you are looking for?
UPDATE:
Your most recent Edit shows that you are looking for a many-to-many relationship between the Person and ProductRecommendation tables, with the HairColour table functioning as a cross reference table.
As ErikE points out, this is a good opportunity to normalize your data.
Create the HairColour table as described above.
Populate it from whatever source you like, for example the insert statement above.
Modify both the Person and the ProductRecommendation tables to include a HairColourID field, which is an integer foreign key that points to the PK field of the HairColour table.
Update Person.HairColourID to point to the color mentioned in the Person.HairColour column.
Drop the Person.HairColour column.
This involves giving up the ability to put free form new color names into the Person table. Any new colors must now be added to the HairColour table; those are the only colors that are available.
The foreign key constraint enforces the list of available colors. This is a good thing. Referential integrity keeps your data clean and prevents a lot of unexpected errors.
You can now confidently build your ProductRecommendation table on a data structure that will carry some weight.

Are you simply looking for a View of distinct hair colors?
CREATE VIEW YourViewName AS
SELECT DISTINCT HairColour
FROM YourTableName
You can query this view like a table:
SELECT 'HairColour: ' + HairColour
FROM YourViewName
If you are trying to create a new (temp) table, the syntax would look like:
SELECT Name, HairColour
INTO #Temp
FROM YourTableName
GROUP BY Name, HairColour
Here the GROUP BY is doing the same work that a DISTINCT keyword would do in the select list. This will create a temp table with unique combinations of "Name" and "HairColour".

You need to clear up a few things in your post (or in your mind) first:
1) What are the objectives? Forget about tables and views and whatever. Phrase your objectives as an ordinary person would. For example, from what I could gather from your post:
"My objective is to have a list of recommended products based on each person's hair colour."
2) Once you have that, check what data you have. I assume you have a "Persons" table, with the columns "Name" and "HairColour". You check your data and ask yourself: "Do I need any more data to reach my objective?" Based on your post I say yes: you also need a "matching" between hair colours and product ids. This must be provided, or programmed by you. There is no automatic method of saying for example "brown means products X,Y,Z.
3) After you have all the needed data, you can ask: Can I perform a query that will return a close approximation of my objective?
See for example this fiddle:
http://sqlfiddle.com/#!2/fda0d6/1
I have also defined your "Select distinct" view, but I fail to see where it will be used. Your objectives (as defined in your post) do not make this clear. If you provide a thorough list in Recommended_Products_HairColour you do not need a distinct view. The JOIN operation takes care of your "missing colors" (namely "Green" in my example)
4) When you have the query, you can follow up with: Do I need it in a different format? Is this a job for the query or the application? etc. But that's a different question I think.

Related

SQL database structure with two changing properties

Let's assume I am building the backend of a university management software.
I have a users table with the following columns:
id
name
birthday
last_english_grade
last_it_grade
profs table columns:
id
name
birthday
I'd like to have a third table with which I can determine all professors teaching a student.
So I'd like to assign multiple teachers to each student.
Those Professors may change any time.
New students may be added any time too.
What's the best way to achieve this?
The canonical way to do this would be to introduce a third junction table, which exists mainly to relate users to professors:
users_profs (
user_id,
prof_id,
PRIMARY KEY (user_id, prof_id)
)
The primary key of this junction table is the combination of a user and professor ID. Note that this table is fairly lean, and avoids the problem of repeating metadata for a given user or professor. Rather, user/professor information remains in your two original tables, and does not get repeated.

Is this database in third normal form/3NF?

I know this is probably a stupid question to some but I'm required to have this database in 3NF but know very little about normalisation as our teacher has not covered it. Could someone give me a simple yes or no answer as to whether it is in 3NF and if it is not, suggest any changes. Thanks.
Simple answer No. Google transitive dependencies, or even just Google 3NF?
Why is this the case? Because you have some columns that are dependant on other columns in the same table, where those columns aren't part of the primary key.
For example, in your Customer Table you have Postcode and Town, but there is a relationship between the two, i.e. you couldn't have a Postcode for Paris without also having a Town of Paris. This is very weak transitive dependence, and most databases would have this without considering it bad practice, but I think this is enough to break 3NF.
There's another place where it's a little less clear, but I am pretty sure you break 3NF. In your Payment Table you have Deposit Paid, Total Price, Amount Still To Pay, and Fully Paid. There's an argument that given Total Price and Deposit Paid you could determine Amount Still to Pay. There's a very strong argument that you could always determine Fully Paid from the other three "paid" columns.
You can create Person table with id,title,firstname,lastname
You can add person_id to customerTable and employeeTable. And remove title,firstname,lastname fields from that table.
You can create TownTable with columns id,name and then add town_id to customerTable and emloyeeTable. Remove column town ftom that tables
Create contactInfoTable with columns id, contact_type_id, contact_info
Add contact_info_id column to employeeTable and customerTable. Delete another columns about contact info (phoneNo,email) from that tables.
Create contactType table woth columns id,name. Fill two rows to that table with names phone and email
Create personAddress table with columns id, address, town_id
Add personAddress_id to customerTable, employeeTable tables. Remove address,town from that tables
Create TownTable woth columns id,name
You can create userTable with columns id,employee_id,username
You can create passwordTable with id and user_id
Create user_role table with id, user_id, role_id
Create role_table and add id,name
Also add create_date,end_date (Date ), active(nvarchar2(1) or integer) to all your tables. And in your selects use active=1 condition.

Join performance

My situation is:
Table member
id
firstname
lastname
company
address data ( 5 fields )
contact data ( 2 fields )
etc
Table member_profile
member_id
html ( something like <h2>firstname lastname</h2><h3>Company</h3><span>date_registration</span> )
date_activity
chat_status
Table news
id
member_id (fk to member_id in member_profile)
title
...
The idea is that the full profile of the member, when viewed is fetched from the member database, in for instance a news overview, the smaller table which holds the basis display info for a member is joined.
However, i have found the need for more often use for the member info that is not stored in the member_profile table, e.g. firstname, lastname and gender, are nescesary when someone has posted a news item (firstname has posted news titled title.
What would be better to do? Move the fields from the member_profile table to the member table, or move the member fields to the member_profile table and perhaps remove them from the member table? Keep in mind that the member_profile table is joined a lot, and also updated on each login, status update etc.
You have two tables named member so i have the feeling your question isn't formed correctly.
What is the relationship between these tables? It looks like you have 3 tables, all one-to-one. So all you need to do is change (fk to member_id in member_profile) to (fk to id in member).
Now you can join in data from either of the 2 extra tables as you wish, without always having to go through member_profile.
[Edit] Also I assume that member_profile.member_id is a fk to member.id. If not, I believe it should :)
Combine them into one table so you're normalizing the name data then create 2 views which replicate the original two tables would be the easy option
Separating the tables between mostly-static fields and frequently-updated fields will improve write performance. So I would stay with what you're doing. If you cache the information from both tables together in a member object, read performance (and thus joining) is less of an issue.

How to enforce DB integrity with non-unique foreign keys?

I want to have a database table that keeps data with revision history (like pages on Wikipedia). I thought that a good idea would be to have two columns that identify the row: (name, version). So a sample table would look like this:
TABLE PERSONS:
id: int,
name: varchar(30),
version: int,
... // some data assigned to that person.
So if users want to update person's data, they don't make an UPDATE -- instead, they create a new PERSONS row with the same name but different version value. Data shown to the user (for given name) is the one with highest version.
I have a second table, say, DOGS, that references persons in PERSONS table:
TABLE DOGS:
id: int,
name: varchar(30),
owner_name: varchar(30),
...
Obviously, owner_name is a reference to PERSONS.name, but I cannot declare it as a Foreign Key (in MS SQL Server), because PERSONS.name is not unique!
Question: How, then, in MS SQL Server 2008, should I ensure database integrity (i.e., that for each DOG, there exists at least one row in PERSONS such that its PERSON.name == DOG.owner_name)?
I'm looking for the most elegant solution -- I know I could use triggers on PERSONS table, but this is not as declarative and elegant as I want it to be. Any ideas?
Additional Information
The design above has the following advantage that if I need to, I can "remember" a person's current id (or (name, version) pair) and I'm sure that data in that row will never be changed. This is important e.g. if I put this person's data as part of a document that is then printed and in 5 years someone might want to print a copy of it exactly unchanged (e.g. with the same data as today), then this will be very easy for them to do.
Maybe you can think of a completely different design that achieves the same purpose and its integrity can be enforced easier (preferably with foreign keys or other constraints)?
Edit: Thanks to Michael Gattuso's answer, I discovered another way this relationship can be described. There are two solutions, which I posted as answers. Please vote which one you like better.
In your parent table, create a unique constraint on (id, version). Add version column to your child table, and use a check constraint to make sure that it is always 0. Use a FK constraint to map (parentid, version) to your parent table.
Alternatively you could maintain a person history table for the data that has historic value. This way you keep your Persons and Dogs table tidy and the references simple but also have access to the historically interesting information.
Okay, first thing is that you need to normalize your tables. Google "database normalization" and you'll come up with plenty of reading. The PERSONS table, in particular, needs attention.
Second thing is that when you're creating foreign key references, 99.999% of the time you want to reference an ID (numeric) value. I.e., [DOGS].[owner] should be a reference to [PERSONS].[id].
Edit: Adding an example schema (forgive the loose syntax). I'm assuming each dog has only a single owner. This is one way to implement Person history. All columns are not-null.
Persons Table:
int Id
varchar(30) name
...
PersonHistory Table:
int Id
int PersonId (foreign key to Persons.Id)
int Version (auto-increment)
varchar(30) name
...
Dogs Table:
int Id
int OwnerId (foreign key to Persons.Id)
varchar(30) name
...
The latest version of the data would be stored in the Persons table directly, with older data stored in the PersonHistory table.
I would use and association table to link the many versions to the one pk.
A project I have worked on addressed a similar problem. It was a biological records database where species names can change over time as new research improved understanding of taxonomy.
However old records needed to remain related to the original species names. It got complicated but the basic solution was to have a NAME table that just contained all unique species names, a species table that represented actual species and a NAME_VERSION table that linked the two together. At any one time there would be a preferred name (ie the currently accepted scientific name for the species) which was a boolean field held in name_version.
In your example this would translate to a Details table (detailsid, otherdetails columns) a link table called DetailsVersion (detailsid, personid) and a Person Table (personid, non-changing data). Relate dogs to Person.
Persons
id (int),
name,
.....
activeVersion (this will be UID from personVersionInfo)
note: Above table will have 1 row for each person. will have original info with which person was created.
PersonVersionInfo
UID (unique identifier to identify person + version),
id (int),
name,
.....
versionId (this will be generated for each person)
Dogs
DogID,
DogName
......
PersonsWithDogs
UID,
DogID
EDIT: You will have to join PersonWithDogs, PersionVersionInfo, Dogs to get the full picture (as of today). This kind of structure will help you link a Dog to the Owner (with a specific version).
In case the Person's info changes and you wish to have latest info associated with the Dog, you will have to Update PersonWithDogs table to have the required UID (of the person) for the given Dog.
You can have restrictions such as DogID should be unique in PersonWithDogs.
And in this structure, a UID (person) can have many Dogs.
Your scenarios (what can change/restrictions etc) will help in designing the schema better.
Thanks to Michael Gattuso's answer, I discovered another way this relationship can be described. There are two solutions, this is the first of them. Please vote which one you like better.
Solution 1
In PERSONS table, we leave only the name (unique identifier) and a link to current person's data:
TABLE PERSONS:
name: varchar(30),
current_data_id: int
We create a new table, PERSONS_DATA, that contains all data history for that person:
TABLE PERSONS_DATA:
id: int
version: int (auto-generated)
... // some data, like address, etc.
DOGS table stays the same, it still points to a person's name (FK to PERSONS table).
ADVANTAGE: for each dog, there exists at least one PERSONS_DATA row that contains data of its owner (that's what I wanted)
DISADVANTAGE: if you want to change a person's data, you have to:
add a new PERSONS_DATA row
update PERSONS entry for this person to point to the new PERSONS_DATA row.
Thanks to Michael Gattuso's answer, I discovered another way this relationship can be described. There are two solutions, this is the second of them. Please vote which one you like better.
Solution 2
In PERSONS table, we leave only the name (unique identifier) and a link to the first (not current!) person's data:
TABLE PERSONS:
name: varchar(30),
first_data_id: int
We create a new table, PERSONS_DATA, that contains all data history for that person:
TABLE PERSONS_DATA:
id: int
name: varchar(30)
version: int (auto-generated)
... // some data, like address, etc.
DOGS table stays the same, it still points to a person's name (FK to PERSONS table).
ADVANTAGES:
for each dog, there exists at least one PERSONS_DATA row that contains data of its owner (that's what I wanted)
if I want to change a person's data, I don't have to update the PERSONS row, only add a new PERSONS_DATA row
DISADVANTAGE: to retrieve current person's data, I have to either:
choose PERSONS_DATA with given name and highest version (may be expensive)
choose PERSONS_DATA with special version, e.g. "-1", but then I would have to update two PERSONS_DATA rows each time I add new PERSONS_DATA, and in this solution I wanted to avoid having to update 2 rows...
What do you think?

How to display multiple values in a MySQL database?

I was wondering how can you display multiple values in a database for example, lets say you have a user who will fill out a form that asks them to type in what types of foods they like for example cookies, candy, apples, bread and so on.
How can I store it in the MySQL database under the same field called food?
How will the field food structure look like?
You may want to read the excellent Wikipedia article on database normalization.
You don't want to store multiple values in a single field. You want to do something like this:
form_responses
id
[whatever other fields your form has]
foods_liked
form_response_id
food_name
Where form_responses is the table containing things that are singular (like a person's name or address, or something where there aren't multiple values). foods_liked.form_response_id is a reference to the form_responses table, so the foods liked by the person who has response number six will have a value of six for the form_response_id field in foods_liked. You'll have one row in that table for each food liked by the person.
Edit: Others have suggested a three-table structure, which is certainly better if you are limiting your users to selecting foods from a predefined list. The three-table structure may be better in the case that you are allowing them the ability to enter their own foods, though if you go that route you'll want to be careful to normalize your input (trim whitespace, fix capitalization, etc.) so you don't end up with duplicate entries in that table.
normally, we do NOT work out like this. try to use a relation table.
Table 1: tbl_food
ID primary key, auto increment
FNAME varchar
Table 2: tbl_user
ID primary key, auto increment
USER varchar
Table 3: tbl_userfood
RID auto increment
USERID int
FOODID int
Use similar format to store your data, instead a chunk of data fitted into a field.
Querying in these tables are easier than parsing the chunk of data too.
Use normalization.
More specifically, create a table called users. Create another called foods. Then link the two tables together with a many-to-many table called users_to_foods referencing each others foreign keys.
One way to do it would be to serialize the food data in your programming language, and then store it in the food field. This would then allow you to query the database, get the serialized food data, and convert it back into a native data structure (probably an array in this case) in your programming language.
The problem with this approach is that you will be storing a lot of the same data over and over, e.g. if a lot of people like cookies, the string "cookies" will be stored over and over. Another problem is searching for everyone who likes one particular food. To do that, you would have to select the food data for each record, unserialize it, and see if the selected food is contained within. This is a very inefficient.
Instead you'll want to create 3 tables: a users table, a foods table, and a join table. The users and foods tables will contain one record for each user and food respectively. The join table will have two fields: user_id and food_id. For every food a user chooses as a favorite, it adds a record to the join table of the user's ID and the food ID.
As an example, to pull all the users who like a particular food with id FOOD_ID, your query would be:
SELECT users.id, users.name
FROM users, join_table
WHERE join_table.food_id = FOOD_ID
AND join_table.user_id = users.id;