Suppose I have the following:
Table region_city
id name parent_id
==============================
1 North null
2 South null
3 Manchester 1
4 London 2
In my user table I store the ID of the City that the user is in.
Now in my search form I need to be able to perform a top-level search, i.e. find all Users that belong to a given region (North or South).
Will it make life easier if I included a region_id field in my user table? Or is that going against the normalisation concept?
It does denormalize the table structures and it could introduce data update anomalies. Consider: the user moves from Manchester to London and the city_id changes. The region_id could still point to the North.
The region_id only depends on the city therefore it does not belong in the user table. Since it can be derived from the city.
If the design absolutely calls for only two levels (region and city) and you are willing to forgo the possible addition of other levels in the future (not a decision I would be inclined to make, but you know your data better than I do) then do not include the regionID in your user table; that would denormalize your database. Instead, you have several choices for representing the data (including two related tables, region and city) and you would perform your search by JOINing the city table to the user table or using an IN clause in your search.
Related
I have country, region, county, town data and I'm currently deciding between 2 schema designs (if there's a better one, do tell).
I first thought
Country
Id
Name
Region
Id
CountryId
Name
County
Id
RegionId
Name
Town
Id
CountyId
Name
Does the job however to get all towns in a country you have to 3 inner joins to do the filtering. I guess this could be ok but potentially expensive?
The other design was:
Country
Id
Name
Region
Id
Name
County
Id
Name
Town
Id
CountryId
RegionId
CountyId
Name
This way all hierarchical data so to speak is at the bottom and you can go back up however if you want all regions in a country you're a bit screwed which makes we wonder whether the first design is best.
What do you think is the best schema design?
The best database design depends on how the data is being used.
If this is pretty static data that will all be updated at one time and external references are all to towns, then I would probably go for a denormalized dimension. That is, store the information all in one row:
Town Id
Town name
County name
Region name
Country name
Under the above scenario, the ids for county, region, and country are not necessary (by assumption).
If the data is being provided as separate tables with separate ids, and these tables can be updated independently or row-by-row, then a separate table for each makes sense. Putting all the ids into the towns table may or may not be a good idea. You will have to verify and maintain the hierarchies when data is inserted and updated.
If ids for each level are necessary for your, then you should have appropriate table structure for declaring foreign key constraints. But, this can get complicated. Will an external entity have a "geography" attribute that can be at any level? Will an external always know what level it is going to refer to as?
In other words, you need to know how the data is going to be used in order to define an appropriate data model.
I have 5 tables that have the same structure and same columns: id, name, description. So I wonder what is the best way to design or to avoid having 5 tables that have the same columns:
Create a category table that will include my three common
columns and another column "enum" that will differentiate my categories
ex (city, country, continent, etc.)
Create a category table that will include my three common
columns and create the other five tables that will just include an
id.
Note that I would have an assocation table that should include the id of cities, id countries, id continents, etc. so i can display them into a report
Thank you for your advice.
The decision on how many tables to have under these circumstances simply depends.
The most important factor is whether the five things are independent entities or whether they are related. A simple way to understand this is by understanding foreign key relationships: Will other tables have a column that could refer to any of the five (say "geoid")? Or will other tables have a column that generally refers to one of the five ("cityid", "countryid")? The ability to define helpful foreign key constraints often drives the table structure.
There are other considerations. If your data is at the geographic level, then it might represent hierarchies . . . cities are in countries, countries are on continents. Some databases (such as MySQL) do not support hierarchical queries at all. Under these circumstances, you might consider denormalizing the data for querying purposes.
Other considerations can also come into play. If your application is going to be internationalized, then having all the reference tables in a single place is handy -- for providing cultural-specific information (language, currency symbol, and so on). One way of handling this situation is to put all such references in a single table (and perhaps using more sophisticated foreign key relationships).
The column names are not important, just the data in the columns. If City description, country description and continent description are different information then you are already doing this the right way. The only time you would aim to reduce this data would be if you were repeating information but for the titles of the data it's fine.
In fact. You are doing this correctly. Country will have different values from city for every field mentioned. Id is just an id, every table should have one. Name and description wont be the same across country and city.
Also, this way if you want a countrys name you dont have to go through every country, continent and city. You only have 192 or so entries to go through. If you had all of that in one massive table you would have to use it for everything and go through every result every time you want data. You would also have to distinguish between cities, countries and continents in some other way than the separate tables.
Eg:
method 1, with 5 tables:
SELECT * FROM country
does the same as
method 2, 1 table:
SELECT * FROM table WHERE enumvalue = 'country';
If you have tables representing city, country and continent, and they all have exactly the same fields, you have a fundamental problem. In real life, every city is in a country and every country is in at least one continent (more or less) but your data structure does not reflect that. Your city table should look something like this:
id (primary key)
countryId (foreign key to country)
name
other fields
You'll need a similar relationship between countries and continents. However, before you do so, you have to decide what to do about countries like Russia which is in two continents and Palau which isn't really in any.
You may also want to have a provinceStateTerritory table to help you sort out the 38 places in the United States named Springfield. Or, you may want to handle that situation differently.
I have a very basic question, which would be a more efficient design, something that involves more joins, or just adding columns to one larger table?
For instance, if we had a table that stored relatives like below:
Person | Father | Mother | Cousing | Etc.
________________________________________________
Would it be better to list the name, age, etc. directly in that table.. or better to have a person table with their name, age, etc., and linked by person_id or something?
This may be a little simplistic of an example, since there are more than just those two options. But for the sake of illustration, assume that the relationships cannot be stored in the person table.
I'm doing the latter of the two choices above currently, but I'm curious if this will get to a point where the performance will suffer, either when the person table gets large enough or when there are enough linked columns in the relations table.
Id' go for more "Normality" to increase flexibility and reduce data duplication.
PERSON:
ID
First Name
Last Name
Person_Relations
PersonID
RelationID
TypeID
Relation_Type
TypeID
Description
This way you could support any relationship (4th cousin mothers side once removed) without change code.
It is a much more flexible design to separate out the details of each person from the table relating them together. Typically, this will lead to less data consumption.
You could even go one step further and have three tables: one for people, one for relationship_types, and one for relationships.
People would have all the individual identifying info -- age, name, etc.
Relationship_types would have a key, a label, and potentially a description. This table is for elaborating the details of each possible relationship. So you would have a row for 'parent', a row for 'child', a row for 'sibling', etc.
Then the Relationships table has a four fields: one for the key of each person in the relationship, one for the key of the relationship_type, and one for its own key. Note that you need to be explicit in how you name the person columns to make it clear which party is which part of the relationship (i.e. saying that A and B have a 'parent' relationship only makes sense if you indicate which person is the parent vs which has the parent).
Depending on how you plan to use the data a better structure may be
a table for Person ( id , name etc )
a table for relationships (person_a_id, person_b_id, relation_type
etc)
where person_a_id and person_b_id relate to id in person
sample data may look like
Person
ID Name
1 Frank
2 Suzy
3 Emma
Relationship
A B Relationship
1 2 Wife
2 1 Husband
1 3 Daughter
2 3 Daughter
3 1 Father
3 2 Mother
I am developing an intranet application for my company. The company has a complicated structure in which there are many business lines, departments, divisions, units and groups. I have to take care of all of these things. Some employees work under department level, some of them work under unit level and so on. The problem now is with database design. I am confused about how to design the database. At the beginning, I decided to design it as following:
Employees: Username, Name, Title, OrgCode
Departments: OrgCode, Name
Divisions: OrgCode, Name
Units: OrgCode, Name
but the problem as I said before, some employees are working under departments, so how to make relations between all of these tables. Is it possible to have OrgCode in Employees table as a foreign key to OrgCode in Departments, Divisions and Units tables?
Could you please recommend me how to design it?
UPDATE:
#wizzardz put a nice database design. All what I need now is to have an example of data that fit this database design
Here's a set of data that I am using in the database:
let us assume that we have Employee with the following information:
Username: JohnA
Name: John
Title: Engineer
OrgCode: AA
And let us assume we have department AA, how I will distribute this data into the database design?
You could do a design something like this
rather than going for separate tables for Departments, Divisions etc try to store them in a single table with a TypeId to distingush Departments , Divisitions etc.
Could you try a design like this
In the Level table you need to enter the values like 'Deparments,Divisions', Groups, etc (By keeping it in a separate table you can handle any future addition of new levels by your organisation.)
In the OrganisationLevels table you need to store Department Names,Division Name, GroupName etc.
The Employee table has a forigen key reference with the table Organisation level, that will store which Level an employee is working in the organisation.
If you wants to store the work history of a particular employee/ there is a chance that an employee can be moves to one level to another I would suggest you to go for this design
Sample data wrt the design
Level
Id LevelType
1 Department
2 Division
3 Group
OrganisationLevels
Id Name LevelId Parent*(Give a proper name to this column)*
13 AA 1 NULL
.
.
21 B 2 13 (This column refer to the Id of department it belongs to.)
Employees
Id UserName Name Title
110 JohnA John Engineer
EmployeeWorkDetails
Id EmployeeId OrganisationLevelId StartDate EndDate IsActive
271 110 13 20/09/2011 NULL true
OrgCode from the Employee Table can be removed, because I thought it is the employee code of the employee with that organisation.
I hope this helps.
Employees: Username, Name, Title, OrgCode
Departments: OrgCode, Name
Divisions: OrgCode, Name
Units: OrgCode, Name
To the above mentioned DB design, have one more table called Org containing OrgCode and another column as type (which we give insights on which type of org is it i.e. Departments,Divisions and Units)
then you can have employee table's OrgCode to have refer the OrgCode of Org table(parent-child relationship).
I suggest that you learn the difference between analysis and design. When you design a database, you are inventing tables, columns and constraints that will affect how the data is stored and retrieved. You are concerned with ease of update and query, including operations you will learn about later.
When you analyze the data requirements, you aren't engaged in inventing things, you are engaged in discovering things. And the things you are discovering are things about the "real world" the subject matter is supposed to represent. You break the subject matter down into "entities" and relationships among those entities. Then you relate every value stored in the database to an instance of an attribute, and every attribute to some aspect of either an entity or a relationship. This results in a conceptual model.
In your case, the relationships between employees, departments, units, etc. sound quite complex. It's worth quite a bit of effort to come up with a model that reflects this complex reality accurately.
Once you have a good conceptual model, you can create SQL tables, columns, and constraints that adequately represent the conceptual model. This involves design skills that can be learned. But if you have a lousy conceptual model, you're doomed, no matter how good you are at design.
I am hopping on a project that sits on top of a Sql Server 2008 DB with what seems like an inefficient schema to me. However, I'm not an expert at anything SQL, so I am seeking for guidance.
In general, the schema has tables like this:
ID | A | B
ID is a unique identifier
A contains text, such as animal names. There's very little variety; maybe 3-4 different values in thousands of rows. This could vary with time, but still a small set.
B is one of two options, but stored as text. The set is finite.
My questions are as follows:
Should I create another table for names contained in A, with an ID and a value, and set the ID as the primary key? Or should I just put an index on that column in my table? Right now, to get a list of A's, it does "select distinct(a) from table" which seems inefficient to me.
The table has a multitude of columns for properties of A. It could be like: Color, Age, Weight, etc. I would think that this is better suited in a separate table with: ID, AnimalID, Property, Value. Each property is unique to the animal, so I'm not sure how this schema could enforce this (the current schema implies this as it's a column, so you can only have one value for each property).
Right now the DB is easily readable by a human, but its size is growing fast and I feel like the design is inefficient. There currently is not index at all anywhere. As I said I'm not a pro, but will read more on the subject. The goal is to have a fast system. Thanks for your advice!
This sounds like a database that might represent a veterinary clinic.
If the table you describe represents the various patients (animals) that come to the clinic, then having properties specific to them are probably best on the primary table. But, as you say column "A" contains a species name, it might be worthwhile to link that to a secondary table to save on the redundancy of storing those names:
For example:
Patients
--------
ID Name SpeciesID Color DOB Weight
1 Spot 1 Black/White 2008-01-01 20
Species
-------
ID Species
1 Cocker Spaniel
If your main table should be instead grouped by customer or owner, then you may want to add an Animals table and link it:
Customers
---------
ID Name
1 John Q. Sample
Animals
-------
ID CustomerID SpeciesID Name Color DOB Weight
1 1 1 Spot Black/White 2008-01-01 20
...
As for your original column B, consider converting it to a boolean (BIT) if you only need to store two states. Barring that, consider CHAR to store a fixed number of characters.
Like most things, it depends.
By having the animal names directly in the table, it makes your reporting queries more efficient by removing the need for many joins.
Going with something like 3rd normal form (having an ID/Name table for the animals) makes you database smaller, but requires more joins for reporting.
Either way, make sure to add some indexes.