Here's the deal : this is my first ever database project and I am afraid my solution to this problem isn't quite the best. The database keeps track of different "types" of cooperators. Those types are companies, organisations, employed workers, other "persons" ...
All of those have totally different sets of information, but they all have one in common - contact information. I decided to let user enter what kind of contact information he wants to add to any of the cooperator, whether its e-mail, phone, URL, Fax and so on ...
So I created "Contacts" table in which all the contact data for all the cooperators will be put, regardless of what type the cooperator is it.
The TablesList is the table that contains the list of cooperator types (Companies, Organisations, Workers)
Each row in the Contacts table must contain "TableID" number which identifies what type of the cooperator is it (Company/Worker/Organisation...), and must contain "RowID" which identifies what exact company/worker/organisation the contact is about.
The problem that exists it that Contacts table contains foreign keys from 3 other tables in 1 column, which cannot be good. I could remove the relationships and just fill the column with thos ID's without the DBMS knowing about the constrains, but that just doesnt look like a good solution to me, so now I doubt this idea is any good.
What do you suggest ?
Keep in mind that in future there may be some more types of cooperators added if needed (like temp/contract workers, agencies) and Contacts table should be designed to support them too
Thanks in advance !
Btw im using SQL CE and C#
here is the sketch of whats going on :
EDIT
Although it doesnt feel right, I just removed the relationships and it works just fine with my application so far
I suspect your design is over-normalized. You can simplify it by consolidating data into three tables: Companies, Workers and Organisations. It's not going to be normalized but it is much simpler to work with.
Check out http://www.codinghorror.com/blog/2008/07/maybe-normalizing-isnt-normal.html
Related
I am designing a database for a system and I came up with the following three tables
My problem is that an Address can belong to either a Person or a Company (or other things in the future) So how do I model this?
I discarded putting the address information in both tables (Person
and Company) because of it would be repeated
I thought of adding two columns (PersonId and CompanyId) to the
Address table and keep one of them null, but then I will need to add
one column for every future relation like this that appears (for
example an asset can have an address where its located at)
The last option that occur to me was to create two columns, one
called Type and other Id, so a pair of values would represent a
single record in the target table, for example: Type=Person,Id=5 and
Type=Company,Id=9 this way I can Join the right table using the type
and it will only be two columns no matter how many tables relate to
this table. But I cannot have constraints which reduce data integrity
I don't know if I am designing this properly. I think this should be a common issue (I've faced it at least three times during this small design in objects like Contact information, etc...) But I could not find many information or examples that would resemble mine.
Thanks for any guidance that you can give me
There are several basic approaches you could take, depending on how much you want to future proof your system.
In general, Has-One relationships are modeled by a foreign key on the owning entity, pointing to the primary key on the owned entity. So you would have an AddressId on both Company and Person,which would be a foreign key to Address.Id. The complexity in your case is how to handle the fact that a person can have multiple addresses. If you are 100% sure that there will only ever be a home and work address, you could put two foreign key columns on Person, but this becomes a big problem if there's a third, fourth, fifth etc. address. The other option is to create a join table, PersonAddress, with three columns a PersonId an AddressId and a AddressType, to indicate whether its a home work or whatever address.
To keep this as short as possible I'm going to use and example.
So let's say I have a simple database that has the following tables:
company - ( "idcompany", "name", "createdOn" )
user - ( "iduser", "idcompany", "name", "dob", "createdOn" )
event - ( "idevent", "idcompany", "name", "description", "date", "createdOn" )
Many users can be linked to a single company as well as multiple events and many events can be linked to a single company. All companies, users and events have columns as show above in common. However, what if I wanted to give my customers the ability to add custom fields to both their users and their events for any unique extra information they wish to store. These extra fields would be on a company wide basis, not on a per record basis ( so a company adding a custom field to their users would add it to all of their users not just one specific user ). The custom fields also need to be sesrchable and have the ability to be reported on, ideally automatically with some sort of report wizard. Considering the database is expected to have lots of traffic as well as lots of custom fields, what is the best solution for this?
My current research and findings in possible solutions:
To have generic placeholder columns such as "custom1", "custom2" etc.
** This is not viable as there will eventually be too many custom columns and there will be too many NULL values stored in the database
To have 3x tables per current table. eg: user, user-custom-field, user-custom-field-value. The user table being the same. The user-custom-field table containing the information about the new field such as name, data type etc. And the user-custom-field-value table containing the value for the custom field
** This one is more of a contender if it were not for its complexity and table size implications. I think it will be impossible to avoid a user-custom-field table if I want to automatically report on these fields as I will have to store the information on how to report on these fields here. However, In order to pull almost any data you would have to do a million joins on the user-custom-field-value table as well as the fact that your now storing column data as rows which in a database expected to have a lot of traffic as well as a lot of custom fields would soon cause a problem.
Create a new user and event table for each new company that is added to the system removing the company id from within those tables and instead using it in the table name ( eg user56, 56 being the company id ). Then allowing the user to trigger DB commands that add the new custom columns to the tables giving them the power to decide if it has a default value or auto increments etc.
** Everytime I have seen this solution it has always instantly been shut down by people saying it would be unmanageable as you would eventually get thousands of tables. However nobody really explains what they mean by unmanageable. Firstly as far as my understanding goes, more tables is actually more efficient and produces faster search times as the tables are much smaller. Secondly, yes I understand that making any common table changes would be difficult but all you would have to do is run a script that changes all your tables for each company. Finally I actually see benefits using this method as it would seperate company data making it impossible for one to accidentally access another's data via a potential bug, plus it would potentially give the ability to back up and restore company data individually. If someone could elaborate on why this is perceived as a bad idea It would be appreciated.
Convert fully or partially to a NoSQL database.
** Honestly I have no experience with schemaless databases and don't really know how dynamic user defined fields on a per record basis would work ( although I know it's possible ). If someone could explain the implications of the switch or differences in queries and potential benefits that would be appreciated.
Create a JSON column in each table that requires extra fields. Then add the extra fields into that JSON object.
** The issue I have with this solution is that it is nearly impossible to filter data via the custom columns. You would not be able to report on these columns and until you have received and processed them you don't really know what is in them.
Finally if anyone has a solution not mentioned above or any thoughts or disagreements on any of my notes please tell me as this is all I have been able to find or figure out for myself.
A typical solution is to have a JSON (or XML) column that contains the user-defined fields. This would be an additional column in each table.
This is the most flexible. It allows:
New fields to be created at any time.
No modification to the existing table to do so.
Supports any reasonable type of field, including types not readily available in SQL (i.e. array).
On the downside,
There is no validation of the fields.
Some databases support JSON but do not support indexes on them.
JSON is not "known" to the database for things like foreign key constraints and table definitions.
So, I've read a lot about how stashing multiple values into one column is a bad idea and violates the first rule of data normalisation (which, surprisingly, is not "Do Not Talk About Data Normalisation") so I need some help.
At the moment I'm designing an ASP .NET webpage for the place I work for. I want to display data on a web page depending on what Active Directory groups the person belongs to. The first way of doing this that comes to mind is to have a table with, essentially, a column containing the AD group and the second column containing what list of computers belong to that list.
I've learnt that this is showing great disregard for relational databases, so what is a better way to do it? I want to control this access by SQL tables, so I can add/remove from these tables and change end users access accordingly.
Thanks for the help! :)
EDIT: To describe exactly what I want to do is this:
We have a certain group of computers that need to be checked up on, however these computers are in physically difficult to reach locations. The organisation I belong to has remote control enabled for these computers, however they're not in the business of giving out the remote control password (understandable).
The added layer of complexity is that, depending on who you are, our clients should only be able to see a certain group of computers (that is, the group of computers that their area owns). So, if Group A has Thomas in it, and Group B has Jones in it, if you belong to either group then you would just see one entry. However, if you belong to both groups you should see both Thomas and Jones computers in it.
The reason why I think that storing this data in a SQL cell is the way to go is because, to store them in tables would require (in my mind) a new table for each new "group" of computers. I don't want to crank out SQL tables for every new group, I'd much rather just have an added row in a SQL table somewhere.
Does this make any sense?
You basically have three options in SQL Server:
Storing the values in a single column.
Storing the values in a junction table.
Storing the values as XML (or as some other structured data format).
(Other databases have other options, such as arrays, nested tables, and JSON.)
In almost all cases, using a junction table is the correct approach. Why? Here are some reasons:
SQL Server has (relatively) lousy string manipulation, so doing something as simple as ensuring a unique list is really, really hard.
A junction table allows you to store lots of other information (When was a machine added? What is the full description of the machine? etc. etc.).
Most queries that you want are pretty easy with a junction table (with the one exception of getting a comma-delimited list, alas -- which is just counterintuitive rather than "hard").
All the types are stored natively.
A junction table allows you to enforce constraints (both check and foreign key) on the elements of the list.
Although a delimited list is almost never the right solution, it is possible to think of cases where it might be useful:
The list doesn't change and presentation of the list is very important.
Space usage is an issue (alas, denormalization often results in fewer pages).
Queries do not really access elements of the list, just the entire thing.
XML is also a reasonable choice under some circumstances. In the most recent versions of SQL Server, this can be made pretty efficient. However, it incurs the overhead of reading and parsing XML -- and things like duplicate elimination are still not obvious.
So, you do have options. In almost all cases, the junction table is the right approach.
There is an "it depends" that you should consider. If the data is never going to be queried (or queried very rarely) storing it as XML or JSON would be perfectly acceptable. Many DBAs would freak out but it is much faster to get the blob of data that you are going to send to the client than to recompose and decompose a set of columns from a secondary table. (There is a reason document and object databases are becoming so popular.)
... though I would ask why are you replicating active directory to your database and how are you planning on keeping these in sync.
I not really a bad idea to store multiple values in one column, but will depend the search you want.
If you just only want to know the persons that is part of a group then you can store persons in one column with a group id as key. For update you just update the entire list in a group.
But if you want to search a specified person that belongs to group, then its not recommended that you store this multiple persons in one column. In this case its better to store a itermedium table that store person id, and group id.
Sounds like you want a table that maps users to group IDs and a second table that maps group IDs to which computers are in that group. I'm not sure, your language describing the problem was a bit confusing to me.
a list has some columns like: name, family name, phone number etc.
and rows like name=john familyName= lee number=12321321
name=... familyname=... number=...
an sql database works same way. every row in a sql database is a record. so you jusr add records of your list into your database using insert query.
complete explanation in here:
http://www.w3schools.com/sql/sql_insert.asp
This sounds like a typical many-to-many problem. You have many groups and many computers and they are related to eachother. In this situation, it is often recommended to use a mapping table, a.k.a. "junction table" or "cross-reference" table. This table consist solely of the two foreign keys in your other tables.
If your tables look like this:
Computer
- computerId
- otherComputerColumns
Group
- groupId
- othergroupColumns
Then your mapping table would look like this:
GroupComputer
- groupId
- computerId
And you would insert a single record for every relationship between a group and computer. This is in compliance with the rules for third normal form in regards to database normalization.
You can have a table with the group and group id, another table with the computer and computer id and a third table with the relation of group id and computer id.
So I know the convention for naming M-M relationship tables in SQL is to have something like so:
For tables User and Data the relationship table would be called
UserData
User_Data
or something similar (from here)
What happens then if you need to have multiple relationships between User and Data, representing each in its own table? I have a site I'm working on where I have two primary items and multiple independent M-M relationships between them. I know I could just use a single relationship table and have a field which determines the relationship type, but I'm not sure whether this is a good solution. Assuming I don't go that route, what naming convention should I follow to work around my original problem?
To make it more clear, say my site is an auction site (it isn't but the principle is similar). I have registered users and I have items, a user does not have to be registered to post an item but they do need to be to do anything else. I have table User which has info on registered users and Items which has info on posted items. Now a user can bid on an item, but they can also report a item (spam, etc.), both of these are M-M relationships. All that happens when either event occurs is that an email is generated, in my scenario I have no reason to keep track of the actual "report" or "bid" other than to know who bid/reported on what.
I think you should name tables after their function. Lets say we have Cars and People tables. Car has owners and car has assigned drivers. Driver can have more than one car. One of the tables you could call CarsDrivers, second CarsOwners.
EDIT
In your situation I think you should have two tables: AuctionsBids and AuctionsReports. I believe that report requires additional dictinary (spam, illegal item,...) and bid requires other parameters like price, bid date. So having two tables is justified. You will propably be more often accessing bids than reports. Sending email will be slightly more complicated then when this data is stored in one table, but it is not really a big problem.
I don't really see this as a true M-M mapping table. Those usually are JUST a mapping. From your example most of these will have additional information as well. For example, a table of bids, which would have a User and an Item, will probably have info on what the bid was, when it was placed, etc. I would call this table... wait for it... Bids.
For reporting items you might want what was offensive about it, when it was placed, etc. Call this table OffenseReports or something.
You can name tables whatever you want. I would just name them something that makes sense. I think the convention of naming them Table1Table2 is just because sometimes the relationships don't make alot of sense to an outside observer.
There's no official or unofficial convention on relations or tables names. You can name them as you want, the way you like.
If you have multiple user_data relationships with the same keys that makes absolutely no sense. If you have different keys, name the relation in a descriptive way like: stores_products_manufacturers or stores_products_paymentMethods
I think you're only confused because the join tables are currently simple. Once you add more information, I think it will be obvious that you should append a functional suffix. For example:
Table User
UserID
EmailAddress
Table Item
ItemID
ItemDescription
Table UserItem_SpamReport
UserID
ItemID
ReportDate
Table UserItem_Post
UserID -- can be (NULL, -1, '', ...)
ItemID
PostDate
Table UserItem_Bid
UserId
ItemId
BidDate
BidAmount
Then the relation will have a Role. For instance a stock has 2 companies associated: an issuer and a buyer. The relationship is defined by the role the parent and child play to each other.
You could either put each role in a separate table that you name with the role (IE Stock_Issuer, Stock_Buyer etc, both have a relationship one - many to company - stock)
The stock example is pretty fixed, so two tables would be fine. When there are multiple types of relations possible and you can't foresee them now, normalizing it into a relationtype column would seem the better option.
This also depends on the quality of the developers having to work with your model. The column approach is a bit more abstract... but if they don't get it maybe they'd better stay away from databases altogether..
Both will work fine I guess.
Good luck, GJ
GJ
For a school project I'm making a simple Job Listing website in ASP.NET MVC (we got to choose the framework).
I've thought about it awhile and this is my initial schema:
JobPostings
+---JobPostingID
+---UserID
+---Company
+---JobTitle
+---JobTypeID
+---JobLocationID
+---Description
+---HowToApply
+---CompanyURL
+---LogoURL
JobLocations
+---JobLocationID
+---City
+---State
+---Zip
JobTypes
+---JobTypeID
+---JobTypeName
Note: the UserID will be linked to a Member table generated by a MembershipProvider.
Now, I am extremely new to relational databases and SQL so go lightly on me.
What about naming? Should it be just "Description" under the JobPostings table, or should it be "JobDescription" (same with other columns in that main table). Should it be "JobPostingID" or just "ID"?
General tips are appreciated as well.
Edit: The JobTypes are fixed for our project, there will be 15 job categories. I've made this a community wiki to encourage people to post.
A few thoughts:
Unless you know a priori that there is a limited list of job types, don't split that into a separate table;
Just use "ID" as the primary key on each table (we already know it's a JobLocationID, because it's in the JobLocations table...);
I'd drop the 'Job' prefix from the fields in JobPostings, as it's a bit redundant.
There's a load of domain-specific info that you could include, like salary ranges, and applicant details, but I don't know how far you're supposed to be going with this.
Job Schema http://gfilter.net/junk/JobSchema.png
I split Company out of Job Posting, as this makes maintaining the companies easier.
I also added a XREF table that can store the relationship between companies and locations. You can setup a row for each company office, and have a very simple way to find "Alternative Job Locations for this Company".
This should be a fun project...good luck.
EDIT: I would add Created and LastModifiedBy (Referring to a UserID). These are great columns for general housekeeping.
Looks good to me, I would recommend also adding Created, LastModified and Deleted columns to the user updateable tables as well for future proofing.
Make sure you explicitly define your primary and foreign keys as well in your schema.
What about naming? Should it be just
"Description" under the JobPostings
table, or should it be JobDescription
(same with other columns in that main
table). Should it be "JobPostingID" or
just "ID"?
Personally, I specify generic-sounding fields like "ID" and "Description" with prefixes as you suggest. It avoids confusion about what the id/description applies to when you write queries later on (and saves you the trouble of aliasing them).
I'd recommend folding the data you're going to be storing in JobLocations back into the main table. It's ok to have a table for states and another for countries, but I doubt you want a table that contains every city/state/country pair, you really don't gain anything from it. What happens if someone goes in and edits their location? You'd have to check to make sure no other joblisting points to the location and edit it, else create a new location and point to that instead.
My usual pattern is address and city as text with the record and FK to a state table.