Custom Fields DB Design for Membership Application - sql

I need to design database for a membership application which needs custom classification for multiple organizations
Following are the data set:
Organization type 1:
Name, Email, Joining Year, End Year, Role, Location.
Organization type 2:
Name, Email, Joining Year, End Year, Role, Department, Sub Organization, Location
Organization type 3:
Name, Email, Joining Year, End Year, Role, Identification No.
what would be the best way to design database for it?
few field items are common, few are specific to org, org types are limited
Option 1:
members_table - member_id, name, email, joining_year, end_year, role
members_org_type_1 - member_id, location
members_org_type_2 - member_id, department, sub_org, location
members_org_type_3 - member_id, id_no
Option: 2
members_table - member_id, name, email, joining_year, end_year, role
member_fields - member_id, field_type, field_value
field_labels - field_type, field_label
second type looks promising, but do not know how to do join operations members_table & member_fields with required fields?

This is a common problem in database design and there are 3 most common ways to deal with (4 if we count EAV):
Three separate tables, one for each type.
One table - with a lot of columns - where some of them will be allowed to have Nulls. The integrity cannot be easily dealt by the database (which column combinations will be Null and which not) and are usually dealt by the application. This is #noa's answer and it results in slightly less code and probably easier to come up with a working (although not perfectly) application.
One Member table (this is the supertype) and 3 additional tables, one for each subtype. This allows you to have no Nulls and to enforce which columns will be used, depending on the organization type. (this is your Option 1)
You can also add an org_type column in all tables. This will mean an
additional UNIQUE constraint on Member (org_type, member_ID) and the FOREIGN KEY constraint (from each subtype table) altered to include this org_type column. Something like this:
CREATE TABLE Member
( MemberID
, Org_Type
, Name
, ...
, Role
, PRIMARY KEY (MemberID)
, UNIQUE KEY (Org_Type, MemberID)
, CHECK Org_Type IN (1, 2, 3)
) ;
CREATE TABLE Member_Type_1
( MemberID
, Org_Type
, Location
, PRIMARY KEY (MemberID)
, FOREIGN KEY (Org_Type, MemberID)
REFERENCES Member(Org_Type, MemberID)
, CHECK Org_Type = 1
) ;
and finally there's (your option 2) EAV:
Entity-Attribute-Value model is, according to Wikipedia:
a data model to describe entities where the number of attributes (properties, parameters) that can be used to describe them is potentially vast, but the number that will actually apply to a given entity is relatively modest. In mathematics, this model is known as a sparse matrix. EAV is also known as object–attribute–value model, vertical database model and open schema.
There are various reasons not to use EAV in relational databases, mainly because of problems regarding datatype and referential integrity (that cannot be easily enforced), difficulty in writing even simple queries (that end up written with a lot of joins) and efficiency. See the answer by Simon Righarts at DBA.SE question: Is there a name for this database structure?
There are reasons that it's a valid option in certain cases though, as the article by Aaron Bertrand explains: What is so bad about EAV, anyway?, especially when you have a lot of columns and even more when you don't know in advance what columns you will need (custom made by customers). That may be your case, if you want the organizations to be able to add custom columns.
Note however, that it's not easy to costruct an efficient EAV model/application. You are actually building an RDBMS inside a database.

If org types are limited and rarely changing, just use one table:
members_table - member_id, name, email, joining_year, end_year, role, location, department, sub_org, id_no
Use null values in the fields which aren't relevant to the organization type, and hide the non-applicable fields when you present the information.
I gave a similar answer here, though it was for a different database.

Related

SQL database structure with two changing properties

Let's assume I am building the backend of a university management software.
I have a users table with the following columns:
id
name
birthday
last_english_grade
last_it_grade
profs table columns:
id
name
birthday
I'd like to have a third table with which I can determine all professors teaching a student.
So I'd like to assign multiple teachers to each student.
Those Professors may change any time.
New students may be added any time too.
What's the best way to achieve this?
The canonical way to do this would be to introduce a third junction table, which exists mainly to relate users to professors:
users_profs (
user_id,
prof_id,
PRIMARY KEY (user_id, prof_id)
)
The primary key of this junction table is the combination of a user and professor ID. Note that this table is fairly lean, and avoids the problem of repeating metadata for a given user or professor. Rather, user/professor information remains in your two original tables, and does not get repeated.

Composite primary key: Finding one attribute using another

Data fields
I am designing a database table structure. Say that we need to record employee profiles from different companies. We have the following fields:
+---------+--------------+-----+--------+-----+
| Company | EmployeeName | Age | Gender | Tel |
+---------+--------------+-----+--------+-----+
It's possible that two employees from different company may have the same name (and assume that no 2 employee has the same name in the same company). In this case a composite primary key (Company, EmployeeName) would be necessary in my opinion.
Search
Now I need to get all information by using only one of the 2 attributes in the primary key. For example,
I want to search all employees' profile of Company A:
SELECT EmployeeName, Age, Gender, Tel FROM table WHERE Company = 'Company A'
And I can also search all employees from different company named Donald:
SELECT Company, Age, Gender, Tel FROM table WHERE EmployeeName = 'Donald'
Strategy
In order to implement this requirement, my strategy would be storing all data in a single table, which is easy to read and understandable. However I noticed that it may take a long time to search as the query may need to iterate through all rows. I would like to retrieve these information as quick as possible. Would there be a better strategy for this?
First, your rows should have a unique identifier for each row -- identity/auto-increment/serial, depending on the database. Second, you might reconsider names being unique. Why can't two people at the same company have the same name?
In any case, you have a primary key on, say, (company, name). For the opposite search you simply want another index on (name, company):
create index idx_profiles_name_company on profiles(name, company);
A note explaining Gordon's suggestion for an identity on each row. This is supplemental to his answer above.
In theory there is nothing wrong with a primary key that crosses columns and in a db like PostgreSQL I like to have identity values as secondary keys (i.e. not null unique) and specify natural primary keys. Of course on MS SQL Server or MySQL/InnoDB that would be a recipe for problems. I would also not say "all" but rather "almost all" since there are times when breaking this rule is good.
Regardless, having an identity row simplifies a couple of things and it provides an abstraction around keys in case you get things wrong. Composite keys provide a couple issues that end up eating time (and possibly resulting in downtime) later. These include:
Joins on composite keys are often more expensive than those on simple values, and
Adding or changing a natural primary key which crosses columns is far harder when joins are involved
So depending on your db you should either specify a unique secondary key or make your natural primary key separate (which you should do depends on storage and implementation specifics).

How to include these requirements in my database design?

I am developing an intranet application for my company. The company has a complicated structure in which there are many business lines, departments, divisions, units and groups. I have to take care of all of these things. Some employees work under department level, some of them work under unit level and so on. The problem now is with database design. I am confused about how to design the database. At the beginning, I decided to design it as following:
Employees: Username, Name, Title, OrgCode
Departments: OrgCode, Name
Divisions: OrgCode, Name
Units: OrgCode, Name
but the problem as I said before, some employees are working under departments, so how to make relations between all of these tables. Is it possible to have OrgCode in Employees table as a foreign key to OrgCode in Departments, Divisions and Units tables?
Could you please recommend me how to design it?
UPDATE:
#wizzardz put a nice database design. All what I need now is to have an example of data that fit this database design
Here's a set of data that I am using in the database:
let us assume that we have Employee with the following information:
Username: JohnA
Name: John
Title: Engineer
OrgCode: AA
And let us assume we have department AA, how I will distribute this data into the database design?
You could do a design something like this
rather than going for separate tables for Departments, Divisions etc try to store them in a single table with a TypeId to distingush Departments , Divisitions etc.
Could you try a design like this
In the Level table you need to enter the values like 'Deparments,Divisions', Groups, etc (By keeping it in a separate table you can handle any future addition of new levels by your organisation.)
In the OrganisationLevels table you need to store Department Names,Division Name, GroupName etc.
The Employee table has a forigen key reference with the table Organisation level, that will store which Level an employee is working in the organisation.
If you wants to store the work history of a particular employee/ there is a chance that an employee can be moves to one level to another I would suggest you to go for this design
Sample data wrt the design
Level
Id LevelType
1 Department
2 Division
3 Group
OrganisationLevels
Id Name LevelId Parent*(Give a proper name to this column)*
13 AA 1 NULL
.
.
21 B 2 13 (This column refer to the Id of department it belongs to.)
Employees
Id UserName Name Title
110 JohnA John Engineer
EmployeeWorkDetails
Id EmployeeId OrganisationLevelId StartDate EndDate IsActive
271 110 13 20/09/2011 NULL true
OrgCode from the Employee Table can be removed, because I thought it is the employee code of the employee with that organisation.
I hope this helps.
Employees: Username, Name, Title, OrgCode
Departments: OrgCode, Name
Divisions: OrgCode, Name
Units: OrgCode, Name
To the above mentioned DB design, have one more table called Org containing OrgCode and another column as type (which we give insights on which type of org is it i.e. Departments,Divisions and Units)
then you can have employee table's OrgCode to have refer the OrgCode of Org table(parent-child relationship).
I suggest that you learn the difference between analysis and design. When you design a database, you are inventing tables, columns and constraints that will affect how the data is stored and retrieved. You are concerned with ease of update and query, including operations you will learn about later.
When you analyze the data requirements, you aren't engaged in inventing things, you are engaged in discovering things. And the things you are discovering are things about the "real world" the subject matter is supposed to represent. You break the subject matter down into "entities" and relationships among those entities. Then you relate every value stored in the database to an instance of an attribute, and every attribute to some aspect of either an entity or a relationship. This results in a conceptual model.
In your case, the relationships between employees, departments, units, etc. sound quite complex. It's worth quite a bit of effort to come up with a model that reflects this complex reality accurately.
Once you have a good conceptual model, you can create SQL tables, columns, and constraints that adequately represent the conceptual model. This involves design skills that can be learned. But if you have a lousy conceptual model, you're doomed, no matter how good you are at design.

Database design for customer to skills

I have an issue where we have a customer table includes name, email, address and a skills table which is qts, first aid which is associated by an id. For example
Customer
id = 1
Name = James
Address = some address
Skills
1, qts
2, first aid
I am now trying to pair up the relationship. I first came to a quick solution just by creating a skills table which just has customerId and each skill has a true / false value. Then created a go between customer_skills with an customerId to SkillId. But I would not know how to update the records when values change as there is no unique id.
can anyone help on what would be the best way to do this?
thanks....
The solution you want really depends on your data, and is a question that has been asked thousands of times before. If you google
Entity Attribute Value vs strict relational model you will see countless articles
comparing and contrasting the methods available.
Strict Relational Model
You would add additional BIT or DATETIME fields (Where a NULL datetime represents the customer not having the skill)
to your customer table for each skill. This works well if you have few skills that are unlikely to change much over time.
This allows simple queries to locate customers with skills, especially with various combinations of skills ee.g (Using datetime fields)
SELECT *
FROM Customer
WHERE Skill1 >= '20120101' -- SKILL 1 AQUIRED AFTER 1ST JAN 2012
AND Skill2 IS NOT NULL -- HAS SKILL 2
AND Skill2 IS NULL -- DOES NOT POSSESS SKILL 3
Entity-Attribute-Value Model
This is a slight adaptation of a classic entity-attribute-value model, because the value is boolean represented by the existence of a record.
You would create a table like this:
CREATE TABLE CustomerSkills
( CustomerID INT NOT NULL,
SkillID INT NOT NULL
PRIMARY KEY (CustomerID, SkillID),
FOREIGN KEY (CustomerID) REFERENCES Customer (ID),
FOREIGN KEY (SkillID) REFERENCES Skills (ID)
)
You may want additional columns such as DateAdded, AddedBy etc to track when skills were added and who by etc, but the core principles can be gathered from the above.
With this method it is much easier to add skills, as it doesn't require adding columns, but can make simple queries much more complicated. The above query would have to be written as:
SELECT Customer.*
FROM Customer
INNER JOIN
( SELECT CustomerID
FROM CustomerSkills
WHERE SkillID IN (2, 3) -- SKILL2,SKILL3
OR (SkillID = 1 AND DateAdded >= '20120101')
GROUP BY CustomerID
HAVING COUNT(*) = 2
AND COUNT(CASE WHEN SkillID = 3 THEN 1 END) = 0
) skills
ON Skills.CustomerID = Customer.ID
This is much more complext and resource intensive than with the relational model, but the overall structure is much more flexible.
So to summarise, it really depends on your own particular situation, there are a few factors to consider, but there are plenty of resources out there to help you decide.
If you have a table linking the primary keys from two other tables together in order to form a many-to-many relationship (like in you example) you don't have to update that table. Instead you can just delete and reinsert values into it.
If you are editing a custiomer (customerId 46 for instance) and changing the skills for that customer, you can just delete all skills for the customer and then reinsert the new set of skills when storing the changes.
If your "link table" contains some additional information besides just the two primary key columns, then the situation might be different. But from your description it seems like you just want to link the table together using the primary keys from each table. In that case a delete + reinsert should be fine.
Also in this kind of table, you should make the combination of the two foreign key fields be the primary key of the binding table.

Sql naming best practice

I'm not entirely sure if there's a standard in the industry or otherwise, so I'm asking here.
I'm naming a Users table, and I'm not entirely sure about how to name the members.
user_id is an obvious one, but I wonder if I should prefix all other fields with "user_" or not.
user_name
user_age
or just name and age, etc...
prefixes like that are pointless, unless you have something a little more arbitrary; like two addresses. Then you might use address_1, address_2, address_home, etc
Same with phone numbers.
But for something as static as age, gender, username, etc; I would just leave them like that.
Just to show you
If you WERE to prefix all of those fields, your queries might look like this
SELECT users.user_id FROM users WHERE users.user_name = "Jim"
When it could easily be
SELECT id FROM users WHERE username = "Jim"
I agree with the other answers that suggest against prefixing the attributes with your table names.
However, I support the idea of using matching names for the foreign keys and the primary key they reference1, and to do this you'd normally have to prefix the id attributes in the dependent table.
Something which is not very well known is that SQL supports a concise joining syntax using the USING keyword:
CREATE TABLE users (user_id int, first_name varchar(50), last_name varchar(50));
CREATE TABLE sales (sale_id int, purchase_date datetime, user_id int);
Then the following query:
SELECT s.*, u.last_name FROM sales s JOIN users u USING (user_id);
is equivalent to the more verbose and popular joining syntax:
SELECT s.*, u.last_name FROM sales s JOIN users u ON (u.user_id = s.user_id);
1 This is not always possible. A typical example is a user_id field in a users table, and reported_by and assigned_to fields in the referencing table that both reference the users table. Using a user_id field in such situations is both ambiguous, and not possible for one of the fields.
As other answers suggest, it is a personal preference - pick up certain naming schema and stick to it.
Some 10 years ago I worked with Oracle Designer and it uses naming schema that I like and use since then:
table names are plural - USERS
surrogate primary key is named as singular of table name plus '_id' - primary key for table USERS would be "USER_ID". This way you have consistent naming when you use "USER_ID" field as foreign key in some other table
column names don't have table name as prefix.
Optionally:
in databases with large number of tables (interpret "large" as you see fit), use 2-3
characters table prefixes so that you can logically divide tables in areas. For example: all tables that contain sales data (invoices, invoice items, articles) have prefix "INV_", all tables that contain human resources data have prefix "HR_". That way it is easier to find and sort tables that contain related data (this could also be done by placing tables in different schemes and setting appropriate access rights, but it gets complicated when you need to create more than one database on one server)
Again, pick naming schema you like and be consistent.
Just go with name and age, the table should provide the necessary context when you're wondering what kind of name you're working with.
Look at it as an entity and name the fields accordingly
I'd suggest a User table, with fields such as id, name, age, etc.
A group of records is a bunch of users, but the group of fields represents a user.
Thus, you end up referring to user.id, user.name, user.age (though you won't always include the table name, depending on the query).
For the table names, I usually use pluralized nouns (or noun phrases), like you.
For column names I'd not use the table name as prefix. The table itself specifies the context of the column.
table users (plural):
id
name
age
plain and simple.
It's personal preference. The best advice we can give you is consistency, legibility and ensuring the relationships are correctly named as well.
Use names that make sense and aren't abbreviated if possible, unless the storage mechanism you are using doesn't work well with them.
In relationships, I like to use Id on the primary key and [table_name]_Id on the foreign key. eg. Order.Id and OrderItem.OrderId
Id works well if using a surrogate key as a primary key.
Also your storage mechanism may or may not be case sensitive, so be sure to that into account.
Edit: Also, thre is some theory to suggest that table should be name after what a single record in that table should represent. So, table name "User" instead of "Users" - personally the plural makes more sense to me, just keep it consistent.
First of all, I would suggest using the singular noun, i.e. user instead of users, although this is more of a personal preference.
Second, there are some who prefer to always name the primary key column id, instead of user_id (i.e. table name + id), and similar with for example name instead of employee_name. I think this is a bad idea for the following reason:
-- when every table has an "id" (or "name") column, you get duplicate column names in the output:
select e.id, e.name, d.id, d.name
from employee e, department d
where e.department_id = d.id
-- to avoid this, you need to specify column aliases every time you query:
select e.id employee_id, e.name employee_name, d.id department_id, d.name department_name
from employee e, department d
where e.department_id = d.id
-- if the column name includes the table, there are no conflicts, and the join condition is very clear
select e.employee_id, e.employee_name, d.department_id, d.department_name
from employee e, department d
where e.department_id = d.department_id
I'm not saying you should include the table name in every column in the table, but do it for the key (id) column and other "generic" columns such as name, description, remarks, etc. that are likely to be included in queries.
I explicitly named my columns using a prefix that was related to the table
i.e. table = USERS, column name = user_id, user_name, user_address_street, etc.
before that, when i started using JOINS I had to alias the crap out of the column names to avoid conflict in the query results, and then when accessed from templates in a MVC View, if the query result field name didn't match the published db schema, the template designers would get all confused and have to ask for the SQL VIEW to determine the correct field name to use.
So it looks messy to use a prefix in a column name, but in practice it works better for us.
I'm not entirely sure if there's a standard in the industry
Yes: ISO 11179-5: Naming and identification principles, available here.
I think table and column names must be like that.
Table Name :
User --> Capitalize Each Word and not plural.
Column Names :
Id --> If i see "Id" I understand this is PK column.
GroupId --> I understanding there is an table which named Group and this column is relation column for Group table.
Name --> If there is a column which named "Name" in User table, this means name of user. It's enaughly clear.
Especially if you are using Entity Framework I suppose this more.
Note: Sorry for my bad English. If somebody will correct my bad English i will be happy.