When to create a new table? - sql

I have an application with employees and their work contracts.
Each work contract has a Vacation. We keep track of how many days the company owes the employee ("VacationOwe") and how many he has taken ("VacationTaken").
Should I make this "VacationOwe" and "VacationTaken" a property of the Contract, or should I create a class (table) called Vacaction with the properties "VacationOwe", "VacationTaken" and "ContractId" and join the two tables?
What are the advantange of both methods?
Is there any rule when you should create a new class or table or keep the data in one.

If the two properties are truly only related to an employee, there's no benefit of creating a separate table. Performance will be worse and you'll constantly have to join on those tables.
For this specific example, it seems like the vacation days may also be linked to a year. If that's the case, a separate table would make sense so you can track vacation days taken/owed by employee and year.

should I make this "VacationOwe" and "VacationTaken" a propertie o the employee
Most probably...No. Because this defeats normalization. Also, it does not provide information about when the vacations were taken (in case you care about that). In addition to that, you have to do this calculation for every employee, every year.
Use a class (table) Call Vacaction with properties "VacationOwe" and "VacationTaken" and "EmployeeId" and cross the two tables?
It is not good to have a singleton class in this case. In general you should avoid singletons in most cases.
So what to do? Well, if you system does not care about vacation details, you could go with the first solution. Maybe you want to consider option (A) below or If you would like to have a more generic approach you could do something similar to option (B). It all depends on your detailed requirements.

Related

Data Warehouse Modeling

I have a FactCase which reflects the metrics of cases created by customers. It has a field Owner_Key which is linked to DimEmployee.
DimEmployee has all the employees in organization. However, the Owner_Key in FactCase can also be a particular "Team" (Which is Queue in Salesforce, basically like in JIRA- don't assign to a person, assign to a team). DimEmployee will not have "Team" related data. We could slam Queues into DimEmployee, but that's obviously a strange fit, it breaks the data modeling rule of mixing granularity (employee / team) and then calling it the DimEmployee doesn't make sense in a long term.
Approach 1:
Thought of DimOwner as separate dimension, that's not a possibility because "Queue" from Salesforce which will be used to populate DimOwner can be used all over the place, not just owners and also DimOwner is not a business entity that makes much sense .
Approach 2:
The other approach if I can think is: Going to create a new dimension by union on top of "DimEmployee" and "DimQueue" that can be used for this type of occurrence. The facts will retain an ‘Employee_Key’ and a ‘new_dimension_key’ to allow both types of analysis. "Queue_key" is a TBD- not sure if there’s enough useful information there. Only question is what is the best naming of this ‘new_dimension_key’ i.e. dimension from union?
Please let me know your thoughts on the above discussed approaches and also suggest me if there are other best approaches to model this one.
TIA

SQL - MS Access Form design - add data of ISA relationships

I'm taking a DBMS course and I need to design and build my own DB. I have a database for a hospital where doctors,nurses,support staff etc are in a ISA relationship to an Employee entity with the rest of the data like the name, address , salary and the rest of the employee data.
Designing a form, I want to be able to add an employee with all of their data in one form.
Is there a way to do a "conditional table" of sorts where if i select "doctor" from a drop-down i get to add to the doctor table too, and same for the rest of the entities under the ISA relationship?
Thx!
As a general rule, when dealing with data, you do NOT flip or switch tables for a given form or relatonal database design.
So, for example. If I have a table of customers. Well, now if I want to mark some of the customers as plumbers, and others as doctors? I don't create two tables.
All I would do is add ONE column to that customers table and it would simply allow me to set the type of customer. The reason for this design is "many" but some significant reasons are:
For each new type of customer, you would not create a new table. Worse, all of the forms, the reports, the SQL, the code you write? Well, all of that code would have to be modified EACH time you create a new table. So, you SIMPLY cannot adopt a design in which the concept of changing a table is part of that process.
Forms are bound to ONE table. For related data, you in most cases will use a sub form.
So, think of even a accounting system. They can have huge numbers of customers, and as a result, you can "query" that table to give you all customers. Or you might ask how many accounting firms are in the customer list. Or make a report that summeries by customer type a "count" of each type of customer.
So, buidling forms, or reports? They cannot on the fly "change" the tables they are using.
So, in place of a tables called:
SalesJan
SalesFeb
SalesMar
etc.
Well, now you can't query sales from Jan to mar, because the data is in different tables.
So, what you do is have ONE table called "sales", and you add ONE column of the date. Now, at the start of each new month, you don't have to create a new table.
Now, of course in some cases it makes sense to create a separate table. For example, a table of customers, and a table of employees in a database is just fine. It makes sense in this case to use two tables, since the information about a customer and what they can do and the kind of information is VERY different then how you would deal with employees.
So, with above? Well, if I need to print mailing labels for all customers and all employees? That would require two different reports. And very likely the table structure for the two tables is different.
Bottom line:
If you working on design or form or report? And you needing to try and change the table that the form/report/code etc is going to operate on? This is a sign that your design approach has gone complete off the rails and is the wrong design.
So, in the case of doctors, nurses etc.? Well, they are all hospital staff, and MOST of the basic information about such employees will be common, much the same, and thus a SINGLE table of "employees" makes the most sense. You would only need a nice "employee type" combo box on that one form, and thus you can add/enter/edit/search any employee in that one table.
The fact that you "want to search" for a employee show that all these people "are" employees and thus belong in one table. And the basic information about all employees is going to be the same anyway. If you find you are attempting to create a new table but with near identical structures over and over, then just like a new table for each month sales, or a new table for each new kind of employee? Simply add the "one" column that allows you to make that distinguish, and not a whole new table.
Now one COULD even attempt to put patients in the same table, but then again, dealing with patents as opposed employees is a considerable different kind of "thing".
So employees are employees - even different kinds. (manager, cleaning staff etc.).
And patients are patients - even different kinds (long term care, emergency etc.).

Redundant field in SQL for Performance

Let's say I have two Tables, called Person, and Couple, where each Couple record stores a pair of Person id's (also assume that each person is bound to at most another different person).
I am planning to support a lot of queries where I will ask for Person records that are not married yet. Do you guys think it's worthwhile to add a 'partnerId' field to Person? (It would be set to null if that person is not married yet)
I am hesitant to do this because the partnerId field is something that is computable - just go through the Couple table to find out. The performance cost for creating new couple will also increase because I have to do this extra book keeping.
I hope that it doesn't sound like I am asking two different questions here, but I felt that this is relevant. Is it a good/common idea to include extra fields that are redundant (computable/inferable by joining with other tables), but will make your query a lot easier to write and faster?
Thanks!
A better option is to keep the data normalized, and utilize a view (indexed, if supported by your rdbms). This gets you the convenience of dealing with all the relevant fields in one place, without denormalizing your data.
Note: Even if a database doesn't support indexed views, you'll likely still be better off with a view as the indexes on the underlying tables can be utilized.
Is there always a zero to one relationship between Person and Couples? i.e. a person can have zero or one partner? If so then your Couple table is actually redundant, and your new field is a better approach.
The only reason to split Couple off to another table is if one Person can have many partners.
When someone gets a partner you either write one record to the Couple table or update one record in the Person table. I argue that your Couple table is redundant here. You haven't indicated that there is any extra info on the Couple record besides the link, and it appears that there is only ever zero or one Couple record for every Person record.
How about one table?
-- This is psuedo-code, the syntax is not correct, but it should
-- be clear what it's doing
CREATE TABLE Person
(
PersonId int not null
primary key
,PartnerId int null
foreign key references Person (PersonId)
)
With this,
Everyone on the system has a row and a PersonId
If you have a partner, they are listed in the PartnerId column
Unnormalized data is always bad. Denormalized data, now, that can be beneficial under very specific circumstances. The best advice I ever heard on this subject it to first fully normalize your data, assess performance/goals/objectives, and then carefully denormalize only if it's demonstrably worth the extra overhead.
I agree with Nick. Also consider the need for history of the couples. You could use row versioning in the same table, but this doesn't work very well for application databases, works best in a in a DW scenario. A history table in theory would duplicate all the data in the table, not just the relationship. A secondary table would give you this flexibility to add additional information about the relationship including StartDate and EndDate.

A view created on joining three tables, one table is a similar to another

This might be a bad idea but I saw this design and want to evaluate:
I have one view created by joining two tables (a product view based on product and productDetail table).
SELECT
tP.ProductId,
tP.ProductType,
tP.Description
tPD.Statement,
tPD.Condition,
FROM
tbl_Product AS tP
INNER JOIN tbl_ProductDetail AS tPD
ON tP.ProductId = tPD.ProductId
Now, I have a new product type which need a new field(e.g ExpirationDate) in the detail table. One argument is that since this only apply to specific product type, we create a new table called tbl_FoodProductDetial (new table will have Statement, Condition and ExpirationDate). Intuitively, I think including this field in tbl_ProductDetail is better so we only need to add the extra field in view and not worry about any new changes. But argument against this is adding a field (ExpirationDate) to a general product detail table is not appropriate. Any suggestions?
If you prefer the third table, how to join this three table (one is kind of sub table of another)?
You need to trade off architectural purity with more work. Will it cause you maintenance problems in the future to just stick the new field to the same table? If not, go with the approach that is less word (just stick it in!). If you think the extra field will make the application much less clear to understand and improve, introduce a third table.
In this case I think you can safely reuse the existing table. A third table will make much more work than I think is appropriate.
I agree with Michael and usr. Also, it sounds like the argument for a third table is using object oriented derivation logic : the Food class derives from the Product class and they are attempting to align the database with their classes. That kind of thinking will wind up with so many tables it will be next to impossible to maintain.

Do these database design styles (or anti-pattern) have names?

Consider a database with tables Products and Employees. There is a new requirement to model current product managers, being the sole employee responsible for a product, noting that some products are simple or mature enough to require no product manager. That is, each product can have zero or one product manager.
Approach 1: alter table Product to add a new NULLable column product_manager_employee_ID so that a product with no product manager is modelled by the NULL value.
Approach 2: create a new table ProductManagers with non-NULLable columns product_ID and employee_ID, with a unique constraint on product_ID, so that a product with no product manager is modelled by the absence of a row in this table.
There are other approaches but these are the two I seem to encounter most often.
Assuming these are both legitimate design choices (as I'm inclined to believe) and merely represent differing styles, do they have names? I prefer approach 2 and find it hard to convey the difference in style to someone who prefers approach 1 without employing an actual example (as I have done here!) I'd would be nice if I could say, "I'm prefer the inclination-towards-6NF (or whatever) style myself."
Assuming one of these approaches is in fact an anti-pattern (as I merely suspect may be the case for approach 1 by modelling a relationship between two entities as an attribute of one of those entities) does this anti-pattern have a name?
Well the first is nothing more than a one-to-many relationship (one employee to many products). This is sometimes referred to as a O:M relationship (zero to many) because it's optional (not every product has a product manager). Also not every employee is a product manager so its optional on the other side too.
The second is a join table, usually used for a many-to-many relationship. But since one side is only one-to-one (each product is only in the table once) it's really just a convoluted one-to-many relationship.
Personally I prefer the first one but neither is wrong (or bad).
The second would be used for two reasons that come to mind.
You envision the possibility that a product will have more than one manager; or
You want to track the history of who the product manager is for a product. You do this with, say a current_flag column set to 'Y' (or similar) where only one at a time can be current. This is actually a pretty common pattern in database-centric applications.
It looks to me like the two model different behaviour. In the first example, you can have one product manager per product and one employee can be product manager for more than one product (one to many). The second appears to allow for more than one product manager per product (many to many). This would suggest the two solutions are equally valid in different situations and which one you use would depend on the business rule.
There is a flaw in the first approach. Imagine for a second, that the business requirements have changed and now you need to be able to set 2 Product Manager to a product. What will you do? Add another column to the table Product? Yuck. This obviously violates 1NF then.
Another option the second approach gives is an ability to store some attributes for a certain Product Manager <-> Product relation. Like, if you have two Product Manager for a product, then you can set one of them as a primary...
Or, for example, an employee can have a phone number, but as a product manager he/she can have another phone number... This also goes to the special table then.
Approach 1)
Slows down the use of the Product table with the additional Product Manager field (maybe not for all databases but for some).
Linking from the Product table to the Employee table is simple.
Approach 2)
Existing queries using the Product table are not affected.
Increases the size of your database. You've now duplicated the Product ID column to another table as well as added unique constraints and indexes to that table.
Linking from the Product table to the Employee table is more cumbersome and costly as you have to ink to the intermediate table first.
How often must you link between the two tables?
How many other queries use the Product table?
How many records in the Product table?
in the particular case you give, i think the main motivation for two tables is avoiding nulls for missing data and that's how i would characterise the two approaches.
there's a discussion of the pros and cons on wikipedia.
i am pretty sure that, given c date's dislike of this, he defines relational theory so that only the multiple table solution is "valid". for example, you could call the single table approach "poorly typed" (since the type of null is unclear - see quote on p4).