Database Relation Anomalies - sql

I am tasked to find anomalies within this relation. I had identified a few insertion, deletion and update anomalies.
Commission Percentage: the percentage of the total sales made by a salesperson that is paid as commission to that salesperson.
Year of Hire: the year the salesperson was first hired
Department Number: the number of the department where the salesperson works
Manager Name: name of the manager of the department
However, I am confused with a anomalies that I pulled out. Below is the statement:
There can not be a manager with the same name in the company as there is no primary identifier for the manager entity except for the name, which can be a duplicate within the company.
May I know how should I phrase the above statement and under which (update/deletion/insertion) anomaly should I include it in?
Thank you
May I request additional assistance below as well:
How would you change the current design and how does your new design address the problems you have identified with the current design.
My current design is splitting it into 3 relations:
Salesperson(salespersonNumber, salespersonName, commissionPercentage, YearOfHire, deparetmentNumber)
Product(productNumber, productName, unitPrice)
Manager(managerNumber, managerName, departmentNumber)
However, I am missing out quantity entity.
Quantity requires composite key of productNumber & salespersonNumber.
Should I make it in another relation by itself?
Quantity(productNumber, salespersonNumber)

Anomalies
When identifying identifying (potential) anomalies, you're listing dependent attributes that are affected by the anomalies (you forgot Salesperson Name, btw). Specifically, you listed attributes that depended on a subset of the key (Salesperson Number, Product Number), thus violating 2NF. You're on the right track.
However, be careful not to confuse attributes with anomalies. An update anomaly would be if 1 of the 3 instances of Bilstein got changed. The (assumed) functional dependency Salesperson Name depends on Salesperson Number would be broken and the data would be inconsistent (Salesperson Number 437 would be associated with more than one name). Remember that normalization aims to eliminate redundant associations.
Identity
The problem with identifying managers by name indicates a poor modeling decision. As you stated, a company's set of managers isn't uniquely identified by name, so there's a mismatch between the logical data model and the world it models. This won't cause insert, update or delete anomalies as long as we use different values for different managers, but it will prevent convenient identification of managers. Possible improvements would be to use multiple attributes (abstract domains are often easily identified by a combination of attributes, but natural domains like people usually aren't, e.g. Manager Name, Birthdate would be more identifying but still not a good solution), turn the Manager Name into a surrogate key (e.g. Scott1, Scott2), or introduce a new surrogate key (e.g. a numeric ID).
Proposed improvement
Your proposed design normalizes the original table as well as addressing the identification problem. It's a good answer except for two issues: it doesn't include the association between Salesperson and Manager, and in your Quantity relation, you forgot to include the quantity as a dependent attribute.
Good job so far, hope this helps.

Related

Should I use a name as primary key if there is nothing else to use, or should i create ids for the entities that don't have anything useable as PK?

I have to specify that this is for a database assignment. I'm pretty good with SQL code but the diagram aspect of the assignment is killing me, I think that every step I take is wrong.
They have given us This scenario and requirements :
A research team has asked you to create a database for a project on movie production
companies; the project aims to use machine learning, neural networks and other
methods to extract information about the situation of movie production companies in
Europe and the health of this sector for a set of specific countries, including the UK.
The data analytics application resulting from this project – which you DO NOT have to
develop; your job is to develop the central, server-side database that underpins it – has been commissioned by a research institute (which shall remain nameless), and it is
intended to be open source, and therefore available to anyone.
Basically, it is a machine learning application that would run on a database with the aim
to identify the correlation between different aspects of the sector, including funding
opportunities and development of new production companies or studios.
The database records every production company in Europe, including the name of the
company, the address, ZIP code, city, country, type of the company (e.g., non-profit
organisation), number of employees and net worth (calculated as total assets minus
total liabilities). Every production company has its name registered with one and only
one local government authority (for example, Companies House in the UK) on a specific
date; each company can have many shareholders. The authority typically requires
information about all the shareholders, including town of birth, mother’s maiden name,
father’s first name, their personal telephone number (only one), national insurance
number (each country in Europe has a similar unique ID), and passport number. Also,
the registration procedure has a cost associated with it (e.g., 12£ in the UK).
The database also records the employees’ data for each company: each employee is
assumed to work for a single production company. Due to the complex structure of
movie production companies and the need for various skills and professions,
employees are categorised into crew and staff. The crew consists of three main groups:
the actors, the director(s) and those who work on other jobs relevant to the filming
(producers, editors, production designers, costume designers, composer, etc.). All
other employees belong to the staff group, including those responsible for HR,
advertising, etc. Employees are identified by an employee ID, first name, last name and
an optional middle name, date of birth and start date. Also, each employee has their
contact details recorded, whether it is a single phone number or multiple, with a
description associated with each of them. Each employee has a single email address,
too.
Members of the crew are paid hourly, and this is recorded in the database as well as a
bonus that depends on their contract. Actors get a bonus for each day of work and
another bonus for each scene completed; directors get a bonus at the end of the
shooting; crew members that work in other jobs relevant to the filming get a bonus at
the end of the shooting, and they have their role recorded as well (e.g., producer or
costume designer).
Staff members have the monthly salary and the working hours (e.g., full time 9-5).
Furthermore, each staff member belongs to a specific department (e.g., advertising),
which is located in a given building at a given address (both recorded in the database).
The database records all movies from each production company. More specifically, for
each movie the following information is recorded: a universal unique movie code(similar to the ISBN for books), the title of the movie, the year and the first release date
(different release dates are not important and should NOT be recorded).
Also, the database records each member of the crew that is part of the movie, and the
role they have in the movie: each crew member can play a single role or multiple roles
in the same movie, and each role has a description associated with it. For example, in
each movie there can be a single protagonist or more than one, the same actor can play
one or several roles, or even have a cameo.
One of the aims of the project is to provide insights on the impact of funding and grants
within the movie industry. To this end, the database should be able to record all the
funding that each production company receives. This must include the name of the
grant, the funding body (e.g., the government of a given country or European Union
grants such as the ERDF), the maximum amount for that grant and the deadline to
submit a proposal.
Then, for each company the database must record the date of the application to a given
grant, the amount requested, the outcome (successful/unsuccessful).
A grant can be given to a single production company or shared among several. Finally,
once the database is ready, the project will run a set of machine learning algorithms to
perform high level data analysis based on the different grants and their corresponding
impact with the aim to investigate the impacts of such funding against a list of criteria.
No additional information is provided at this stage from the project.
In the spec, the requirements are numerated from 1 to 5, as the scenario was not given
at that time. The details of each requirement are provided in the following:
Each production company may have received one or multiple grants, and grants
can be shared by more than one company.
It is possible for each employee to have more than one telephone number. Each
telephone number has a description associated with it (e.g., personal, or work).
Each production company is registered only once but can have many shareholders.
Each employee can either be a member of the crew OR a staff member. Each crew
member can be an actor OR a director OR have another role. Each staff member
belongs to a department. No duplication of data is allowed.
Each crew member may be part of one or more movies in a single role or many.
Based on that I have created THIS DIAGRAM.
I think I have all the entities,attributes and relationships down but I'm missing the keys. Keys can't be names right? I will use the company entity as an example. So, should I create new attributes like company_id to use as primary keys or just underline the name attributes and use it as Primary Key?
Also, please tell me if there's anything else wrong with the diagram.
Thanks a lot!
I created an er diagram but some entities don't attributes that can be used as primary keys because they are names. I tried using them but I don't think it's right.
The problem with names as primary keys
In your diagram, you have a couple of name used to identify entities: Grant, Production Company, Shareholder (full name), Employee, Movie (Title). You can in theory use them as primary key. However, this is a bad practice:
names can change (e.g. departments and companies can be renamed, movies can have a temporary working title);
names are often not sufficient to distinguish entities (e.g. there may be different people having the same name, e.g. Adam Smith);
names can be spelled differently across source of information , and are also easily misspelled;
although not really noticeable with modern RDBMS, names are more time consuming to search, and consume more memory when used as foreign keys.
How to chose a primary key?
You'd better use a primary key that guarantees uniqueness. You can then decide easily if a same name correspond to a different entity or not.
The next question that you'll then face in you design is surrogate key vs. natural key:
When there's no other unique information, you'll have not choice than using a surrogate key.
When there are other potential unique attributes, you may chose to use either a natural key (e.g. company registration number, national insurance number together with a country code, movie code?) or a surrogate one.
Keep in mind that both have advantages and inconveniences, but the surrogate key is in general more robust, as natural keys sometimes appear to be not as stable as expected.
Other remarks concerns about your ERD
By the way, here some issues and other remarks:
Works in relation does not relate Staff to anything else. From the name, it's obviously not a reflexive relation either. So this is a diagram error. department (name) and building should either be attached to a Department entity or be attributes of Staff.
In several cases you relate attributes to other attributes (actor-extra role, phone number-description) . This is also a diagramming error. Either add the extra attribute to the same entity, or there's a missing relationship with a missing entity.
In one case you relate two entities without a relation between the two (production company- application). This is an inconsistency that must be corrected also.
The following attributes are not real attributes but probably values of an unidentified attribute: producer, composer, actor, editor, xyzzy designer, advertising, HR, janitor.
Government authority is a misleading entity name: nowhere do you refer to data about the authority itself (name of the authority, e.g. "CNC", country of the government, ...). It's only information about the company's registration.
In your diagram you leave the hourly and monthly wage at the level of the Employee. This does not model accurately the requirements.
The link of the relation receive funding and the entity Application with the same attribute outcome seems very ambiguous.
In the name of the entities, stay consistent: either singular or plural. But mixing both will lead to lots of typos.
Better show cardinality in the link between the relation and the entity, than on the top of the relation: this avoids confusion about the direction of reading.
As a side remark, your question provides wealth of interesting details, but that are not really needed for answering the core of your question. Better limit yourself to only the information directly related to your issue in your next questions ;-)
Research or not research, keep in mind that GDPR may apply and that it requires inter alia privacy by design (some information about the shareholder and the employee may require some additional thoughts).

What's the best practice to connect a table to a junction table in relational database design?

I'm building a relational database that will act as a CRM for a travel company. I have removed tables and attributes to make this as simple as possible. Users will send quotes to customers.
A hotel can have many rooms (e.g. hotel 1 can have both a twin room and a triple room).
A room can have many hotels (e.g. a both hotels 1 and 2 can have a twin room).
Let's say a customer has a group of 6.
A user could send this customer a quote for hotel 1 with either 3x twin rooms or 2x triple rooms.
A quote will need to contain the hotel and appropriate room type and room type quantities.
Whats the best practice to connect table HOTEL_ROOM_JUNCTION to QUOTE as they key is a multi-attribute, composite key?
Thank you
Noting the Relational Database tag.
Problem
There is a lack of precision in your declarations:
A hotel can have many rooms (e.g. hotel 1 can have both a twin room and a triple room).
A room can have many hotels (e.g. a both hotels 1 and 2 can have a twin room).
I think you mean RoomType. From the rest of your declarations, the system you are implementing is for Quotations of rooms across all hotels, not a room booking system for each of the hotels. That is, you need to track RoomType, not Room, per Hotel.
The tables as given are not Relational tables, they do not have any of the requirements that make them Relational. When you start with stamping an id field on every file, it cripples the data analysis & data modelling exercise that is required to create a set of Relational tables. That is anti-Relational:
physical pointers such as record id are expressly prohibited in the Relational Model.
The Primary Key must be "made up from the data".
I appreciate that you have been schooled in that, due to the marketing and promotion of primitive methods as "relational".
.
For starters, each logical row (not physical record with a record id) must be unique.
The fields in each file should not be prefixed with the filename. In SQL (the data sub-language for the implementation of the Relational Model), the fully qualified address for a column is:
[server.][database.][owner.][table.]column
with defaults (obvious) for each element. If a column is ambiguous, simply prefix it with the table name.
Primary Keys are a special case. In order to avoid confusion (and now, to allow the new NATURAL JOIN), they should be the full name, in both the PK and FK locations. An id on every file would ensure buggy code.
Relational Data Model
If I address all those issues, and model the data according to the Relational Model, it would be:
Notation
All my data models are rendered in IDEF1X, the Standard for modelling Relational databases since 1993.
My IDEF1X Introduction is essential reading for those who are new to the Relational Model, or its modelling method. Note that IDEF1X models are rich in detail and precision, showing all required details, whereas home-grown models have far less than that. Which means, the notation has to be understood.
Content
Relational Key
In order to make the logical rows unique, we need to make a Key from the data. The users know their data, they know what is unique and what is not. Usually they will have a ShortName for such things as Company; Hotel; Customer; etc.
If you do not communicate with the user, there is no chance of supplying the user's needs.
Hotel, UserName, Customer are ShortNames, which are unique, which therefore are the Primary Key. (More, later)
Relational Keys are composites, because they preserve the natural data hierarchies. Get used to it.
If you need the DDL for composite Keys, please ask.
Presuming that a Hotel may be a chain or franchise, we need a Location to make a specific hotel that has rooms unique.
The following are discrete Facts, and should not be mixed together (doing so will lead to complex constraints and horrendous SQL code):
HotelRoomType
that a Hotel.Location has a particular RoomType; and the Price
RoomTypeAvailable
that a Hotel.Location has one of those RoomTypes available on a particular Date; and the Number.
I presume there is a file from the hotels that you will be importing on a daily basis: this is the central table for that, with the constraints, of course.
Quote
that an User is providing a Quote that is requested by a single Customer, for a single TravelDate, for a single Hotel.Location. This allows separate Quotes for separate Hotel.Locations for a single TravelDate; Quotes for a Customer for more than one TravelDate; etc.
.
If you need multiple Hotel.Locations (and their RoomTypes) on a single Quote, let me know in the comments, and I will update the data model.
QuoteRoomType
that a Quote contains a line item which is a single RoomType in the single Hotel.Location that is available on the TravelDate.
Relational Integrity
A logical feature of the Relational Model, which is distinct from Referential Integrity, which is a physical feature in SQL. It is not possible to achieve this in a Record Filing System with record ids as "primary keys", not even an advanced and progressed one (after the various errors in the initial RFS have been corrected). Genuine logical Keys ("made up from the data") are required.
In RoomTypeAvailable, we have constrained:
RoomTypes to that which the Hotel.Location actually has (in HotelRoomType)
AND is actually available on Date.
In QuoteRoomType, we have constrained:
Hotel.Location to that which is in the Quote,
AND RoomTypes to that which is available in Hotel.Location (in HotelRoomType),
AND which is available on the TravelDate (RoomTypeAvailable.Date "maps to" QuoteRoomType.TravelDate).
1960's Record Filing System • Anti-Relational, Sold as "relational"
This section is relevant for those who prescribe a Record ID field as "primary key" in every file. And somehow think that that is "relational". Others can safely skip it.
For comparison, here is the set of files that one would come up with, if one followed the techniques and methods that are promoted and marketed by Date; Darwen; Fagin; et al crowd, falsely proposed as "relational".
This a "mature" or "advanced" model, the fourth or fifth iteration. It has a number of improvements over the initial RFS. The initial or second or third iteration would not be equivalent enough to offer a comparison:
the Facts that are required to support the system have been determined (as opposed to the initial model, the record perspective, which is oblivious to Facts).
the content of the records have been improved to prevent duplicates, to the extent possible given the record content (but it is still streets behind the uniqueness provided in a Relational data model)
Fails Relational
Nevertheless it has no Relational features, which are logical. It has only the physical features of SQL reference-ability. Just a few of the many failures, which the mob prescribes as "relational":
Duplicate rows (logical) are not prevented, because rows are not defined.
No Relational Integrity
which depends on Relational Keys. (Refer to the Relational Keys detailed above.)
Eg. QuoteRoomType is constrained to any RoomTypeAvailable.
It is not possible to constrain it to:
the HotelId that is referenced in the Quote only,
OR to RoomTypes that exist in the HotelId only,
OR to RoomTypesAvailable that are available on the TravelDate only.
One additional field, and one additional index, for the Record id on every file. That will have a marvellous effect on performance.
Horrendous navigation and query code.
No Relational Power
When two distal files need to be JOINed, each of the intermediate files must be additionally JOINed, something that is not required in a Relational database. That is because it breaks the Access Path Independence Rule, a concept that the razor gang have not been understand in the fifty years since the advent of the RM. But they will come up with yet another abnormal "normal form", to add to their bag of seventeen thus far.
More, Not Fewer, Joins
Let’s look at what that means. We need a query to provide statistics for RoomTypes that have been quoted for previous year, so that hotels can re-arrange their room types to suit the expected traffic.
Using the Relational data model (separate section above), we would code:
SELECT RoomType.RoomType, -- Relational database
Description,
SUM( NumRoom )
FROM RoomType
JOIN QuoteRoomType ON RoomType.RoomType = QuoteRoomType.RoomType
WHERE DATEPART( YY, TravelDate ) = DATEPART( YY, GETDATE() ) - 1
GROUP BY RoomType.RoomType, Description
Using the Record Filing System data model, which is the result of following the advice of the Date; Darwen; Fagin; philipxy; AntC; et al gang, which is falsely marketed as "relational" (above), we would be forced to code:
SELECT RoomType, -- Record Filing System
Description,
SUM( NumRoom )
FROM RoomType
JOIN HotelRoomType
ON RoomType.RoomTypeId = HotelRoomType.RoomTypeId
JOIN RoomTypeAvailable
ON HotelRoomType.HotelRoomTypeId = RoomTypeAvailable.HotelRoomTypeId
JOIN QuoteRoomType
ON RoomTypeAvailable.RoomTypeAvailableId = QuoteRoomType.RoomTypeAvailableId
JOIN Quote
ON QuoteRoomType.QuoteId Quote.QuoteId
WHERE DATEPART( YY, TravelDate ) = DATEPART( YY, GETDATE() ) - 1
GROUP BY RoomType, Description
Gotta love the QueryPlan for that, that the SQL platform will produce.
Re-arranging the order of the JOINs might improve the tortoise.
Resorting to moving fragments such as “partial FDs” or “MVDs” around, might improve it.
Perhaps deploying more “candies”, plus the required additional indices, all over the place, will help. But wait, that would be duplication on a mass scale, it would break Normalisation, there would be Update Anomalies everywhere one looks.
Note that that result set has no reliability; no credibility. Why ? Because, as already proved, the QuoteRoomType is not constrained to the Quote.Hotel (referenced by HotelId);
or to the Quote.TravelDate;
or to the RoomTypes available in QuoteHotel (referenced by HotelId).
Further, there may well be duplicates, because prevention can only be partially implemented. The result of which is unreliable result sets.
Simplicity vs Complexity
If you have the interest and the stamina, you can attempt to elevate the RFS by muddling through their "partial dependencies"; "transitive dependencies"; "candies"; "multi-valued dependencies"; etc, all of which are neither defined in, nor required in, the Relational Model. They are expressly for use in the Record Filing Systems of the last century.
First, the RFS paradigm (marketed as "relational") forces a record mindset, instead of a data-only mindset.
Second, it breaks everything down into fragments, instead of understanding the atoms; the Facts, in their full context (data hierarchies).
Third, it gives you a morass of complexity to handle the fragments, that have no relevance when handling atoms.
When you are done, all that complexity in the Record Filing System will still not be anywhere near the simplicity of the equivalent Relational data model: it will have:
No Relational Integrity (yes, yes, we have Declarative Referential Integrity, and that only for physical records, not for logical rows)
No Relational Power (multiple forced JOINs in every query)
No Relational Speed (those additional columns and indices have an effect).
And the navigation and query code will be horrendous, and prone to errors.
Please feel feel to ask specific questions. Also, please supply clarifications as noted, and I will update the data model.
Since a specific room can only exist in one hotel the table HOTEL_ROOM_JUNCTION is redundant. So pk hotel_id is fk in rooom, and pk in room is a concat key of hotel_id and room_id.
If one quote can consist of several rooms you need a connecting table between quote and room them with fk quote_id, room_id and hotel_id and those three will be the pk in that table. (As a rule of thumb, that kind of table will usually need a timestamp).
(as a side note; I would name the tables QUOTES, ROOMS and HOTELS since they contain many)
EDIT: I miss read the question somewhat .. to make my model as OP wants I need to add ROOM_TYPES with pk room_type_id which will be fk (not null) in ROOMS but not part of the pk.

Database design....storing relationship in one table, and the data in another table?

I'm looking at this company's database design, and would like to know the purpose of their design, ie store relationship in one table and the data in another, why do this?
They have this,
EMPLOYEE
Id (PK)
DepartmentId
EMPLOYEE_DATA
EmployeeId (PK)
First Name
Last Name
Position
etc...
Rather than this...
EMPLOYEE
Id (PK)
DepartmentId
First Name
Last Name
Position
etc...
...OR this...(employee can belong to many departments)
EMPLOYEE
Id (PK)
First Name
Last Name
etc...
EMPLOYEE_DEPARTMENT
Id
EmployeeId
DepartmentId
Position
That's a link table, or join table, or cross table.. lots of different names.
How would you assign an employee to two different departments with your design? You can't. You can only assign them to one.
With their design, they can assign the same ID to multiple departments by creating multiple records with the employee ID and different department ID's.
EDIT:
You need to be more specific about what you're asking. Your first question seemed to be asking what the purpose of mapping table was. Then you changed it, then you changed it again.. none of which makes much sense.
It seems now that you are asking what the better design is, which is a totally different question than what you originally asked. Please state specifically what question you want answered so we don't have to guess.
EDIT2:
Upon re-reading, if this is the actual design, then no.. It does not support multiple department id's. Such a design makes little sense, except for one situation. If the original design did not include a department, this would allow them to add a department ID without modifying the original EMPLOYEE_DATA table.
They may not have wanted to update legacy code to support the Employee id, so they added it this way.
Purpose of design is determined by business rules.
Business rules dictate entity (logical model perspective) / table (physical model perspective) design. No design is "bad" if it is built according to the requirements that were determined based on business rules. Those rules can however change over time -- foreseeing such changes and building to accommodate/future-proof the data model can really save time, effort and ultimately money.
The first and third example are the same -- the third example has an extraneous column (EMPLOYEE_DEPARTMENT.id). ORMs don't like composite keys, so a single column is used in place of.
The second example is fine if:
employees will never work for more than one department
there's no need for historical department tracking
Conclusion:
The first/third example is more realistic for the majority of real-world situations, and can be easily customized to provide more value without major impact (re-writing entire table structure). It uses what is most commonly referred to as a many-to-many relationship to allow many employees to relate to many departments.
If an employee can be in more than one department, then you would need a mapping table but I'd do it like the following:
EMPLOYEE
Id (PK)
First Name
Last Name
DEPARTMENT
Id (PK)
Name
EMPLOYEE_DEPARTMENT
EmployeeId_fk (PK)
DepartmentId_fk (PK)
Position
This would allow for multiple positions in multiple departments.
You would do this if an employee can be a member of multiple departments. With the latter table, each employee can only belong to one department.
The only remotely good reason for doing this is to implement an extension model where the master table identifying unique customers does not include all the data for customers that is not always necessary. Instead, you create one core table with the core employee data and and extension table with all the supplementary fields. I've seen people take this approach to avoid creating large tables with many columns that are rarely needed. However, in my experience it's typically premature optimization, and I wouldn't recommend it.
In contrast to many responses, the model included does not support multiple departments per employee - it is not a many to many mapping approach.

SSIS Population of Slowly Changing Dimension with outrigger

Working on a data warehouse, a suitable analogy for the problem is that we have Healthcare Practitioners. Healthcare Practitioners have a number of professional attributes and work in an open number of teams and in an open number of clinical areas.
For example, you may have a nurse who works in children's services across a number of teams as a relief/contractor/bank staff person. Or you may have a newly qualified doctor who works general medicine who is doing time in a special area pending qualifying as a consultant of that special area.
So we have an open number of areas of work and an open number of teams, we can't have team 1, team 2 etc in our dimensions. The other attributes may change over time also, like base location (where they work out of), the main team and area they work in..
So, following Kimble I've gone for outriggers:
Table DimHealthProfessionals:
Key (primary key, identity)
Name
Main Team
Main Area of Work
Base Location
Other Attribute 1
Other Attribute 2
Start Date
End Date
Table OutriggerHealthProfessionalTeam:
HPKey (foreign key to DimHealthPRofessionals.Key)
Team Name
Team Type
Other Team Attribute 1
Other Team Attribute 2
Table OutriggerHealthProfessionalAreaOfWork:
HPKey (as above)
Area of Work
Other AoW attribute 1
If any attribute of the HP changes, or the combination of teams or areas of work in which they work change, we need to create a new entry in the SCD and it's outrigger tables to encapsulate this.
And we're doing this in SSIS.
The source data is basically an HP table with the main attributes, a table of areas of work, a table of teams and a pair of mapping tables to map a current set of areas of work to an HP.
I have three data sources, one brings in the HCP information, one the areas of work of all HCPs and one the team memberships.
The problem is how to run over all three datasets to determine if an HP has changed an attribute, and if they have changed an attribute, how we update the DIM and two outriggers appropriately.
Can anyone point me at a best practice for this? OR suggest an alternative way of modelling this dimension?
Admittedly I may not understand everything here, but it seems to me that the relationship in this example should be reversed. Place TeamKey and the WorkAreaKey in the dimHealthProfessionals -- this should simplify things.
With this in place, you simply make sure to deliver outriggers before the dimHealthProfessionals.
Treat outriggers as dimensions in their own right. You may want to treat dimHealthProfessionals as a type 2 dimension, to properly capture the history.
EDIT
Considering that team to person is many-to-many, a fact is more appropriate.
A column in a dimension table is appropriate only if a person can belong to only one team at a time. Same with work areas.
The problem is how to run over all three datasets to determine if an HP has changed an attribute, and if they have changed an attribute, how we update the DIM and two outriggers appropriately.
Can anyone point me at a best practice for this? OR suggest an alternative way of modelling this dimension?
I'm not sure I understand your question fully. If you are unsure about change detection, then use Checksums in the package. Build up a temp table with the data as it is in the source, then compare each row to its counterpart (joined via the business keys) by computing the checksum for both rows and comparing those. If they differ, the data has changed.
If you are talking about cascading updates in a historized dimension hierarchy (and you can treat the outriggers like a hierarchy in this context) then the foreign key lookups will automatically lookup the newer entry in DimHealthProfessionals if you have a historization (i.e. have validFrom / validThrough timestamps in DimHealthProfessionals). Those different foreign keys result in a different checksum.

Do these database design styles (or anti-pattern) have names?

Consider a database with tables Products and Employees. There is a new requirement to model current product managers, being the sole employee responsible for a product, noting that some products are simple or mature enough to require no product manager. That is, each product can have zero or one product manager.
Approach 1: alter table Product to add a new NULLable column product_manager_employee_ID so that a product with no product manager is modelled by the NULL value.
Approach 2: create a new table ProductManagers with non-NULLable columns product_ID and employee_ID, with a unique constraint on product_ID, so that a product with no product manager is modelled by the absence of a row in this table.
There are other approaches but these are the two I seem to encounter most often.
Assuming these are both legitimate design choices (as I'm inclined to believe) and merely represent differing styles, do they have names? I prefer approach 2 and find it hard to convey the difference in style to someone who prefers approach 1 without employing an actual example (as I have done here!) I'd would be nice if I could say, "I'm prefer the inclination-towards-6NF (or whatever) style myself."
Assuming one of these approaches is in fact an anti-pattern (as I merely suspect may be the case for approach 1 by modelling a relationship between two entities as an attribute of one of those entities) does this anti-pattern have a name?
Well the first is nothing more than a one-to-many relationship (one employee to many products). This is sometimes referred to as a O:M relationship (zero to many) because it's optional (not every product has a product manager). Also not every employee is a product manager so its optional on the other side too.
The second is a join table, usually used for a many-to-many relationship. But since one side is only one-to-one (each product is only in the table once) it's really just a convoluted one-to-many relationship.
Personally I prefer the first one but neither is wrong (or bad).
The second would be used for two reasons that come to mind.
You envision the possibility that a product will have more than one manager; or
You want to track the history of who the product manager is for a product. You do this with, say a current_flag column set to 'Y' (or similar) where only one at a time can be current. This is actually a pretty common pattern in database-centric applications.
It looks to me like the two model different behaviour. In the first example, you can have one product manager per product and one employee can be product manager for more than one product (one to many). The second appears to allow for more than one product manager per product (many to many). This would suggest the two solutions are equally valid in different situations and which one you use would depend on the business rule.
There is a flaw in the first approach. Imagine for a second, that the business requirements have changed and now you need to be able to set 2 Product Manager to a product. What will you do? Add another column to the table Product? Yuck. This obviously violates 1NF then.
Another option the second approach gives is an ability to store some attributes for a certain Product Manager <-> Product relation. Like, if you have two Product Manager for a product, then you can set one of them as a primary...
Or, for example, an employee can have a phone number, but as a product manager he/she can have another phone number... This also goes to the special table then.
Approach 1)
Slows down the use of the Product table with the additional Product Manager field (maybe not for all databases but for some).
Linking from the Product table to the Employee table is simple.
Approach 2)
Existing queries using the Product table are not affected.
Increases the size of your database. You've now duplicated the Product ID column to another table as well as added unique constraints and indexes to that table.
Linking from the Product table to the Employee table is more cumbersome and costly as you have to ink to the intermediate table first.
How often must you link between the two tables?
How many other queries use the Product table?
How many records in the Product table?
in the particular case you give, i think the main motivation for two tables is avoiding nulls for missing data and that's how i would characterise the two approaches.
there's a discussion of the pros and cons on wikipedia.
i am pretty sure that, given c date's dislike of this, he defines relational theory so that only the multiple table solution is "valid". for example, you could call the single table approach "poorly typed" (since the type of null is unclear - see quote on p4).