SQL Database Design ERD - Empty entity because of different function - sql

As you can see below, the User is able to make a Call, the Operator will log it, writing the time (error on my part, Column2), his own ID and the ID of the caller. The Operator is also able to create a Solution, by generating a Solution ID and describing the solution.
Note that nothing differentiate the User from the Operator in terms of attributes. Indeed they both inherit their ID from the Person entity.
So I have two questions.
First, as you can see, the Call entity has two attributes which are the same column (ID for User and Operator), but will always represent two different people (i.e. a User will never be an Operator). Is this the correct notation for such a thing?
Secondly, I am not sure about having User and Operator as separate entities because no attribute distinguish them from one another, only their ability to do something or not (User can't create a solution). This would mean that they don't have attributes apart from the ones they inherit. Is this correct or should the two entities be merged under the Personentity?
Thanks in advance.

It's valid to create subtypes with distinct relationships and/or constraints, even if they have no distinct attributes. You'll be able to use referential integrity to ensure that Operator IDs and User IDs don't get mixed up in the Call table, and it's possible to enforce mutual exclusion between the IDs in the User and Operator tables.
As far as notation is concerned, I would show the ID in the User and Operator tables, and use Crow's foot lines to represent the FK constraints between the tables. If I wanted to make the subtyping explicit, I would rather show that on an EER diagram using Chen's notation than on a table diagram.

Related

Enum types in database schema

This might be sort of a basic db question, but I'm more used to working with objects rather than tables. Let's say I have an object 'Movie' with property 'genre'. Genre should be restricted by using enumerated types (eg. the only valid genres are Horror, Action, Comedy, Drama). How should this translate to a db schema?
I could put a 'genre' column in the Movies table and rely on checking inputs to ensure that a 'genre' assignment is valid?
Or, I could include a Genres table with pre-filled rows, and then in the Movies table include a column with a foreign key to the Genres table?
I'm leaning towards the first option, but are there pitfalls/etc. that I'm not considering?
I lean toward using the lookup table, your second option. The reason I prefer this is that I can add a new genre simply by adding a row to the Genres table. There would be no need to modify code or to modify the enum definition in the schema.
See also my answer to How to handle enumerations without enum fields in a database?
Here is a useful heuristic: Do you treat all values the same from the client code?
If you do, then just use the lookup table. Even if you don't envision adding new values1 now, requirements tend to change as the time marches on, and the lookup table will allow you to do that without changing the client code. Your case seems to fall into that category.
If you don't, then enum is likely more appropriate - the "knowledge" about each distinct value is contained in your client code anyway, so there is nothing useful left to store in the database.
The gray zone is if you do a little bit of both. E.g. you need to treat values in special ways, but there is still some additional field (associated to each value) that you can treat generically (e.g. just display it to the user). Or you need to treat just some values in special ways. In cases like these, I'd lean towards the lookup table.
1 Or deleting or modifying old values.

SQL vs NoSQL for data that will be presented to a user after multiple filters have been added

I am about to embark on a project for work that is very outside my normal scope of duties. As a SQL DBA, my initial inclination was to approach the project using a SQL database but the more I learn about NoSQL, the more I believe that it might be the better option. I was hoping that I could use this question to describe the project at a high level to get some feedback on the pros and cons of using each option.
The project is relatively straightforward. I have a set of objects that have various attributes. Some of these attributes are common to all objects whereas some are common only to a subset of the objects. What I am tasked with building is a service where the user chooses a series of filters that are based on the attributes of an object and then is returned a list of objects that matches all^ of the filters. When the user selects a filter, he or she may be filtering on a common or subset attribute but that is abstracted on the front end.
^ There is a chance, depending on user feedback, that the list of objects may match only some of the filters and the quality of the match will be displayed to the user through a score that indicates how many of the criteria were matched.
After watching this talk by Martin Folwler (http://www.youtube.com/watch?v=qI_g07C_Q5I), it would seem that a document-style NoSQL database should suit my needs but given that I have no experience with this approach, it is also possible that I am missing something obvious.
Some additional information - The database will initially have about 5,000 objects with each object containing 10 to 50 attributes but the number of objects will definitely grow over time and the number of attributes could grow depending on user feedback. In addition, I am hoping to have the ability to make rapid changes to the product as I get user feedback so flexibility is very important.
Any feedback would be very much appreciated and I would be happy to provide more information if I have left anything critical out of my discussion. Thanks.
This problem can be solved in by using two separate pieces of technology. The first is to use a relatively well designed database schema with a modern RDBMS. By modeling the application using the usual principles of normalization, you'll get really good response out of storage for individual CRUD statements.
Searching this schema, as you've surmised, is going to be a nightmare at scale. Don't do it. Instead look into using Solr/Lucene as your full text search engine. Solr's support for dynamic fields means you can add new properties to your documents/objects on the fly and immediately have the ability to search inside your data if you have designed your Solr schema correctly.
I'm not an expert in NoSQL, so I will not be advocating it. However, I have few points that can help you address your questions regarding the relational database structure.
First thing that I see right away is, you are talking about inheritance (at least conceptually). Your objects inherit from each-other, thus you have additional attributes for derived objects. Say you are adding a new type of object, first thing you need to do (conceptually) is to find a base/super (parent) object type for it, that has subset of the attributes and you are adding on top of them (extending base object type).
Once you get used to thinking like said above, next thing is about inheritance mapping patterns for relational databases. I'll steal terms from Martin Fowler to describe it here.
You can hold inheritance chain in the database by following one of the 3 ways:
1 - Single table inheritance: Whole inheritance chain is in one table. So, all new types of objects go into the same table.
Advantages: your search query has only one table to search, and it must be faster than a join for example.
Disadvantages: table grows faster than with option 2 for example; you have to add a type column that says what type of object is the row; some rows have empty columns because they belong to other types of objects.
2 - Concrete table inheritance: Separate table for each new type of object.
Advantages: if search affects only one type, you search only one table at a time; each table grows slower than in option 1 for example.
Disadvantages: you need to use union of queries if searching several types at the same time.
3 - Class table inheritance: One table for the base type object with its attributes only, additional tables with additional attributes for each child object type. So, child tables refer to the base table with PK/FK relations.
Advantages: all types are present in one table so easy to search all together using common attributes.
Disadvantages: base table grows fast because it contains part of child tables too; you need to use join to search all types of objects with all attributes.
Which one to choose?
It's a trade-off obviously. If you expect to have many types of objects added, I would go with Concrete table inheritance that gives reasonable query and scaling options. Class table inheritance seems to be not very friendly with fast queries and scalability. Single table inheritance seems to work with small number of types better.
Your call, my friend!
May as well make this an answer. I should comment that I'm not strong in NoSQL, so I tend to lean towards SQL.
I'd do this as a three table set. You will see it referred to as entity value pair logic on the web...it's a way of handling multiple dynamic attributes for items. Lets say you have a bunch of products and each one has a few attributes.
Prd 1 - a,b,c
Prd 2 - a,d,e,f
Prd 3 - a,b,d,g
Prd 4 - a,c,d,e,f
So here are 4 products and 6 attributes...same theory will work for hundreds of products and thousands of attributes. Standard way of holding this in one table requires the product info along with 6 columns to store the data (in this setup at least one third of them are null). New attribute added means altering the table to add another column to it and coming up with a script to populate existing or just leaving it null for all existing. Not the most fun, can be a head ache.
The alternative to this is a name value pair setup. You want a 'header' table to hold the common values amoungst your products (like name, or price...things that all rpoducts always have). In our example above, you will notice that attribute 'a' is being used on each record...this does mean attribute a can be a part of the header table as well. We'll call the key column here 'header_id'.
Second table is a reference table that is simply going to store the attributes that can be assigned to each product and assign an ID to it. We'll call the table attribute with atrr_id for a key. Rather straight forwards, each attribute above will be one row.
Quick example:
attr_id, attribute_name, notes
1,b, the length of time the product takes to install
2,c, spare part required
etc...
It's just a list of all of your attributes and what that attribute means. In the future, you will be adding a row to this table to open up a new attribute for each header.
Final table is a mapping table that actually holds the info. You will have your product id, the attribute id, and then the value. Normally called the detail table:
prd1, b, 5 mins
prd1, c, needs spare jack
prd2, d, 'misc text'
prd3, b, 15 mins
See how the data is stored as product key, value label, value? Any future product added can have any combination of any attributes stored in this table. Adding new attributes is adding a new line to the attribute table and then populating the details table as needed.
I beleive there is a wiki for it too... http://en.wikipedia.org/wiki/Entity-attribute-value_model
After this, it's simply figuring out the best methodology to pivot out your data (I'd recommend Postgres as an opensource db option here)

How to model a mutually exclusive relationship in SQL Server

I have to add functionality to an existing application and I've run into a data situation that I'm not sure how to model. I am being restricted to the creation of new tables and code. If I need to alter the existing structure I think my client may reject the proposal.. although if its the only way to get it right this is what I will have to do.
I have an Item table that can me link to any number of tables, and these tables may increase over time. The Item can only me linked to one other table, but the record in the other table may have many items linked to it.
Examples of the tables/entities being linked to are Person, Vehicle, Building, Office. These are all separate tables.
Example of Items are Pen, Stapler, Cushion, Tyre, A4 Paper, Plastic Bag, Poster, Decoration"
For instance a Poster may be allocated to a Person or Office or Building. In the future if they add a Conference Room table it may also be added to that.
My intital thoughts are:
Item
{
ID,
Name
}
LinkedItem
{
ItemID,
LinkedToTableName,
LinkedToID
}
The LinkedToTableName field will then allow me to identify the correct table to link to in my code.
I'm not overly happy with this solution, but I can't quite think of anything else. Please help! :)
Thanks!
It is not a good practice to store table names as column values. This is a bad hack.
There are two standard ways of doing what you are trying to do. The first is called single-table inheritance. This is easily understood by ORM tools but trades off some normalization. The idea is, that all of these entities - Person, Vehicle, whatever - are stored in the same table, often with several unused columns per entry, along with a discriminator field that identifies what type the entity is.
The discriminator field is usually an integer type, that is mapped to some enumeration in your code. It may also be a foreign key to some lookup table in your database, identifying which numbers correspond to which types (not table names, just descriptions).
The other way to do this is multiple-table inheritance, which is better for your database but not as easy to map in code. You do this by having a base table which defines some common properties of all the objects - perhaps just an ID and a name - and all of your "specific" tables (Person etc.) use the base ID as a unique foreign key (usually also the primary key).
In the first case, the exclusivity is implicit, since all entities are in one table. In the second case, the relationship is between the Item and the base entity ID, which also guarantees uniqueness.
Note that with multiple-table inheritance, you have a different problem - you can't guarantee that a base ID is used by exactly one inheritance table. It could be used by several, or not used at all. That is why multiple-table inheritance schemes usually also have a discriminator column, to identify which table is "expected." Again, this discriminator doesn't hold a table name, it holds a lookup value which the consumer may (or may not) use to determine which other table to join to.
Multiple-table inheritance is a closer match to your current schema, so I would recommend going with that unless you need to use this with Linq to SQL or a similar ORM.
See here for a good detailed tutorial: Implementing Table Inheritance in SQL Server.
Find something common to Person, Vehicle, Building, Office. For the lack of a better term I have used Entity. Then implement super-type/sub-type relationship between the Entity and its sub-types. Note that the EntityID is a PK and a FK in all sub-type tables. Now, you can link the Item table to the Entity (owner).
In this model, one item can belong to only one Entity; one Entity can have (own) many items.
your link table is ok.
the trouble you will have is that you will need to generate dynamic sql at runtime. parameterized sql does not typically allow the objects inthe FROM list to be parameters.
i fyou want to avoid this, you may be able to denormalize a little - say by creating a table to hold the id (assuming the ids are unique across the other tables) and the type_id representing which table is the source, and a generated description - e.g. the name value from the inital record.
you would trigger the creation of this denormalized list when the base info is modified, and you could use that for generalized queries - and then resort to your dynamic queries when needed at runtime.

Inheritance in Database Design

I am designing a new laboratory database with MANY types of my main entities.
The table for each entity will hold fields common to ALL types of that entity (entity_id, created_on, created_by, etc). I will then use concrete inheritance (separate table for each unique set of attributes) to store all remaining fields.
I believe that this is the best design for the standard types of data which come through the laboratory daily. However, we often have a special samples which often are accompanied by specific values the originator wants stored.
Question: How should I model special (non-standard) types of entities?
Option 1: Use entity-value for special fields
One table (entity_id, attribute_name, numerical_value) would hold all data for any special entity.
+ Fewer tables.
- Cannot enforce requiring a particular attribute.
- Must convert (pivot) rows to columns which is inefficient.
Option 2: Strict concrete inheritance.
Create separate table for each separate special case.
+ Follows in accordance with all other rules
- Overhead of many tables with only a few rows.
Option 3: Concrete inheritance with special tables under a different user.
Put all special tables under a different user.
+ Keeps all special and standard tables separate.
+ Easier to search for common standard table in a list without searching through all special tables.
- Overhead of many tables with only a few rows.
Actually the design you described (common table plus subtype-specific tables) is called Class Table Inheritance.
Concrete Table Inheritance would have all the common attributes duplicated in the subtype tables, and you'd have no supertype table as you do now.
I'm strongly against EAV. I consider it an SQL antipattern. It may seem like an elegant solution because it requires fewer tables, but you're setting yourself up for a lot of headache later. You identified a couple of the disadvantages, but there are many others. IMHO, EAV is used appropriately only if you absolutely must not create a new table when you introduce a new subtype, or if you have an unbounded number of subtypes (e.g. users can define new attributes ad hoc).
You have many subtypes, but still a finite number of them, so if I were doing this project I'd stick with Class Table Inheritance. You may have few rows of each subtype, but at least you have some assurance that all rows in each subtype have the same columns, you can use NOT NULL if you need to, you can use SQL data types, you can use referential integrity constraints, etc. From a relational perspective, it's a better design than EAV.
One more option that you didn't mention is called Serialized LOB. That is, add a BLOB column for a semi-structured collection of custom attributes. Store XML, YAML, JSON, or your own DSL in that column. You won't be able to parse individual attributes out of that BLOB easily with SQL, you'll have to fetch the whole BLOB back into your application and extract individual attributes in code. So in some ways it's less convenient. But if that satisfies your usage of the data, then there's nothing wrong with that.
I think it depends mostly on how you want to use this data.
First of all, I don't really see the benefit of option 3 over option 2. I think separating the special tables in another schema will make your application harder to maintain, especially if later on commonalities are found between 'special values'.
As another option I would say:
- Store the special values in an XML fragment (or blob). Most databases have ability to query on XML structures these days, so without the need for many extra tables, you would keep your flexibility for a small performance hit.
If you put all the special values in one table, you get a very sparse table. Most normal DBMSes cannot handle this very well, but there are some implementations that specialize in this. You could benefit from that.
Do you often need to query the key-value pairs? if you basically access that table through it's entry_id, I think having a key-value table is not a bad design. An extra index on the kay column might even help you when you do need to query for special values. If you build an application layer on top of your database, the key-value table will map on a Map or Hash structure, which can also easily be used.
It also depends on the different types of values you want to store. If there are many different types, that need to be easily accessed (instead of being serialized/deserialized to XML/Character-String) you might want to store the type in a separate column, but that will usually lead to a very complicated design.
Hope this helps (a little bit).
-Maarten
http://en.wikipedia.org/wiki/Entity-Attribute-Value_model
Suggest you read about the problems with entity value tables before deciding to use them.
Oracle can deal with sparsely filled tables quite well. I think you can use a similar approach as company salesforce uses. They use tables with a lot of columns, they create columns when needed. You can index those columns much better than an eav model.
So it is flexible but it performs better than an eav model.
Read: Ask Tom 1, Ask Tom 2, High Scalabilty and SalesForce.
The "Option 1" patterns is also called the "Universal Relation" At first look it seems like a short cut to not doing potentially difficult data modeling. It trades effortless data modeling for not being able to do simple select, update, delete without dramatically more effort than it would take on more usual looking data model with multiple tables.

Subtyping database tables

I hear a lot about subtyping tables when designing a database, and I'm fully aware of the theory behind them. However, I have never actually seen table subtyping in action. How can you create subtypes of tables? I am using MS Access, and I'm looking for a way of doing it in SQL as well as through the GUI (Access 2003).
Cheers!
An easy example would be to have a Person table with a primary key and some columns in that table. Now you can create another table called Student that has a foreign key to the person table (its supertype). Now the student table has some columns which the supertype doesn't have like GPA, Major, etc. But the name, last name and such would be in the parent table. You can always access the student name back in the Person table through the foreign key in the Student table.
Anyways, just remember the following:
The hierarchy depicts relationship between supertypes and subtypes
Supertypes has common attributes
Subtypes have uniques attributes
Subtypes of tables is a conceptual thing in EER diagrams. I haven't seen an RDBMS (excluding object-relational DBMSs) that supports it directly. They are usually implemented in either
A set of nullable columns for each property of the subtype in a single table
With a table for base type properties and some other tables with at most one row per base table that will contain subtype properties
The notion of table sub-types is useful when using an ORM mapper to produce class sub-type heirarchy that exactly models the domain.
A sub-type table will have both a Foreign Key back to its parent which is also the sub-types table's primary key.
Keep in mind that in designing a bound application, as with an Access application, subtypes impose a heavy cost in terms of joins.
For instance, if you have a supertype table with three subtype tables and you need to display all three in a single form at once (and you need to show not just the supertype date), you end up with a choice of using three outer joins and Nz(), or you need a UNION ALL of three mutually exclusive SELECT statements (one for each subtype). Neither of these will be editable.
I was going to paste some SQL from the first major app where I worked with super/subtype tables, but looking at it, the SQL is so complicated it would just confuse people. That's not so much because my app was complicated, but it's because the nature of the problem is complex -- presenting the full set of data to the user, both super- and subtypes, is by its very nature complex. My conclusion from working with it was that I'd have been better off with only one subtype table.
That's not to say it's not useful in some circumstances, just that Access's bound forms don't necessarily make it easy to present this data to the user.
I have a similar problem I've been working on.
While looking for a repeatable pattern, I wanted to make sure I didn't abandon referential integrity, which meant that I wouldn't use a (TABLE_NAME, PK_ID) solution.
I finally settled on:
Base Type Table: CUSTOMER
Sub Type Tables: PERSON, BUSINESS, GOVT_ENTITY
I put nullable PRERSON_ID, BUSINESS_ID and GOVT_ENTITY_ID fields in CUSTOMER, with foreign keys on each, and a check constraint that only one is not null. It's easy to add new sub types, just need to add the nullable foreign key and modify the check constraint.