Have a table for every basic types or merge them into one table - sql

I have over 100 basic types in my project. Assume that all of them just contains an Id and a Title , so which of these approach is better to use :
Approach 1 : create a separate table for each of them
Approach 2 : Create a table for all of them and use another field as discriminator
I am using Using MSSQL server with Entity Framework Code-First Approach . Actually I can not decide which approach I should choose to use.
I think the question is self-briefed , but Let me know if you need more details.
UPDATE1 : Please do not refer me to this question . I have checked this one , wasnt that much helpful
UPDATE2 : Many of these tables have many relations to the other tables. but some of them wont use that much

100 types that inherits from Id/Title base type and EF TPH (so the DB will have 1 table with discriminator and programmers will have 100 types).

Approach1 will keep relation integrity and clean navigation properties form models.
Also your IDE will helps you completing rigth model names.
The tip: create and interface for all tables in order to reuse UI controls.
Edited
If you found a business name for this table, like customer_data then do a single table. If name is related with technology master_tables split into full semantical classes.

Related

How to join a table within a user defined function whose name is provided as parameter?

Context
I have three tables in my SQL Server database: 1) School, 2) College, 3) University.
Then I have another table: Tags.
Each of the three tables (School, College, University) can have Tags associated with them. For which purpose I have three association tables: SchoolTags, CollegeTags, UniversityTags.
Problem
I am trying to create a user-defined function that will take the name of association table as parameter (i.e. 'SchoolTags') and the Id of the entity (school/college/university) and will return a list of tags associated with that entityId.
The issue I am having is I have got to join Tags with a table whose name will come in as parameter. For that I am creating a dynamic query. And we can not run dynamic queries in SQL Server user-defined functions.
Question
Any idea how can that be acheived?
Note: I want separate association tables as I have created and do not want to convert them into a generic association table and I do not want to add If-Else based on table names in my function so that if a new association table is created, I do not need to update my function.
I am using Microsoft SQL Server.
Whatever language you are using, you would probably use if:
begin
if table = 'school' then
begin
. . .
end;
else if table = 'college' then
. . .
end;
The exact syntax depends on the scripting language for the database you are using.
What you desire is impossible. You cannot pass a table name as a parameter to a UDF and use dynamic sql in the UDF to then create and execute a statement that is specific to the table passed as the argument. You already know that you have no choice but to use if-else statements in your UDF to achieve your goal - it is your pipe-dream of "never having to update (or verify) your code when the schema changes" (yes - I rephrased it to make your issue more obvious) that is a problem.
There are likely to be other ways of implementing some useful functionality - but I suggest that you are thinking too far ahead and trying to implement generic functions without a clear purpose. And that is a very difficult and trouble-prone path that requires sophisticated tsql skills.
And to re-iterate the prior responses, you have a schema problem. You purposely created three different entities - and now you want a common function to use with any of them. So before you spend much time on this particular aspect, you should take some time to think carefully about how you intend to use (i.e., write queries against) these tables. If you find yourself using unions frequently to combine these entities into a common resultset, then you have might have a mismatch between your actual business and your model (schema) of it.
Consider normalizing your database in related, logical groupings for one EducationInstitution table and one JoinEducTags table. Those tables sound like they maintain the same structure but of different typology and hence should be saved in one table with different Type field for School, College, University, etc.
Then, add necessary constraints, primary/foreign keys for the one-to-many relationship between all three sets:
You never want to keep restructuring your schema (i.e., add tables) for each new type. With this approach, your user-defined function would just need to receive value parameters not identifiers like tables to be run in dynamic querying. Finally, this approach scales better with efficient storage. And as you will see normalization saves on complex querying.

SQL vs NoSQL for data that will be presented to a user after multiple filters have been added

I am about to embark on a project for work that is very outside my normal scope of duties. As a SQL DBA, my initial inclination was to approach the project using a SQL database but the more I learn about NoSQL, the more I believe that it might be the better option. I was hoping that I could use this question to describe the project at a high level to get some feedback on the pros and cons of using each option.
The project is relatively straightforward. I have a set of objects that have various attributes. Some of these attributes are common to all objects whereas some are common only to a subset of the objects. What I am tasked with building is a service where the user chooses a series of filters that are based on the attributes of an object and then is returned a list of objects that matches all^ of the filters. When the user selects a filter, he or she may be filtering on a common or subset attribute but that is abstracted on the front end.
^ There is a chance, depending on user feedback, that the list of objects may match only some of the filters and the quality of the match will be displayed to the user through a score that indicates how many of the criteria were matched.
After watching this talk by Martin Folwler (http://www.youtube.com/watch?v=qI_g07C_Q5I), it would seem that a document-style NoSQL database should suit my needs but given that I have no experience with this approach, it is also possible that I am missing something obvious.
Some additional information - The database will initially have about 5,000 objects with each object containing 10 to 50 attributes but the number of objects will definitely grow over time and the number of attributes could grow depending on user feedback. In addition, I am hoping to have the ability to make rapid changes to the product as I get user feedback so flexibility is very important.
Any feedback would be very much appreciated and I would be happy to provide more information if I have left anything critical out of my discussion. Thanks.
This problem can be solved in by using two separate pieces of technology. The first is to use a relatively well designed database schema with a modern RDBMS. By modeling the application using the usual principles of normalization, you'll get really good response out of storage for individual CRUD statements.
Searching this schema, as you've surmised, is going to be a nightmare at scale. Don't do it. Instead look into using Solr/Lucene as your full text search engine. Solr's support for dynamic fields means you can add new properties to your documents/objects on the fly and immediately have the ability to search inside your data if you have designed your Solr schema correctly.
I'm not an expert in NoSQL, so I will not be advocating it. However, I have few points that can help you address your questions regarding the relational database structure.
First thing that I see right away is, you are talking about inheritance (at least conceptually). Your objects inherit from each-other, thus you have additional attributes for derived objects. Say you are adding a new type of object, first thing you need to do (conceptually) is to find a base/super (parent) object type for it, that has subset of the attributes and you are adding on top of them (extending base object type).
Once you get used to thinking like said above, next thing is about inheritance mapping patterns for relational databases. I'll steal terms from Martin Fowler to describe it here.
You can hold inheritance chain in the database by following one of the 3 ways:
1 - Single table inheritance: Whole inheritance chain is in one table. So, all new types of objects go into the same table.
Advantages: your search query has only one table to search, and it must be faster than a join for example.
Disadvantages: table grows faster than with option 2 for example; you have to add a type column that says what type of object is the row; some rows have empty columns because they belong to other types of objects.
2 - Concrete table inheritance: Separate table for each new type of object.
Advantages: if search affects only one type, you search only one table at a time; each table grows slower than in option 1 for example.
Disadvantages: you need to use union of queries if searching several types at the same time.
3 - Class table inheritance: One table for the base type object with its attributes only, additional tables with additional attributes for each child object type. So, child tables refer to the base table with PK/FK relations.
Advantages: all types are present in one table so easy to search all together using common attributes.
Disadvantages: base table grows fast because it contains part of child tables too; you need to use join to search all types of objects with all attributes.
Which one to choose?
It's a trade-off obviously. If you expect to have many types of objects added, I would go with Concrete table inheritance that gives reasonable query and scaling options. Class table inheritance seems to be not very friendly with fast queries and scalability. Single table inheritance seems to work with small number of types better.
Your call, my friend!
May as well make this an answer. I should comment that I'm not strong in NoSQL, so I tend to lean towards SQL.
I'd do this as a three table set. You will see it referred to as entity value pair logic on the web...it's a way of handling multiple dynamic attributes for items. Lets say you have a bunch of products and each one has a few attributes.
Prd 1 - a,b,c
Prd 2 - a,d,e,f
Prd 3 - a,b,d,g
Prd 4 - a,c,d,e,f
So here are 4 products and 6 attributes...same theory will work for hundreds of products and thousands of attributes. Standard way of holding this in one table requires the product info along with 6 columns to store the data (in this setup at least one third of them are null). New attribute added means altering the table to add another column to it and coming up with a script to populate existing or just leaving it null for all existing. Not the most fun, can be a head ache.
The alternative to this is a name value pair setup. You want a 'header' table to hold the common values amoungst your products (like name, or price...things that all rpoducts always have). In our example above, you will notice that attribute 'a' is being used on each record...this does mean attribute a can be a part of the header table as well. We'll call the key column here 'header_id'.
Second table is a reference table that is simply going to store the attributes that can be assigned to each product and assign an ID to it. We'll call the table attribute with atrr_id for a key. Rather straight forwards, each attribute above will be one row.
Quick example:
attr_id, attribute_name, notes
1,b, the length of time the product takes to install
2,c, spare part required
etc...
It's just a list of all of your attributes and what that attribute means. In the future, you will be adding a row to this table to open up a new attribute for each header.
Final table is a mapping table that actually holds the info. You will have your product id, the attribute id, and then the value. Normally called the detail table:
prd1, b, 5 mins
prd1, c, needs spare jack
prd2, d, 'misc text'
prd3, b, 15 mins
See how the data is stored as product key, value label, value? Any future product added can have any combination of any attributes stored in this table. Adding new attributes is adding a new line to the attribute table and then populating the details table as needed.
I beleive there is a wiki for it too... http://en.wikipedia.org/wiki/Entity-attribute-value_model
After this, it's simply figuring out the best methodology to pivot out your data (I'd recommend Postgres as an opensource db option here)

Assistance in querying an Entity Attribute Value model ( "dynamic attributes" ) in SQL Server 2012

I need assistance to query from a model that contains one table of a specific object (i.e. Products) and one table of its dynamic attributes.
Let's say that I can store a Chocolate with attributes such as Price, Color, Weight and also a Car with attributes such as Engine, Gears, Color.
In my example I have a table called Products with the following columns :
Id (Int),
Name (NVarchar)
I have another table called dynamicAttributes with the following columns :
Id (int) -- of the attribute
ProductId (int) -- of the specific product
AttributeType (int) -- enum with the following values ("Color", "Price","Height", "Width".... )
StringValue -- of the product
IntValue -- of the product
DoubleValue -- of the product
BooleanValue -- of the product
I get from the client a list of attributes codes and a list of there values.
I can get value type (aka boolean, string, int) for each attribute.
What are my best options to query this model from my app ?
Dynamic sqls only ? Using Pivot keyword?
As others have noted, doing a lot of PIVOT queries is pretty inefficient and it's laborious to write and to debug SQL queries that use PIVOT.
An alternative is to fetch the data back from the database in the way it's stored, i.e. in multiple rows. Then write code in your database access layer to massage the rows into a single object instance, adding one attribute to your object per database row. This is called the Table Module pattern in Martin Fowler's awesome book Patterns of Enterprise Application Architecture.
If you invest some time writing DBAL code in a reusable fashion, you may be able to make it pretty easy for subsequent code in your app to read and save objects stored in an EAV table.
But yeah, I agree with other commenters. I'm generally against using the EAV design. It takes a lot of work to write code to compensate for the ways EAV breaks database conventions. I would think you have better things to do with your time!
For alternatives, see:
My answer to How to design a product table for many kinds of product where each product has many parameters
My presentation Practical Object Oriented Models In SQL
My book SQL Antipatterns Volume 1: Avoiding the Pitfalls of Database Programming

How to model a mutually exclusive relationship in SQL Server

I have to add functionality to an existing application and I've run into a data situation that I'm not sure how to model. I am being restricted to the creation of new tables and code. If I need to alter the existing structure I think my client may reject the proposal.. although if its the only way to get it right this is what I will have to do.
I have an Item table that can me link to any number of tables, and these tables may increase over time. The Item can only me linked to one other table, but the record in the other table may have many items linked to it.
Examples of the tables/entities being linked to are Person, Vehicle, Building, Office. These are all separate tables.
Example of Items are Pen, Stapler, Cushion, Tyre, A4 Paper, Plastic Bag, Poster, Decoration"
For instance a Poster may be allocated to a Person or Office or Building. In the future if they add a Conference Room table it may also be added to that.
My intital thoughts are:
Item
{
ID,
Name
}
LinkedItem
{
ItemID,
LinkedToTableName,
LinkedToID
}
The LinkedToTableName field will then allow me to identify the correct table to link to in my code.
I'm not overly happy with this solution, but I can't quite think of anything else. Please help! :)
Thanks!
It is not a good practice to store table names as column values. This is a bad hack.
There are two standard ways of doing what you are trying to do. The first is called single-table inheritance. This is easily understood by ORM tools but trades off some normalization. The idea is, that all of these entities - Person, Vehicle, whatever - are stored in the same table, often with several unused columns per entry, along with a discriminator field that identifies what type the entity is.
The discriminator field is usually an integer type, that is mapped to some enumeration in your code. It may also be a foreign key to some lookup table in your database, identifying which numbers correspond to which types (not table names, just descriptions).
The other way to do this is multiple-table inheritance, which is better for your database but not as easy to map in code. You do this by having a base table which defines some common properties of all the objects - perhaps just an ID and a name - and all of your "specific" tables (Person etc.) use the base ID as a unique foreign key (usually also the primary key).
In the first case, the exclusivity is implicit, since all entities are in one table. In the second case, the relationship is between the Item and the base entity ID, which also guarantees uniqueness.
Note that with multiple-table inheritance, you have a different problem - you can't guarantee that a base ID is used by exactly one inheritance table. It could be used by several, or not used at all. That is why multiple-table inheritance schemes usually also have a discriminator column, to identify which table is "expected." Again, this discriminator doesn't hold a table name, it holds a lookup value which the consumer may (or may not) use to determine which other table to join to.
Multiple-table inheritance is a closer match to your current schema, so I would recommend going with that unless you need to use this with Linq to SQL or a similar ORM.
See here for a good detailed tutorial: Implementing Table Inheritance in SQL Server.
Find something common to Person, Vehicle, Building, Office. For the lack of a better term I have used Entity. Then implement super-type/sub-type relationship between the Entity and its sub-types. Note that the EntityID is a PK and a FK in all sub-type tables. Now, you can link the Item table to the Entity (owner).
In this model, one item can belong to only one Entity; one Entity can have (own) many items.
your link table is ok.
the trouble you will have is that you will need to generate dynamic sql at runtime. parameterized sql does not typically allow the objects inthe FROM list to be parameters.
i fyou want to avoid this, you may be able to denormalize a little - say by creating a table to hold the id (assuming the ids are unique across the other tables) and the type_id representing which table is the source, and a generated description - e.g. the name value from the inital record.
you would trigger the creation of this denormalized list when the base info is modified, and you could use that for generalized queries - and then resort to your dynamic queries when needed at runtime.

How to design tables that map the same entity?

Given the two following tables :
Gallery
id | title | desc
Site
id | title | desc | url
I've a tags system which can apply to both Gallery and Site tables.
I'm wondering if I should do :
TagMap
tagId | entityId | applyTo
Where applyTo could be 'site' or 'gallery' or use separate table like the following :
TagGalleryMap
tagId | galleryId
and
TagSiteMap
tagId | siteId
What are you though about this ?
Having two tables, TagGalleryMap and TagSiteMap, allows you to enforce foreign keys referencing the Gallery and Site tables. The one table solution doesn't allow this. Of course if you add a third master table (say Slideshow) you would need to add another matching intersection table, TagSlideshowMap.
Both solutions are valid, so it comes down to is respective advantages and disadvantages:
Method 1
(+) only one table to maintain
(-) the key is more complex (two fields instead of one)
(-) no out of the box foreign key checking (can be implemented by triggers though)
(-) no cascading deletes (can be implemented by triggers though)
(+) queries that do not care about one type or the other are easier to implement
(-) queries that do care about the type need additional predicates
(+) the mapping between the object and the table is straight forward, inheritance can be implemented in the load and save routines
(+) additional types can be included without modifying the database
(+) navigation from the tags to the objects is easier than with the second method, as there is only one table you habe to look in.
Method 2
(-) two tables to maintain
(+) simple key that has a clear relationsip with its parent
(+) foreign key checking is simple
(+) cascading deletes work directly
(-) queries that do not care about one type or the other more difficult as they have to take multiple tables into account
(+) queries that do care about the type are easier
(-) the mapping between the object and the tables is not straigt forward, inherited objects need to pull the data from different tables
(-) additional types require additonal tables, therefore, modifications of the database
(-) navigation from the tags to the objects is more difficult than with the firstmethod, as there are multiple methods to look in (this can be leviated with views, but additional tables mean that you have to change the view too).
Edit:
Method 2 works should work with all ORM mappers as it is a simple parent-child relationship.
Method 1 needs to be supported by an ORM mapper. Some ORM mappers support such constructs by the notion of inheritance (e.g. Linq to SQL, .Net Entity Framework). If inheritance is supported you have a base type (the table) and inheriting types. Each inheriting type needs to have the information that discriminates it from the other types, based on the information provided by the data in the table. This discriminator is stored with the inheriting type and used to create the "filtered" queries needed. When providing inheritance eager loading is mostly also supported for these types (again e.g. the two named ORMs).
Note: I used Linq to SQL and the .NET Entity Framework as an example because I know them best, but I believe that other ORM have similar concepts.
I don't see anything terribly ugly with the former idea - (tagId, entityId, applyTo). It would also generalise to further domain objects to which you might wish to attach tags.
I don't know off-hand if using an applyTo int (or enum) column might be faster, and maybe a TagMapLabel table that maps the applyTo number to a string ('gallery', 'site', etc.) for rendering purposes.
frank
why dont you make one table with these fields:
id | title | desc | url | type
and in the type column, list either "Gallery" or "Site"
then you could create your tag table like this:
TagMap
tagId | entityId | applyTo