Variable amount of sets as SQL database tables - sql

More of a question concerning the database model for a specific problem. The problem is as follows:
I have a number of objects that make up the rows in a fixed table, they are all distinct (of course). One would like to create sets that contain a variable amount of these stored objects. These sets would be user-defined, therefore no hard-coding. Each set will be characterized by a number.
My question is: what advice can you experienced SQL programmers give me to implement such a feature. My most direct approach would be to create a table for each such set using table-variables or temporary tables. Finally, an already present table that contains the names of the sets (as a way to let the user know what sets are currently present in the database).
If not efficient, what direction would I be looking in to solve this?
Thanks.

Table variables and temporary tables are short lived, narrow of scope and probably not what you want to use for this. One table for each Set is also not a solution I would choose.
By the sound of it you need three tables. One for Objects, one for Sets and one for the relationship between Objects and Sets.
Something like this (using SQL Server syntax to describe the tables).
create table [Object]
(
ObjectID int identity primary key,
Name varchar(50)
-- more columns here necessary for your object.
)
go
create table [Set]
(
SetID int identity primary key,
Name varchar(50)
)
go
create table [SetObject]
(
SetID int references [Object](ObjectID),
ObjectID int references [Set](SetID),
primary key (SetID, ObjectID)
)
Here is the m:m relation as a pretty picture:

Related

How to design entity tables for entities with multiple names

I want to create a table structure to store customers and I am facing a challenge: for each customer I can have multiple names, one being the primary one and the others being the alternative names.
The initial take on the tables looks like this:
CREATE TABLE dbo.Customer (
CustomerId INT IDENTITY(1,1) NOT NULL --PK
-- other fields below )
CREATE TABLE dbo.CustomerName (
CustomerNameId INT IDENTITY(1,1) NOT NULL -- PK
,CustomerId INT -- FK to Customer
,CustomerName VARCHAR(30)
,IsPrimaryName BIT)
Though, the name of the customer is part of the Customer entity and I feel that it belongs to the Customer table.
Is there a better design for this situation?
Thank you
Personally, I would keep the Primary name in the Customer table and create an "AlternateNames" table with a zero-to-many relationship to Customer.
This is because presumably most of the time when you are returning customer data, you are only going to be interested in returning the Primary Name. And probably the main (if not only) reason you want the alternate names is for looking up customers when an alternate name has been supplied.
Unfortunately, this is too long for a comment.
Before figuring this out, more information is needed.
Is additional information needed for names? For instance, language or title or date created?
Are the names unique? Is the uniqueness within a customer or over all names?
Are the primary names unique?
Does every customer have to have a primary name?
How often does the primary name change to an alternate name? (As opposed to just having the name updated.)
When querying the data, will you know if the name is a primary or alternate name? (Or do they all need to be compared?)
Depending on the answer to this question, the appropriate data structure can have some tricky nuances. For instance, if you have a flag to identify the primary name, it can be tricky to ensure that exactly one row has this value set -- particularly when updating rows.
Note: If you update the question with the answers, I'll delete this.

Problems on having a field that will be null very often on a table in SQL Server

I have a column that sometimes will be null. This column is also a foreign key, so I want to know if I'll have problems with performance or with data consistency if this column will have weight
I know its a foolish question but I want to be sure.
There is no problem necessarily with this, other than it is likely indication that you might have poorly normalized design. There might be performance implications due to the way indexes are structured and the sparseness of the column with nulls, but without knowing your structure or intended querying scenarios any conclusions one might draw would be pure speculation.
A better solution might be a shared primary key where table A has a primary key, and there is zero or one records in B with the same primary key.
If table A can have one or zero B, but more than one A can refer to B, then what you have is a one to many relationship. This can be represented as Pieter laid out in his answer. This allows multiple A records to refer to the same B, and in turn each B may optionally refer to an A.
So you see there are two optional structures to address this problem, and choosing each is not guesswork. There is a distinct rational between why you would choose one or the other, but it depends on the nature of your relationships you are modelling.
Instead of this design:
create table Master (
ID int identity not null primary key,
DetailID int null references Detail(ID)
)
go
create table Detail (
ID int identity not null primary key
)
go
consider this instead
create table Master (
ID int identity not null primary key
)
go
create table Detail (
ID int identity not null primary key,
MasterID int not null references Master(ID)
)
go
Now the Foreign Key is never null, rather the existence (or not) of the Detail record indicates whether it exists.
If a Detail can exist for multiple records, create a mapping table to manage the relationship.

Can I use a trigger to create a column?

As an alternative to anti-patterns like Entity-Attribute-Value or Key-Value Pair tables, is it possible to dynamically add columns to a data table via an INSERT trigger on a parameter table?
Here would be my tables:
CREATE TABLE [Parameters]
(
id int NOT NULL
IDENTITY(1,1)
PRIMARY KEY,
Parameter varchar(200) NOT NULL,
Type varchar(200) NOT NULL
)
GO
CREATE TABLE [Data]
(
id int NOT NULL
IDENTITY(1,1)
PRIMARY KEY,
SerialNumber int NOT NULL
)
GO
And the trigger would then be placed on the parameter table, triggered by new parameters being added:
CREATE TRIGGER [TRG_Data_Insert]
ON [Parameters]
FOR INSERT
AS BEGIN
-- The trigger takes the newly inserted parameter
-- record and ADDs a column to the data table, using
-- the parameter name as the column name, the data type
-- as the column data type and makes the new column
-- nullable.
END
GO
This would allow my data mining application to get a list of parameters to mine and have a place to store that data once it mines it. It would also allow a user to add new parameters to mine dynamically, without having to mess with SQL.
Is this possible? And if so, how would you go about doing it?
I think the idea of dynamically adding columns will be a ticking time bomb, just gradually creeping towards one of the SQL Server limits.
You will also be putting the database design in the hands of your users, leaving you at the mercy of their naming conventions and crazy ideas.
So while it is possible, is it better than an EAV table, which is at least obvious to the next developer to pick up your program؟

How to represent many similar attributes of an entity in a database?

Let's say I'm building a website about cars. The car entity has a lot of enum-like attributes:
transmission (manual/automatic)
fuel (gasoline/diesel/bioethanol/electric)
body style (coupe/sedan/convertible/...)
air conditioning (none/simple/dual-zone)
exterior color (black/white/gray/blue/green/...)
interior color (black/white/gray/blue/green/...)
etc.
The list of these attributes is likely to change in the future. What is the optimal way to model them in the database? I can think of the following options but can't really decide:
use fields in the car table with enum values
hard to add more columns later, probably the fastest
use fields in the car table that are foreign keys referencing a lookup table
hard to add more colums later, somewhat slower
create separate tables for each of those attributes that store the possible values and another table to store the connection between the car and the attribute value
easy to add more possible values later, even slower, seems to be too complicated
Idealy is to create a relational database. Each table from DB should be represented by a class, as in hibernate. You should make 2 tables for the car. One for the interior and one for the exterior of the car. If you want to add extra features, you just add more columns.
Now here is a (very basic) EAV model:
DROP TABLE IF EXISTS example.zvalue CASCADE;
CREATE TABLE example.zvalue
( val_id SERIAL NOT NULL PRIMARY KEY
, zvalue varchar NOT NULL
, CONSTRAINT zval_alt UNIQUE (zvalue)
);
GRANT SELECT ON TABLE example.zvalue TO PUBLIC;
DROP TABLE IF EXISTS example.tabcol CASCADE;
CREATE TABLE example.tabcol
( tabcol_id SERIAL NOT NULL PRIMARY KEY
, tab_id BIGINT NOT NULL REFERENCES example.zname(nam_id)
, col_id BIGINT NOT NULL REFERENCES example.zname(nam_id)
, type_id varchar NOT NULL
, CONSTRAINT tabcol_alt UNIQUE (tab_id,col_id)
);
GRANT SELECT ON TABLE example.tabcol TO PUBLIC;
DROP TABLE IF EXISTS example.entattval CASCADE;
CREATE TABLE example.entattval
( ent_id BIGINT NOT NULL
, tabcol_id BIGINT NOT NULL REFERENCES example.tabcol(tabcol_id)
, val_id BIGINT NOT NULL REFERENCES example.zvalue(val_id)
, PRIMARY KEY (ent_id, tabcol_id, val_id)
);
GRANT SELECT ON TABLE example.entattval TO PUBLIC;
BTW: this is tailored to support system catalogs; you might need a few changes.
This is really a duplicate of this dba.SE post:
https://dba.stackexchange.com/questions/27057/model-with-variable-number-of-properties-of-different-types
Use hstore, json, xml, an EAV pattern, ... see my answer on that post.
Depending upon the number of queries and size of the databases you could either:
Make wide tables
Make an attibutes table and a car_attributes table where: cars -> car_attributes -> attributes
#1 will make faster, easier queries due to less joins, but #2 is more flexible
It is up to the admin UI you need to support:
If there is an interface to manage for example the types of a transmission you should store this in a separate entity. (your option 3)
If there is no such interface the best would be to store in like enumerable type values. When you need another one(for example 'semi-automatic' for the transmission) you will add this only in the DB schema, as a matter of fact this will be the easiest to support and fastest to execute
I would create create table CarAttributes
with column AttributeID,CarID,PropertyName,PropertyValue.
When reslut set is returned we save it in IDictionary.
It will allow you to add as many rows as you need without adding new columns.

Database Design - Optional Bit Fields

I'm building a database that will be used to store Questions and Answers. There are varying Question types that deal so far with a DateTime answer, Free Text, DropDownList, and some that link to other tables in the database. My design question is this: Some Question types have other boolean attributes that are unique to that type. Is it better to have the boolean column in the generic Questions table or create some sort of Flag table for each question type?
As an example, a DropDownList Question might have a boolean attribute to tell whether or not to display a TextBox when a value "Other" is selected, but a Free Text Question would have no use for this.
Thanks heaps
EDIT:
I guess it seems to be boiling down to is it better to store unused columns in a generic Questions table to extend out for each Question type and have lots of keys back to the Question table using Views to access the data for various Question types.
Strip out all the extra attributes from the base question table and have a field for the 'Question Type' and a set of tables for each question type. In your application code, based on the questions type retrieve the row from the particular question type table and use them.
An example:
Base Question Table: t_question <QuestionID, Question, QuestionType, QuestionTypeLink>
Let's say you have two question types: Comprehensive or Simple. Create two tables for each of them with schema: t_compflags <linkID, field1, field2...> and t_simpleflags <linkID, field1, field2...>.
Now in the base question table, QuestionType would take two values: Comprehensive or Simple. Based on that field it uses the QuestionTypeLink to refer the row in either of the tables.
EDIT:
You can't directly enforce PK-FK constraint on these tabes, you have to do that in application code. But if you would like to enforce that constraint, there is a dirty way of doing it. Instead of QuestionTypeLink, have two columns CompQuestionTypeLink and SimpQuestionTypeLink which allow nulls and references the other two tables. But I personally think this is a bad design.
This depends entirely on how much normalisation you want to do and how many columns you're talking about.
If you are expecting quite a number then you should have a 1:1 table relationship simply to extend that question type. Something like
Create Table QuestionType_DropDownList
(OtherDisplay bit,
SomethingElse bit)
This is easier to read and easier to query. But it's not easily maintainable. It is unfortunately very much a pros/cons thing.
In my experience I would pick this solution as you never know what the future may hold.
Depending on how many combinations you have you could just express each combination as its own type:
DateTime
DropDownList
DropDownListWithOptionalOther
FreeText
FreeTextNumbersOnly
...
This flattens your design a little at the expense of a potential combinatorial explosion. But I don't know how many combinations you have, or will have.
Could you include the text box automatically if you have a DropDownList choice of "Other?" Or would there be a case when the user wouldn't have to specify what "other" is?
If you have too many combinations to consider, then it still sounds like you'll need to specify the flags at a per-question basis, so it makes sense to include another field in the Questions table to specify these flags. Maybe have them as plain text so you can extend later if you need to? Like a comma-separated list of flags in that field?
I am mulling this over as a possible solution, seems more abstracted to me and allows for the most future extension.
CREATE TABLE dbo.QuestionTypes
(
Id INT IDENTITY(1, 1) PRIMARY KEY,
Type VARCHAR(256) NOT NULL
);
CREATE TABLE dbo.TypeSpecificFlags
(
Id INT IDENTITY(1, 1) PRIMARY KEY,
TypeId INT REFERENCES dbo.QuestionTypes(Id) NOT NULL,
Flag VARCHAR(256) NOT NULL
)
CREATE TABLE dbo.Questions
(
Id INT IDENTITY(1, 1) PRIMARY KEY,
Name VARCHAR(256) NOT NULL,
ShortName VARCHAR(32),
TypeId INT REFERENCES QuestionTypes(Id) NOT NULL,
AllowNulls BIT NOT NULL DEFAULT 1,
Sort INT
);
CREATE TABLE dbo.QuestionsFlags
(
Id INT IDENTITY(1, 1) PRIMARY KEY,
QuestionId INT REFERENCES dbo.Questions(Id) NOT NULL,
FlagId INT REFERENCES dbo.TypeSpecificFlags(Id) NOT NULL,
Answer BIT NOT NULL
);