How to represent many similar attributes of an entity in a database? - sql

Let's say I'm building a website about cars. The car entity has a lot of enum-like attributes:
transmission (manual/automatic)
fuel (gasoline/diesel/bioethanol/electric)
body style (coupe/sedan/convertible/...)
air conditioning (none/simple/dual-zone)
exterior color (black/white/gray/blue/green/...)
interior color (black/white/gray/blue/green/...)
etc.
The list of these attributes is likely to change in the future. What is the optimal way to model them in the database? I can think of the following options but can't really decide:
use fields in the car table with enum values
hard to add more columns later, probably the fastest
use fields in the car table that are foreign keys referencing a lookup table
hard to add more colums later, somewhat slower
create separate tables for each of those attributes that store the possible values and another table to store the connection between the car and the attribute value
easy to add more possible values later, even slower, seems to be too complicated

Idealy is to create a relational database. Each table from DB should be represented by a class, as in hibernate. You should make 2 tables for the car. One for the interior and one for the exterior of the car. If you want to add extra features, you just add more columns.

Now here is a (very basic) EAV model:
DROP TABLE IF EXISTS example.zvalue CASCADE;
CREATE TABLE example.zvalue
( val_id SERIAL NOT NULL PRIMARY KEY
, zvalue varchar NOT NULL
, CONSTRAINT zval_alt UNIQUE (zvalue)
);
GRANT SELECT ON TABLE example.zvalue TO PUBLIC;
DROP TABLE IF EXISTS example.tabcol CASCADE;
CREATE TABLE example.tabcol
( tabcol_id SERIAL NOT NULL PRIMARY KEY
, tab_id BIGINT NOT NULL REFERENCES example.zname(nam_id)
, col_id BIGINT NOT NULL REFERENCES example.zname(nam_id)
, type_id varchar NOT NULL
, CONSTRAINT tabcol_alt UNIQUE (tab_id,col_id)
);
GRANT SELECT ON TABLE example.tabcol TO PUBLIC;
DROP TABLE IF EXISTS example.entattval CASCADE;
CREATE TABLE example.entattval
( ent_id BIGINT NOT NULL
, tabcol_id BIGINT NOT NULL REFERENCES example.tabcol(tabcol_id)
, val_id BIGINT NOT NULL REFERENCES example.zvalue(val_id)
, PRIMARY KEY (ent_id, tabcol_id, val_id)
);
GRANT SELECT ON TABLE example.entattval TO PUBLIC;
BTW: this is tailored to support system catalogs; you might need a few changes.

This is really a duplicate of this dba.SE post:
https://dba.stackexchange.com/questions/27057/model-with-variable-number-of-properties-of-different-types
Use hstore, json, xml, an EAV pattern, ... see my answer on that post.

Depending upon the number of queries and size of the databases you could either:
Make wide tables
Make an attibutes table and a car_attributes table where: cars -> car_attributes -> attributes
#1 will make faster, easier queries due to less joins, but #2 is more flexible

It is up to the admin UI you need to support:
If there is an interface to manage for example the types of a transmission you should store this in a separate entity. (your option 3)
If there is no such interface the best would be to store in like enumerable type values. When you need another one(for example 'semi-automatic' for the transmission) you will add this only in the DB schema, as a matter of fact this will be the easiest to support and fastest to execute

I would create create table CarAttributes
with column AttributeID,CarID,PropertyName,PropertyValue.
When reslut set is returned we save it in IDictionary.
It will allow you to add as many rows as you need without adding new columns.

Related

Include one table's values in multiple other tables and allow FK references

I'm still a relative novice when it comes to designing SQL databases, so apologies if this is something obvious that I'm missing.
I have a few tables of controlled vocabularies for certain values that I'm representing as FKs referencing the controlled vocab tables (there are few distinct vocabularies I'm trying to represent). My schema specification allows each of these vocabularies to also allow a controlled set of values for "unknown" information (coming from DataCite). Here is an example using a table dates that must specify a date_type, which should be either a value from date_types or unknown_values. I have a few more tables with this model as well, each with their own specific controlled vocabularies, but should also allow values from unknown_values. So the values in unknown_values should be shared among many tables of controlled vocabularies with similar structure to date_types.
CREATE TABLE dates (
date_id integer NOT NULL PRIMARY KEY autoincrement ,
date_value date NOT NULL DEFAULT CURRENT_DATE ,
date_type text NOT NULL ,
FOREIGN KEY ( date_type ) REFERENCES date_types( date_type )
);
CREATE TABLE date_types (
date_type text NOT NULL PRIMARY KEY ,
definition text
);
CREATE TABLE unknown_values (
code text NOT NULL PRIMARY KEY ,
definition text
);
INSERT INTO date_types (date_type, definition)
VALUES
('type_a', 'The first date type'),
('type_b', 'The second date type');
INSERT INTO unknown_values (code, definition)
VALUES
(':unac', 'Temporarily inaccessible'),
(':unal', 'Unallowed, suppressed intentionally'),
(':unap', 'Not applicable, makes no sense'),
(':unas', 'Value unassigned (e.g., Untitled)'),
(':unav', 'Value unavailable, possibly unknown'),
(':unkn', 'Known to be unknown (e.g., Anonymous, Inconnue)'),
(':none', 'Never had a value, never will'),
(':null', 'Explicitly and meaningfully empty'),
(':tba', 'To be assigned or announced later'),
(':etal', 'Too numerous to list (et alia)');
My first thought was a view that creates a union of date_types and unknown_values, but you cannot make FK references onto a view, so that's not suitable.
The "easiest" solution would be to duplicate the values from unknown_values in each controlled vocabulary table (date_types etc.), but this feels incorrect to have duplicated values.
I also thought about a single table for all the controlled vocabularies with a third field (something like vocabulary_category with values like 'date'), so all my tables could reference that one table, but then I would likely need a function and a CHECK constraint to ensure that the value has the right "category". This feels inelegant and messy.
I'm stumped about the best way to proceed, or what to search for to find help. I can't imagine this is too rare of a requirement, but I can't seem to find any solutions online. My target DB is SQLite, but I'd be interested in solutions that would be possible in PostgreSQL as well.
What you are requesting is the ability for a FK to have optional referenced table. Also as discovered Postgres nor SQLite(?) provides this option (afaik neither does any other RDBMS). Postgres at lease offers a work around, I do not know it its doable in SQLite. You need to:
drop the not null constraint on the currently defined FK
add a FK column referencing the unknown_values table
add check constraint that requires exactly 1 on the columns
date_type and the new FK column to be null. See the num_nulls function.
Changes you need: ( see demo )
alter table dates
alter column date_type
drop not null;
alter table dates
add unknown_value text
references unknown_values(code);
alter table dates
add constraint one_null
check (num_nulls(date_type, unknown_value ) = 1);
Note: Postgres does not support the autoincrement key word. The same is accomplished using a generated column generated always as identity (for older varsions use serial).

Modifying the schema of a table to make it inherit from another one without recreating the table or reinserting the data

I have several tables in my PostgreSQL database that have a couple of common columns. I thought it would be a good idea if I move those common columns into a new table and made the tables inherit from this new table. For example:
create table foo_bar (
code varchar(9) primary key,
name varchar,
bar integer
);
After refactoring:
create table foo (
code varchar(9) primary key,
name varchar
);
create table foo_bar (
bar integer
) inherits (foo);
The problem is, I have lots of data in foo_bar, as well as many views that refer to this table.
Is it possible to alter definition of foo_bar to achieve the above change without dropping the data in the table?
It may or may not be a good idea. Database design isn't exactly object oriented programming. There are some caveats.
A serious limitation of the inheritance feature is that indexes
(including unique constraints) and foreign key constraints only apply
to single tables, not to their inheritance children. This is true on
both the referencing and referenced sides of a foreign key constraint.
And there are few more issues described at that link which you need to be aware of.
If you really want to go ahead with this, how to alter the table is given in Laurenz Albe's answer.
CREATE TABLE foo (
code varchar(9),
name varchar
);
ALTER TABLE foo_bar INHERIT foo;

Using LIKE queries, Foreign Keys or jsonb arrays for filtering Postgres SELECT

I have a table, which suppose to contain relations to other tables. But I need these relations only for filtering, i.e. the relations in this case are not suppose to work as foreign key or some kind of references - I just need to search through them. It works like tags or something.
Here is the example:
CREATE TABLE "public"."table_name"
(
"id" uuid NOT NULL DEFAULT uuid_generate_v4(),
"relations" text NOT NULL,
"some_column" text,
"some_another_column" int4,
"created" timestamp(6) WITH TIME ZONE NOT NULL DEFAULT now(),
CONSTRAINT "table_name_pkey" PRIMARY KEY ("id") NOT DEFERRABLE INITIALLY IMMEDIATE,
CONSTRAINT "owner" FOREIGN KEY ("owner") REFERENCES "public"."user" ("id") ON UPDATE NO ACTION ON DELETE CASCADE NOT DEFERRABLE INITIALLY IMMEDIATE,
)
WITH (OIDS=FALSE);
ALTER TABLE "public"."table_name" OWNER TO "postgres";
the relations column will contain multiple uuid keys. This column is not suppose to be SELECTed, I intend to use it only for filtering. In this case I intend to use this kind of queries to select rows from only this table:
SELECT
id, some_column, some_another_column
FROM
table_name
WHERE
relations LIKE '%c56c8a4f-765a-4e1c-9638-f3736a25da17%'
AND owner = 'badee659-1fca-412a-bcf6-c73ecf1e65aa';
Of course, I will create a multicolumn (owner,relations) index.
Is this a good approach to perform this kind of search queries? relations column will contain 1 to 10 uuids per each row on the average.
Or, maybe I should create additional table, which will contain, say, one uuid for each 'relation' and FK for referencing to the table_name table? In this case I will use JOIN queries.
Or may be there are better ways? May be I should store uuids as array within jsonb object? Or something else?
Index consideration
Using operator LIKE with leading % is not sargable. It won't use your index, because the optimizer doesn't know how to narrow first characters from relations column.
Design
It's almost always a bad idea to store different values in one column as a string with delimiter.
Remember that relational databases are designed for performing JOIN operations efficiently. In my opinion it would be better to separate that data into rows with atomic values in their columns.
Json consideration
json and jsonb datatypes should only be taken under consideration if your columns are unpredictably changing. By saying this I mean that whenever you can (without much overhead going on) fit your model into relational one, you should always go for it. The same goes for hstore.
You could read this blog post to grab some information to start with when considering using a mechanism for storing dynamic columns.
A little quote from that post:
How to decide when to use json
Use json if your data won’t fit in the database using a normal
relational modelling.
Credit for above blogpost goes to #Craig Ringer.

Variable amount of sets as SQL database tables

More of a question concerning the database model for a specific problem. The problem is as follows:
I have a number of objects that make up the rows in a fixed table, they are all distinct (of course). One would like to create sets that contain a variable amount of these stored objects. These sets would be user-defined, therefore no hard-coding. Each set will be characterized by a number.
My question is: what advice can you experienced SQL programmers give me to implement such a feature. My most direct approach would be to create a table for each such set using table-variables or temporary tables. Finally, an already present table that contains the names of the sets (as a way to let the user know what sets are currently present in the database).
If not efficient, what direction would I be looking in to solve this?
Thanks.
Table variables and temporary tables are short lived, narrow of scope and probably not what you want to use for this. One table for each Set is also not a solution I would choose.
By the sound of it you need three tables. One for Objects, one for Sets and one for the relationship between Objects and Sets.
Something like this (using SQL Server syntax to describe the tables).
create table [Object]
(
ObjectID int identity primary key,
Name varchar(50)
-- more columns here necessary for your object.
)
go
create table [Set]
(
SetID int identity primary key,
Name varchar(50)
)
go
create table [SetObject]
(
SetID int references [Object](ObjectID),
ObjectID int references [Set](SetID),
primary key (SetID, ObjectID)
)
Here is the m:m relation as a pretty picture:

How to create multiple one to one's

I have a database set up with many tables and it all looks good apart from one bit...
Inventory Table <*-----1> Storage Table <1-----1> Van Table
^
1
|-------1> Warehouse Table
The Storage table is used since the Van and Warehouse table are similar but how do I create a relationship between Storage and Warehouse/Van tables? It would make sense they need to be 1 to 1 as a Storage object can only be 1 Storage place and type.
I did have the Van/Warehouse table link to the StorageId primary key and then add a constraint to make sure the Van and Warehouse tables dont have the same StorageId, but this seems like it could be done a better way.
I can see several ways of doing this but they all seem wrong, so any help would be good!
You are using the inheritance (also known in entity-relationship modeling as "subclass" or "category"). In general, there are 3 ways to represent it in the database:
"All classes in one table": Have just one table "covering" the parent and all child classes (i.e. with all parent and child columns), with a CHECK constraint to ensure the right subset of fields is non-NULL (i.e. two different children do not "mix").
"Concrete class per table": Have a different table for each child, but no parent table. This requires parent's relationships (in your case Inventory <- Storage) to be repeated in all children.
"Class per table": Having a parent table and a separate table for each child, which is what you are trying to do. This is cleanest, but can cost some performance (mostly when modifying data, not so much when querying because you can join directly from child and skip the parent).
I usually prefer the 3rd approach, but enforce both the presence and the exclusivity of a child at the application level. Enforcing both at the database level is a bit cumbersome, but can be done if the DBMS supports deferred constraints. For example:
CHECK (
(
(VAN_ID IS NOT NULL AND VAN_ID = STORAGE_ID)
AND WAREHOUSE_ID IS NULL
)
OR (
VAN_ID IS NULL
AND (WAREHOUSE_ID IS NOT NULL AND WAREHOUSE_ID = STORAGE_ID)
)
)
This will enforce both the exclusivity (due to the CHECK) and the presence (due to the combination of CHECK and FK1/FK2) of the child.
Unfortunately, MS SQL Server does not support deferred constraints, but you may be able to "hide" the whole operation behind stored procedures and forbid clients from modifying the tables directly.
Just the exclusivity can be enforced without deferred constraints:
The STORAGE_TYPE is a type discriminator, usually an integer to save space (in the example above, 0 and 1 are "known" to your application and interpreted accordingly).
The VAN.STORAGE_TYPE and WAREHOUSE.STORAGE_TYPE can be computed (aka. "calculated") columns to save storage and avoid the need for the CHECKs.
--- EDIT ---
Computed columns would work under SQL Server like this:
CREATE TABLE STORAGE (
STORAGE_ID int PRIMARY KEY,
STORAGE_TYPE tinyint NOT NULL,
UNIQUE (STORAGE_ID, STORAGE_TYPE)
);
CREATE TABLE VAN (
STORAGE_ID int PRIMARY KEY,
STORAGE_TYPE AS CAST(0 as tinyint) PERSISTED,
FOREIGN KEY (STORAGE_ID, STORAGE_TYPE) REFERENCES STORAGE(STORAGE_ID, STORAGE_TYPE)
);
CREATE TABLE WAREHOUSE (
STORAGE_ID int PRIMARY KEY,
STORAGE_TYPE AS CAST(1 as tinyint) PERSISTED,
FOREIGN KEY (STORAGE_ID, STORAGE_TYPE) REFERENCES STORAGE(STORAGE_ID, STORAGE_TYPE)
);
-- We can make a new van.
INSERT INTO STORAGE VALUES (100, 0);
INSERT INTO VAN VALUES (100);
-- But we cannot make it a warehouse too.
INSERT INTO WAREHOUSE VALUES (100);
-- Msg 547, Level 16, State 0, Line 24
-- The INSERT statement conflicted with the FOREIGN KEY constraint "FK__WAREHOUSE__695C9DA1". The conflict occurred in database "master", table "dbo.STORAGE".
Unfortunately, SQL Server requires for a computed column which is used in a foreign key to be PERSISTED. Other databases may not have this limitation (e.g. Oracle's virtual columns), which can save some storage space.
As you say, there are many solutions. I would recommend starting with the simplest solution, then optimising later if performance or storage become problems. The simplest solution (but not optimal in terms of storage) would be to have a Storage table that has a column for storage type (indicating whether the row represents a van or a warehouse), plus columns for Van attributes as well as Warehouse attributes. In a row that represents a Van, the columns for the Warehouse attributes will all be null. In a row that represents a Warehouse, the columns for the Van attributes will all be null.
That way, you cut down on the number of tables, and keep your queries nice and simple. Be prepared to revisit your decision if storage becomes tight.
Somehow seems to me that inventory-items may change locations, so I would go with something like this.