Creating an SQL Schema (postgresql) - sql

I'm having problems creating a schema for a PostgreSQL project.
It's for a social networking site, if there is a profile, and each profile comes in three varieties: generic, education, and employment profiles, therefore each profile requires different attributesā€¦ how do we do this all in the one table?
create type ProfileTypeValue as enum
('generic', 'education', 'employment');
create Profiles (
id integer
type ProfileTypeValue
....?
primary key (id)
);
because for instance if it's an education profile, then we need to have institution name etc, or if it's an employment profile, then we need to have an employer name attribute, etc.
Is it best to just have 3 different tables, 1 for each profile Type, dont know if thats possibleā€¦ but I feel like I need to have an if statement saying if it's profile, include these attributes, or if its a profile, include these attributes, etc.

Here's a couple of options
All in the same table
Common profile attributes in one table, profile type specific in their own tables with foreign key references to the common profile table
Inheritance
Key-Value store
All in the same table
In this option all the fields are always present whatever type the profile is. This is too easy to do the first time around as you only have to list all the columns. However, this is really bad design that will make your life harder in the long run, because the maintainability and extendability is poor. You should read up on database normal forms etc. Don't do this.
Master profile table and profile type dependent details on their own tables
In this option you will create a table for all profiles. This will include all the common attributes. This table will make sure the identifiers are all in the same namespace and each profile has a unique id. For each profile type you'll create a new table that has a foreign key reference to the master profile table. You can then select all employment profiles using an inner join on the employment profile table and the master profile table. This design allows you to create constraints for each profile type. Furthermore, this design lets you have profiles that are both employment and education profiles. You should probably do this.
Inheritance
Postges provides a facility for table inheritance. You can use this by creating a base table for all profile types and then creating child tables for each profile type. Each profile type then inherits all the attributes defined in the parent table. With inheritance you can select all profiles using the parent table and all employment profiles using the employment profile table. If generic profiles use only common attributes, they can be stored to the parent table.
The main disadvantage of inheritance in postgres is that parent table and the child tables do not share the same namespace. You cannot create a unique constraint that spans all the tables. This means that you have to make sure that the identifiers are globally unique some other way e.g. keeping a separate table for the profile identifiers.
You should think if the disadvantages of inheritance matter in you situation. However, this is the sensible way of doing separate tables for all profile types if you are using postgres as you don't have to duplicate the definitions of the common attributes.
Key-value store
You could also create a table for common profile attributes and keep the rest of the attribues in (profile, attribute, value)-tuples. By doing this, you'd discard the benefits of a RDBMS and you'd have to implement all the logic in you program. Don't do this.

PostgreSQL supports table level inheritance. You can make a Profile table as the parent table with common attributes and then separate child tables for education and employment with only attributes specific to those categories
Check out the PostgreSQL documentation here.

Related

Postgres Foreign Keys on a generalizing link table

I'm doing some work with a system that has many types of entities that may or may not have access to many types of resources. Setting up these tables I have a structure where I set up an Entity_Sequence and a Resource Sequence, and then create a link table for each different entity and each different Resource to associate them with their respective sequence.
For example, I have a Users_Entities_Link table with the columns user_id referencing User, and entity_id, which is a bigint default nextval(Entity_Sequence).
To top this structure off is an Entities_Resources_Access of (entity_id, resource_id) to denote whether the entity has access to the resource. However, given that each of these entities could be related to any one of the entity link tables and the same for resources and each of the resource tables, I'm trying to figure out what the best way to handle the relationship is. This seems like a fairly rare problem, so I couldn't find help elsewhere on it.
The best that I could determine myself was to run an after deletion trigger on each of the Entity or Resource link tables that would check if the entity or resource exists in the access table, but that's a lot of debt to handle when adding in new potential entities or resources.
Is there a better solution to either the structural problem of how to deal with this many entities accessing many resources issue, or how to handle the sequence relationship better? Do I need to add in a dummy entity table and a dummy resource table that each only have an ID for the link and access tables to link foreign keys to? That seems like a lot of wasted space if I have a large quantity of any given entity or resource, and also something that I would have to manually unlink if it floated without anything referencing it on deletion of a row in an associated table (like a user)
Here's how the setup is currently designed:
Table some_entity_entity_link
some_entity_id FK refs some_entity(id)
entity_id not null default nextval(entity_sequence)
Table another_entity_entity_link
another_entity_id FK refs another_entity(id)
entity_id not null default nextval(entity_sequence)
Table some_resource_resource_link
some_resource_id FK refs some_resource(id)
resource_id not null default nextval(resource_sequence)
Table another_resource_resource_link
another_resource_id FK refs another_resource(id)
resource_id not null default nextval(resource_sequence)
Table entities_resources_access
entity_id
resource_id

When do relationships in ERD diagrams get a separate table in a RDBMS

Title is the question. When should a relationship in an ER diagram be given its own table in a RDBMS? For instance, one mail courier(with attributes eid and surname) can deliver a number of packages but a package(attributes,pid, sent_By, going_to) can only have one mail courier. Would it make sense to make a table for the relationship called delivers(with an attribute of the time that the package was delivered)? or should the eid of the mail_courier and time_delivered from the deliver relationship be added to the package entity? Also, what would be an example when you would not want to add the attributes to the package entity?
I think what you are trying is to create a one-to-many relationship between two entities. And for that, there is no need to create a separate table; as you mentioned in your question, just add those two attributes to the package table.
Where you would need to create a separate table is when you want to achieve many-to-many relationship between two entities. For example, take twitter's followers. One user can have many followers and a follower can follow many users. You can't do that the relational way without creating a new table with just those two columns.

How do I structure a generic item that can have a relationship with different tables?

In my example, I have a watch, which is an indication a user wants notifications about events on a different item, say a group and an organization.
I see two ways to do this:
Have a groupwatch resource, with a groupwatch table, with id,user,group (group FK to group resource and table); and a orgwatch resource, with a orgwatch table, with id,user,organization (org FK to organization resource and table)
Have a generic watch resource, with a watch table, with id,user,type,typeid. type is one of group or organization, and typeid is the ID of the group or organization being watched.
Since both of them are watches, it seems a waste to have two different tables and resources to watch 2 different objects. It gets worse if I start watching 4, 5, 6, 20, 50 different types of resources.
On the other hand, a foreign key relationship appears impossible if I just have a generic typeid, which means that my database (if relational) and my framework (activerecord or anything else) cannot enforce it correctly.
How do I best implement this type of "association to different types of record/table for each record in my table"?
UPDATE:
Are my only choices for doing this:
separate tables/resources for each watch type, which enables the database to enforce relational integrity and do joins
single table for all watches, but I will have to enforce relational integrity and do joins at the app level?
If you add a new type of resource once every six months, you may want to define your tables in such a way that adding new resources involves changing data definitions. If you add a new resource type every week, you may want to make your data definitions stay the same when you add new types. There's a downside to either choice.
If you do choose to define table in such a way that the types are visible in the table structure, there are two patterns often used with type/subtype (aka class/subclass) situations.
One pattern has been called "single table inheritance". Put data about all the types in a single table, and leave some columns NULL wherever they do not apply.
Another pattern has been called "class table inheritance". Define one table for the superclass, with all the data that is common to all the types. Then define tables for each subtype (subclass) to contain class specific data. Make the primary key of the subtype tables a duplicate of the primary key in the supertype table, and also declare it as a foreign key that references the primary key of the supertype table. It's going to be up to the app, at insert time, to replicate the value of the primary key in the supertype table over in the subtype table.
I like Fowlers' treatment of these two patterns.
http://martinfowler.com/eaaCatalog/classTableInheritance.html
http://www.martinfowler.com/eaaCatalog/singleTableInheritance.html
This matter of sharing primary keys has a few beneficial effects.
First, it enforces the one-to-one nature of the ISa relationships.
Second, it makes it easy to find out whether a given entry belongs to a desired subtype, by just joining with the subtype table. You don't really need an extra type field.
Third, it speeds up the joins, because of the index that gets built when you declare a primary key.
If you want a structure that can adapt to new attributes without changing data definitions, you can look into E-A-V design. Be careful, though. Sometimes this results in data that is nearly impossible to use, because the logical structure is so obscure. I usually think of E-A-V as an anti-pattern for this reason, although there are some who really like the results they get from it.

What are the benefits of using separate role-bridge table over all-in-one table?

I have a bridge table book_person between tables book and person to provide many-to-many relation. In this table I also have role-definitions, to set which roles (author, editor, illustrator, translator etc.) a person has on particular book. Now I consider to split roles to separate role tables (like book_author, book_translator etc). But I am in doubt, is it good idea or not? For pros, it makes DB more clean and one simple benefit I see that DBIC schema loader detects such simple bridge-tables and creates many-to-many accessors to me. For cons I see that aggregating functions for roles will need more joinings.
What are the benefits of using separate role-bridgetable over all-in-one role-bridgetable? And what are shortcomings? I am trying to upgrade my apps using ORM (DBIx::Class), but not knowing it well yet, so considerations towards it are also really welcome.
create table book (
id_book integer primary key,
name_book text,
author_book
);
create table person (
id_person integer primary key,
name_person text
);
create table book_person (
id_person integer,
id_book integer,
role_person text,
primary key (id_person, id_book)
);
I think using role in book_person table is a good choice if one person has ONLY one role because:
There are just a few roles which you use.
To wouldn't to litter your DB when you could do it if created some more tables like book_author, book_translator.
You don't need to use as you said many joinings.
Roles in your case just an attribute and if you don't keep some extra info about roles capabilities you shouldn't create one more table for keeping binding role-person. You already keep it in book_person.
You need to create another table for role if you have:
One person has more than one role.
You keep some extra info about role as I said above.
I guess that's all.
Given that a person can have multiple roles for a book, I would create a separate table (say book_person_role) with person-id/book-id as foreign key and a role-id. Thus you get a one-to-many relation from book_person to book_person_role. I wouldn't create a table per role; that would mean changing the schema when a role is added/deleted/changed.

How to model a mutually exclusive relationship in SQL Server

I have to add functionality to an existing application and I've run into a data situation that I'm not sure how to model. I am being restricted to the creation of new tables and code. If I need to alter the existing structure I think my client may reject the proposal.. although if its the only way to get it right this is what I will have to do.
I have an Item table that can me link to any number of tables, and these tables may increase over time. The Item can only me linked to one other table, but the record in the other table may have many items linked to it.
Examples of the tables/entities being linked to are Person, Vehicle, Building, Office. These are all separate tables.
Example of Items are Pen, Stapler, Cushion, Tyre, A4 Paper, Plastic Bag, Poster, Decoration"
For instance a Poster may be allocated to a Person or Office or Building. In the future if they add a Conference Room table it may also be added to that.
My intital thoughts are:
Item
{
ID,
Name
}
LinkedItem
{
ItemID,
LinkedToTableName,
LinkedToID
}
The LinkedToTableName field will then allow me to identify the correct table to link to in my code.
I'm not overly happy with this solution, but I can't quite think of anything else. Please help! :)
Thanks!
It is not a good practice to store table names as column values. This is a bad hack.
There are two standard ways of doing what you are trying to do. The first is called single-table inheritance. This is easily understood by ORM tools but trades off some normalization. The idea is, that all of these entities - Person, Vehicle, whatever - are stored in the same table, often with several unused columns per entry, along with a discriminator field that identifies what type the entity is.
The discriminator field is usually an integer type, that is mapped to some enumeration in your code. It may also be a foreign key to some lookup table in your database, identifying which numbers correspond to which types (not table names, just descriptions).
The other way to do this is multiple-table inheritance, which is better for your database but not as easy to map in code. You do this by having a base table which defines some common properties of all the objects - perhaps just an ID and a name - and all of your "specific" tables (Person etc.) use the base ID as a unique foreign key (usually also the primary key).
In the first case, the exclusivity is implicit, since all entities are in one table. In the second case, the relationship is between the Item and the base entity ID, which also guarantees uniqueness.
Note that with multiple-table inheritance, you have a different problem - you can't guarantee that a base ID is used by exactly one inheritance table. It could be used by several, or not used at all. That is why multiple-table inheritance schemes usually also have a discriminator column, to identify which table is "expected." Again, this discriminator doesn't hold a table name, it holds a lookup value which the consumer may (or may not) use to determine which other table to join to.
Multiple-table inheritance is a closer match to your current schema, so I would recommend going with that unless you need to use this with Linq to SQL or a similar ORM.
See here for a good detailed tutorial: Implementing Table Inheritance in SQL Server.
Find something common to Person, Vehicle, Building, Office. For the lack of a better term I have used Entity. Then implement super-type/sub-type relationship between the Entity and its sub-types. Note that the EntityID is a PK and a FK in all sub-type tables. Now, you can link the Item table to the Entity (owner).
In this model, one item can belong to only one Entity; one Entity can have (own) many items.
your link table is ok.
the trouble you will have is that you will need to generate dynamic sql at runtime. parameterized sql does not typically allow the objects inthe FROM list to be parameters.
i fyou want to avoid this, you may be able to denormalize a little - say by creating a table to hold the id (assuming the ids are unique across the other tables) and the type_id representing which table is the source, and a generated description - e.g. the name value from the inital record.
you would trigger the creation of this denormalized list when the base info is modified, and you could use that for generalized queries - and then resort to your dynamic queries when needed at runtime.